Skip to main content

Supported connections for on-premises databases

The metadata-extractor tool supports the following connection types.

These describe the details required when setting up on-premises database access.

Amazon Redshift with basic authentication

Use <<: *redshift under the environment section to use this connection type.

Available attributes:

  • HOST - database host name or IP address (required)
  • PORT - database port (optional, default is 5439)
  • DATABASE - database name (required)
  • USERNAME - database username (required)
  • PASSWORD - database user password (required)
  • EXCLUDE_FILTER_TEMPLATE - exclude filter pattern (optional)
  • INCLUDE_FILTER_TEMPLATE - include filter pattern (optional)
  • USE_SOURCE_SCHEMA_FILTERING - Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional)
  • USE_JDBC_INTERNAL_METHODS - Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)

Example

services:
my-redshift-database:
<<: *extract
environment:
<<: *redshift
USERNAME: my-database-username
PASSWORD: my-database-password
HOST: redshift-host
PORT: redshift-database-port
DATABASE: my-database-name
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/redshift-example:/output

Amazon Redshift with IAM user authentication

Use <<: *redshift under the environment section to use this connection type.

Available attributes:

  • HOST - database host name or IP address (required)
  • PORT - database port (optional, default is 5439)
  • DATABASE - database name (required)
  • DATABASE_USER - database username of the IAM user (required)
  • AWS_ACCESS_KEY_ID - AWS access key ID (required)
  • AWS_SECRET_ACCESS_KEY - AWS secret access key (required)
  • CLUSTER_ID  - cluster identifier of your private Amazon Redshift cluster (optional)
  • EXCLUDE_FILTER_TEMPLATE - exclude filter pattern (optional)
  • INCLUDE_FILTER_TEMPLATE - include filter pattern (optional)
  • USE_SOURCE_SCHEMA_FILTERING - Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional)
  • USE_JDBC_INTERNAL_METHODS - Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)

Example

services:
redshift-iam-user-example:
<<: *extract
environment:
<<: *redshift-iam-user
AWS_ACCESS_KEY_ID: aws-access-key-id
AWS_SECRET_ACCESS_KEY: aws-secret-access-key
HOST: redshift-host
DATABASE: my-database-name
DATABASE_USER: my-database-user
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
CLUSTER_ID: private-cluster-id
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/redshift-iam-user-example:/output

Hive with basic authentication

Use <<: *hive under the environment section to use this connection type.

Available attributes:

  • HOST - database host name or IP address (required)
  • PORT - database port (optional, default is 10000)
  • DEFAULT_SCHEMA - default schema name (optional, default is default)
  • USERNAME - database user name (required)
  • PASSWORD - database user password (required)
  • SCHEMA_EXCLUDE_REGEX - regex to exclude schemas (optional)
  • SCHEMA_INCLUDE_REGEX - regex to include schemas (optional)
  • TEMP_TABLE_REGEX - regex to exclude tables (optional)
  • USE_SOURCE_SCHEMA_FILTERING - Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional)
  • EXCLUDE_FILTER_TEMPLATE - exclude filter pattern (optional)
  • INCLUDE_FILTER_TEMPLATE - include filter pattern (optional)
  • USE_JDBC_INTERNAL_METHODS - Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)

Example

hive-example:
<<: *hive-extract
environment:
<<: *hive
HOST: hive-host
PORT: hive-port
DEFAULT_SCHEMA: default
USERNAME: my-hive-username
PASSWORD: my-hive-password
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/hive-example:/output

Microsoft SQL Server with basic authentication

Use <<: *mssql under the environment section to use this connection type.

Available attributes:

  • HOST - database host name or IP address (required)
  • DATABASE - database name (required)
  • USERNAME - database user name (required)
  • PASSWORD - database user password (required)
  • USE_SOURCE_SCHEMA_FILTERING - Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional)
  • EXCLUDE_FILTER_TEMPLATE - exclude filter pattern (optional)
  • INCLUDE_FILTER_TEMPLATE - include filter pattern (optional)
  • USE_JDBC_INTERNAL_METHODS - Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)

Example

services:
my-mssql-database:
<<: *extract
environment:
<<: *mssql
USERNAME: db-user
PASSWORD: db-user-password
HOST: mssql-database-host
DATABASE: northwind
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/mssql-database:/output

MySQL with basic authentication

Use <<: *mysql under the environment section to use this connection type.

Available attributes:

  • HOST - database host name or IP address (required)
  • PORT - database port (optional, default is 3306)
  • USERNAME - database user name (required)
  • PASSWORD - database user password (required)
  • USE_SOURCE_SCHEMA_FILTERING - Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional)
  • EXCLUDE_FILTER_TEMPLATE - exclude filter pattern (optional)
  • INCLUDE_FILTER_TEMPLATE - include filter pattern (optional)
  • USE_JDBC_INTERNAL_METHODS - Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)

Example

services:
my-mysql-database:
<<: *extract
environment:
<<: *mysql
USERNAME: db-user
PASSWORD: db-user-password
HOST: mysql-database-host.internal
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/my-mysql-database:/output

MySQL with IAM authentication

Use <<: *mysql-iam under the environment section to use this connection type.

Available attributes:

  • HOST - database host name or IP address (required)
  • PORT - database port (optional, default is 3306)
  • USERNAME - database user name (required)
  • AWS_ACCESS_KEY_ID - AWS access key id (required)
  • AWS_SECRET_ACCESS_KEY - AWS secret access key (required)
  • USE_SOURCE_SCHEMA_FILTERING - Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional)
  • EXCLUDE_FILTER_TEMPLATE - exclude filter pattern (optional)
  • INCLUDE_FILTER_TEMPLATE - include filter pattern (optional)
  • USE_JDBC_INTERNAL_METHODS - Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)

Example

services:
my-mysql-database:
<<: *extract
environment:
<<: *mysql-iam
AWS_ACCESS_KEY_ID: my-access-key-id
AWS_SECRET_ACCESS_KEY: my-secret-access-key
USERNAME: db-user
HOST: mysql-database-host.internal
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/my-mysql-database:/output

Oracle with basic authentication

Use <<: *oracledb under the environment section to use this connection type.

Did you know?

The Oracle JDBC driver is downloaded to the image under the Oracle FUTC license.

Available attributes:

  • HOST - database host name or IP address (required)
  • PORT - database port (optional, default is 1521)
  • USERNAME - database user name (required)
  • PASSWORD - database user password (required)
  • SERVICE - SID/service identifier (usually ORCL) or service name (required)
  • DEFAULT_DATABASE_NAME - database name (usually same as SID) (required)
  • USE_SOURCE_SCHEMA_FILTERING - Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional)
  • EXCLUDE_FILTER_TEMPLATE - exclude filter pattern (optional)
  • INCLUDE_FILTER_TEMPLATE - include filter pattern (optional)
  • USE_JDBC_INTERNAL_METHODS - Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)

Example

services:
my-oracledb-database:
<<: *extract
environment:
<<: *oracledb
HOST: oracle-host.internal
USERNAME: db-user
PASSWORD: db-user-password
SERVICE: ORCL
DEFAULT_DATABASE_NAME: ORCL
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/my-oracledb-database:/output

PostgreSQL with basic authentication

Use <<: *psql under the environment section to use this connection type.

Available attributes:

  • HOST - database host name or IP address (required)
  • PORT - database port (optional, default is 5432)
  • DATABASE - database name (required)
  • USERNAME - database user name (required)
  • PASSWORD - database user password (required)
  • USE_SOURCE_SCHEMA_FILTERING - Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional)
  • EXCLUDE_FILTER_TEMPLATE - exclude filter pattern (optional)
  • INCLUDE_FILTER_TEMPLATE - include filter pattern (optional)
  • USE_JDBC_INTERNAL_METHODS - Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)

Example

services:
my-psql-database:
<<: *extract
environment:
<<: *psql
USERNAME: db-user
PASSWORD: db-user-password
HOST: psql-database-host.internal
DATABASE: inventory
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/my-psql-database:/output

PostgreSQL with IAM authentication

Use <<: *psql-iam under the environment section to use this connection type.

Available attributes:

  • HOST - database host name or IP address (required)
  • PORT - database port (optional, default is 5432)
  • DATABASE - database name (required)
  • USERNAME - database user name (required)
  • AWS_ACCESS_KEY_ID - AWS access key id (required)
  • AWS_SECRET_ACCESS_KEY - AWS secret access key (required)
  • USE_SOURCE_SCHEMA_FILTERING - Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional)
  • EXCLUDE_FILTER_TEMPLATE - exclude filter pattern (optional)
  • INCLUDE_FILTER_TEMPLATE - include filter pattern (optional)
  • USE_JDBC_INTERNAL_METHODS - Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)

Example

services:
my-psql-database:
<<: *extract
environment:
<<: *psql-iam
AWS_ACCESS_KEY_ID: my-access-key-id
AWS_SECRET_ACCESS_KEY: my-secret-access-key
USERNAME: db-user
HOST: psql-database-host.internal
DATABASE: inventory
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/my-psql-database:/output

SAP HANA with basic authentication

Use <<: *saphana under the environment section to use this connection type.

Available attributes:

  • HOST - database host name or IP address (required)
  • PORT - database port (optional, default is 39017)
  • USERNAME - database user name (required)
  • PASSWORD - database user password (required)
  • USE_SOURCE_SCHEMA_FILTERING - Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional)
  • EXCLUDE_FILTER_TEMPLATE - exclude filter pattern (optional)
  • INCLUDE_FILTER_TEMPLATE - include filter pattern (optional)
  • USE_JDBC_INTERNAL_METHODS - Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)

Example

services:
my-saphana-database:
<<: *extract
environment:
<<: *saphana
USERNAME: my-database-username
PASSWORD: my-database-password
HOST: my-database-host
PORT: 39017
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DEFAULT$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DEFAULT$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/saphana-example:/output

Snowflake with basic authentication

This is currently only supported when using the information schema extraction method to fetch metadata with basic authentication.

Use <<: *snowflake under the environment section to use this connection type.

Available attributes:

  • HOST - database host name or IP address (required)
  • ROLE - warehouse role (required)
  • WAREHOUSE - warehouse name (required)
  • USERNAME - warehouse username (required)
  • PASSWORD - warehouse user password (required)
  • EXCLUDE_FILTER_TEMPLATE - exclude filter pattern (optional)
  • INCLUDE_FILTER_TEMPLATE - include filter pattern (optional)
  • USE_SOURCE_SCHEMA_FILTERING - Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional)
  • USE_JDBC_INTERNAL_METHODS - Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)

Example

services:
snowflake-example:
<<: *extract
environment:
<<: *snowflake
USERNAME: my-username
PASSWORD: my-password
HOST: snowflake-host
ROLE: warehouse-role
WAREHOUSE: warehouse-name
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/snowflake-example:/output

Teradata with basic authentication

Use <<: *teradata under the environment section to use this connection type.

Available attributes:

  • HOST - database host name or IP address (required)
  • PORT - database port (optional, default is 1025)
  • USERNAME - database user name (required)
  • PASSWORD - database user password (required)
  • USE_SOURCE_SCHEMA_FILTERING - Boolean to specify if database-level filtering needs to be enabled while fetching schemas, tables, and columns (optional)
  • EXCLUDE_FILTER_TEMPLATE - exclude filter pattern (optional)
  • INCLUDE_FILTER_TEMPLATE - include filter pattern (optional)
  • USE_JDBC_INTERNAL_METHODS - Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)

Example

services:
my-teradata-database:
<<: *extract
environment:
<<: *teradata
USERNAME: db-user
PASSWORD: db-user-password
HOST: teradata-database-host.internal
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DEFAULT$$": ["^DB1$$", "^DB2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DEFAULT$$": ["^DB1$$", "^DB2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DEFAULT$": ["^DB1$", "^DB2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DEFAULT$": ["^DB1$", "^DB2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/my-teradata-database:/output