Crawl PostgreSQL
Extract metadata assets from your PostgreSQL database into Atlan.
Prerequisites
Before you begin, verify you have:
- Configured the PostgreSQL permissions (and set up a private network link if using private network)
- Access to your PostgreSQL instance and credentials
- Reviewed the order of operations
Create crawler workflow
Create a new workflow and select PostgreSQL as your connector source.
- In the top right of any screen in Atlan, navigate to New > New Workflow.
- From the Marketplace page, click Postgres Assets > Setup Workflow.
Configure extraction
When setting up metadata extraction from your PostgreSQL database, choose how Atlan connects and extracts metadata. Select the extraction method that best fits your organization's security and network requirements:
- Direct
- Agent
- Offline
Atlan SaaS connects directly to your PostgreSQL database. This method supports Basic and IAM authentication and lets you test the connection before proceeding.
- For Host, enter the host for your PostgreSQL instance.
- For Port, enter the port number of your PostgreSQL instance (default: 5432).
- Choose an authentication type that matches your PostgreSQL configuration:
- Basic authentication
- IAM User authentication
- IAM Role authentication
- Select Basic authentication.
- For Username, enter the username you configured in PostgreSQL.
- For Password, enter the password for the username.
- Select IAM User authentication.
- For AWS Access Key, enter the AWS access key.
- For AWS Secret Key, enter the AWS secret key.
- For Username, enter the database username you configured.
- Select IAM Role authentication.
- For AWS Role ARN, enter the ARN of the role you created.
- For Username, enter the database username you configured.
- For AWS External ID, enter the external ID if you configured one in the role definition.
- For Database, enter the name of the database to crawl.
- Click Test Authentication to confirm connectivity to PostgreSQL. When successful, click Next to proceed with the connection configuration.
Self-Deployed Runtime runs within your organization and connects to your PostgreSQL database. This method keeps connections inside your network perimeter.
-
Install Self-Deployed Runtime if you haven't already:
-
Select the Agent tab and configure the PostgreSQL data source by adding the secret keys for your secret store.
-
Choose an authentication type that matches your PostgreSQL configuration:
- Basic authentication
- IAM User authentication
- IAM Role authentication
- Select Basic authentication.
- For Host, enter the host for your PostgreSQL instance.
- For Port, enter the port number of your PostgreSQL instance (default: 5432).
- For Username, enter the username you configured in PostgreSQL (or reference the secret key where it's stored).
- For Password, enter the password for the username (or reference the secret key where it's stored).
- Select IAM User authentication.
- For Host, enter the host for your PostgreSQL instance.
- For Port, enter the port number of your PostgreSQL instance (default: 5432).
- For AWS Access Key, enter the AWS access key (or reference the secret key where it's stored).
- For AWS Secret Key, enter the AWS secret key (or reference the secret key where it's stored).
- For Username, enter the database username you configured.
- Select IAM Role authentication.
- For Host, enter the host for your PostgreSQL instance.
- For Port, enter the port number of your PostgreSQL instance (default: 5432).
- For AWS Role ARN, enter the ARN of the role you created (or reference the secret key where it's stored).
- For Username, enter the database username you configured.
- For AWS External ID, enter the external ID if you configured one in the role definition.
- For Database, enter the name of the database to crawl.
- Store sensitive information in your secret store and reference the secrets in the corresponding fields. For more information, see Configure secrets for workflow execution.
- Click Next after completing the configuration.
Atlan can ingest metadata that you extract and upload to S3 using the offline extraction method. First extract the metadata yourself, then make it available in S3.
- For Bucket name, enter the name of your S3 bucket or Atlan's bucket.
- For Bucket prefix, enter the S3 prefix under which all the metadata files exist (for example,
database.json,columns-<database>.json). - Click Next at the bottom of the screen.
Configure connection
Set up the connection name and access controls for your PostgreSQL data source in Atlan.
- Provide a Connection Name that represents your source environment. For example, you might use values like
production,development,gold, oranalytics. - To change the users able to manage this connection, update the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection (not even admins).
- At the bottom of the screen, click Next to proceed.
Configure crawler
Before running the crawler, you can configure which assets to include or exclude:
- To exclude specific assets from crawling, click Exclude Metadata. This defaults to no assets if none are specified.
- To include specific assets in crawling, click Include Metadata. This defaults to all assets if none are specified.
- To have the crawler ignore tables and views based on a naming convention, specify a regular expression in the Exclude regex for tables & views field.
- For Advanced Config, keep Default for the default configuration or click Custom to configure the crawler:
- For Enable Source Level Filtering, click True to enable schema-level filtering at source or click False to disable it.
- For Use JDBC Internal Methods, click True to enable JDBC internal methods for data extraction or click False to disable it.
If an asset appears in both the include and exclude filters, the exclude filter takes precedence.
Run crawler
- Direct
- Agent
- Offline
- Click Preflight checks to validate permissions and configuration before running the crawler. This helps identify any potential issues early.
- After the preflight checks pass, you can either:
- Click Run to run the crawler once immediately.
- Click Schedule Run to schedule the crawler to run hourly, daily, weekly, or monthly.
You can either:
- Click Run to run the crawler once immediately.
- Click Schedule Run to schedule the crawler to run hourly, daily, weekly, or monthly.
You can either:
- Click Run to run the crawler once immediately.
- Click Schedule Run to schedule the crawler to run hourly, daily, weekly, or monthly.
Once the crawler has completed running, you can see the assets on Atlan's asset page.
See also
- How Atlan connects to PostgreSQL: Connectivity, authentication, and data access patterns
- What does Atlan crawl from PostgreSQL: Metadata and assets discovered during crawling
- Preflight checks for PostgreSQL: Verify prerequisites before crawling
- Troubleshooting PostgreSQL connectivity: Resolve common connection issues