Crawl Cloud SQL for PostgreSQL
Extract metadata assets from your Cloud SQL for PostgreSQL database into Atlan.
Prerequisites
Before you begin, verify you have:
- Completed the Set up Cloud SQL for PostgreSQL guide
- Access to your Cloud SQL for PostgreSQL instance
- Reviewed the order of operations
Create crawler workflow
Create a new workflow and select Cloud SQL (PostgreSQL) as your connector source.
- In the top-right corner of any screen, select New > New Workflow.
- From the list of packages, select Cloud SQL (PostgreSQL) > Setup Workflow.
Choose extraction method
When setting up metadata extraction from your Cloud SQL for PostgreSQL instance, you need to choose how Atlan connects and extracts metadata. Select the extraction method that best fits your organization's security and network requirements:
- Direct
- Agent
Atlan SaaS connects directly to your Cloud SQL instance. This method supports multiple authentication options and lets you test the connection before proceeding.
Connection type
Choose whether to use the default connection settings or provide a custom JDBC URL:
-
Host: Use the default JDBC URL based on standard connection parameters (host, port, database name).
-
URL: Provide a custom JDBC URL with specific driver options.
Custom JDBC URLWhen using the URL option, make sure your connection string conforms to the Cloud SQL JDBC documentation and the PostgreSQL JDBC Driver documentation.
Configure authentication
Choose an authentication method for your direct connection. When using IAM-based authentication, Atlan uses the Cloud SQL Language Connector for added security.
- Built-in authentication
- IAM user authentication
- IAM Service Account authentication
Use standard database credentials created in your Cloud SQL instance.
- Username: Enter the database username you created in Cloud SQL for PostgreSQL.
- Password: Enter the password for the specified user.
- Host: Enter the public IP address of your Cloud SQL instance.
- Port: Specify the database port number (default is
5432
). - Database: Enter the name of the database you want to crawl.
Authenticate using a Google Cloud IAM user. This method is suitable for environments where you manage access through IAM policies.
-
IAM username: Enter the IAM user's email address. If the email ends with
.gserviceaccount.com
, remove this suffix before entering the username. -
IAM auth token: Provide a short-lived access token for the IAM user. To generate this token after logging in, run:
gcloud auth print-access-token --account=<iam-user>
This token expires in about one hour, making this method best suited for manual or short-lived connections, not for scheduled or long-running workflows.
-
Connection name: Copy the connection name from your Cloud SQL instance's Overview page. The format is:
<project_name>:<region>:<instance_name>
-
Database: Enter the name of the database you want to crawl.
Use a Google Cloud service account for authentication. This method is recommended for automated or scheduled workflows.
- IAM service account name: Enter the service account's email address. If it ends with
.gserviceaccount.com
, remove this suffix before entering. - IAM service account key: Upload or paste the JSON key for the service account. If you need to create or download a key, follow Google's guide: Create and manage service account keys.
- Connection name: Copy the connection name from the Cloud SQL instance's Overview page. Use the format:
<project_name>:<region>:<instance_name>
- Database: Enter the name of the database you want to crawl.
After entering the authentication details, click Test Authentication to verify your configuration. If the test is successful, click Next to proceed with the connection configuration.
Atlan's Secure Agent application is deployed within your organization and connects to the Cloud SQL instance. This method provides additional security by keeping connections within your network perimeter.
Connection type
Choose whether to use the default connection settings or provide a custom JDBC URL:
-
Host: Use the default JDBC URL based on standard connection parameters (host, port, database name).
-
URL: Provide a custom JDBC URL with specific driver options.
Custom JDBC URLWhen using the URL option, make sure your connection string conforms to the Cloud SQL JDBC documentation and the PostgreSQL JDBC Driver documentation.
Configure authentication
Choose an authentication method for your agent-based connection. When using IAM-based authentication, the agent uses the Cloud SQL Language Connector for added security.
- Built-in authentication
- IAM user authentication
- IAM Service Account authentication
- IAM Workload Identity Federation for GKE
Use standard database credentials created in your Cloud SQL instance.
- Username: Enter the database username you created in Cloud SQL for PostgreSQL.
- Password: Enter the password for the specified user.
- Host: Enter the private IP address or hostname reachable from your agent's network.
- Port: Specify the database port number (default is
5432
). - Database: Enter the name of the database you want to crawl.
Authenticate using a Google Cloud IAM user from within your agent environment.
-
IAM username: Enter the IAM user's email address. If the email ends with
.gserviceaccount.com
, remove this suffix before entering the username. -
IAM auth token: Provide a short-lived access token for the IAM user. To generate this token after logging in, run:
gcloud auth print-access-token --account=<iam-user>
This token expires in about one hour. Consider using service account authentication for connections that require longer-running or scheduled workflows.
-
Connection name: Copy the connection name from your Cloud SQL instance's Overview page. The format is:
<project_name>:<region>:<instance_name>
-
Database: Enter the name of the database you want to crawl.
Use a Google Cloud service account for authentication. This method is recommended for automated or scheduled workflows.
- IAM service account name: Enter the service account's email address. If it ends with
.gserviceaccount.com
, remove this suffix before entering. - IAM service account key: Upload or paste the JSON key for the service account. If you need to create or download a key, follow Google's guide: Create and manage service account keys.
- Connection name: Copy the connection name from the Cloud SQL instance's Overview page. Use the format:
<project_name>:<region>:<instance_name>
- Database: Enter the name of the database you want to crawl.
This method is available when the Secure Agent application is deployed on a Google Kubernetes Engine (GKE) cluster. The Kubernetes Service Account that runs the Secure Agent application impersonates an IAM service account that has required permissions on the Cloud SQL instance.
To configure this authentication method:
-
Configure Workload Identity Federation: Set up the federation between your GKE cluster and Google Cloud IAM. For detailed instructions, see Workload Identity Federation with Kubernetes—use service account impersonation.
-
IAM service account name: Enter the name of the IAM service account that's impersonated by the Kubernetes Service Account. Remove the
.gserviceaccount.com
suffix if present. -
Connection name: Copy the connection name from the Cloud SQL instance's Overview page. Format:
<project_name>:<region>:<instance_name>
-
Database: Enter the name of the database you want to crawl.
For more information on this approach, see Workload Identity Federation for GKE.
Click Next to proceed with the connection configuration.
Configure connection
To complete the connection configuration:
-
Provide a Connection Name that represents your source environment. For example, you might use values like
production
,development
,gold
, oranalytics
. -
(Optional) To change the users able to manage this connection, change the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection - not even admins.
-
At the bottom of the screen, click Next to proceed.
Configure crawler
Before running the crawler, you can further configure it. (These options are only available when using the direct extraction method.)
You can override the defaults for any of these options:
- To select the assets you want to exclude from crawling, click Exclude Metadata. (This defaults to no assets if none are specified.)
- To select the assets you want to include in crawling, click Include Metadata. (This defaults to all assets, if none are specified.)
- To have the crawler ignore tables and views based on a naming convention, specify a regular expression in the Exclude regex for tables & views field.
- For Advanced Config, keep Default for the default configuration or click Custom to configure the crawler:
- For Enable Source Level Filtering, click True to enable schema-level filtering at source or click False to disable it.
- For Use JDBC Internal Methods, click True to enable JDBC internal methods for data extraction or click False to disable it.
If an asset appears in both the include and exclude filters, the exclude filter takes precedence.
Run crawler
To run the Cloud SQL for PostgreSQL crawler:
- Run preflight checks (Direct extraction only): Click Preflight checks to validate permissions and configuration before running the crawler. This helps identify any potential issues early. If you're using Agent extraction, skip to step 2.
- Execute the crawler: You can either:
- To run the crawler once immediately, at the bottom of the screen, click the Run button.
- To schedule the crawler to run hourly, daily, weekly, or monthly, at the bottom of the screen, click the Schedule Run button.
Once the crawler has completed running, you can see the assets in Atlan's asset page! 🎉
See also
- What does Atlan crawl from Cloud SQL for PostgreSQL: Learn about the Cloud SQL for PostgreSQL assets and metadata that Atlan discovers and catalogs