Crawl AlloyDB for PostgreSQL
Extract metadata assets from your AlloyDB for PostgreSQL database into Atlan.
Prerequisites
Before you begin, verify you have:
- Completed the Set up AlloyDB for PostgreSQL guide
- Access to your AlloyDB for PostgreSQL instance
- Reviewed the order of operations
Create crawler workflow
Create a new workflow and select AlloyDB (PostgreSQL) as your connector source.
- In the top-right corner of any screen, select New > New Workflow.
- From the list of packages, select AlloyDB (PostgreSQL) > Setup Workflow.
Configure extraction
When setting up metadata extraction from your AlloyDB for PostgreSQL instance, you need to choose how Atlan connects and extracts metadata. Select the extraction method that best fits your organization's security and network requirements:
- Direct
- Agent
Atlan SaaS connects directly to your AlloyDB instance (typically via the public endpoint and AlloyDB connectors). This method supports multiple authentication options and lets you test the connection before proceeding.
-
Choose whether to use the default connection settings or provide a custom Postgres Driver URL:
- Host: Use the default Postgres Driver URL based on standard connection parameters (host, port, database name).
- URL: Provide a custom Postgres Driver URL with specific driver options. Make sure your connection string conforms to the PostgreSQL Driver documentation and applicable to AlloyDB.
-
Choose an authentication method for your direct connection. For IAM-based authentication, use the AlloyDB connectors/Auth Proxy to generate database auth tokens.
- Built-in authentication
- IAM user authentication
- IAM Service Account authentication
-
Use standard database credentials created in your AlloyDB for PostgreSQL instance.
- Username: Enter the database username you created.
- Password: Enter the password for the specified user.
- Host: Enter the IP address or hostname exposed for your AlloyDB instance.
- Port: Specify the database port number (default is
5432). - Database: Enter the name of the database you want to crawl.
-
After entering the authentication details, click Test Authentication to verify your configuration. If the test is successful, click Next to proceed with the connection configuration.
-
Authenticate using a Google Cloud IAM user. This method is suitable for environments where you manage access through IAM policies and use the Auth Proxy or connectors to obtain tokens.
-
IAM username: Enter the IAM user's email address. If the email ends with
.gserviceaccount.com, remove this suffix before entering the username. -
IAM auth token: Provide a short-lived access token for the IAM user. To generate this token after logging in, run:
gcloud auth print-access-token --account=<iam-user>
This token expires in about one hour. Prefer service accounts for scheduled workflows.
- Instance/cluster connection info: Provide connection details from your AlloyDB instance overview.
- Database: Enter the name of the database you want to crawl.
-
-
After entering the authentication details, click Test Authentication to verify your configuration. If the test is successful, click Next to proceed with the connection configuration.
-
Use a Google Cloud service account for authentication. This method is recommended for automated or scheduled workflows.
- IAM service account name: Enter the service account's email address. If it ends with
.gserviceaccount.com, remove this suffix before entering. - IAM service account key: Upload or paste the JSON key for the service account (when using key-based auth). For keyless setups, make sure the environment can obtain access tokens.
- Instance/cluster connection info: Provide connection details from AlloyDB.
- Database: Enter the name of the database you want to crawl.
- IAM service account name: Enter the service account's email address. If it ends with
-
After entering the authentication details, click Test Authentication to verify your configuration. If the test is successful, click Next to proceed with the connection configuration.
Atlan's Secure Agent application is deployed within your organization and connects to the AlloyDB instance. This method provides additional security by keeping connections within your network perimeter.
-
Install Self-Deployed Runtime if you haven't already.
-
Choose whether to use the default connection settings or provide a custom Postgres Driver URL:
- Host: Use the default Postgres Driver URL based on standard connection parameters (host, port, database name).
- URL: Provide a custom Postgres Driver URL with specific driver options. Make sure your connection string conforms to the PostgreSQL Driver documentation and applicable to AlloyDB.
-
Choose an authentication method for your agent-based connection. For IAM-based authentication, configure your environment to obtain database auth tokens via the AlloyDB connectors/Auth Proxy.
- Built-in authentication
- IAM user authentication
- IAM Service Account authentication
-
Use standard database credentials created in AlloyDB for PostgreSQL.
- Username: Enter the database username.
- Password: Enter the password for the specified user.
- Host: Enter the private IP address or hostname reachable from your agent's network.
- Port: Specify the database port number (default is
5432). - SQLAlchemy Args: Enter additional SQLAlchemy arguments in comma-separated string format.
- Database: Enter the name of the database you want to crawl.
-
Click Next to proceed with the connection configuration.
-
Authenticate using a Google Cloud IAM user from within your agent environment.
-
IAM username: Enter the IAM user's email address. If the email ends with
.gserviceaccount.com, remove this suffix before entering the username. -
IAM auth token: Provide a short-lived access token for the IAM user. To generate this token after logging in, run:
gcloud auth print-access-token --account=<iam-user>
This token expires in about one hour. Consider using service account authentication for connections that require longer-running or scheduled workflows.
- Instance/cluster connection info: Provide connection details from AlloyDB.
- SQLAlchemy Args: Enter additional SQLAlchemy arguments in comma-separated string format.
- Database: Enter the name of the database you want to crawl.
-
-
Click Next to proceed with the connection configuration.
- Use a Google Cloud service account for authentication. This method is recommended for automated or scheduled workflows.
- IAM service account name: Enter the service account's email address. If it ends with
.gserviceaccount.com, remove this suffix before entering. - IAM service account key: Upload or paste the JSON key for the service account (when applicable).
- Instance/cluster connection info: Provide connection details from AlloyDB.
- SQLAlchemy Args: Enter additional SQLAlchemy arguments in comma-separated string format.
- Database: Enter the name of the database you want to crawl.
- Click Next to proceed with the connection configuration.
Advanced options
- SQLAlchemy Args: Comma separated list of arguments which are passed to SQLAlchemy engine as connect_args
Configure connection
Set up the connection name and access controls for your AlloyDB for PostgreSQL data source in Atlan.
- Provide a Connection Name that represents your source environment. For example, you might use values like
production,development,gold, oranalytics. - To change the users able to manage this connection, update the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection (not even admins).
- At the bottom of the screen, click Next to proceed.
Configure crawler
Before running the crawler, you can configure which assets to include or exclude. These options are only available when using the direct extraction method. If an asset appears in both the include and exclude filters, the exclude filter takes precedence.
- To exclude specific assets from crawling, select Exclude Metadata. This defaults to no assets if none are specified.
- To include specific assets in crawling, select Include Metadata. This defaults to all assets if none are specified.
- To ignore tables and views based on a naming convention, specify a regular expression in the Exclude regex for tables & views field.
Run crawler
- Direct
- Agent
- Click Preflight checks to validate permissions and configuration before running the crawler. This helps identify any potential issues early.
- After the preflight checks pass, you can either:
- Click Run to run the crawler once immediately.
- Click Schedule Run to schedule the crawler to run hourly, daily, weekly, or monthly.
You can either:
- Click Run to run the crawler once immediately.
- Click Schedule Run to schedule the crawler to run hourly, daily, weekly, or monthly.
Once the crawler has completed running, you can see the assets in Atlan's asset page! 🎉
See also
- What does Atlan crawl from AlloyDB for PostgreSQL: Learn about the AlloyDB for PostgreSQL assets and metadata that Atlan discovers and catalogs