Crawl Iceberg
Configure and run the crawler to extract metadata from your Iceberg data lakehouse assets.
Atlan crawls metadata from Iceberg catalogs, namespaces, tables, and columns.
Prerequisites
Before you begin, make sure you have:
- Completed Set up Iceberg
- Admin access to your Atlan instance
- The connection values for your chosen mode (Generic REST Catalog or BigLake Metastore)
Create crawler workflow
To crawl metadata from Iceberg, review the order of operations and then complete the following steps.
- In the top right of any screen, navigate to New and then click New Workflow.
- From the list of packages, select Iceberg Assets and click Setup Workflow.
Configure authentication
Choose one authentication mode and then configure either Direct extraction or Agent extraction for that mode.
- Generic REST Catalog
- BigLake Metastore (GCP)
Use this mode for REST catalogs that support OAuth2 client credentials.
Direct extraction
- Extraction method: Select Direct.
- Authentication method: Select Token.
- Enter the required values:
- REST Catalog URI: For example,
https://your-catalog.com/api/rest - Token: Enter credentials in the format
client-id:client-secret - Catalog Name
- Warehouse
- Scope (if required by your catalog)
- REST Catalog URI: For example,
- Click Test Connection.
- Once successful, click Next.
Agent extraction
- Extraction method: Select Agent.
- Provide the same values as Direct extraction through your configured secret store.
- Complete runtime configuration by following How to configure Secure Agent for workflow execution.
- Click Next.
Use this mode for Iceberg catalogs backed by Google BigLake Metastore.
- Service account key
- Workload Identity Federation (WIF)
Direct extraction
- Extraction method: Select Direct.
- Authentication method: Select BigLake Metastore (BLM).
- GCP authentication type: Select Service account key.
- Enter the required values:
- REST Catalog URI
- Project ID
- Location
- Catalog Name
- Warehouse (use your configured warehouse path, for example,
gs://<bucket>/warehouse) - Service account JSON key
- Click Test Connection.
- Once successful, click Next.
Agent extraction
- Extraction method: Select Agent.
- Select BigLake Metastore (BLM) and Service account key.
- Provide the same values as Direct extraction through your configured secret store.
- Complete runtime configuration by following How to configure Secure Agent for workflow execution.
- Click Next.
Direct extraction
- Extraction method: Select Direct.
- Authentication method: Select BigLake Metastore (BLM).
- GCP authentication type: Select Workload Identity Federation (WIF).
- Enter the required values:
- REST Catalog URI
- Project ID
- Location
- Catalog Name
- Warehouse (use your configured warehouse path, for example,
gs://<bucket>/warehouse) - Service Account Email
- WIF Pool Provider ID
- Atlan OAuth Client ID
- Atlan OAuth Client Secret
- Click Test Connection.
- Once successful, click Next.
Agent extraction
- Extraction method: Select Agent.
- Select BigLake Metastore (BLM) and Workload Identity Federation (WIF).
- Provide the same values as Direct extraction through your configured secret store.
- Complete runtime configuration by following How to configure Secure Agent for workflow execution.
- Click Next.
Configure connection
On this page, define how this Iceberg connection is identified and managed in Atlan.
- Provide a Connection Name that represents your source environment (for example,
production,development, oriceberg-blm). - To control who can manage this connection, configure Connection Admins.
- Click Next.
Configure crawler
Before running the crawler, optionally customize crawl scope on the Metadata page:
- Exclude Metadata: Select specific namespaces and tables to skip.
- Include Metadata: Select specific namespaces and tables to include.
- Preflight checks: Validate connectivity and permissions before execution.
Run crawler
After configuration, choose how to run:
- Click Run to run once immediately.
- Click Schedule & Run to run on a schedule.
Verify crawled assets
After the crawler completes:
- Navigate to Workflows and open the Iceberg workflow run.
- Review execution details and logs.
- Confirm status is Success.
Then verify crawled assets from Iceberg in Atlan search and asset views.
See also
- What does Atlan crawl from Iceberg: Assets and metadata that Atlan ingests.
- Preflight checks for Iceberg: Validation checks run before crawling.