Crawl Iceberg
Configure and run the crawler to extract metadata from your Iceberg data lakehouse assets.
Atlan crawls comprehensive metadata from your Iceberg catalog, including catalogs, namespaces, tables, and columns. This gives you visibility into and control over your Iceberg data assets within Atlan.
Prerequisites
Before you begin, make sure you have:
- Completed Iceberg setup
- Admin access to your Atlan instance
- Iceberg REST catalog connection details (URI, authentication credentials)
Create crawler workflow
To crawl metadata from Iceberg, review the order of operations and then complete the following steps.
- In the top right of any screen, navigate to New and then click New Workflow.
- From the list of packages, select Iceberg Assets and click Setup Workflow.
Configure authentication
Choose your extraction method:
- In Direct extraction, Atlan connects to your Iceberg REST catalog and crawls metadata directly.
- In Agent extraction, Atlan's secure agent executes metadata extraction within your organization's environment.
- Direct - Token Authentication
- Agent extraction
- Extraction method: Select Direct
- Authentication method: Select Token
- REST Catalog URI: Enter your Iceberg REST catalog endpoint URL (for example,
https://your-catalog.com/api/rest) - Token: Enter your credentials in format
client-id:client-secret - Advanced:
- Catalog Name: Identifier for your catalog instance (default:
atlan-wh) - Warehouse Name: Identifier for the warehouse within the catalog (default:
atlan-wh) - Scope: Access scope for the catalog (default:
PRINCIPAL_ROLE:lake_readers)
- Catalog Name: Identifier for your catalog instance (default:
- Click Test Connection to confirm connectivity to your Iceberg catalog. This validates that Atlan can reach your catalog with the provided credentials.
- Once successful, click Next.
Use Agent extraction when your Iceberg REST catalog isn't reachable from Atlan Cloud (for example, it's behind a firewall). A Self-Deployed Runtime runs inside your network and connects to your catalog, then sends metadata to Atlan over an outbound connection.
Before configuring the crawler:
- Install Self-Deployed Runtime if you haven't already:
- Confirm the runtime can reach your Iceberg REST catalog over your local network and that network security is configured.
To configure the crawler:
- Extraction method: Select Agent
- Configure the Iceberg catalog by adding the secret keys for your secret store. For details on the required fields, refer to the Direct extraction section.
- Complete the Secure Agent configuration by following the instructions in the How to configure Secure Agent for workflow execution guide.
- Click Next after completing the configuration.
Configure connection
On this page, you define how this Iceberg connection is identified and managed within Atlan.
-
Provide a Connection Name that represents your source environment. For example, you might use values like
production,development,analytics, oriceberg-catalog. This name appears in Atlan's interface and helps you identify this connection when managing multiple Iceberg instances. -
To control who can manage this connection, change the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection, including administrators. This maintains governance of connection access within your organization.
-
At the bottom of the screen, click Next to proceed.
Configure crawler
Before running the Iceberg crawler, you can customize which assets it crawls. On the Metadata page, you can override the defaults:
- Exclude Metadata: Select specific namespaces and tables to skip during crawling. By default, no assets are excluded.
- Include Metadata: Select specific namespaces and tables to crawl. By default, all assets are included.
- Preflight checks: Click to check for any permissions or configuration issues before running the crawler.
Run crawler
After completing the configuration, choose how you want to run the crawler.
- To run the crawler once immediately, at the bottom of the screen click Run.
- To schedule the crawler to run hourly, daily, weekly or monthly, at the bottom of the screen click Schedule & Run.
Verify crawled assets
Once the crawler completes running, you can see the assets in Atlan's asset page. Verify the crawl was successful by monitoring the workflow:
- Navigate to the Workflows section in Atlan. Here you can see real-time status updates of your crawler run.
- Click on your Iceberg crawler workflow to view details. Check the Logs tab for detailed execution information about what was crawled and any errors that occurred.
- Wait for the status to show Success before proceeding. This confirms that all assets were successfully crawled and are now available in Atlan's catalog.
Once complete, you can now browse, search, and govern your Iceberg assets within Atlan.
See also
- What does Atlan crawl from Iceberg: Learn what assets, metadata, and properties Atlan crawls from Iceberg
- Preflight checks for Iceberg: Validate permissions and configuration before crawling