Skip to main content

Crawl Iceberg

Configure and run the crawler to extract metadata from your Iceberg data lakehouse assets.

Atlan crawls comprehensive metadata from your Iceberg catalog, including catalogs, namespaces, tables, and columns. This gives you visibility into and control over your Iceberg data assets within Atlan.

Prerequisites

Before you begin, make sure you have:

  • Completed Iceberg setup
  • Admin access to your Atlan instance
  • Iceberg REST catalog connection details (URI, authentication credentials)

Create crawler workflow

To crawl metadata from Iceberg, review the order of operations and then complete the following steps.

  1. In the top right of any screen, navigate to New and then click New Workflow.
  2. From the list of packages, select Iceberg Assets and click Setup Workflow.

Configure authentication

Choose your extraction method:

  • In Direct extraction, Atlan connects to your Iceberg REST catalog and crawls metadata directly.
  • In Agent extraction, Atlan's secure agent executes metadata extraction within your organization's environment.
  1. Extraction method: Select Direct
  2. Authentication method: Select Token
  3. REST Catalog URI: Enter your Iceberg REST catalog endpoint URL (for example, https://your-catalog.com/api/rest)
  4. Token: Enter your credentials in format client-id:client-secret
  5. Advanced:
    • Catalog Name: Identifier for your catalog instance (default: atlan-wh)
    • Warehouse Name: Identifier for the warehouse within the catalog (default: atlan-wh)
    • Scope: Access scope for the catalog (default: PRINCIPAL_ROLE:lake_readers)
  6. Click Test Connection to confirm connectivity to your Iceberg catalog. This validates that Atlan can reach your catalog with the provided credentials.
  7. Once successful, click Next.

Configure connection

On this page, you define how this Iceberg connection is identified and managed within Atlan.

  1. Provide a Connection Name that represents your source environment. For example, you might use values like production, development, analytics, or iceberg-catalog. This name appears in Atlan's interface and helps you identify this connection when managing multiple Iceberg instances.

  2. To control who can manage this connection, change the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection, including administrators. This maintains governance of connection access within your organization.

  3. At the bottom of the screen, click Next to proceed.

Configure crawler

Before running the Iceberg crawler, you can customize which assets it crawls. On the Metadata page, you can override the defaults:

  • Exclude Metadata: Select specific namespaces and tables to skip during crawling. By default, no assets are excluded.
  • Include Metadata: Select specific namespaces and tables to crawl. By default, all assets are included.
  • Preflight checks: Click to check for any permissions or configuration issues before running the crawler.

Run crawler

After completing the configuration, choose how you want to run the crawler.

  • To run the crawler once immediately, at the bottom of the screen click Run.
  • To schedule the crawler to run hourly, daily, weekly or monthly, at the bottom of the screen click Schedule & Run.

Verify crawled assets

Once the crawler completes running, you can see the assets in Atlan's asset page. Verify the crawl was successful by monitoring the workflow:

  1. Navigate to the Workflows section in Atlan. Here you can see real-time status updates of your crawler run.
  2. Click on your Iceberg crawler workflow to view details. Check the Logs tab for detailed execution information about what was crawled and any errors that occurred.
  3. Wait for the status to show Success before proceeding. This confirms that all assets were successfully crawled and are now available in Atlan's catalog.

Once complete, you can now browse, search, and govern your Iceberg assets within Atlan.

See also