Skip to main content

Crawl Iceberg

Configure and run the crawler to extract metadata from your Iceberg data lakehouse assets.

Atlan crawls metadata from Iceberg catalogs, namespaces, tables, and columns.

Prerequisites

Before you begin, make sure you have:

  • Completed Set up Iceberg
  • Admin access to your Atlan instance
  • The connection values for your chosen mode (Generic REST Catalog or BigLake Metastore)

Create crawler workflow

To crawl metadata from Iceberg, review the order of operations and then complete the following steps.

  1. In the top right of any screen, navigate to New and then click New Workflow.
  2. From the list of packages, select Iceberg Assets and click Setup Workflow.

Configure authentication

Choose one authentication mode and then configure either Direct extraction or Agent extraction for that mode.

Use this mode for REST catalogs that support OAuth2 client credentials.

Direct extraction

  1. Extraction method: Select Direct.
  2. Authentication method: Select Token.
  3. Enter the required values:
    • REST Catalog URI: For example, https://your-catalog.com/api/rest
    • Token: Enter credentials in the format client-id:client-secret
    • Catalog Name
    • Warehouse
    • Scope (if required by your catalog)
  4. Click Test Connection.
  5. Once successful, click Next.

Agent extraction

  1. Extraction method: Select Agent.
  2. Provide the same values as Direct extraction through your configured secret store.
  3. Complete runtime configuration by following How to configure Secure Agent for workflow execution.
  4. Click Next.

Configure connection

On this page, define how this Iceberg connection is identified and managed in Atlan.

  1. Provide a Connection Name that represents your source environment (for example, production, development, or iceberg-blm).
  2. To control who can manage this connection, configure Connection Admins.
  3. Click Next.

Configure crawler

Before running the crawler, optionally customize crawl scope on the Metadata page:

  • Exclude Metadata: Select specific namespaces and tables to skip.
  • Include Metadata: Select specific namespaces and tables to include.
  • Preflight checks: Validate connectivity and permissions before execution.

Run crawler

After configuration, choose how to run:

  • Click Run to run once immediately.
  • Click Schedule & Run to run on a schedule.

Verify crawled assets

After the crawler completes:

  1. Navigate to Workflows and open the Iceberg workflow run.
  2. Review execution details and logs.
  3. Confirm status is Success.

Then verify crawled assets from Iceberg in Atlan search and asset views.

See also