Set up Iceberg
Configure your Iceberg catalog to enable Atlan to connect and crawl your data lakehouse assets.
Atlan supports two setup modes for Iceberg:
- Generic REST Catalog using OAuth2 client credentials
- BigLake Metastore (BLM) on Google Cloud using either service account key auth or Workload Identity Federation (WIF)
Prerequisites
Before you begin, make sure you have:
- Permission to create and assign IAM roles in your environment
- Network connectivity from Atlan (or Self-Deployed Runtime) to your catalog endpoint
Choose setup mode
- Generic REST Catalog
- BigLake Metastore (GCP)
Use this mode for REST catalogs that support OAuth2 client credentials.
- Request REST catalog credentials from your catalog administrator.
- Gather the following values:
- REST Catalog URI (for example,
https://your-catalog.com/api/rest) - Client ID
- Client Secret
- Catalog Name
- Warehouse
- Scope (if required by your catalog)
- REST Catalog URI (for example,
- When creating the crawler in Atlan, select Authentication method = Token and enter credentials in the format
client-id:client-secret.
Use this mode when your Iceberg REST catalog is backed by Google BigLake Metastore.
Required BigLake permissions
Create a custom IAM role with the following permissions, then assign it to the service account used by Atlan:
biglake.catalogs.get: Retrieves catalog metadata. This is metadata access only and doesn't grant table data access.biglake.databases.get: Retrieves namespace/database metadata. This is metadata access only and doesn't grant table data access.biglake.databases.list: Lists namespaces/databases in the catalog for discovery.biglake.tables.get: Retrieves table metadata. This is metadata access only and doesn't grant table data access.biglake.tables.list: Lists tables in a namespace for discovery.biglake.catalogs.use: Enables use of the catalog resource during metadata API calls.biglake.databases.use: Enables use of namespace/database resources during metadata API calls.biglake.tables.readMetadata: Reads table metadata details (schema, partitions, snapshots) without reading table data.
Authentication setup for BLM
A Google Cloud service account is required for both authentication options below.
Create service account
- Create (or reuse) a service account in your Google Cloud project.
- Assign the custom BigLake role to this service account.
- Keep the service account email ready for crawler configuration.
Choose authentication mode
- Service account key
- Workload Identity Federation (WIF)
Use this option when you want key-based authentication.
- Create and securely store a JSON key for the service account.
- Use these values when configuring the crawler:
- Project ID
- Location
- Catalog Name
- Warehouse (for example,
gs://your-bucket/warehouse) - Service account JSON key
Use this option to avoid long-lived service account keys.
- Create an OAuth client in Atlan and securely store:
- OAuth Client ID
- OAuth Client Secret
- In Google Cloud, create a Workload Identity Pool and OIDC provider that trusts your Atlan tenant issuer.
- Configure attribute mapping for audience and add your Atlan OAuth client ID as the audience.
- Grant
roles/iam.workloadIdentityUseron the target service account to the workload identity principal set. - Copy the WIF provider resource name in this format:
//iam.googleapis.com/projects/<project-number>/locations/global/workloadIdentityPools/<pool-id>/providers/<provider-id>
- Use these values when configuring the crawler:
- Project ID
- Location
- Catalog Name
- Warehouse
- Service Account Email
- WIF Pool Provider ID
- Atlan OAuth Client ID
- Atlan OAuth Client Secret
For detailed WIF setup flow, refer to Set up Workload Identity Federation for Google BigQuery. The same Atlan OAuth and Google WIF concepts apply.
Verify network connectivity
Before crawling, confirm Atlan can reach your Iceberg catalog:
- HTTPS access: Your REST catalog endpoint must be available via HTTPS.
- Firewall rules: Permit outbound connections from Atlan (or Self-Deployed Runtime) to your catalog endpoint.
- DNS resolution: Your catalog hostname must be resolvable from the runtime.
Next steps
- Crawl Iceberg assets: Configure and run the crawler to extract metadata from Iceberg.