Skip to main content

What does Atlan crawl from Iceberg

Atlan crawls metadata from your Iceberg catalog, including catalogs, namespaces (with nested namespace support), tables, and columns (including nested columns).

Lineage

Atlan establishes the following lineage between Iceberg assets:

  • Catalog -> Namespaces: Each catalog contains multiple namespaces.
  • Namespace -> Tables: Each namespace contains multiple tables.
  • Namespace -> Namespaces: Nested namespaces have parent-child relationships.
  • Table -> Columns: Each table contains multiple columns.
  • Column -> Columns: Nested columns have parent-child relationships (for STRUCT, LIST, and MAP types).

Assets

Atlan crawls the following Iceberg assets and metadata fields.

IcebergCatalog

Iceberg catalogs represent the top-level catalog instances that contain namespaces and tables.

Source fieldAtlan fieldDescription
catalog_namenameCatalog name
catalog_namequalifiedNameUnique qualified name for the catalog
catalog_typeicebergCatalogTypeType of catalog (for example, rest)
uriicebergUriREST catalog URI
iceberg_warehouseicebergWarehouseWarehouse identifier
scopeicebergScopeAccess scope configuration
total_namespacesschemaCountNumber of namespaces in the catalog
iceberg_catalog_propertiesicebergCatalogPropertiesCatalog configuration properties

IcebergNamespace

Namespaces represent logical containers for organizing tables within a catalog. Iceberg supports nested namespaces.

Source fieldAtlan fieldDescription
namespace_strnameNamespace name (leaf segment for nested namespaces)
namespace_strqualifiedNameUnique qualified name for the namespace
namespace_hierarchyicebergNamespaceHierarchyOrdered namespace hierarchy path
namespace_stricebergParentNamespaceQualifiedNameParent namespace qualified name (for nested namespaces)
table_counttableCountNumber of tables in the namespace

IcebergTable

Iceberg tables represent table assets with metadata including partitions, snapshots, and table-level properties.

Source fieldAtlan fieldDescription
table_namenameTable name
table_name + namespace contextqualifiedNameUnique qualified name for the table
table_uuidassetSourceIdSource identifier for the table
locationexternalLocationStorage location of table data
locationexternalLocationRegionParsed storage region (when derivable)
schema_fieldscolumnCountNumber of columns
snapshots.summary.total-recordsrowCountNumber of records
snapshots.summary.total-files-sizesizeBytesTable size in bytes
partitionsisPartitionedWhether the table is partitioned
current_snapshot_idicebergCurrentSnapshotIdCurrent snapshot identifier
last_updated_mssourceUpdatedAtLast updated timestamp on source
source_created_atsourceCreatedAtCreated timestamp on source
format_versionicebergFormatVersionIceberg format version
propertiesicebergTablePropertiesTable-level properties
partitionsicebergTablePartitionsPartition specification details
snapshotsicebergSnapshotsSnapshot metadata

IcebergColumn

Columns represent table fields, including nested field metadata for complex types.

Source fieldAtlan fieldDescription
column_namenameColumn name
column_name / column_path + table contextqualifiedNameUnique qualified name for the column
data_typedataTypeColumn data type
nullableisNullableWhether the column accepts null values
is_partitionisPartitionWhether the column is a partition column
descriptiondescriptionColumn description
sub_typesubTypeNested subtype marker
column_depth_levelcolumnDepthLevelNesting depth level
column_orderorderColumn order within parent scope
nested_column_ordernestedColumnOrderHierarchical order for nested fields
nested_column_countnestedColumnCountNumber of child columns
parent_column_nameparentColumnNameParent column name
parent_column_qualified_nameparentColumnQualifiedNameParent column qualified name
column_hierarchycolumnHierarchyAncestor hierarchy for nested columns