What does Atlan crawl from Google Cloud Knowledge Catalog
Atlan crawls Aspect metadata, data quality results, and data profiling results from your Google Cloud Knowledge Catalog, and attaches them to the corresponding BigQuery assets in Atlan.
Aspects
Aspects are custom metadata schemas attached to assets in Knowledge Catalog. Atlan crawls Aspect details including fields, labels, and timestamps, and attaches them to the corresponding BigQuery assets.
Aspect field values are editable in Atlan. When reverse sync is enabled, changes made in Atlan are written back to Knowledge Catalog. Aspect Type schemas are read-only and can't be modified through Atlan. Aspects linked to assets can't be deleted on Atlan.
| Source field | Atlan field | Description |
|---|---|---|
| Aspect list | assetGCPDataplexAspectList | Array of Aspect names attached to the asset |
| Aspect field list | assetGCPDataplexAspectFieldList | Array of formatted Aspect field strings (format: aspectName|||fieldName|||fieldValue) |
| Aspect details | assetGCPDataplexAspectDetails | Per-Aspect details including full name, display name, Aspect Type, labels, timestamps, and field values |
Data quality
When Ingest Data Quality is enabled, Atlan crawls Data Quality scan results from Knowledge Catalog and attaches them to the corresponding BigQuery Tables and Columns. The most recent 7 run results are captured per scan.
| Source field | Atlan field | Description |
|---|---|---|
| DQ scan results | assetExternalDQMetadataDetails | Data Quality scan results including pass/fail status, rule outcomes, and row counts |
Data profiling
When Ingest Data Profiling is enabled, Atlan crawls Data Profiling scan results from Knowledge Catalog and attaches them to the corresponding BigQuery Column assets. The following attributes are populated on each Column asset based on the most recent DATA_PROFILE scan job for the parent table.
Universal metrics: written for every profiled column
| Atlan field | Description |
|---|---|
columnMissingValuesCountLong | Number of null or missing values in the column |
columnMissingValuesPercentage | Percentage of null or missing values relative to total row count |
columnDistinctValuesCountLong | Number of distinct values in the column |
columnDistinctValuesPercentage | Percentage of distinct values relative to total row count |
columnTopValues | Array of the most frequent values in the column, each with columnValue (the value) and columnValueFrequency (its occurrence count) |
String metrics: written when the column has a string data type
| Atlan field | Description |
|---|---|
columnMinimumStringLength | Shortest string length observed in the column |
columnMaximumStringLength | Longest string length observed in the column |
columnAverageLengthValue | Average string length across all non-null values in the column |
Numeric metrics: written when the column has a numeric data type
| Atlan field | Description |
|---|---|
columnMinValue | Minimum numeric value in the column |
columnMaxValue | Maximum numeric value in the column |
columnMeanValue | Mean (average) of all numeric values in the column |
columnMedianValue | Median (Q2) of all numeric values in the column |
columnStandardDeviationValue | Standard deviation of numeric values in the column |
See also
- Set up Knowledge Catalog: Configure Knowledge Catalog connection and authentication