Skip to main content

What does Atlan crawl from Google Cloud Knowledge Catalog

Atlan crawls Aspect metadata, data quality results, and data profiling results from your Google Cloud Knowledge Catalog, and attaches them to the corresponding BigQuery assets in Atlan.

Aspects

Aspects are custom metadata schemas attached to assets in Knowledge Catalog. Atlan crawls Aspect details including fields, labels, and timestamps, and attaches them to the corresponding BigQuery assets.

Aspect field values are editable in Atlan. When reverse sync is enabled, changes made in Atlan are written back to Knowledge Catalog. Aspect Type schemas are read-only and can't be modified through Atlan. Aspects linked to assets can't be deleted on Atlan.

Source fieldAtlan fieldDescription
Aspect listassetGCPDataplexAspectListArray of Aspect names attached to the asset
Aspect field listassetGCPDataplexAspectFieldListArray of formatted Aspect field strings (format: aspectName|||fieldName|||fieldValue)
Aspect detailsassetGCPDataplexAspectDetailsPer-Aspect details including full name, display name, Aspect Type, labels, timestamps, and field values

Data quality

When Ingest Data Quality is enabled, Atlan crawls Data Quality scan results from Knowledge Catalog and attaches them to the corresponding BigQuery Tables and Columns. The most recent 7 run results are captured per scan.

Source fieldAtlan fieldDescription
DQ scan resultsassetExternalDQMetadataDetailsData Quality scan results including pass/fail status, rule outcomes, and row counts

Data profiling

When Ingest Data Profiling is enabled, Atlan crawls Data Profiling scan results from Knowledge Catalog and attaches them to the corresponding BigQuery Column assets. The following attributes are populated on each Column asset based on the most recent DATA_PROFILE scan job for the parent table.

Universal metrics: written for every profiled column

Atlan fieldDescription
columnMissingValuesCountLongNumber of null or missing values in the column
columnMissingValuesPercentagePercentage of null or missing values relative to total row count
columnDistinctValuesCountLongNumber of distinct values in the column
columnDistinctValuesPercentagePercentage of distinct values relative to total row count
columnTopValuesArray of the most frequent values in the column, each with columnValue (the value) and columnValueFrequency (its occurrence count)

String metrics: written when the column has a string data type

Atlan fieldDescription
columnMinimumStringLengthShortest string length observed in the column
columnMaximumStringLengthLongest string length observed in the column
columnAverageLengthValueAverage string length across all non-null values in the column

Numeric metrics: written when the column has a numeric data type

Atlan fieldDescription
columnMinValueMinimum numeric value in the column
columnMaxValueMaximum numeric value in the column
columnMeanValueMean (average) of all numeric values in the column
columnMedianValueMedian (Q2) of all numeric values in the column
columnStandardDeviationValueStandard deviation of numeric values in the column

See also