Google Dataplex assets app
The Dataplex assets app crawls Google Dataplex / Knowledge Catalog metadata (aspect
types, data profiling, and data quality) and publishes it to Atlan. Build it with
the AtlanKnowledgeCatalog builder.
Creating an app creates a new connection
Each create mints a new connection and new assets. To re-crawl, re-run the existing workflow (see Re-run an existing app).
Dataplex supports two authentication methods: service account (basic) and
Workload Identity Federation (gcp_wif).
Service account
- Python
Dataplex crawling with a service account
from pyatlan.client.atlan import AtlanClient
from pyatlan.model.apps import AtlanKnowledgeCatalog
client = AtlanClient()
response = (
AtlanKnowledgeCatalog(client)
.basic( # (1)
service_account_json=sa_json, # (2)
project_id="my-project", # (3)
)
.connection(
name="production-dataplex",
admin_roles=[client.role_cache.get_id_for_name("$admin")],
)
.include_projects_optional({"my-project": {}}) # (4)
.run(name="dataplex-prod")
)
print(response.slug, response.run_id)
- Step 1—Credential. Service-account auth; the JSON key is vaulted.
- Required. The service-account JSON key (as a string; keep
\nescaped). - Required. The home GCP project id. For private connectivity, pass
network_connectivity=...andpsc_host=...(both optional). - Step 3—Metadata. GCP projects to include. Empty = all accessible projects.
Workload Identity Federation
- Python
Dataplex crawling with WIF
(
AtlanKnowledgeCatalog(client)
.gcp_wif(
service_account_email="svc@my-project.iam.gserviceaccount.com", # (1)
wif_pool_provider_id="...", # (2)
atlan_oauth_id="...", # (3)
atlan_oauth_secret="••••••", # (4)
project_id="my-project", # (5)
)
.connection(name="production-dataplex", admin_roles=[...])
.run(name="dataplex-prod")
)
- Required. The service-account email.
- Required. The WIF pool provider id.
- Required. Atlan OAuth client id.
- Required. Atlan OAuth client secret.
- Required. The home GCP project id.
Configuration options
All metadata options are optional:
- Python
Dataplex metadata configuration
(
AtlanKnowledgeCatalog(client)
.basic(service_account_json=sa_json, project_id="my-project")
.connection(name="production-dataplex", admin_roles=[...])
.include_projects_optional({"my-project": {}}) # (1)
.exclude_projects_optional({"sandbox-project": {}}) # (2)
.include_aspect_types({"my-aspect": {}}) # (3)
.exclude_aspect_types({"noisy-aspect": {}}) # (4)
.ingest_knowledge_catalog_aspect_metadata(True) # (5)
.ingest_data_profiling_metadata(True) # (6)
.ingest_data_quality_metadata(True) # (7)
.run(name="dataplex-prod")
)
- GCP projects to include (empty = all accessible projects).
- GCP projects to exclude.
- Aspect types to include—if set, only these aspects are extracted.
- Aspect types to exclude.
- Discover Knowledge Catalog aspect types and write per-asset aspect metadata.
- Fetch
DATA_PROFILEscan results and write per-column profiling metrics. - Fetch
DATA_QUALITYscan results and write DQ scores, rules, and dimensions.