BigQuery assets app
The BigQuery assets app crawls Google BigQuery assets and publishes them to Atlan
for discovery. Build it with the BigqueryCrawler builder, which mirrors the
"new app" wizard: Credential → Connection → Metadata.
Each time you create the app it mints a new connection and new assets within it—running it repeatedly with the same settings can produce duplicate assets. To re-crawl, re-run the existing workflow (see Re-run an existing app).
Service account
To crawl BigQuery using service-account authentication (the UI default):
- Python
from pyatlan.client.atlan import AtlanClient
from pyatlan.model.apps import BigqueryCrawler
client = AtlanClient()
response = (
BigqueryCrawler(client) # (1)
.service_account( # (2)
email="svc@my-project.iam.gserviceaccount.com", # (3)
service_account_json=sa_json, # (4)
project_id="my-project", # (5)
connectivity="public", # (6)
)
.connection( # (7)
name="production-bigquery",
admin_roles=[client.role_cache.get_id_for_name("$admin")],
admin_groups=None,
admin_users=None,
)
.include({"my-project": ["analytics", "sales"]}) # (8)
.exclude({"my-project": ["staging"]}) # (9)
.exclude_regex(".*_TMP") # (10)
.import_nested_columns(True) # (11)
.combine_sharded_tables(True) # (12)
.run(name="bigquery-prod") # (13)
)
print(response.slug, response.run_id) # (14)
- Base configuration for a new BigQuery crawler. You must provide a
client. - Step 1—Credential. Service-account auth; the JSON key is vaulted and never persisted in the workflow.
- The service-account email.
- The service-account JSON key, as a string. Paste the key file's contents
unmodified (newlines stay escaped as
\n). - Your GCP project id.
publicuses Google's public endpoint;privateuses Private Service Connect—forprivate, also passhost="https://your-psc-host".- Step 2—Connection. Provide a display name and at least one admin (role, group, or user). The builder mints the connection qualified name.
- Step 3—Metadata. Datasets to crawl, as
{project: [dataset, ...]}(anchored as regex automatically). Omit to crawl everything. - Datasets to skip—exclude takes priority over include.
- Regex for tables/views to exclude from extraction.
- Parse nested (
STRUCT/ARRAY) columns into child columns. - Combine sharded tables of the same prefix into a single asset.
.run(name=...)creates and submits a run. Use.create(name=...)to create without running.- Persist
response.slugfor later operations (see Manage apps).
Workload Identity Federation
To crawl BigQuery using Workload Identity Federation (keyless) auth:
- Python
from pyatlan.client.atlan import AtlanClient
from pyatlan.model.apps import BigqueryCrawler
client = AtlanClient()
response = (
BigqueryCrawler(client)
.workload_identity_federation( # (1)
project_id="my-project",
connectivity="public",
)
.connection(
name="production-bigquery",
admin_roles=[client.role_cache.get_id_for_name("$admin")],
)
.include({"my-project": ["analytics"]})
.run(name="bigquery-prod")
)
- Workload Identity Federation auth—no service-account key is stored. Provide any provider-specific values as additional keyword arguments.
Other metadata options
Beyond the options shown previously, the builder exposes the rest of the wizard's metadata toggles:
- Python
(
BigqueryCrawler(client)
.service_account(email=..., service_account_json=..., project_id=...)
.connection(name="production-bigquery", admin_roles=[...])
.import_tags(True) # (1)
.hidden_assets(True) # (2)
.custom_config('{"ignore-all-case": true}') # (3)
.run(name="bigquery-prod")
)
- Import tags from BigQuery into Atlan.
- Crawl hidden datasets.
- Switch advanced config to
customand supply a feature-flag JSON string.
Preview the payload
Call .preview() instead of .create() / .run() to assemble and inspect the
inputs payload offline (no network call, secret redacted):
- Python
builder = (
BigqueryCrawler(client)
.service_account(email=..., service_account_json=..., project_id=...)
.connection(name="production-bigquery", admin_roles=[...])
.include({"my-project": ["analytics"]})
)
print(builder.preview()) # (1)
- Returns the full
inputsdict the app submits, with the credential redacted—handy for review and testing.
Re-run with an existing credential
To create another workflow that reuses an already-vaulted credential (instead of vaulting a new one), pass its guid:
- Python
(
BigqueryCrawler(client)
.credential_guid("e49783c7-...") # (1)
.connection(name="production-bigquery", admin_roles=[...])
.include({"my-project": ["analytics"]})
.run(name="bigquery-prod-2")
)
- Reuses the vaulted credential by guid—no new secret is stored.