Generate GCS to BigQuery external table lineage
Atlan enables you to generate lineage between BigQuery external tables and upstream GCS objects they reference by using the GCS → BigQuery External Table Lineage workflow. This lets you trace data from GCS files through to BigQuery tables in Atlan so that impact analysis and discovery reflect the full path from storage to the data warehouse.
Prerequisites
Before you begin, make sure you have:
- Access to the GCS → BigQuery External Table Lineage workflow. You can verify this by searching for GCS → BigQuery External Table Lineage in the Atlan marketplace. If you don't have access, contact Atlan support or your Atlan customer team to request it.
- BigQuery tables and GCS objects already crawled and cataloged in Atlan. Lineage can't be generated for assets that don't exist in Atlan.
- A BigQuery connection in Atlan that contains the external tables you want to link to GCS.
Create lineage workflow
- Navigate to the bottom right of any screen and select Workflow.
- Select Marketplace from the top if you are creating a new workflow, or select Manage if you want to use an existing workflow.
- Select GCS → BigQuery External Table Lineage from the package list.
- Select Setup Workflow.
Configure connection
Specify the BigQuery connection that contains the external tables:
- For BigQuery connection, select the BigQuery connection in Atlan where the external tables are stored.
Configure filename transformation
If the file names stored in BigQuery metadata differ from the object names in Atlan (for example, due to encoding or path differences), use the regex fields to normalize them before matching:
- Regex to match characters to replace: A regular expression that matches characters to replace in the file's full name, excluding the bucket prefix.
- Regex with replacement characters: The replacement expression applied to matches from the preceding regex.
Leave both fields blank if no transformation is needed.
Configure operation
Choose what to do when the workflow runs:
- Generate (default): Publish lineage between GCS objects and BigQuery external tables to Atlan.
- Delete: Remove lineage in Atlan that was previously created by this package. Only lineage generated by this package is deleted.
Configure source type
Choose how GCS objects are stored in Atlan:
- File (default): Use this when GCS objects are cataloged in Atlan as GCS objects. This is the standard case when GCS has been crawled with the GCS connector.
- Table: Use this when GCS objects are stored in Atlan as tables inside a GCS connection (where
connectorName = gcs).
Run and verify workflow
- After completing the configuration, select Run to run the workflow once immediately, or select Schedule & Run to run it hourly, daily, weekly, or monthly.
- To verify lineage, open a BigQuery external table that was in scope in Atlan and navigate to the Lineage tab. Confirm that upstream GCS objects are linked to the table as expected.
Need help?
If you have any issues configuring or running the workflow, contact Atlan support.
See also
- Crawl GCS assets: Catalog GCS buckets and objects in Atlan.
- What does Atlan crawl from GCS: Reference for GCS assets and properties that Atlan catalogs.