Lineage and asset loader App
The Lineage and asset loader app creates lineage relationships between assets (and optionally creates the assets themselves) based on source-to-target mappings defined in a CSV file stored in Amazon S3. It supports both relational database and S3 object asset types, making it useful for tools that Atlan doesn't natively connect to or where lineage can't be extracted automatically. This reference provides complete configuration details for the Lineage and asset loader app.
Configuration
This section defines the fields required for workflow setup.
Workflow name
Specifies a unique and descriptive name to identify this workflow configuration in the Atlan interface. This name appears in the workflow list and helps distinguish it from other lineage workflows.
Example:
prod-etl-lineage-loader
Input
Specifies the method by which the app accesses the input CSV mapping file. S3 Bucket is the only supported option.
Authentication
Selects the AWS authentication method used to access the S3 bucket containing the mapping file.
- IAM user
- IAM role (role-based)
- IAM role (role delegation)
Authenticates using an IAM user's access key and secret. Use this method when you manage a dedicated IAM user with an attached policy that grants access to the S3 bucket.
- AWS Access Key: Access key ID for the IAM user with permission to read from the S3 bucket.
- AWS Access Secret: Secret access key paired with the access key ID in the preceding field.
To set up IAM user authentication:
- Create an IAM user in the AWS Identity and Access Management console.
- On the Set permissions page, attach your S3 access policy to the user.
- After the user is created, copy the access key ID and secret access key for use in the workflow.
Authenticates by attaching your S3 access policy directly to the EC2 role that Atlan uses for its EKS cluster instances. No access key or secret is required.
This option requires a support request. Contact Atlan support to enable role-based authentication for your tenant.
No additional fields are required in the workflow for this authentication method.
Authenticates using cross-account role delegation. Atlan's node instance role assumes a role in your AWS account to access the S3 bucket.
- AWS Role ARN: ARN of the role in your AWS account that Atlan assumes to access the S3 bucket. Leave empty when using role-identity-based access without delegation.
To set up role delegation:
-
Contact Atlan support to get the ARN of the node instance role for your Atlan EKS cluster.
-
Create a new IAM role in your AWS account and attach your S3 access policy.
-
Add a trust relationship to the role using the following policy, replacing
<atlan_nodeinstance_role_arn>with the ARN received from support:{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "<atlan_nodeinstance_role_arn>"
},
"Action": "sts:AssumeRole",
"Condition": {}
}
]
} -
Share the role name and your AWS account ID with Atlan support to complete the setup.
S3 bucket settings
Defines the location of the mapping file in Amazon S3.
-
S3 Bucket Name: Name of the S3 bucket where the input CSV file is stored.
-
Mapping filename / key: Full key (path and filename) of the CSV mapping file within the bucket, including any prefix.
Example:
lineage/mappings/etl-to-warehouse.csv -
S3 Region: AWS region where the S3 bucket is located.
Example:
us-east-1
Connection settings
Defines how the workflow tracks the assets and lineage it creates.
Connection qualified name
Specifies the qualified name of the connection where the workflow creates lineage process assets. This connection must already exist in Atlan before running the workflow. It can't be created by the workflow itself.
Example:
default/custom-etl/1234567890
Name
Specifies the name of the custom metadata set the workflow uses to tag assets it creates or manages. If this custom metadata set doesn't already exist, the workflow creates it and locks it in the UI to prevent accidental modification.
Example:
ETL Lineage Loader
Instance name
Specifies the name of the custom metadata property within the set identified in Name that stores the workflow instance identity on each managed asset.
Example:
Loader Instance
Instance unique ID
Assigns a unique identifier stored on every asset and lineage process created by this workflow run. On subsequent runs, the workflow uses this ID to locate the assets it authored, to update metadata or deprecate records no longer present in the mapping file.
Each workflow configuration must use a distinct Instance unique ID. Reusing an ID across configurations causes the workflow to incorrectly manage assets from other runs.
Example:
prod-etl-loader-v1
Lineage and asset loader CSV file
The input CSV file defines the source-to-target mappings used to create lineage and assets. Each row represents one lineage relationship. Fields fall into four groups: source identifiers, target identifiers, asset creation controllers, and lineage metadata.
Each file supports only one source type and one target type.
Source identifiers
Regardless of source type, the following two fields are required in every row:
| Field | Description |
|---|---|
SOURCE_TYPE | Asset type of the source. Use Table for relational database assets or S3 Object for S3 assets. |
SOURCE_CONN | Qualified name of the connection where source assets reside or are created. |
Additional fields depend on the source asset type:
- Database
- S3
Use the following fields when source assets are relational database tables.
| Field | Description |
|---|---|
SOURCE_DB | Name of the database containing the source table. |
SOURCE_SCHEMA | Name of the schema containing the source table. |
SOURCE_TABLE | Name of the source table. |
Example row (database source):
SOURCE_TYPE,SOURCE_CONN,SOURCE_DB,SOURCE_SCHEMA,SOURCE_TABLE
Table,default/snowflake/1234567890,ANALYTICS,PUBLIC,RAW_ORDERS
Use the following fields when source assets are S3 objects.
| Field | Description |
|---|---|
SOURCE_BUCKET | Name of the S3 bucket containing the source object. |
SOURCE_KEY | Key (path) of the source S3 object within the bucket. |
Example row (S3 source):
SOURCE_TYPE,SOURCE_CONN,SOURCE_BUCKET,SOURCE_KEY
S3 Object,default/s3/1234567890,my-data-bucket,raw/orders/2024.parquet
Target identifiers
Regardless of target type, the following two fields are required in every row:
| Field | Description |
|---|---|
TARGET_TYPE | Asset type of the target. Use Table for relational database assets or S3 Object for S3 assets. |
TARGET_CONN | Qualified name of the connection where target assets reside or are created. |
Additional fields depend on the target asset type:
- Database
- S3
Use the following fields when target assets are relational database tables.
| Field | Description |
|---|---|
TARGET_DB | Name of the database containing the target table. |
TARGET_SCHEMA | Name of the schema containing the target table. |
TARGET_TABLE | Name of the target table. |
Use the following fields when target assets are S3 objects.
| Field | Description |
|---|---|
TARGET_BUCKET | Name of the S3 bucket containing the target object. |
TARGET_KEY | Key (path) of the target S3 object within the bucket. |
Asset creation controllers
Controls whether the workflow creates source or target assets when they don't already exist in Atlan. Lineage generates only for rows where both the source and target assets exist in Atlan (either pre-existing or created by the workflow in the same run).
| Field | Values | Description |
|---|---|---|
CREATE_SOURCE_IF_NOT_EXISTS | TRUE / FALSE | When TRUE, the workflow creates the source asset if it doesn't exist. When FALSE, the source asset must already exist in Atlan for lineage to be generated on that row. |
CREATE_TARGET_IF_NOT_EXISTS | TRUE / FALSE | When TRUE, the workflow creates the target asset if it doesn't exist. When FALSE, the target asset must already exist in Atlan for lineage to be generated on that row. |
Lineage metadata fields
Attaches descriptive metadata to the lineage process asset connecting each source-to-target pair.
| Field | Description |
|---|---|
DESCRIPTION | Human-readable description saved on the lineage process asset. Updates on subsequent runs if the value changes in the mapping file. |
EXPRESSION | SQL statement or expression saved on the lineage process asset. Updates on subsequent runs if the value changes in the mapping file. |
Both fields are optional. Rows without these values create lineage with no description or expression on the process asset.
See also
- Load lineage and assets from CSV: Step-by-step guide for preparing the mapping file, configuring S3 access, and running the workflow.
- Lineage Builder: Reference for creating lineage from CSV uploaded directly or via object storage.
- Alert propagation: Reference for propagating alert metadata through lineage to downstream assets.