Skip to main content

Load lineage and assets from CSV
App

This guide walks you through loading lineage (and optionally creating assets) from a CSV mapping file stored in Amazon S3. You use an IAM user with access keys so the workflow can read the file from your bucket.

Prerequisites

  • Access to the Lineage and asset loader app. Search for it in the Atlan marketplace; if you don't have access, contact Atlan support to request it.
  • An Atlan connection where lineage process assets live. Create the connection in Atlan before you run the workflow; the workflow doesn't create it.
  • A CSV file that follows the required format (one lineage relationship per row). For the exact columns and examples, see the Lineage and asset loader reference.

Prepare mapping file and S3 access

  1. Build your CSV with one row per source-to-target lineage relationship. Include at least Source Type, Source Conn, Target Type, Target Conn, Create Source If Not Exists, and Create Target If Not Exists. For database assets, include Source DB, Source Schema, Source Table and the matching Target columns. Use the Lineage and asset loader reference for the full schema and examples.

  2. Upload the CSV to an S3 bucket. Note the bucket name, the full path (key) to the file (for example, lineage/mappings/etl-to-warehouse.csv), and the AWS region (for example, us-east-1).

  3. Create an IAM user that can read from this bucket:

    • In the AWS Identity and Access Management console, create a new user.
    • On Set permissions, attach a policy that grants read access to your S3 bucket (and the object key prefix where the file lives).
    • After the user is created, create an access key and copy the Access key ID and Secret access key. You enter these in the workflow.

Configure and run workflow

  1. In Atlan, go to the homepage and click New workflow in the top navigation bar.

  2. Search for Lineage and asset loader and select Set up workflow.

  3. In Workflow name, enter a descriptive name (for example, prod-etl-lineage-loader).

  4. Under Input, select S3 Bucket. Enter:

    • AWS Access Key and AWS Access Secret (the IAM user credentials from step 3).
    • S3 Bucket Name (the bucket that contains your CSV).
    • Mapping filename / key (the full path to the CSV, for example lineage/mappings/etl-to-warehouse.csv).
    • S3 Region (for example, us-east-1).
  5. In Connection QN, enter the qualified name of the connection where lineage process assets are created. You can copy this from the connection’s qualifiedName in Atlan.

  6. In Name, enter the name of the custom metadata set the workflow uses to tag managed assets. If it doesn’t exist, the workflow creates and locks it.

  7. In Instance name, enter the name of the custom metadata property that stores the workflow instance identity (for example, Loader Instance).

  8. In Instance unique ID, enter a unique value for this workflow run (for example, prod-etl-loader-v1). The workflow uses this to find assets it created and to deprecate ones no longer in the mapping file.

    Each workflow configuration must use a different Instance unique ID. Reusing an ID across configurations can cause the workflow to change or remove assets from another run.

  9. Click Run to start the workflow.

Verify results

After the workflow completes:

  • In Atlan, search for one of the target assets from your mapping file and open it.
  • On the asset, open the Lineage tab and confirm the expected upstream and downstream relationships.
  • If you used Create Target If Not Exists (or Create Source If Not Exists), check that the new assets appear in search under the right connection.
  • In the asset’s Custom metadata panel, confirm the Instance unique ID appears on the property you configured.

If lineage is missing for a row, confirm that both source and target assets exist and are active under their connections.

See also