A Data Lineage reveals how data has evolved through its lifecycle — how and where it has come from. It is a medium to trace back the data sources it is derived from and the transformational steps it has gone through.
The main objective of a good Data Lineage is to make the process of backtracking the data's origins as easy as possible.
Atlan's Data Lineage gives a clear picture of the entire flow of the data and makes it easy for the user to navigate through it.
Apart from providing all information about a data asset's source and creation, Atlan also shows other data assets that are dependent on the original asset and how they will be affected by changes — all down to a column level, and even including widgets and dashboards from your BI tools.
🔍 Lets users independently track a data quality issue back to the source without bugging the engineering team
✅ Gives quick access to the logic of how a particular column or metric was created
❗ Foresees the list of data tables or columns that will be affected before making any schema changes
Data Lineage gives a clear visual demarcation between the source (or origin) asset and impacted assets. All source assets are shown using green connecting lines, and impacted assets using red connecting lines.
You can click any asset in the Lineage diagram to check its details. The preview shows the source of the assets, number of rows and columns, and classification (if any). If you click on the asset name in the preview, you'll be redirected to that asset's lineage.
Just like asset details, you can also preview the code that is transforming the data asset. Just click on the circle to view the intermediary code.
When you hover over any particular asset, only the linked assets are shown. All other assets are blurred out. This helps you focus on the asset you want to learn more about.
If you want to see how an asset will impact other assets, you can do that right from this screen.
Click on the asset whose impacted assets you need.
Check the preview. An option to download the impacted assets will show up at the bottom.
Click on this option, and a file with the names and links of all your impacted assets will get downloaded.
You can expand the Lineage view to full screen and zoom in or out as needed.
Atlan also shows how an asset indirectly impacts other assets. For example, say you are filtering a table by the column
profit to show the most profitable products. Here, the column
profit is indirectly impacting the entire output table.
If a classification is attached to an asset, it will flow down to all the secondary assets that are created using the original asset. For example, if the column
Customer Name is tagged as
PII, then all the columns created using
Customer Name will automatically have
PII tagged to it. This reduces a lot of manual work and makes sure your data is secure.
Your Lineage set-up depends your organization's use case and infrastructure. We’ll work with you and your team to automate the process and help set up Lineage.
Here are some of the sources we support.
By parsing the SQL queries from query engines like Presto or relational databases or warehouses like PostgreSQL or Snowflake, Atlan can automatically create all the relationships among the tables and columns. Atlan can either pull in the SQL queries from the history logs or pick it up from cloud storage.
Atlan can use the information provided your BI tool (like Tableau, Sisense or Redash) to create the Lineage — connecting the widgets and dashboards to the source tables and columns that were used to create them.
For sources that Atlan doesn’t support just yet, we won't make you manually create the Lineage for your organization. Atlan's flexible REST APIs can be used to push the relationships to Atlan and see Lineage come to life. This can be very helpful for creating Lineage from ETL tools like Airflow.
Seeing how data flows across different sources in your organization can be very powerful and save a lot of time spent on internal communication and debugging.
At Atlan, we understand that end-to-end Lineage can be complex and dependent on your data architecture. We'd be happy to discuss your infrastructure and see how a Lineage can be created for your sources.