Skip to main content

Traverse lineage

Retrieve lineage

To fetch lineage, you need to request lineage from Atlan from a particular starting point:

Retrieve lineage
FluentLineage.builder(client,
"495b1516-aaaf-4390-8cfd-b11ade7a7799") // (1)
.depth(1000000) // (2)
.direction(AtlanLineageDirection.DOWNSTREAM) // (3)
.pageSize(10) // (4)
.includeOnResults(Asset.NAME) // (5)
.immediateNeighbors(true) // (6)
.stream() // (7)
.forEach(result -> { // (8)
// Do something with the result
});
  1. Build a request for lineage with the starting point for your lineage retrieval (the GUID of an asset). If you already have an asset, you can also instead run requestLineage() on the asset to directly build the same request.
  2. You can specify how far you want lineage to be fetched using depth(). A depth of 1 will only fetch immediate upstream and downstream assets, while 2 will also fetch the immediate upstream and downstream assets of those assets, and so on. The default value of 1000000 will fetch upstream and downstream assets up to 1,000,000 hops away (basically all lineage).
  3. You can fetch only upstream assets or only downstream assets. In the list API, you can't access both directions at the same time.
  4. You can specify how many results to include per page of results (defaults to 10).
  5. You can also specify any extra attributes you want to include in each asset in the resulting list.
  6. To include details about which asset is upstream and downstream of which other asset, set immediateNeighbors to true. (Without this, all downstream assets will be listed in breadth-first order, but you won't know specifically which asset is downstream of which other asset.)
  7. You can then directly stream the results from the request. These will be lazily-fetched and paged automatically.
  8. A normal Java Stream is created, so you can apply any stream-based operations to it (filtering, mapping, collecting, or doing something for each result as in this example).

Traverse lineage

The new lineage list API returns results in breadth-first order. So you can traverse the lineage by progressing through the result list in the order they're returned, even across multiple pages of results.

Downstream assets

To traverse downstream assets in lineage:

Traverse downstream lineage
FluentLineage.builder(client,
"495b1516-aaaf-4390-8cfd-b11ade7a7799") // (1)
.direction(AtlanLineageDirection.DOWNSTREAM) // (2)
.immediateNeighbors(true) // (3)
.stream() // (4)
.filter(a -> !(a instanceof ILineageProcess)) // (5)
.limit(100) // (6)
.forEach(result -> { // (7)
// Do something with each result
for (LineageRef ref : result.getImmediateDownstream())
});
  1. Specify the GUID of an asset for the starting point. (Or from an asset itself, use requestLineage() to start the same builder.)
  2. Request the DOWNSTREAM direction.
  3. If you want to understand specifically which assets are downstream from which other assets, set immediateNeighbors to true.
  4. You can then stream the results from the request. The pages will be lazily-fetched in the background, as-needed.
  5. With streams, you can apply additional filters over the results (in this example any processes in the results will be ignored).
  6. With streams, you can also limit the total number of results you want to process—independently from page size of retrievals. With lazy-fetching of the results, this will make sure you only retrieve the number of pages required to complete the stream.
  7. Of course, you still need to actually do something with those remaining results.
  8. If immediateNeighbors is true, each asset will have a list of downstream lineage references populated in .getImmediateDownstream().
  9. You can, for example, retrieve the GUID of each of these downstream references to see the assets that are immediately downstream from the asset you are iterating through in the lineage results.

Upstream assets

To traverse upstream assets in lineage:

Traverse upstream lineage
FluentLineage.builder(client,
"495b1516-aaaf-4390-8cfd-b11ade7a7799") // (1)
.direction(AtlanLineageDirection.UPSTREAM) // (2)
.immediateNeighbors(true) // (3)
.stream() // (4)
.filter(a -> !(a instanceof ILineageProcess)) // (5)
.limit(100) // (6)
.forEach(result -> { // (7)
// Do something with each result
for (LineageRef ref : result.getImmediateUpstream())
});
  1. Specify the GUID of an asset for the starting point. (Or from an asset itself, use requestLineage() to start the same builder.)
  2. Request the UPSTREAM direction.
  3. If you want to understand specifically which assets are upstream from which other assets, set immediateNeighbors to true.
  4. You can then stream the results from the request. The pages will be lazily-fetched in the background, as-needed.
  5. With streams, you can apply additional filters over the results (in this example any processes in the results will be ignored).
  6. With streams, you can also limit the total number of results you want to process—independently from page size of retrievals. With lazy-fetching of the results, this will make sure you only retrieve the number of pages required to complete the stream.
  7. Of course, you still need to actually do something with those remaining results.
  8. If immediateNeighbors is true, each asset will have a list of upstream lineage references populated in .getImmediateUpstream().
  9. You can, for example, retrieve the GUID of each of these upstream references to see the assets that are immediately upstream from the asset you are iterating through in the lineage results.

Filter lineage

You can also filter the information fetched through lineage. This can help improve performance of your code by limiting the results it will fetch to only those you require.

Retrieve only active assets

In most cases for lineage you only care about active assets. By filtering to only active assets, you can improve the performance of lineage retrieval by as much as 10x. (The new FluentLineage interface will do this automatically, unless you explicitly request the inclusion of archived assets.)

Not possible to filter by custom metadata

You currently can't filter lineage based on the values of custom metadata.

Limit assets in response

You can limit the assets you will see in the response through entity filters. These restrict what assets will be included in the results, but still traverse all of the lineage:

Limit assets in response
List<Asset> verifiedAssets = Asset.lineage(client, "495b1516-aaaf-4390-8cfd-b11ade7a7799") // (1)
.direction(AtlanLineageDirection.UPSTREAM)
.includeInResults(Asset.CERTIFICATE_STATUS.inLineage.eq(CertificateStatus.VERIFIED)) // (2)
.includesCondition(FilterList.Condition.AND) // (3)
.stream() // (4)
.collect(Collectors.toList()); // (5)
  1. Build the request as you would above, or request it directly from an asset. Because this operation will directly request lineage for the asset in Atlan, you must provide it an AtlanClient through which to connect to the tenant.

  2. Add one or more includeInResults to the request before sending it to Atlan. Each of these defines criteria for which assets should be filtered for inclusion in the results, in this example only assets with a verified certificate will be included. The criterion itself is composed of:

    • The field by which you want to filter (Asset.CERTIFICATE_STATUS in this example).
    • A fixed member within that field that builds lineage filters, called .inLineage.
    • The operator you want to use to compare values for that field in order to determine whether or not an asset matches (.eq() in this example).
    • The value you want to compare against using that operator (CertificateStatus.VERIFIED in this example).
  3. Optionally, you can use includesCondition in your lineage request to specify whether the includeInResults criteria should be combined with AND (default) or if any matching is sufficient (OR).

  4. When you then stream the results, only those assets that match the filter criteria will be included in the response.

  5. You can then collect them (standard stream operation) to give a complete list, across pages, of those assets that match the criteria.

Limit lineage traversal

You can also limit how much of the lineage is traversed. You can do this both at an asset-level and a relationship-level:

Limit lineage traversal
List<Asset> activeAssets = Asset.lineage(client, "495b1516-aaaf-4390-8cfd-b11ade7a7799")
.direction(AtlanLineageDirection.DOWNSTREAM)
.whereAsset(FluentLineage.ACTIVE) // (1)
.assetsCondition(FilterList.Condition.AND) // (2)
.whereRelationship(FluentLineage.ACTIVE)
.relationshipsCondition(FilterList.Condition.AND) // (3)
.stream() // (4)
.collect(Collectors.toList()); // (5)
  1. Provide your conditions to the whereAsset and whereRelationship of the request. This will make sure that once an asset (or relationship) is found in lineage traversal that does not match the conditions, further lineage traversal beyond that asset (or relationship) won't be done.

    In this example, that means that once we hit an archived or soft-deleted asset (or relationship) in the lineage, we won't look for any further downstream lineage from that archived or soft-deleted asset (or relationship). (In other words, we will limit the lineage results to only active assets by short-circuiting traversal when we hit an archived or soft-deleted asset or relationship.)

    FluentLineage.ACTIVE constant

Note that the FluentLineage.ACTIVE example here is a predefined filter constant. If you look at its code, it's equivalent to writing any other lineage filter:

Asset.STATUS.inLineage.eq(AtlanStatus.ACTIVE)

When you request lineage directly on an asset, as in the example above, by default only active assets and relationships are included. (In other words, the filters by FluentLineage.ACTIVE are applied by default when using the Asset.lineage() request style.) ::: 2. Optionally, you can use assetsCondition in your lineage request to specify whether the whereAsset criteria should be combined with AND (default) or if any matching is sufficient (OR).

  1. Optionally, you can use relationshipsCondition in your lineage request to specify whether the whereRelationship criteria should be combined with AND (default) or if any matching is sufficient (OR).

  2. When you then fetch the results and iterate through them, not only are those assets that match the filter criteria the only ones included in the response, but the traversal is likely to run significantly faster as well by entirely skipping any further downstream traversal through the assets that don't match.

  3. You can continue to process the results from there as you would with any stream: filtering, mapping, running something for each result, or in this example collecting them into a list.

Limit asset details

You can also limit the details for each asset returned by lineage:

Limit asset details
LineageListRequest request = Asset.lineage(client, "495b1516-aaaf-4390-8cfd-b11ade7a7799")
.direction(AtlanLineageDirection.DOWNSTREAM)
.includeOnResults(Asset.DESCRIPTION) // (1)
.toRequestBuilder() // (2)
.excludeAtlanTags(false) // (3)
.excludeMeanings(false) // (4)
.build();
List<Asset> withTagsAndTerms = request.fetch(client) // (5)
.stream() // (6)
.collect(Collectors.toList());
  1. Build the request as above, but chain as many includeOnResults as you like to specify the attributes you want to include on each asset in the lineage.
  2. You can also decide whether to include or exclude Atlan tags and assigned business terms, but to do this you must first conver the fluent lineage request into a LineageListRequest. You can do this by chaining toRequestBuilder().
  3. You can then use excludeAtlanTags(false) to make sure that Atlan tags are included on each asset in lineage.
  4. You can also use excludeMeanings(false) to make sure that assigned business terms are included on each asset in lineage.
  5. You then need to call fetch() on the LineageListRequest to actually run the lineage request. Because this operation will directly request lineage for the asset in Atlan, you must provide it an AtlanClient through which to connect to the tenant.
  6. You can then stream and further transform or collect the results from the request, directly.

Original API

Deprecated and removed

The original lineage API was previously deprecated, and now no longer exists in the latest releases of the SDKs. It's slower, doesn't support paging, and won't receive any enhancements. We would therefore strongly recommend using the newer API (described above); however, the original API is described here for completeness.

Retrieve lineage (deprecated)

To fetch lineage, you need to request lineage from Atlan from a particular starting point:

Java

Retrieve lineage
LineageRequest request = LineageRequest.builder() // (1)
.guid("495b1516-aaaf-4390-8cfd-b11ade7a7799") // (2)
.depth(0) // (3)
.direction(AtlanLineageDirection.BOTH) // (4)
.hideProcess(true) // (5)
.allowDeletedProcess(false) // (6)
.build(); // (7)
LineageResponse response = request.fetch(); // (8)
  1. Build a LineageRequest to specify the starting point for your lineage retrieval.

  2. The starting point for lineage must be the GUID of an asset.

  3. You can specify how far you want lineage to be fetched using depth(). A depth of 1 will only fetch immediate upstream and downstream assets, while 2 will also fetch the immediate upstream and downstream assets of those assets, and so on. The default value of 0 will fetch all upstream and downstream assets.

    If you expect extensive lineage, change the default!

The default value of 0 can result in a long-running API call with a very large response payload. If you expect your lineage to be extensive, you may want to try smaller depths first. ::: 4. You can fetch only upstream assets, only downstream assets, or lineage in both directions. 5. Decide whether to include processes in the response.

Use true if you want to use the SDK's traversal helpers

Currently the SDK's traversal logic only works when this is set to true. Unless you want to code your own traversal logic, set hideProcess to true.

  1. If allowDeletedProcess is set to true and hideProcess is set to false then deleted (archived) processes will also be included in the response.

  2. Build the request.

  3. Call the fetch() method to actually retrieve the lineage details from Atlan.

    Python

Retrieve lineage
from pyatlan.client.atlan import AtlanClient
from pyatlan.model.enums import LineageDirection
from pyatlan.model.lineage import LineageRequest

client = AtlanClient()
request = LineageRequest( # (1)
guid="495b1516-aaaf-4390-8cfd-b11ade7a7799", # (2)
depth=0, # (3)
direction=LineageDirection.BOTH, # (4)
hide_process=True, # (5)
allow_deleted_process=False, # (6)
)
response = client.asset.get_lineage(request) # (7)
  1. Build a LineageRequest to specify the starting point for your lineage retrieval.

  2. The starting point for lineage must be the GUID of an asset.

  3. You can specify how far you want lineage to be fetched using depth. A depth of 1 will only fetch immediate upstream and downstream assets, while 2 will also fetch the immediate upstream and downstream assets of those assets, and so on. The default value of 0 will fetch all upstream and downstream assets.

    If you expect extensive lineage, change the default!

The default value of 0 can result in a long-running API call with a very large response payload. If you expect your lineage to be extensive, you may want to try smaller depths first. ::: 4. You can fetch only upstream assets, only downstream assets, or lineage in both directions. 5. Decide whether to include processes in the response.

Use True if you want to use the SDK's traversal helpers

Currently the SDK's traversal logic only works when this is set to True. Unless you want to code your own traversal logic, set hide_process to True.

  1. If allow_deleted_process is set to True and hide_process is set to False then deleted (archived) processes will also be included in the response.

  2. Call the asset.get_lineage() method to actually retrieve the lineage details from Atlan.

    Raw REST API

POST /api/meta/lineage/getlineage
{
"guid": "495b1516-aaaf-4390-8cfd-b11ade7a7799", // (1)
"depth": 0, // (2)
"direction": "BOTH", // (3)
"hideProcess": true, // (4)
"allowDeletedProcess": false // (5)
}
  1. The starting point for lineage must be the GUID of an asset.

  2. You can specify how far you want lineage to be fetched using depth. A depth of 1 will only fetch immediate upstream and downstream assets, while 2 will also fetch the immediate upstream and downstream assets of those assets, and so on. The default value of 0 will fetch all upstream and downstream assets.

    If you expect extensive lineage, change the default!

The default value of 0 can result in a long-running API call with a very large response payload. If you expect your lineage to be extensive, you may want to try smaller depths first. ::: 3. You can fetch only upstream assets, only downstream assets, or lineage in both directions. 4. Decide whether to include processes in the response. 5. If allowDeletedProcess is set to true and hideProcess is set to false then deleted (archived) processes will also be included in the response.

Traverse lineage (deprecated)

To assist with traversal of the lineage, the SDK provides some helper methods.

Downstream assets (deprecated)

To retrieve assets immediately downstream from the originally-requested asset:

Java

Retrieve downstream assets
Set<String> downstreamGuids = response.getDownstreamAssetGuids(); // (1)
List<Asset> downstreamAssets = response.getDownstreamAssets(); // (2)
downstreamGuids = response.getDownstreamProcessGuids(); // (3)
  1. The getDownstreamAssetGuids() method will return the GUIDs of assets that are immediately downstream.
  2. The getDownstreamAssets() method will return the asset objects for the assets that are immediately downstream.
  3. The getDownstreamProcessGuids() method will return the GUIDs of the processes that run immediately downstream.

Python

Retrieve downstream assets
downstream_guids = response.get_downstream_asset_guids() # (1)
downstream_assets = response.get_downstream_assets() # (2)
downstream_process_guids = response.get_downstream_process_guids() # (3)
  1. The get_downstream_asset_guids() method will return the GUIDs of assets that are immediately downstream.
  2. The get_downstream_assets() method will return the asset objects for the assets that are immediately downstream.
  3. The get_downstream_process_guids() method will return the GUIDs of the processes that run immediately downstream.

Raw REST API

POST /api/meta/lineage/getlineage
{
"guid": "495b1516-aaaf-4390-8cfd-b11ade7a7799", // (1)
"depth": 1, // (2)
"direction": "OUTPUT" // (3)
}
  1. The starting point for lineage must be the GUID of an asset.
  2. A depth of 1 will only fetch immediate upstream and downstream assets.
  3. A direction of OUTPUT will fetch only downstream assets.
Upstream assets (deprecated)

To retrieve assets immediately upstream from the originally-requested asset:

Java

Retrieve upstream assets
Set<String> upstreamGuids = response.getUpstreamAssetGuids(); // (1)
List<Asset> upstreamAssets = response.getUpstreamAssets(); // (2)
upstreamGuids = response.getUpstreamProcessGuids(); // (3)
  1. The getUpstreamAssetGuids() method will return the GUIDs of assets that are immediately upstream.
  2. The getUpstreamAssets() method will return the asset objects for the assets that are immediately upstream.
  3. The getUpstreamProcessGuids() method will return the GUIDs of the processes that run immediately upstream.

Python

Retrieve upstream assets
upstream_guids = response.get_upstream_asset_guids() # (1)
upstream_assets = response.get_upstream_assets() # (2)
upstream_process_guids = response.get_upstream_process_guids() # (3)
  1. The get_upstream_asset_guids() method will return the GUIDs of assets that are immediately upstream.
  2. The get_upstream_assets() method will return the asset objects for the assets that are immediately upstream.
  3. The get_upstream_process_guids() method will return the GUIDs of the processes that run immediately upstream.

Raw REST API

POST /api/meta/lineage/getlineage
{
"guid": "495b1516-aaaf-4390-8cfd-b11ade7a7799", // (1)
"depth": 1, // (2)
"direction": "INPUT" // (3)
}
  1. The starting point for lineage must be the GUID of an asset.
  2. A depth of 1 will only fetch immediate upstream and downstream assets.
  3. A direction of INPUT will fetch only upstream assets.
Depth-first traversal (deprecated)

You might want to traverse more than only the immediate upstream or downstream assets. To retrieve all assets that are downstream from the originally-requested asset, across multiple degrees of separation, using a depth-first search traversal:

Java

Retrieve all downstream assets
List<String> dfsDownstreamGuids = response.getAllDownstreamAssetGuidsDFS(); // (1)
List<Asset> dfsDownstream = response.getAllDownstreamAssetsDFS(); // (2)
  1. The getAllDownstreamAssetGuidsDFS() method will return the GUIDs of all assets that are downstream.

    The first GUID will always be the GUID of the asset used as the starting point for lineage, so even if there is no downstream lineage this will still return a list with a single GUID.

    The traversal will be in depth-first order downstream. This means after the GUID for the starting point, the list will contain GUIDs of assets immediately downstream. These will be followed by the assets that are immediately downstream from those assets, and so on. (The deeper you get into the list, the further downstream you will be in lineage from the starting point.)

  2. The getAllDownstreamAssetsDFS() method will return the asset objects for all assets that are downstream.

    The first asset object will always be the object for the asset used as the starting point for lineage, so even if there is no downstream lineage this will still return a list with a single asset.

    The traversal will be in depth-first order downstream. This means after the asset for the starting point, the list will contain assets of assets immediately downstream. These will be followed by the assets that are immediately downstream from those assets, and so on. (The deeper you get into the list, the further downstream you will be in lineage from the starting point.)

Retrieve all upstream assets
List<String> dfsDownstreamGuids = response.getAllUpstreamAssetGuidsDFS(); // (1)
List<Asset> dfsUpstream = response.getAllUpstreamAssetsDFS(); // (2)
  1. The getAllUpstreamAssetGuidsDFS() method will return the GUIDs of all assets that are upstream.

    The first GUID will always be the GUID of the asset used as the starting point for lineage, so even if there is no upstream lineage this will still return a list with a single GUID.

    The traversal will be in depth-first order upstream. This means after the GUID for the starting point, the list will contain GUIDs of assets immediately upstream. These will be followed by the assets that are immediately upstream from those assets, and so on. (The deeper you get into the list, the further upstream you will be in lineage from the starting point.)

  2. The getAllUpstreamAssetsDFS() method will return the asset objects for all assets that are upstream.

    The first asset object will always be the object for the asset used as the starting point for lineage, so even if there is no upstream lineage this will still return a list with a single asset.

    The traversal will be in depth-first order upstream. This means after the asset for the starting point, the list will contain assets of assets immediately upstream. These will be followed by the assets that are immediately upstream from those assets, and so on. (The deeper you get into the list, the further upstream you will be in lineage from the starting point.)

Python

Retrieve all downstream assets
dfs_downstream_guids = response.get_all_downstream_asset_guids_dfs() # (1)
dfs_downstream_assets = response.get_all_downstream_assets_dfs() # (2)
  1. The get_all_downstream_asset_guids_dfs() method will return the GUIDs of all assets that are downstream.

    The first GUID will always be the GUID of the asset used as the starting point for lineage, so even if there is no downstream lineage this will still return a list with a single GUID.

    The traversal will be in depth-first order downstream. This means after the GUID for the starting point, the list will contain GUIDs of assets immediately downstream. These will be followed by the assets that are immediately downstream from those assets, and so on. (The deeper you get into the list, the further downstream you will be in lineage from the starting point.)

  2. The get_all_downstream_assets_dfs() method will return the asset objects for all assets that are downstream.

    The first asset object will always be the object for the asset used as the starting point for lineage, so even if there is no downstream lineage this will still return a list with a single asset.

    The traversal will be in depth-first order downstream. This means after the asset for the starting point, the list will contain assets of assets immediately downstream. These will be followed by the assets that are immediately downstream from those assets, and so on. (The deeper you get into the list, the further downstream you will be in lineage from the starting point.)

Retrieve all upstream assets
dfs_upstream_guids = response.get_all_upstream_asset_guids_dfs() # (1)
dfs_upstream_assets = response.get_all_upstream_assets_dfs() # (2)
  1. The get_all_upstream_asset_guids_dfs() method will return the GUIDs of all assets that are upstream.

    The first GUID will always be the GUID of the asset used as the starting point for lineage, so even if there is no upstream lineage this will still return a list with a single GUID.

    The traversal will be in depth-first order upstream. This means after the GUID for the starting point, the list will contain GUIDs of assets immediately upstream. These will be followed by the assets that are immediately upstream from those assets, and so on. (The deeper you get into the list, the further upstream you will be in lineage from the starting point.)

  2. The get_all_upstream_assets_dfs() method will return the asset objects for all assets that are upstream.

    The first asset object will always be the object for the asset used as the starting point for lineage, so even if there is no upstream lineage this will still return a list with a single asset.

    The traversal will be in depth-first order upstream. This means after the asset for the starting point, the list will contain assets of assets immediately upstream. These will be followed by the assets that are immediately upstream from those assets, and so on. (The deeper you get into the list, the further upstream you will be in lineage from the starting point.)

Raw REST API

Multiple API calls

You may either need to make multiple API calls using the approaches above, or retrieve all downstream lineage and in your code traverse the returned relationships.

More details on what all means here

Keep in mind that when we say "all" above we mean all assets that are found in the response. If you have modified your request parameters to limit the lineage (for example, through depth() or direction()) then this will only traverse what's found in the response—not necessarily all lineage in Atlan.

Was this page helpful?