What does Atlan crawl from Talend
Atlan crawls the following assets and metadata from Talend projects stored in GitHub or Atlassian Stash (Bitbucket Server).
Lineage
The Talend connector calculates lineage at both table level and column level.
Table-level lineage
Table-level lineage tracks data flow between:
- Source databases → Talend jobs → Target databases
- Source files → Talend jobs → Target databases
- Source databases → Talend jobs → Target files
Example:
MySQL.sales.customers → [Talend Job: ETL_Customer_Data] → Snowflake.analytics.dim_customer
Column-level lineage
Column-level lineage tracks transformations for individual columns:
- Column mapping through tMap components
- Aggregations through tAggregateRow components
- Joins through tJoin components
- Filters and transformations
Example:
MySQL.sales.customers.first_name → [tMap: concat] → Snowflake.analytics.dim_customer.full_name
MySQL.sales.customers.last_name → [tMap: concat] → Snowflake.analytics.dim_customer.full_name
Notation
- 🔀 = Includes lineage support
- 📊 = Includes column-level details
For details on how lineage is calculated and known limitations, see Lineage.
FlowProject
Atlan maps Talend projects to its FlowProject asset type.
| Atlan property | File name | Talend property |
|---|---|---|
name | talend.project | label |
qualifiedName | N/A | Calculated |
flowId | talend.project | xmi:id |
description | talend.project | description |
assetUserDefinedType | N/A | Hard coded |
connectorName | N/A | Hard coded |
connectionName | N/A | UI driven |
connectionQualifiedName | N/A | Calculated |
lastSyncRunAt | N/A | Calculated |
lastSyncWorkflowName | N/A | Workflow ID |
lastSyncRun | N/A | Run ID |
tenantId | N/A | Hard coded |
FlowControlOperation
Atlan maps Talend jobs to its FlowControlOperation asset type.
| Atlan property | File name | Talend property |
|---|---|---|
name | .properties | label |
qualifiedName | N/A | Calculated |
flowId | .properties | xmi:id |
description | .properties | description |
flowProjectName | talend.project | label |
flowProjectQualifiedName | talend.project | Calculated |
assetUserDefinedType | N/A | Hard coded |
connectorName | N/A | Hard coded |
connectionName | N/A | UI driven |
connectionQualifiedName | N/A | Calculated |
lastSyncRunAt | N/A | Calculated |
lastSyncWorkflowName | N/A | Workflow ID |
lastSyncRun | N/A | Run ID |
tenantId | N/A | Hard coded |
FlowReusableUnit
Atlan maps Talend reusable units (shared components and routines) to its FlowReusableUnit asset type.
| Atlan property | File name | Talend property |
|---|---|---|
name | N/A | Calculated |
qualifiedName | N/A | Calculated |
flowId | N/A | Calculated |
description | N/A | Hard coded |
flowProjectName | talend.project | label |
flowProjectQualifiedName | talend.project | Calculated |
flowDatasetCount | N/A | Calculated |
flowControlOperationCount | N/A | Calculated |
assetUserDefinedType | N/A | Hard coded |
connectorName | N/A | Hard coded |
connectionName | N/A | UI driven |
connectionQualifiedName | N/A | Calculated |
lastSyncRunAt | N/A | Calculated |
lastSyncWorkflowName | N/A | Workflow ID |
lastSyncRun | N/A | Run ID |
tenantId | N/A | Hard coded |
FlowDataset
Atlan maps Talend job components to its FlowDataset asset type. Components represent individual operations within a job, including transformations, database connections, and file operations.
| Atlan property | File name | Talend property |
|---|---|---|
name | .item | Node.componentName |
qualifiedName | N/A | Calculated |
flowId | N/A | xmi:id |
flowProjectName | talend.project | label |
flowProjectQualifiedName | talend.project | Calculated |
flowReusableUnitName | N/A | Calculated |
flowReusableUnitQualifiedName | N/A | Calculated |
assetUserDefinedType | N/A | Hard coded |
flowType | .item | Node.componentName |
flowQuery | N/A | Calculated |
flowExpression | N/A | Calculated |
connectorName | N/A | Hard coded |
connectionName | N/A | UI driven |
connectionQualifiedName | N/A | Calculated |
lastSyncRunAt | N/A | Calculated |
lastSyncWorkflowName | N/A | Workflow ID |
lastSyncRun | N/A | Run ID |
tenantId | N/A | Hard coded |
Component types
The connector crawls Talend job components across multiple categories:
- Transformation components: Data mapping, filtering, joins, aggregations, sorting, and data normalization operations
- Database components: Input/output operations for various database systems (MySQL, Oracle, SQL Server, PostgreSQL, and generic JDBC connections)
- File components: Operations for delimited files, Excel, JSON, XML, and other file formats
- Orchestration components: Job execution control, loops, and workflow management
The connector's component support is continuously expanding. Contact Atlan support for the most current list of supported component types.
FlowField
Atlan maps Talend component fields (columns and variables) to its FlowField asset type.
| Atlan property | File name | Talend property |
|---|---|---|
name | .item | Calculated |
qualifiedName | N/A | Calculated |
flowId | N/A | xmi:id |
flowProjectName | talend.project | label |
flowProjectQualifiedName | talend.project | Calculated |
flowReusableUnitName | N/A | Calculated |
flowReusableUnitQualifiedName | N/A | Calculated |
assetUserDefinedType | N/A | Hard coded |
connectorName | N/A | Hard coded |
connectionName | N/A | UI driven |
connectionQualifiedName | N/A | Calculated |
lastSyncRunAt | N/A | Calculated |
lastSyncWorkflowName | N/A | Workflow ID |
lastSyncRun | N/A | Run ID |
tenantId | N/A | Hard coded |
flowDataType | .item | Node.<metadata<column.type |
flowDatasetName | .item | Node.componentName |
flowDatasetQualifiedName | N/A | Calculated |
FlowDatasetOperation
Atlan maps Talend dataset operations to its FlowDatasetOperation asset type.
| Atlan property | File name | Talend property |
|---|---|---|
name | N/A | label |
qualifiedName | N/A | Calculated |
flowId | N/A | xmi:id |
description | N/A | Hard coded |
flowProjectName | talend.project | label |
flowProjectQualifiedName | talend.project | Calculated |
inputs | N/A | Calculated |
outputs | N/A | Calculated |
assetUserDefinedType | N/A | Hard coded |
connectorName | N/A | Hard coded |
connectionName | N/A | UI driven |
connectionQualifiedName | N/A | Calculated |
lastSyncRunAt | N/A | Calculated |
lastSyncWorkflowName | N/A | Workflow ID |
lastSyncRun | N/A | Run ID |
tenantId | N/A | Hard coded |
Process
Atlan maps Talend processes to its Process asset type for table-level lineage tracking.
| Atlan property | File name | Talend property |
|---|---|---|
name | .item | Calculated |
qualifiedName | N/A | Calculated |
connectorName | N/A | Hard coded |
connectionName | N/A | UI driven |
connectionQualifiedName | N/A | Calculated |
lastSyncRunAt | N/A | Calculated |
lastSyncWorkflowName | N/A | Workflow ID |
lastSyncRun | N/A | Run ID |
tenantId | N/A | Hard coded |
flowOrchestratedBy.qualifiedName | N/A | Calculated |
inputs | N/A | Calculated |
outputs | N/A | Calculated |
ColumnProcess
Atlan maps Talend column processes to its ColumnProcess asset type for column-level lineage tracking.
| Atlan property | File name | Talend property |
|---|---|---|
name | .item | Calculated |
qualifiedName | N/A | Calculated |
connectorName | N/A | Hard coded |
connectionName | N/A | UI driven |
connectionQualifiedName | N/A | Calculated |
lastSyncRunAt | N/A | Calculated |
lastSyncWorkflowName | N/A | Workflow ID |
lastSyncRun | N/A | Run ID |
tenantId | N/A | Hard coded |
process.qualifiedName | N/A | Calculated |
inputs | N/A | Calculated |
outputs | N/A | Calculated |
See also
- Crawl Talend assets: Configure and run the workflow to discover and catalog Talend assets
- Set up Talend: Configure GitHub or Stash access tokens for the Talend connector