Integrate Apache Flink/OpenLineage
Atlan extracts job-level operational metadata from Apache Flink and generates job lineage through OpenLineage. To learn more about OpenLineage, refer to OpenLineage configuration and facets.
To integrate Apache Flink/OpenLineage with Atlan, review the order of operations and then complete the following steps.
Prerequisites
Before you begin, verify you have:
- Atlan access: Workspace admin access to install packages and create connections.
- Apache Flink environment: Running Flink environment where you can modify job configurations or classpath.
- Atlan API token or OAuth client: Required to authenticate OpenLineage events sent from Flink to Atlan.
Create API token in Atlan
Before running the workflow, create an API token or configure an OAuth client in Atlan.
Select source in Atlan
To select Apache Flink/OpenLineage as the source, from within Atlan:
- In the top navigation, click Marketplace.
- Search for Apache Flink Assets and select it.
- Click Install.
- Once installation completes, click Setup Workflow on the same tile.
If you navigated away before installation completed, go to New > New Workflow and select Apache Flink Assets to proceed.
Configure integration in Atlan
You only need to create a connection once to enable Atlan to receive incoming OpenLineage events. Once you've set up the connection, you don't have to rerun the workflow or schedule it. Atlan processes the OpenLineage events as and when your Flink jobs run to catalog your assets.
To configure Apache Flink/OpenLineage connection in Atlan:
-
For Connection Name, provide a connection name that represents your source environment. For example, you might use values like
production,development,gold, oranalytics. -
To change who can manage this connection, update the users or groups listed under Connection Admins.
warningIf you don't specify any user or group, no one can manage the connection—not even admins.
-
To create the connection, at the bottom of the screen, click the Create connection button.
Configure integration in Apache Flink
You need the Atlan API token and connection name to configure the integration in Apache Flink/OpenLineage. This permits Apache Flink to connect with the OpenLineage API and send events to Atlan.
OpenLineage provides a Flink integration that registers a JobListener with the Flink execution environment to emit lineage events. For general information, see the OpenLineage Flink integration documentation.
To configure Apache Flink to send OpenLineage events to Atlan:
-
Add the OpenLineage Flink integration JAR to your job's classpath. Include the following Maven dependency in your Flink job build:
<dependency><groupId>io.openlineage</groupId><artifactId>openlineage-flink</artifactId><version><!-- latest OpenLineage version --></version></dependency>Atlan recommends using the latest available version of the OpenLineage package for the Apache Flink integration. Replace the version placeholder with the latest version of OpenLineage.
-
Register the OpenLineage listener with your Flink
StreamExecutionEnvironment. For example, in a JavaDataStreamjob:StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();env.registerJobListener(OpenLineageFlinkJobListener.builder().executionEnvironment(env).jobName("<flink-job-name>").jobNamespace("<connection-name>").build());// ... define your sources, transformations, and sinks ...env.execute("<flink-job-name>");<flink-job-name>: a stable name for the Flink job. This becomes the OpenLineagejob.name.<connection-name>: the connection name exactly as configured in Atlan. This becomes the OpenLineagejob.namespace.
-
Provide the HTTP transport configuration that points to Atlan. You can do this either through an
openlineage.ymlconfiguration file or through environment variables passed to the Flink task managers.
- openlineage.yml
- Environment variables
transport:
type: http
url: https://<instance>.atlan.com
endpoint: /events/openlineage/flink-openlineage/api/v1/lineage
auth:
type: api_key
apiKey: <Atlan_api_key>
url: the base URL of your Atlan instance—for example,https://<instance>.atlan.com.endpoint: the path that consumes OpenLineage events from your Flink job—must be set to/events/openlineage/flink-openlineage/api/v1/lineage.apiKey: the API token generated in Atlan.
The OpenLineage Flink integration looks for openlineage.yml in the Flink job's working directory and in ~/.openlineage/, or at the path specified by the OPENLINEAGE_CONFIG environment variable.
export OPENLINEAGE__TRANSPORT__TYPE=http
export OPENLINEAGE__TRANSPORT__URL="https://<instance>.atlan.com"
export OPENLINEAGE__TRANSPORT__ENDPOINT="/events/openlineage/flink-openlineage/api/v1/lineage"
export OPENLINEAGE__TRANSPORT__AUTH__TYPE=api_key
export OPENLINEAGE__TRANSPORT__AUTH__API_KEY="<Atlan_api_key>"
OPENLINEAGE__TRANSPORT__URL: the base URL of your Atlan instance—for example,https://<instance>.atlan.com.OPENLINEAGE__TRANSPORT__ENDPOINT: the path that consumes OpenLineage events from your Flink job—must be set to/events/openlineage/flink-openlineage/api/v1/lineage.OPENLINEAGE__TRANSPORT__AUTH__API_KEY: the API token generated in Atlan.
The OpenLineage Flink integration parses OPENLINEAGE__-prefixed environment variables (double underscore separator) into the same nested config tree as openlineage.yml. The connection name is set in code via .jobNamespace("<connection-name>") on the listener builder, so no OPENLINEAGE_NAMESPACE variable is needed.
- Submit and run your Flink job using your usual mechanism (for example,
flink runfor session mode or your Flink Kubernetes operator).
Once your Flink job has started running, you see Flink jobs along with lineage from OpenLineage events in Atlan! 🎉
You can also view event logs in Atlan to track and debug events received from OpenLineage.
Next steps
- What does Atlan crawl from Apache Flink/OpenLineage?: Review the full list of jobs and metadata that Atlan extracts from your Flink environment.