Skip to main content

Integrate Apache Flink/OpenLineage

Atlan extracts job-level operational metadata from Apache Flink and generates job lineage through OpenLineage. To learn more about OpenLineage, refer to OpenLineage configuration and facets.

To integrate Apache Flink/OpenLineage with Atlan, review the order of operations and then complete the following steps.

Prerequisites

Before you begin, verify you have:

  • Atlan access: Workspace admin access to install packages and create connections.
  • Apache Flink environment: Running Flink environment where you can modify job configurations or classpath.
  • Atlan API token or OAuth client: Required to authenticate OpenLineage events sent from Flink to Atlan.

Create API token in Atlan

Before running the workflow, create an API token or configure an OAuth client in Atlan.

Select source in Atlan

To select Apache Flink/OpenLineage as the source, from within Atlan:

  1. In the top navigation, click Marketplace.
  2. Search for Apache Flink Assets and select it.
  3. Click Install.
  4. Once installation completes, click Setup Workflow on the same tile.

If you navigated away before installation completed, go to New > New Workflow and select Apache Flink Assets to proceed.

Configure integration in Atlan

You only need to create a connection once to enable Atlan to receive incoming OpenLineage events. Once you've set up the connection, you don't have to rerun the workflow or schedule it. Atlan processes the OpenLineage events as and when your Flink jobs run to catalog your assets.

To configure Apache Flink/OpenLineage connection in Atlan:

  1. For Connection Name, provide a connection name that represents your source environment. For example, you might use values like production, development, gold, or analytics.

  2. To change who can manage this connection, update the users or groups listed under Connection Admins.

    warning

    If you don't specify any user or group, no one can manage the connection—not even admins.

  3. To create the connection, at the bottom of the screen, click the Create connection button.

Did you know?

You need the Atlan API token and connection name to configure the integration in Apache Flink/OpenLineage. This permits Apache Flink to connect with the OpenLineage API and send events to Atlan.

OpenLineage provides a Flink integration that registers a JobListener with the Flink execution environment to emit lineage events. For general information, see the OpenLineage Flink integration documentation.

To configure Apache Flink to send OpenLineage events to Atlan:

  1. Add the OpenLineage Flink integration JAR to your job's classpath. Include the following Maven dependency in your Flink job build:

    <dependency>
    <groupId>io.openlineage</groupId>
    <artifactId>openlineage-flink</artifactId>
    <version><!-- latest OpenLineage version --></version>
    </dependency>

    Atlan recommends using the latest available version of the OpenLineage package for the Apache Flink integration. Replace the version placeholder with the latest version of OpenLineage.

  2. Register the OpenLineage listener with your Flink StreamExecutionEnvironment. For example, in a Java DataStream job:

    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

    env.registerJobListener(
    OpenLineageFlinkJobListener.builder()
    .executionEnvironment(env)
    .jobName("<flink-job-name>")
    .jobNamespace("<connection-name>")
    .build());

    // ... define your sources, transformations, and sinks ...

    env.execute("<flink-job-name>");
    • <flink-job-name>: a stable name for the Flink job. This becomes the OpenLineage job.name.
    • <connection-name>: the connection name exactly as configured in Atlan. This becomes the OpenLineage job.namespace.
  3. Provide the HTTP transport configuration that points to Atlan. You can do this either through an openlineage.yml configuration file or through environment variables passed to the Flink task managers.

transport:
type: http
url: https://<instance>.atlan.com
endpoint: /events/openlineage/flink-openlineage/api/v1/lineage
auth:
type: api_key
apiKey: <Atlan_api_key>
  • url: the base URL of your Atlan instance—for example, https://<instance>.atlan.com.
  • endpoint: the path that consumes OpenLineage events from your Flink job—must be set to /events/openlineage/flink-openlineage/api/v1/lineage.
  • apiKey: the API token generated in Atlan.

The OpenLineage Flink integration looks for openlineage.yml in the Flink job's working directory and in ~/.openlineage/, or at the path specified by the OPENLINEAGE_CONFIG environment variable.

  1. Submit and run your Flink job using your usual mechanism (for example, flink run for session mode or your Flink Kubernetes operator).

Once your Flink job has started running, you see Flink jobs along with lineage from OpenLineage events in Atlan! 🎉

You can also view event logs in Atlan to track and debug events received from OpenLineage.

Next steps