Skip to main content

Integrate Generic OpenLineage

Public Preview

Atlan supports ingestion of OpenLineage (OL) events from any system that can emit OpenLineage-compliant events. This generic integration lets you connect any OpenLineage source (not limited to Apache Airflow or Spark) to Atlan. To learn more about OpenLineage, refer to OpenLineage configuration and facets.

warning

Atlan currently supports only the HTTP transport mechanism for receiving OpenLineage events.

Once configured, Atlan automatically processes incoming lineage events and catalogs your data workflows and assets.

Prerequisites

Before setting up the integration, make sure you have:

  • An Atlan API token
  • A connection name created in Atlan (Generic OpenLineage)
  • A source system capable of emitting OpenLineage events over HTTP

Create API token in Atlan

To authenticate your OpenLineage source with Atlan, you need an API token.

  1. Go to the Atlan Admin Panel
  2. Navigate to API Tokens
  3. Generate a new token

This token serves as the authentication key when sending OpenLineage events.


Configure integration in Atlan

  1. In the top-right corner, click New
  2. Select New workflow
  3. Search for Generic OpenLineage Assets
  4. Click Setup Workflow

Create connection

A connection represents the namespace under which lineage events are grouped.

  • Connection Name: A name representing the source environment (for example, production, development, analytics)
  • Connection Admins (optional): Assign users or groups who can manage this connection. If no admin is assigned, no one—including Atlan admins—can manage the connection.

Click Create connection to finish.

A single connection can receive OpenLineage events from multiple source systems. However, we recommend creating one connection per source instance—this keeps assets segregated by source in the Atlan UI and makes it easier to debug issues when they arise. Don't create connections with a duplicate name in Generic OpenLineage Assets.

note

You no longer pick a source system (Airflow, Spark, Flink, and so on) when creating the connection. Atlan identifies the source automatically from each incoming event's job.facets.jobType.integration attribute. Events are routed to this connection by matching job.namespace to the connection name—see Event conventions.


Configure your OpenLineage source

Your OpenLineage producer must send events to Atlan using HTTP transport.

Endpoint

All OpenLineage events must be sent to:

https://<instance>.atlan.com/events/openlineage/generic-openlineage/api/v1/lineage

Replace <instance> with your Atlan tenant name.


Validate your setup with sample events

Use the public examples repo to quickly verify your connector is working end-to-end.

Clone and install

git clone https://github.com/atlanhq/generic-openlineage-examples.git
cd generic-openlineage-examples
pip install -r requirements.txt

Configure credentials

cp .env.example .env

Edit .env and fill in:

  • OL_ENDPOINT: https://<your-tenant>.atlan.com/events/openlineage/generic-openlineage/api/v1/lineage
  • API_KEY: The API token generated in Create API token in Atlan
  • NAMESPACE: The connection name created in Create connection

Send sample events

python send_events.py examples/01_simple_dag

The script reads all .json files from the example's events/ directory in sorted order and POSTs each one to your Atlan endpoint.


Verify your event format

Events are raw OpenLineage RunEvent JSON—no additional envelope needed. Each request body contains a single event.

Minimal required fields:

{
"eventTime": "2025-01-15T10:00:00.000Z",
"eventType": "START",
"producer": "https://my-system.example.com",
"schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent",
"run": { "runId": "<uuid>" },
"job": { "name": "<job-name>", "namespace": "<namespace>" },
"inputs": [],
"outputs": []
}

See also