Crawl Apache Kafka

Connect docs via MCP

Extract metadata assets from your Apache Kafka cluster into Atlan. After configuring the necessary cluster permissions, you can crawl topics, consumer groups, clusters, and optionally schema information from Confluent Schema Registry. Review the order of operations for metadata enrichment workflows before starting.

Prerequisites

Before you begin, complete the following prerequisites:

Apache Kafka setup: You've configured the Apache Kafka permissions needed for Atlan to connect to your cluster.
Schema Registry setup (if crawling schemas): You've completed the Confluent Schema Registry setup and have the Schema Registry endpoint, API key, and API secret ready.
Order of operations: Review the order of operations to understand the sequence of tasks for crawling metadata.
Access to Atlan workspace: You have the required permissions in Atlan to create and manage a connection.

Create crawler workflow

In your Atlan workspace, click Connectors in the left sidebar.
- If you are using the Old UI (Classic), click New Workflow in the top navigation.
Click Marketplace.
Search for Apache Kafka Assets and select it.
Click Install.
Once installation completes, click Setup Workflow on the same tile.

Configure extraction

Select your extraction method and provide the connection details for your Apache Kafka cluster.

Offline extraction sunset

The offline extraction method has been sunset and is no longer available. For on-premises or network-restricted environments, use the Agent extraction method with Self-Deployed Runtime.

Direct
Agent

Atlan connects directly to your Apache Kafka cluster and crawls metadata over the network.

For Bootstrap servers, enter one or more hostnames of your Apache Kafka brokers. For multiple hostnames, separate each entry with a comma , or semicolon ;.
For Authentication, choose the method that matches your cluster configuration:
- No Auth -- select this if your cluster doesn't require authentication.
- Basic -- enter the username and password configured for Atlan using SASL/PLAIN.
- SCRAM -- enter the username and password and choose the SCRAM mechanism (SCRAM-SHA-256 or SCRAM-SHA-512).
- mTLS -- upload the client certificate and private key for mutual TLS authentication.
For Security protocol, select Plaintext or SSL for No Auth, and SASL_PLAINTEXT or SASL_SSL for Basic and SCRAM authentication.
To crawl Schema Registry subjects alongside Kafka, set Include Schema Registry to True and provide the following details:
- For Schema registry host, enter the URL of your Schema Registry endpoint (for example, https://psrc-xxxxx.us-east-2.aws.confluent.cloud).
- For API Key, enter the Schema Registry API key you created.
- For API Secret, enter the Schema Registry API secret you created.
Click Test Authentication to confirm connectivity, then click Next.

Self-Deployed Runtime executes metadata extraction within your organization's environment, keeping all connections inside your network perimeter.

Install Self-Deployed Runtime if you haven't already:
- Install via Docker Compose
- Install on Kubernetes
Confirm the runtime can reach your Apache Kafka cluster over your local network and that network security is configured.
Under Secure Agent Configuration, select your deployed agent from the Agent dropdown and the secret store from the Secret Store dropdown.
For Bootstrap servers, enter one or more hostnames of your Apache Kafka brokers as reachable from within your network.
For Authentication, choose the method that matches your cluster configuration:
- No Auth -- select this if your cluster doesn't require authentication.
- Basic -- reference the secret store path for the username and password configured for Atlan using SASL/PLAIN.
- SCRAM -- reference the secret store path for the username and password and choose the SCRAM mechanism (SCRAM-SHA-256 or SCRAM-SHA-512).
- mTLS -- reference the secret store paths for the client certificate and private key.
For Security protocol, select Plaintext or SSL for No Auth, and SASL_PLAINTEXT or SASL_SSL for Basic and SCRAM authentication.
To crawl Schema Registry subjects alongside Kafka, set Include Schema Registry to True and provide the following details:
- For Schema registry host, enter the URL of your Schema Registry endpoint (for example, https://psrc-xxxxx.us-east-2.aws.confluent.cloud).
- For API Key, reference the secret store path where the Schema Registry API key is stored.
- For API Secret, reference the secret store path where the Schema Registry API secret is stored.
Store sensitive credential values in your secret store and reference them in the corresponding fields. For more information, see Configure secrets for workflow execution.
Click Next after completing the configuration.

Configure connection

Set up the connection identity and access controls for your Apache Kafka source.

Provide a Connection Name that represents your source environment -- for example, production, development, gold, or analytics.
Under Connection Admins, add the users or groups that can manage this connection. If you leave this empty, no one can manage the connection, including admins.
At the bottom of the screen, click Next.

Configure crawling options

On the Metadata page, you can override the defaults for any of these options. If an asset appears in both include and exclude filters, the exclude filter takes precedence. When Schema Registry credentials are provided, the topic include/exclude regex also applies to schema subjects. Subjects are matched using their base topic name (stripping the -key or -value suffix).

For Skip internal topics, keep the default Yes to skip internal Apache Kafka topics, or select No to crawl them.
Click Exclude topics regex to exclude specific topics. Defaults to no exclusions if none are specified.
Click Include topics regex to limit crawling to specific topics. Defaults to all topics if none are specified.

Run crawler

After configuring all options, run or schedule the crawler.

For Direct extraction, click Preflight checks to validate permissions and configuration before running. For Agent and Offline extraction, skip this step.
Click Run to run the crawler once immediately, or click Schedule & Run to schedule the crawler to run hourly, daily, weekly, or monthly.

Once the crawler completes, the assets appear on Atlan's asset page.

Prerequisites​

Create crawler workflow​

Configure extraction​

Configure connection​

Configure crawling options​

Run crawler​

See also​