Skip to main content

Crawl Apache Kafka

Extract metadata assets from your Apache Kafka cluster into Atlan.

Prerequisites

Before you begin, complete the following prerequisites:

  • Apache Kafka setup: You've configured the Apache Kafka permissions needed for Atlan to connect to your cluster.
  • Schema Registry setup (if crawling schemas): You've completed the Confluent Schema Registry setup and have the Schema Registry endpoint, API key, and API secret ready.
  • Order of operations: Review the order of operations to understand the sequence of tasks for crawling metadata.
  • Access to Atlan workspace: You have the required permissions in Atlan to create and manage a connection.

Create crawler workflow

  1. In Atlan, select New > New Workflow.
  2. Select Apache Kafka Assets and click Setup Workflow.

Configure extraction

Select your extraction method and provide the connection details for your Apache Kafka cluster.

Atlan connects directly to your Apache Kafka cluster and crawls metadata over the network.

  1. For Bootstrap servers, enter one or more hostnames of your Apache Kafka brokers. For multiple hostnames, separate each entry with a comma , or semicolon ;.

  2. For Authentication, choose the method that matches your cluster configuration:

    • No Auth -- select this if your cluster doesn't require authentication.
    • Basic -- enter the username and password configured for Atlan using SASL/PLAIN.
    • SCRAM -- enter the username and password and choose the SCRAM mechanism (SCRAM-SHA-256 or SCRAM-SHA-512).
    • mTLS -- upload the client certificate and private key for mutual TLS authentication.
  3. For Security protocol, select Plaintext or SSL for No Auth, and SASL_PLAINTEXT or SASL_SSL for Basic and SCRAM authentication.

  4. To crawl Schema Registry subjects alongside Kafka, set Include Schema Registry to True and provide the following details:

    • For Schema registry host, enter the URL of your Schema Registry endpoint (for example, https://psrc-xxxxx.us-east-2.aws.confluent.cloud).
    • For API Key, enter the Schema Registry API key you created.
    • For API Secret, enter the Schema Registry API secret you created.
  5. Click Test Authentication to confirm connectivity, then click Next.

Configure connection

Set up the connection identity and access controls for your Apache Kafka source.

  1. Provide a Connection Name that represents your source environment -- for example, production, development, gold, or analytics.
  2. Under Connection Admins, add the users or groups that can manage this connection. If you leave this empty, no one can manage the connection, including admins.
  3. At the bottom of the screen, click Next.

Configure crawling options

On the Metadata page, you can override the defaults for any of these options. If an asset appears in both include and exclude filters, the exclude filter takes precedence. When Schema Registry credentials are provided, the topic include/exclude regex also applies to schema subjects. Subjects are matched using their base topic name (stripping the -key or -value suffix).

  • For Skip internal topics, keep the default Yes to skip internal Apache Kafka topics, or select No to crawl them.
  • Click Exclude topics regex to exclude specific topics. Defaults to no exclusions if none are specified.
  • Click Include topics regex to limit crawling to specific topics. Defaults to all topics if none are specified.

Run crawler

After configuring all options, run or schedule the crawler.

  1. For Direct extraction, click Preflight checks to validate permissions and configuration before running. For Agent and Offline extraction, skip this step.
  2. Click Run to run the crawler once immediately, or click Schedule & Run to schedule the crawler to run hourly, daily, weekly, or monthly.

Once the crawler completes, the assets appear on Atlan's asset page.

See also