Crawl on-premises Kafka

Once you have set up the kafka-extractor tool, you can extract metadata from your on-premises Kafka instances by completing the following steps.

Run kafka-extractor

To crawl all Kafka connections using the kafka-extractor tool:

To crawl a specific Kafka connection using the kafka-extractor tool:

(Replace <connection-name> with the name of the connection from the services section of the compose file.)

The kafka-extractor tool will generate many folders with JSON files for each service. For example:

You can inspect the metadata and make sure it's acceptable for providing metadata to Atlan.

To provide Atlan access to the extracted metadata, you need to upload the metadata to object storage.

To upload the metadata to S3:

Make sure all files for a particular connection have the same prefix.
Upload the files to the S3 bucket using your preferred method. Include all the files from the output folder generated after running Docker Compose.

For example, to upload all files using the AWS CLI:

aws s3 cp output/kafka-example s3://my-bucket/metadata/kafka-example --recursive

To upload the metadata to GCS:

Make sure all files for a particular connection have the same prefix.
Upload the files to the GCS bucket using your preferred method. Include all the files from the output folder generated after running Docker Compose.

For example, to upload all files using the gcloud CLI:

gcloud storage cp output/kafka-example gs://my-bucket/metadata/kafka-example --recursive

Once you have extracted metadata on-premises and uploaded the results to object storage, you can crawl the metadata into Atlan:

Be sure you select Offline for the Extraction method.