Crawl on-premises Kafka
Once you have set up the kafka-extractor tool, you can extract metadata from your on-premises Kafka instances by completing the following steps.
Run kafka-extractor
Crawl all Kafka connections
To crawl all Kafka connections using the kafka-extractor tool:
- Log into the server with Docker Compose installed.
- Change to the directory containing the compose file.
- Run Docker Compose:
sudo docker-compose up
Crawl a specific connection
To crawl a specific Kafka connection using the kafka-extractor tool:
- Log into the server with Docker Compose installed.
- Change to the directory containing the compose file.
- Run Docker Compose:
sudo docker-compose up <connection-name>
(Replace <connection-name> with the name of the connection from the services section of the compose file.)
(Optional) Review generated files
The kafka-extractor tool will generate many folders with JSON files for each service. For example:
topicstopic-configsconsumer-groupsconsumer-groups-members- and many others
You can inspect the metadata and make sure it's acceptable for providing metadata to Atlan.
Upload generated files to object storage
To provide Atlan access to the extracted metadata, you need to upload the metadata to object storage.
- AWS S3
- Google Cloud Storage
To upload the metadata to S3:
- Make sure all files for a particular connection have the same prefix.
- Upload the files to the S3 bucket using your preferred method. Include all the files from the output folder generated after running Docker Compose.
For example, to upload all files using the AWS CLI:
aws s3 cp output/kafka-example s3://my-bucket/metadata/kafka-example --recursive
To upload the metadata to GCS:
- Make sure all files for a particular connection have the same prefix.
- Upload the files to the GCS bucket using your preferred method. Include all the files from the output folder generated after running Docker Compose.
For example, to upload all files using the gcloud CLI:
gcloud storage cp output/kafka-example gs://my-bucket/metadata/kafka-example --recursive
Crawl metadata in Atlan
Once you have extracted metadata on-premises and uploaded the results to object storage, you can crawl the metadata into Atlan:
- How to crawl Apache Kafka
- How to crawl Confluent Kafka
- How to crawl Aiven Kafka
- How to crawl Redpanda Kafka
Be sure you select Offline for the Extraction method.