Crawl CrateDB
Extract metadata from your CrateDB database and make it available in Atlan for data discovery, governance, and lineage tracking. This guide walks you through setting up authentication and running your first crawl.
Prerequisites
Before you begin, make sure you have:
- Set up CrateDB with proper user permissions
- Network connectivity between Atlan and your CrateDB instance
- Your CrateDB cluster HTTP endpoint and port information
Set up workflow
Create a new CrateDB Assets workflow to extract metadata from your database.
- Select New > New Workflow.
- From the list of packages, select CrateDB Assets.
- Click Setup Workflow.
Configure extraction method
Choose how to connect to your CrateDB environment:
- Direct extraction
- Agent extraction
- Select Direct for the extraction method.
- Enter your CrateDB connection details:
- Host: Your CrateDB cluster HTTP endpoint (for example,
https://your-cluster.crate.io
) - Port: The port number of your CrateDB instance
- Authentication: Choose Basic authentication
- Username: Enter the username you configured in CrateDB
- Password: Enter the password you configured in CrateDB
- Database: Enter the name of the database to crawl
- Host: Your CrateDB cluster HTTP endpoint (for example,
- Click Test Authentication to confirm connectivity to CrateDB using these details.
- When successful, click Next.
- Select Agent for the extraction method.
- Add the secret keys for your secret store configuration.
- Follow the Secure Agent configuration guide.
- Click Next.
Configure connection details
- Enter a Connection Name to identify your CrateDB environment. For example,
production-cratedb
,analytics-db
,data-warehouse
. - Assign Connection Admins to manage access. At least one admin is required.
Configure crawler settings
Before running the CrateDB crawler, you can configure additional settings:
- Exclude Metadata: Select assets you want to exclude from crawling
- Include Metadata: Select assets you want to include in crawling
- Exclude regex for tables & views: Specify a regular expression to ignore tables and views based on naming conventions
- Advanced Config:
- Enable Source Level Filtering: Enable schema-level filtering at source
- Use JDBC Internal Methods: Enable JDBC internal methods for data extraction
If an asset appears in both the include and exclude filters, the exclude filter takes precedence.
Run crawler
You can now start extracting metadata from your CrateDB database:
- Run now: Click Run to start a one-time crawl.
- Schedule runs: Click Schedule Run to automate recurring crawls (hourly, daily, weekly, or monthly).
Monitor crawl progress in the activity log. Once complete, your CrateDB assets appear in Atlan.
Troubleshooting
If you encounter connection or authentication issues, see Connection issues.