Crawl ClickHouse
Create a ClickHouse crawler workflow in Atlan to extract metadata from your database. The workflow configures the connection, extraction method, and crawl scope.
Prerequisites
Before you begin:
- Complete Set up ClickHouse so user permissions and credentials are configured in ClickHouse.
- Review the order of operations for connection workflows.
Create crawler workflow
- In the top right of any screen, click New and then New Workflow.
- Select ClickHouse Assets and click Setup Workflow.
Configure extraction
Select your extraction method and provide the connection details.
- Direct
- Agent
In Direct extraction, Atlan connects to your database and crawls metadata directly.
-
Enter your ClickHouse connection details:
- Host Name: Enter the host for your ClickHouse instance.
- Port: Enter the port number of your ClickHouse HTTPS interface (default is
8443).
-
Choose Basic authentication and enter the credentials you configured when setting up the ClickHouse user:
- Username: Enter the username you configured in ClickHouse.
- Password: Enter the password for the specified user.
-
Click Test Authentication to confirm connectivity to ClickHouse using these details.
-
When successful, at the bottom of the screen click Next.
In Agent extraction, Self-Deployed Runtime executes metadata extraction within your organization's environment.
- Install Self-Deployed Runtime if you haven't already:
- Select the Agent tab.
- Enter the ClickHouse connection details:
- Host Name: Enter the host for your ClickHouse instance as reachable from within your network.
- Port: Enter the port number of your ClickHouse HTTPS interface (default is
8443).
- Store sensitive information in the secret store configured with the Self-Deployed Runtime and reference the secrets in the corresponding fields. For more information, see Secret management.
- Click Next after completing the configuration.
Configure connection
Complete the ClickHouse connection configuration:
- Enter a Connection Name that represents your source environment (for example,
production,development,gold, oranalytics). - Under Connection Admins, add the users or groups that can manage this connection. If you leave this empty, no one can manage the connection, including admins.
- At the bottom of the screen, click Next.
Configure crawler
Configure crawl scope before running the crawler:
- Click Include Metadata to choose which assets to include (default is all if none are specified).
- Click Exclude Metadata to choose which assets to exclude (default is none). If an asset matches both include and exclude, the exclude filter takes precedence.
- In Exclude regex for tables, enter a regular expression to ignore tables and views by naming convention.
Run crawler
Configure the frequency of crawler runs: choose to run the crawler immediately or schedule it to run at an interval.
- In Direct mode only, click Preflight checks to validate permissions and configuration before running.
- At the bottom of the screen, click Run to run once, or Schedule Run to run hourly, daily, weekly, or monthly.
After the crawler completes, assets appear in the asset page.
See also
- What does Atlan crawl from ClickHouse: Assets and metadata discovered
- Preflight checks for ClickHouse: Prerequisites and validation