Skip to main content

Crawl ClickHouse

Create a ClickHouse crawler workflow in Atlan to extract metadata from your database. The workflow configures the connection, extraction method, and crawl scope.

Prerequisites

Before you begin:

Create crawler workflow

  1. In the top right of any screen, click New and then New Workflow.
  2. Select ClickHouse Assets and click Setup Workflow.

Configure extraction

Select your extraction method and provide the connection details.

In Direct extraction, Atlan connects to your database and crawls metadata directly.

  1. Enter your ClickHouse connection details:

    • Host Name: Enter the host for your ClickHouse instance.
    • Port: Enter the port number of your ClickHouse HTTPS interface (default is 8443).
  2. Choose Basic authentication and enter the credentials you configured when setting up the ClickHouse user:

    • Username: Enter the username you configured in ClickHouse.
    • Password: Enter the password for the specified user.
  3. Click Test Authentication to confirm connectivity to ClickHouse using these details.

  4. When successful, at the bottom of the screen click Next.

Configure connection

Complete the ClickHouse connection configuration:

  1. Enter a Connection Name that represents your source environment (for example, production, development, gold, or analytics).
  2. Under Connection Admins, add the users or groups that can manage this connection. If you leave this empty, no one can manage the connection, including admins.
  3. At the bottom of the screen, click Next.

Configure crawler

Configure crawl scope before running the crawler:

  • Click Include Metadata to choose which assets to include (default is all if none are specified).
  • Click Exclude Metadata to choose which assets to exclude (default is none). If an asset matches both include and exclude, the exclude filter takes precedence.
  • In Exclude regex for tables, enter a regular expression to ignore tables and views by naming convention.

Run crawler

Configure the frequency of crawler runs: choose to run the crawler immediately or schedule it to run at an interval.

  1. In Direct mode only, click Preflight checks to validate permissions and configuration before running.
  2. At the bottom of the screen, click Run to run once, or Schedule Run to run hourly, daily, weekly, or monthly.

After the crawler completes, assets appear in the asset page.

See also