Skip to main content

Crawl Starburst Enterprise

Create a crawler workflow to automatically discover and catalog your Starburst Enterprise assets in Atlan. The crawler extracts the following metadata:

  • Catalogs, schemas, tables, views, materialized views, and columns
  • Domains, data products, datasets, and dataset columns

Prerequisites

Before you begin, make sure you have:

Create crawler workflow

  1. In the top right of any screen, navigate to New and then click New Workflow.
  2. From the list of packages, select Starburst Enterprise and click Setup Workflow.

Configure extraction

Select your extraction method and provide the connection details.

In Direct extraction, Atlan connects to your Starburst Enterprise instance and crawls metadata directly.

  1. For Host, enter the hostname of your Starburst Enterprise coordinator.
  2. For Port, enter the port (default 443).
  3. For HTTP Scheme, select HTTPS (recommended) or HTTP.
  4. For Authentication Method, select either Basic or LDAP:
  5. For Role, enter the Trino role to use (default sysadmin). If you created a dedicated role for Atlan (for example, atlan_metadata_reader), enter that role name.
  6. For Atlan API Token, enter the API token you created during setup. The token must belong to a user with the Admin role.
  7. For Verify SSL, keep True to validate TLS certificates or change to False for self-signed certificates.
  8. Click Test Authentication to verify connectivity to your Starburst Enterprise instance.
  9. Once successful, click Next.

Configure connection

Set up your Starburst Enterprise connection name and define who can manage this connection in Atlan:

  1. Provide a Connection Name that represents your source environment. For example, you might use values like production, development, or analytics.
  2. To change the users able to manage this connection, change the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection, including admins.
  3. Click Next to proceed.

Configure crawler

Customize the crawler settings to control which assets are extracted from your Starburst Enterprise instance.

  1. For Catalogs, use the tree picker to select the catalogs and schemas you want to crawl. If no selection is made, all accessible catalogs are crawled.

  2. For Domains, use the tree picker to select the domains and data products you want to crawl. If no selection is made, all accessible domains are crawled.

  1. For Allow editing data products, choose whether domains, data products, datasets, and dataset columns can be edited in Atlan:

    • Yes (default): Domains, data products, datasets, and dataset columns are editable in the Atlan UI.
    • No: Domains, data products, datasets, and dataset columns are published as read-only, preserving Starburst Enterprise as the source of truth.
  2. For Autoinclude catalog/schema, keep True (default) to automatically include the backing catalog and schema when a data product is selected. This prevents gaps in metadata for selected data products.

Autoinclude behavior

When a data product is selected and Autoinclude catalog/schema is enabled, the connector automatically includes the data product's backing catalog and schema in the extraction scope. This ensures you have both the data product assets and the underlying SQL assets for selected data products.

Run crawler

Execute the crawler workflow:

  1. To check for any permissions or configuration issues before running the crawler, click Preflight checks. For details, see Preflight checks for Starburst Enterprise. Preflight checks are available for Direct extraction only.

  2. You can either:

    • To run the crawler once immediately, click Run.
    • To schedule the crawler to run hourly, daily, weekly, or monthly, click Schedule Run.

Once the crawler has completed running, you can see the assets in Atlan's asset page.

See also