Crawl Starburst Enterprise

Connect docs via MCP

Create a crawler workflow to automatically discover and catalog your Starburst Enterprise assets in Atlan. After completing the prerequisite setup, the crawler extracts metadata including catalogs, schemas, tables, views, materialized views, columns, domains, data products, and datasets. Review the order of operations for metadata enrichment workflows before starting.

Prerequisites

Before you begin, make sure you have:

Configured Starburst Enterprise user permissions
Created an Atlan API token with Admin privileges
Reviewed the order of operations for connection workflows
Obtained your Starburst Enterprise host, port, and authentication credentials

Create crawler workflow

In the top right of any screen, navigate to New and then click New Workflow.
From the list of packages, select Starburst Enterprise and click Setup Workflow.

Configure extraction

Select your extraction method and provide the connection details.

Direct
Agent

In Direct extraction, Atlan connects to your Starburst Enterprise instance and crawls metadata directly.

For Host, enter the hostname of your Starburst Enterprise coordinator.
For Port, enter the port (default 443).
For HTTP Scheme, select HTTPS (recommended) or HTTP.
For Authentication Method, select either Basic or LDAP:
- Basic: Enter the username and password you created for password file authentication.
- LDAP: Enter your LDAP username and password.
For Role, enter the Trino role to use (default sysadmin). If you created a dedicated role for Atlan (for example, atlan_metadata_reader), enter that role name.
For Atlan API Token, enter the API token you created during setup. The token must belong to a user with the Admin role.
For Verify SSL, keep True to validate TLS certificates or change to False for self-signed certificates.
Click Test Authentication to verify connectivity to your Starburst Enterprise instance.
Once successful, click Next.

Configure connection

Set up your Starburst Enterprise connection name and define who can manage this connection in Atlan:

Provide a Connection Name that represents your source environment. For example, you might use values like production, development, or analytics.
To change the users able to manage this connection, change the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection, including admins.
Click Next to proceed.

Configure crawler

Customize the crawler settings to control which assets are extracted from your Starburst Enterprise instance.

Direct
Agent

For Catalogs, use the tree picker to select the catalogs and schemas you want to crawl. If no selection is made, all accessible catalogs are crawled.
For Domains, use the tree picker to select the domains and data products you want to crawl. If no selection is made, all accessible domains are crawled.

When using Agent extraction, tree pickers aren't available. Instead, enter catalog and domain filters as JSON strings using literal names (not regex patterns).

For Include Catalogs & Schemas, enter a JSON object where keys are catalog names and values are arrays of schema names:
```
{"my_catalog": ["schema_a", "schema_b"], "other_catalog": ["public"]}
```
To include all schemas in a catalog, use an empty array:
```
{"my_catalog": []}
```
Leave the field empty or use {} to crawl all accessible catalogs.
For Include Domains & Data Products, enter a JSON object where keys are domain names and values are arrays of data product names:
```
{"Sales": ["Order Information", "Customer Data"]}
```
Leave the field empty or use {} to crawl all accessible domains.

Use literal names, not regex patterns

Enter exact catalog, schema, domain, and data product names as they appear in Starburst Enterprise. Do not use regex anchors (for example, ^my_catalog$). The filter values are passed directly to SQL queries and REST API calls as literal identifiers.

For Allow editing data products, choose whether domains, data products, datasets, and dataset columns can be edited in Atlan:
- Yes (default): Domains, data products, datasets, and dataset columns are editable in the Atlan UI.
- No: Domains, data products, datasets, and dataset columns are published as read-only, preserving Starburst Enterprise as the source of truth.
For Autoinclude catalog/schema, keep True (default) to automatically include the backing catalog and schema when a data product is selected. This prevents gaps in metadata for selected data products.

Autoinclude behavior

When a data product is selected and Autoinclude catalog/schema is enabled, the connector automatically includes the data product's backing catalog and schema in the extraction scope. This ensures you have both the data product assets and the underlying SQL assets for selected data products.

Run crawler

Execute the crawler workflow:

To check for any permissions or configuration issues before running the crawler, click Preflight checks. For details, see Preflight checks for Starburst Enterprise. Preflight checks are available for Direct extraction only.
You can either:
- To run the crawler once immediately, click Run.
- To schedule the crawler to run hourly, daily, weekly, or monthly, click Schedule Run.

Once the crawler has completed running, you can see the assets in Atlan's asset page.

Prerequisites​

Create crawler workflow​

Configure extraction​

Configure connection​

Configure crawler​

Run crawler​

See also​