Crawl Starburst Enterprise
Create a crawler workflow to automatically discover and catalog your Starburst Enterprise assets in Atlan. The crawler extracts the following metadata:
- Catalogs, schemas, tables, views, materialized views, and columns
- Domains, data products, datasets, and dataset columns
Prerequisites
Before you begin, make sure you have:
- Configured Starburst Enterprise user permissions
- Created an Atlan API token with Admin privileges
- Reviewed the order of operations for connection workflows
- Obtained your Starburst Enterprise host, port, and authentication credentials
Create crawler workflow
- In the top right of any screen, navigate to New and then click New Workflow.
- From the list of packages, select Starburst Enterprise and click Setup Workflow.
Configure extraction
Select your extraction method and provide the connection details.
- Direct
- Agent
In Direct extraction, Atlan connects to your Starburst Enterprise instance and crawls metadata directly.
- For Host, enter the hostname of your Starburst Enterprise coordinator.
- For Port, enter the port (default
443). - For HTTP Scheme, select HTTPS (recommended) or HTTP.
- For Authentication Method, select either Basic or LDAP:
- Basic: Enter the username and password you created for password file authentication.
- LDAP: Enter your LDAP username and password.
- For Role, enter the Trino role to use (default
sysadmin). If you created a dedicated role for Atlan (for example,atlan_metadata_reader), enter that role name. - For Atlan API Token, enter the API token you created during setup. The token must belong to a user with the Admin role.
- For Verify SSL, keep True to validate TLS certificates or change to False for self-signed certificates.
- Click Test Authentication to verify connectivity to your Starburst Enterprise instance.
- Once successful, click Next.
In Agent extraction, Self-Deployed Runtime executes metadata extraction within your organization's environment. Use this when your Starburst Enterprise instance is behind a firewall and can't accept inbound connections from the internet.
- Install Self-Deployed Runtime if you haven't already:
- Select the Agent tab.
- Store sensitive information (username, password) in the secret store configured with the Self-Deployed Runtime and reference the secrets in the corresponding fields. For more information, see Configure secrets for workflow execution.
- For details on individual fields (host, port, HTTP scheme, role, Atlan API token, SSL verification), refer to the Direct extraction tab.
- Click Next after completing the configuration.
Configure connection
Set up your Starburst Enterprise connection name and define who can manage this connection in Atlan:
- Provide a Connection Name that represents your source environment. For example, you might use values like
production,development, oranalytics. - To change the users able to manage this connection, change the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection, including admins.
- Click Next to proceed.
Configure crawler
Customize the crawler settings to control which assets are extracted from your Starburst Enterprise instance.
- Direct
- Agent
-
For Catalogs, use the tree picker to select the catalogs and schemas you want to crawl. If no selection is made, all accessible catalogs are crawled.
-
For Domains, use the tree picker to select the domains and data products you want to crawl. If no selection is made, all accessible domains are crawled.
When using Agent extraction, tree pickers aren't available. Instead, enter catalog and domain filters as JSON strings using literal names (not regex patterns).
-
For Include Catalogs & Schemas, enter a JSON object where keys are catalog names and values are arrays of schema names:
{"my_catalog": ["schema_a", "schema_b"], "other_catalog": ["public"]}To include all schemas in a catalog, use an empty array:
{"my_catalog": []}Leave the field empty or use
{}to crawl all accessible catalogs. -
For Include Domains & Data Products, enter a JSON object where keys are domain names and values are arrays of data product names:
{"Sales": ["Order Information", "Customer Data"]}Leave the field empty or use
{}to crawl all accessible domains.
Enter exact catalog, schema, domain, and data product names as they appear in Starburst Enterprise. Do not use regex anchors (for example, ^my_catalog$). The filter values are passed directly to SQL queries and REST API calls as literal identifiers.
-
For Allow editing data products, choose whether domains, data products, datasets, and dataset columns can be edited in Atlan:
- Yes (default): Domains, data products, datasets, and dataset columns are editable in the Atlan UI.
- No: Domains, data products, datasets, and dataset columns are published as read-only, preserving Starburst Enterprise as the source of truth.
-
For Autoinclude catalog/schema, keep True (default) to automatically include the backing catalog and schema when a data product is selected. This prevents gaps in metadata for selected data products.
When a data product is selected and Autoinclude catalog/schema is enabled, the connector automatically includes the data product's backing catalog and schema in the extraction scope. This ensures you have both the data product assets and the underlying SQL assets for selected data products.
Run crawler
Execute the crawler workflow:
-
To check for any permissions or configuration issues before running the crawler, click Preflight checks. For details, see Preflight checks for Starburst Enterprise. Preflight checks are available for Direct extraction only.
-
You can either:
- To run the crawler once immediately, click Run.
- To schedule the crawler to run hourly, daily, weekly, or monthly, click Schedule Run.
Once the crawler has completed running, you can see the assets in Atlan's asset page.
See also
- How Atlan connects to Starburst Enterprise: Understand connectivity options including Self-Deployed Runtime
- What does Atlan crawl from Starburst Enterprise?: Learn about the metadata and assets that Atlan extracts from Starburst Enterprise
- Data Product integration: Understand how SQL and data product assets are linked