Skip to main content

Crawl Dremio

Configure and run the crawler to extract metadata from your Dremio data lakehouse assets.

Atlan crawls comprehensive metadata from your Dremio data lakehouse, including physical datasets, virtual datasets, folders, spaces, and sources information.

Prerequisites

Before you begin, make sure you have:

  • Completed Dremio setup
  • Admin access to your Atlan instance
  • Dremio connection details (host, credentials)

Create crawler workflow

To crawl metadata from Dremio, review the order of operations and then complete the following steps.

  1. In the top right of any screen, navigate to New and then click New Workflow.
  2. From the list of packages, select Dremio Assets and click Setup Workflow.

Configure authentication

  1. Host: Enter your Dremio server hostname or IP address
  2. Authentication method, choose one:
    • Username/Password: Enter atlan-service and password
    • Personal Access Token (PAT): Enter the PAT token (leave username empty)
  3. Click Test Connection to confirm connectivity to Dremio.
  4. Once successful, click Next.

Configure connection

Complete the connection configuration for your Dremio environment:

  1. Provide a Connection Name that represents your source environment. For example, you might want to use values like production, development, gold, or analytics.

  2. To change the users able to manage this connection, change the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection—not even admins.

  3. To check for any permissions or configuration issues before running the crawler, click Preflight checks.

  4. At the bottom of the screen, click Next to proceed.

Run crawler

After completing the configuration:

  • To run the crawler once, immediately, at the bottom of the screen click Run.
  • To schedule the crawler to run hourly, daily, weekly or monthly, at the bottom of the screen click Schedule & Run.

Verify crawled assets

Once the crawler has completed running, you can see the assets in Atlan's asset page! 🎉

Atlan crawls comprehensive metadata from your Dremio data lakehouse, including physical datasets, virtual datasets, folders, spaces, and sources information.

  1. Monitor the crawler progress in the Workflows section:
    • View real-time status updates
    • Check the Logs tab for detailed execution information
    • Wait for the status to show Success before proceeding

See also