Skip to main content

Crawl MongoDB (self-managed)

Create a MongoDB (self-managed) crawler workflow to extract and catalog metadata from your MongoDB databases, Collections, and columns in Atlan. This guide walks you through configuring the connection, setting up extraction methods, and running the crawler.

Prerequisites

Before you begin, make sure you have:

Create crawler workflow

To create a MongoDB crawler workflow:

  1. In the top right of any screen in Atlan, navigate to +New and click New Workflow.
  2. From the Marketplace page, click MongoDB Assets.
  3. In the right panel, click Setup Workflow.

Choose extraction method

Choose your extraction method and provide the connection details.

In Direct extraction, Atlan connects to your database and crawls metadata directly.

  1. For MongoDB host name, enter the hostname or IP address of your MongoDB server. This is the network address where your MongoDB instance is running. For replica sets or sharded clusters, you can specify multiple hosts separated by commas. Learn more about MongoDB connection strings.

  2. For Port, enter the port number on which MongoDB is listening. The default port is 27017.

  3. For Username, enter the username of the database user you created for Atlan.

  4. For Password, enter the password for the database user specified in the Username field. The password is used for SCRAM authentication to verify your identity when connecting to MongoDB.

  5. For Authentication database, enter the name of the database where the user credentials are stored. Typically, this is admin, but it can be any database where the user was created. Learn more about authentication databases in MongoDB.

  6. For Authentication Mechanism, select the SCRAM authentication method your MongoDB server supports:

    • SCRAM-SHA-256 (recommended): Uses SHA-256 hashing algorithm for password verification. This is the default authentication mechanism for MongoDB 4.0 and later.
    • SCRAM-SHA-1: Uses SHA-1 hashing algorithm. Supported for backward compatibility with older MongoDB versions.

    The authentication mechanism must match what your MongoDB server is configured to use. Learn more about SCRAM authentication in MongoDB.

  7. For SSL, select whether to use SSL/TLS encryption for the connection:

    • Yes: Enables SSL/TLS encryption for secure communication between Atlan and your MongoDB server. Use this when your MongoDB instance requires encrypted connections.
    • No: Disables SSL/TLS encryption. Use this only if your MongoDB instance doesn't require encrypted connections.
  8. For CA certificate, if SSL is enabled, provide the Certificate Authority (CA) certificate file that was used to sign your MongoDB server's certificate. The raw content of the file needs to be copied (-----BEGIN CERTIFICATE-----xxxxxxx-----END CERTIFICATE-----)

  9. For Certificate key file, if SSL is enabled and your MongoDB server requires client authentication, provide the contents to the client certificate key file. This is used for mutual TLS (mTLS) authentication where both the client and server present certificates. This field is optional and only needed if your MongoDB server is configured to require client certificates.

  10. Click the Test Authentication button to confirm connectivity to MongoDB.

  11. Once authentication is successful, navigate to the bottom of the screen and click Next.

Configure connection

To complete the MongoDB connection configuration:

  1. Provide a Connection Name that represents your source environment. For example, you might use values like production, development, gold, or analytics.
  2. To change the users who are able to manage this connection, change the users or groups listed under Connection Admins. If you don't specify any user or group, no one can manage the connection, not even admins.
  3. Navigate to the bottom of the screen and click Next to proceed.

Configure crawler

Before running the MongoDB crawler, you can further configure it.

On the Metadata Filters page, you can override the defaults for any of these options. If an asset appears in both the include and exclude filters, the exclude filter takes precedence.

  • To select the databases you want to include in crawling, click Include Metadata. This defaults to all databases if none are specified.
  • To select the databases you want to exclude from crawling, click Exclude Metadata. This defaults to no databases if none are specified.
  • To have the crawler ignore Collections based on a naming convention within the included databases, specify a regular expression in the Exclude regex for collections field.
    • For example: _order*|customer*_
  • To set the number of documents to sample from each Collection for field inference, adjust the value in the Sampling Size field. For details on how this parameter affects extraction performance and field inference accuracy, see What does the sampling size workflow setting affect in the FAQ.

Run crawler

To run the MongoDB crawler, after completing the previous steps:

  • To run the crawler once, immediately, at the bottom of the screen, click the Run button.
  • To schedule the crawler to run hourly, daily, weekly, or monthly, at the bottom of the screen, click the Schedule & Run button.

Once the crawler completes running, you can see the assets on Atlan's asset page.

See also