Crawl MongoDB Atlas

Connect docs via MCP

Create a crawler workflow to automatically discover and catalog your MongoDB Atlas assets, including databases, collections, and schema metadata.

Prerequisites

Before you begin, make sure you have:

Configured MongoDB user permissions with the required metadata read access
Connection details from your MongoDB deployment: SQL interface host name, MongoDB native host, default database, authentication database, and credentials
Reviewed the order of operations for workflow execution

Create crawler workflow

Create a new MongoDB Atlas crawler workflow in Atlan by selecting the connector package, configuring your extraction method and connection details, and running the crawler to extract metadata.

In your Atlan workspace, click Connectors in the left sidebar.
- If you are using the Old UI (Classic), click New Workflow in the top navigation.
Click Marketplace.
Search for MongoDB Atlas Assets and select it.
Click Install.
Once installation completes, click Setup Workflow on the same tile.

Configure extraction

Select your extraction method and provide the connection details.

Direct
Agent

In Direct extraction, Atlan connects to your database and crawls metadata directly.

For SQL interface host name, enter the host name of the SQL (or JDBC) endpoint you copied from your MongoDB database.
For Authentication, Basic is the default method.
For Username, enter the username you created in your MongoDB database.
For Password, enter the password you created for the username.
For MongoDB native host, enter the host name of your MongoDB database you copied.
For Default database, enter the name of the default database you copied from your MongoDB database.
For Authentication database, enter the name of the authentication database you copied. admin is the default; see authentication databases in MongoDB.
For SSL, keep Yes to connect via SSL or click No.
Click Test Authentication to confirm connectivity to MongoDB, then click Next.

Configure connection

Set up connection details including a descriptive name and admin access.

Provide a Connection Name that represents your source environment. For example, you might use values like production, development, gold, or analytics.
To change the users able to manage this connection, update the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection, including admins.
Click Next at the bottom of the screen.

Configure crawler

Configure crawler settings to control which assets to include or exclude. If an asset appears in both filters, the exclude filter takes precedence.

On the Metadata Filters page, you can override the defaults. The options are the same for Direct and Agent extraction; when you use Agent extraction, filtering and document sampling run on your Self-Deployed Runtime.

Direct
Agent

To select the assets you want to include in crawling, click Include Metadata. By default, all assets are included if none are specified.
To select the assets you want to exclude from crawling, click Exclude Metadata. By default, no assets are excluded if none are specified.
To have the crawler ignore collections by naming convention, enter a regular expression in the Exclude regex for collections field (for example .*_TMP|.*_TEMP|TMP.*|TEMP.*).

In Agent extraction, Include Metadata and Exclude Metadata are available as filter options. If you configure via API or app configuration (for example when the workflow runs on the runtime), use these patterns:

Include Metadata (Include Filter): A JSON object. Keys are database names or regex patterns; values are arrays of collection name patterns (regex). Use an empty array [] for a database to mean all collections. Syntax: {"^DB1$": ["^COLL1$", "^COLL2$"]}.
Exclude Metadata (Exclude Filter): Same JSON syntax as Include Filter. Only the selected databases and collections are excluded. To exclude collections by naming convention, use regex patterns in the value arrays. Syntax: {"^DB1$": ["^COLL1$", "^COLL2$"]}.

Run crawler

Run preflight checks to validate your configuration, then execute the crawler immediately or schedule it to run on a recurring basis.

To verify permissions and configuration before running, click Preflight checks. This option is available for Direct extraction only.
Choose your run option:
- To run the crawler once immediately, click Run at the bottom of the screen.
- To schedule the crawler to run hourly, daily, weekly, or monthly, click Schedule & Run at the bottom of the screen.

Once the crawler completes, you can view the assets in Atlan's asset page.

Need help

If you encounter issues, refer to the Troubleshooting MongoDB Atlas connectivityto resolve common connection errors. You can also contact the Atlan support team by submitting a support request.

Prerequisites​

Create crawler workflow​

Configure extraction​

Configure connection​

Configure crawler​

Run crawler​

Need help​

See also​