Crawl MongoDB Atlas
Create a crawler workflow to automatically discover and catalog your MongoDB Atlas assets, including databases, collections, and schema metadata.
Prerequisites
Before you begin, make sure you have:
- Configured MongoDB user permissions with the required metadata read access
- Connection details from your MongoDB deployment: SQL interface host name, MongoDB native host, default database, authentication database, and credentials
- Reviewed the order of operations for workflow execution
Create crawler workflow
Create a new MongoDB Atlas crawler workflow in Atlan by selecting the connector package, configuring your extraction method and connection details, and running the crawler to extract metadata.
- In the top right of any screen, navigate to New and then click New Workflow.
- From the list of packages, select MongoDB Atlas Assets and click Setup Workflow.
Configure extraction
Select your extraction method and provide the connection details.
- Direct
- Agent
In Direct extraction, Atlan connects to your database and crawls metadata directly.
- For SQL interface host name, enter the host name of the SQL (or JDBC) endpoint you copied from your MongoDB database.
- For Authentication, Basic is the default method.
- For Username, enter the username you created in your MongoDB database.
- For Password, enter the password you created for the username.
- For MongoDB native host, enter the host name of your MongoDB database you copied.
- For Default database, enter the name of the default database you copied from your MongoDB database.
- For Authentication database, enter the name of the authentication database you copied.
adminis the default; see authentication databases in MongoDB. - For SSL, keep Yes to connect via SSL or click No.
- Click Test Authentication to confirm connectivity to MongoDB, then click Next.
In Agent extraction, Self-Deployed Runtime executes metadata extraction within your organization's environment.
- Install Self-Deployed Runtime if you haven't already:
- Select the Agent tab.
- Store sensitive information in the secret store configured with the Self-Deployed Runtime and reference the secrets in the corresponding fields.
- For details on individual fields, refer to the Direct extraction tab.
- Click Next after completing the configuration.
Configure connection
Set up connection details including a descriptive name and admin access.
- Provide a Connection Name that represents your source environment. For example, you might use values like
production,development,gold, oranalytics. - To change the users able to manage this connection, update the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection, including admins.
- Click Next at the bottom of the screen.
Configure crawler
Configure crawler settings to control which assets to include or exclude. If an asset appears in both filters, the exclude filter takes precedence.
On the Metadata Filters page, you can override the defaults. The options are the same for Direct and Agent extraction; when you use Agent extraction, filtering and document sampling run on your Self-Deployed Runtime.
- Direct
- Agent
- To select the assets you want to include in crawling, click Include Metadata. By default, all assets are included if none are specified.
- To select the assets you want to exclude from crawling, click Exclude Metadata. By default, no assets are excluded if none are specified.
- To have the crawler ignore collections by naming convention, enter a regular expression in the Exclude regex for collections field (for example
.*_TMP|.*_TEMP|TMP.*|TEMP.*).
In Agent extraction, Include Metadata and Exclude Metadata are available as filter options. If you configure via API or app configuration (for example when the workflow runs on the runtime), use these patterns:
- Include Metadata (Include Filter): A JSON object. Keys are database names or regex patterns; values are arrays of collection name patterns (regex). Use an empty array
[]for a database to mean all collections. Syntax:{"^DB1$": ["^COLL1$", "^COLL2$"]}. - Exclude Metadata (Exclude Filter): Same JSON syntax as Include Filter. Only the selected databases and collections are excluded. To exclude collections by naming convention, use regex patterns in the value arrays. Syntax:
{"^DB1$": ["^COLL1$", "^COLL2$"]}.
Run crawler
Run preflight checks to validate your configuration, then execute the crawler immediately or schedule it to run on a recurring basis.
- To verify permissions and configuration before running, click Preflight checks. This option is available for Direct extraction only.
- Choose your run option:
- To run the crawler once immediately, click Run at the bottom of the screen.
- To schedule the crawler to run hourly, daily, weekly, or monthly, click Schedule & Run at the bottom of the screen.
Once the crawler completes, you can view the assets in Atlan's asset page.
Need help
If you encounter issues, refer to the Troubleshooting MongoDB Atlas connectivityto resolve common connection errors. You can also contact the Atlan support team by submitting a support request.
See also
- What does Atlan crawl from MongoDB Atlas: Complete reference of assets and metadata discovered during crawling
- Preflight checks for MongoDB Atlas: Validation checks for permissions and configuration before running the crawler