Crawl Cloudera Impala
Once you have configured the Cloudera Impala user permissions, you can establish a connection between Atlan and Cloudera Impala.
To crawl metadata from Cloudera Impala, review the order of operations and then complete the following steps.
Select the source
To select Cloudera Impala as your source:
- In the top right of any screen in Atlan, navigate to +New and click New Workflow.
- From the Marketplace page, click Cloudera Impala Assets.
- In the right panel, click Setup Workflow.
Provide your credentials
To enter your Cloudera Impala credentials:
- For Extraction method, Direct is the default selection.
- For Hostname, enter the host name of your Cloudera Impala coordinator or load balancer.
- For Authentication, select LDAP as the authentication method.
- For Username, enter the LDAP username that has access to Cloudera Impala.
- For Password, enter the password associated with the LDAP username.
- For SSL, keep Enabled to connect via a Secure Sockets Layer (SSL) channel or click Disabled.
- Click the Test Authentication button to confirm connectivity to Cloudera Impala.
- Once authentication is successful, navigate to the bottom of the screen and click Next.
Configure the connection
To complete the Cloudera Impala connection configuration:
-
Provide a Connection Name that represents your source environment. For example, you might use values like
production
,development
,gold
, oranalytics
. -
(Optional) To change the users who are able to manage this connection, change the users or groups listed under Connection Admins.
CarefulIf you do not specify any user or group, no one will be able to manage the connection — not even admins.
-
Navigate to the bottom of the screen and click Next to proceed.
Configure the crawler
Before running the Cloudera Impala crawler, you can further configure it.
On the Metadata Filters page, you can override the defaults for any of these options:
- To include specific assets in crawling, click Include Metadata, and select the assets you want. If you don't select any, all assets will be included by default.
- To exclude specific assets from crawling, click Exclude Metadata, and choose the assets you want to omit. If you don't select any, no assets will be excluded.
If an asset appears in both the include and exclude filters, the exclude filter takes precedence.
Run the crawler
To run the Cloudera Impala crawler, after completing the steps above:
- To run the crawler once, immediately, at the bottom of the screen, click the Run button.
- To schedule the crawler to run hourly, daily, weekly, or monthly, at the bottom of the screen, click the Schedule & Run button.
Once the crawler has completed running, you will see the assets on Atlan's asset page! 🎉