Hive assets app
The Hive assets app crawls Apache Hive databases, tables, views, and columns and
publishes them to Atlan. Build it with the HiveCrawler builder.
Creating an app creates a new connection
Each create mints a new connection and new assets. To re-crawl, re-run the existing workflow (see Re-run an existing app).
Hive supports two authentication methods: basic (username/password) and Kerberos.
Basic authentication
- Python
Hive crawling with basic auth
from pyatlan.client.atlan import AtlanClient
from pyatlan.model.apps import HiveCrawler
client = AtlanClient()
response = (
HiveCrawler(client)
.basic( # (1)
username="atlan_user", # (2)
password="••••••", # (3)
host="hive.example.com", # (4)
)
.connection(
name="production-hive",
admin_roles=[client.role_cache.get_id_for_name("$admin")],
)
.include_metadata({"default": ["sales", "marketing"]}) # (5)
.run(name="hive-prod")
)
print(response.slug, response.run_id)
- Step 1—Credential. Username/password auth; the secret is vaulted.
- Required. Username.
- Required. Password.
- Optional. Host. The port (
port=),default_schema, anddatabase_nameare also optional. - Databases/schemas to crawl, as
{database: [schema, ...]}.
Kerberos authentication
- Python
Hive crawling with Kerberos
(
HiveCrawler(client)
.kerberos(
principal="hive/_HOST@EXAMPLE.COM", # (1)
service_name="hive", # (2)
keytab_file=keytab_contents, # (3)
krb5_conf_file=krb5_contents, # (4)
kerberos_type="...", # (5)
host="hive.example.com",
)
.connection(name="production-hive", admin_roles=[...])
.run(name="hive-prod")
)
- Required. The Kerberos principal.
- Required. The service name.
- Required. The keytab file contents.
- Required. The
krb5.conffile contents. - Required. The Kerberos connection type. TLS material
(
ca_cert_file/client_cert_file/client_key_file/client_key_passphrase) anddefault_schema/database_name/host/portare optional.
Configuration options
All metadata options are optional:
- Python
Hive metadata configuration
(
HiveCrawler(client)
.basic(username="atlan_user", password="••••••", host="...")
.connection(name="production-hive", admin_roles=[...])
.include_metadata({"default": ["sales", "marketing"]}) # (1)
.exclude_metadata({"default": ["tmp"]}) # (2)
.allow_partial_success(True) # (3)
.advanced_config("...") # (4)
.run(name="hive-prod")
)
- Databases/schemas to include, as
{database: [schema, ...]}. - Databases/schemas to exclude.
- Ingest the assets Atlan can read even when some are blocked by permissions. Enable only if you understand the permission limitations.
- Advanced workflow configuration—leave unset unless you know you need it.