Glue assets app
The Glue assets app crawls the AWS Glue Data Catalog (databases, tables, columns)
and publishes it to Atlan. Build it with the AtlanGlue builder.
Creating an app creates a new connection
Each create mints a new connection and new assets. To re-crawl, re-run the existing workflow (see Re-run an existing app).
Glue supports two authentication methods: access key/secret (iam) and IAM
role (role).
Access key authentication
- Python
Glue crawling with access key/secret
from pyatlan.client.atlan import AtlanClient
from pyatlan.model.apps import AtlanGlue
client = AtlanClient()
response = (
AtlanGlue(client)
.iam( # (1)
username="AKIA...", # (2)
password="••••••", # (3)
region="us-east-1", # (4)
)
.connection(
name="production-glue",
admin_roles=[client.role_cache.get_id_for_name("$admin")],
)
.include_metadata({"AwsDataCatalog": ["analytics", "sales"]}) # (5)
.run(name="glue-prod")
)
print(response.slug, response.run_id)
- Step 1—Credential. AWS access key/secret auth; the secret is vaulted.
- Required. AWS access key.
- Required. AWS secret key.
- Required. AWS region.
- Step 3—Metadata. Databases to crawl, keyed by catalog:
{catalog: [database, ...]}(for example{"AwsDataCatalog": ["analytics", "sales"]}). The builder nests it to the form the workflow expects. Omit to crawl everything.
IAM role authentication
- Python
Glue crawling with an IAM role
(
AtlanGlue(client)
.role(
aws_role_arn="arn:aws:iam::123456789012:role/atlan", # (1)
aws_external_id="...", # (2)
region="us-east-1", # (3)
)
.connection(name="production-glue", admin_roles=[...])
.run(name="glue-prod")
)
- Optional. The IAM role ARN to assume.
- Optional. AWS external id for the role.
- Required. AWS region.
Configuration options
All metadata options are optional:
- Python
Glue metadata configuration
(
AtlanGlue(client)
.iam(username="AKIA...", password="••••••", region="us-east-1")
.connection(name="production-glue", admin_roles=[...])
.catalog_id("AwsDataCatalog") # (1)
.include_metadata({"AwsDataCatalog": ["analytics"]}) # (2)
.exclude_metadata({"AwsDataCatalog": ["staging"]}) # (3)
.exclude_table_regex(".*_tmp$") # (4)
.run(name="glue-prod")
)
- The Glue Data Catalog id. Use
AwsDataCatalogfor the default catalog; for S3 Table Buckets use<account_id>:s3tablescatalog/<bucket_name>. - Databases to include. Exclude takes priority over include.
- Databases to exclude.
- Regex of tables to exclude (defaults to including all tables).