Skip to main content

Glue assets app

Connect docs via MCP

The Glue assets app crawls the AWS Glue Data Catalog (databases, tables, columns) and publishes it to Atlan. Build it with the AtlanGlue builder.

Creating an app creates a new connection

Each create mints a new connection and new assets. To re-crawl, re-run the existing workflow (see Re-run an existing app).

Glue supports two authentication methods: access key/secret (iam) and IAM role (role).

Access key authentication

Glue crawling with access key/secret
from pyatlan.client.atlan import AtlanClient
from pyatlan.model.apps import AtlanGlue

client = AtlanClient()

response = (
AtlanGlue(client)
.iam( # (1)
username="AKIA...", # (2)
password="••••••", # (3)
region="us-east-1", # (4)
)
.connection(
name="production-glue",
admin_roles=[client.role_cache.get_id_for_name("$admin")],
)
.include_metadata({"AwsDataCatalog": ["analytics", "sales"]}) # (5)
.run(name="glue-prod")
)
print(response.slug, response.run_id)
  1. Step 1—Credential. AWS access key/secret auth; the secret is vaulted.
  2. Required. AWS access key.
  3. Required. AWS secret key.
  4. Required. AWS region.
  5. Step 3—Metadata. Databases to crawl, keyed by catalog: {catalog: [database, ...]} (for example {"AwsDataCatalog": ["analytics", "sales"]}). The builder nests it to the form the workflow expects. Omit to crawl everything.

IAM role authentication

Glue crawling with an IAM role
(
AtlanGlue(client)
.role(
aws_role_arn="arn:aws:iam::123456789012:role/atlan", # (1)
aws_external_id="...", # (2)
region="us-east-1", # (3)
)
.connection(name="production-glue", admin_roles=[...])
.run(name="glue-prod")
)
  1. Optional. The IAM role ARN to assume.
  2. Optional. AWS external id for the role.
  3. Required. AWS region.

Configuration options

All metadata options are optional:

Glue metadata configuration
(
AtlanGlue(client)
.iam(username="AKIA...", password="••••••", region="us-east-1")
.connection(name="production-glue", admin_roles=[...])
.catalog_id("AwsDataCatalog") # (1)
.include_metadata({"AwsDataCatalog": ["analytics"]}) # (2)
.exclude_metadata({"AwsDataCatalog": ["staging"]}) # (3)
.exclude_table_regex(".*_tmp$") # (4)
.run(name="glue-prod")
)
  1. The Glue Data Catalog id. Use AwsDataCatalog for the default catalog; for S3 Table Buckets use <account_id>:s3tablescatalog/<bucket_name>.
  2. Databases to include. Exclude takes priority over include.
  3. Databases to exclude.
  4. Regex of tables to exclude (defaults to including all tables).
Was this page helpful?