Sync Lake Formation tags to custom metadata App
The Lake Formation Tag Sync app applies AWS Lake Formation tags as custom metadata properties on assets in Atlan. It reads tag association data exported from AWS Lake Formation and maps them to Atlan custom metadata using two mapping files you provide. This reference provides complete configuration details.
Access
The Lake Formation Tag Sync app isn't enabled by default. To use this app, contact Atlan support and request it be added to your tenant.
Configuration
This section defines the fields used to configure the workflow and control how Lake Formation tag data is retrieved and applied.
Workflow name
Specifies a unique name to identify this workflow configuration in Atlan. This name appears in the workflow list and helps distinguish it from other automation workflows.
Example:
lake-formation-tag-sync-prod
Import lake tags from
Defines where the tag association and mapping files are retrieved from.
- Object storage
Use this option to retrieve the tag and mapping files from a cloud object store. This is the recommended approach for production environments and recurring syncs.
For detailed information on configuring storage credentials, access methods, and required fields for each provider, see the general Object storage configuration for apps guide, which applies to S3, GCS, and ADLS-based imports.
When your Atlan tenant is deployed on AWS, you can leave the AWS access key, AWS secret key, region, and bucket fields blank to reuse Atlan's own backing S3 store. You can also configure a cross-account bucket policy so that Atlan can access your S3 bucket and leave these fields blank.
Prefix (path)
The directory path within the object store from which to retrieve the files. Use forward slashes (/) as path separators.
- If left blank, the workflow searches from the root of the bucket or container.
- All required files must be present under this path.
Example:
lake-formation/exports/2024-01
Options
Controls how the workflow processes the input files and handles errors.
- Default
- Advanced
When Default is selected, the following behaviors apply:
- All blank fields in the input file are ignored.
- Any invalid value in a field causes the import to fail rather than proceeding.
- Assets are matched case-sensitively.
- Type names in the input file are strictly adhered to.
- Comma (
,) is the expected field separator. - A maximum of 20 records are processed per underlying API request.
Selecting Advanced provides more control over how files are processed.
Fail on errors?
Defines how the workflow responds when it encounters an invalid value.
- Yes: The workflow stops and fails on the first error encountered.
- No: The workflow logs a warning, skips the invalid record, and continues processing.
Field separator
Specifies the character used to separate fields in input files.
- Default is
,(comma). - Other supported options include
;(semicolon) or|(pipe).
Batch size
The maximum number of records submitted per underlying API request.
- Default is
20. - Increase this value for faster processing on large datasets if your tenant can support it.
Input files
The workflow expects three types of files to be present in the configured object storage location. All files must be placed under the same prefix (path).
Tag association files
One or more JSON files that contain the Lake Formation tag associations exported from AWS. Each file must have a name beginning with iftag_association (for example, iftag_association_2024.json). At least one tag association file is required.
The files must follow the format documented in the AWS CLI reference for Lake Formation tag associations. Each record specifies which AWS resource (database, table, or column) has which tags and values applied. The workflow uses the resource identifiers to find the corresponding asset in Atlan (within the connection resolved from the connection map) and uses the tag keys and values to set custom metadata (using the metadata map).
Example filename and record structure
Filename:
iftag_association_prod.json
A typical record in the tag association file identifies a resource and one or more tag key-value pairs. The exact schema is defined by AWS; conceptually, a record includes fields such as DatabaseName, and optionally table and column identifiers, plus tag associations (for example, TagKey and TagValue). For a database-level tag, the record might reference only the database name; for a column-level tag, it includes database, table, and column so the workflow can target the right asset in Atlan.
Connection mapping file
A single JSON file named exactly connection_map.json. This file maps the DatabaseName values found in the tag association files to fully qualified connection names in Atlan.
Lake Formation tag exports use database names as defined in your AWS account (for example, prod_customer_db or dev_analytics). Those names often don't match the connection names or qualified names used in Atlan. The connection map bridges that gap so the workflow knows which Atlan connection to use when resolving databases, tables, and columns from the tag association records.
When the workflow reads a DatabaseName value, it splits the string at the first underscore (_) character and uses the left part as the lookup key. The corresponding value in connection_map.json must be the fully qualified connection name in Atlan (the same identifier you see in the connection list or in asset qualified names).
Example and how the lookup works
File: connection_map.json
{
"prod": "default/snowflake/production-warehouse",
"dev": "default/snowflake/dev-warehouse"
}
- A
DatabaseNameofprod_customer_dbis split at the first_, giving the keyprod. The workflow looks upprodand uses the connectiondefault/snowflake/production-warehousein Atlan. - A
DatabaseNameofdev_analyticsgives the keydevand resolves todefault/snowflake/dev-warehouse.
Make sure every prefix that appears in your tag association files (the part before the first underscore in each DatabaseName) has a corresponding key in connection_map.json; otherwise the workflow can't resolve the connection and fails or skips those records.
Metadata mapping file
A single JSON file named exactly metadata_map.json. This file maps TagKey values from the tag association files to Atlan custom metadata properties.
Lake Formation tag keys (for example, data_sensitivity, pii_flag) are arbitrary names you define in AWS. In Atlan, the same information is stored as custom metadata (sets and properties such as "Data Governance::Sensitivity Level"). The metadata map defines which Lake Formation tag key corresponds to which Atlan custom metadata set and property. For each tag association record, the workflow looks up the tag key in this file and writes the tag value to the specified custom metadata property on the matched asset.
Each key in the dictionary is a TagKey value from the tag association files. The corresponding value is the human-readable custom metadata set name, followed by ::, followed by the human-readable property name (as shown in the Atlan UI when you manage custom metadata).
Example
File: metadata_map.json
{
"data_sensitivity": "Data Governance::Sensitivity Level",
"data_owner": "Data Governance::Owner",
"pii_flag": "Compliance::PII"
}
If a tag association record has TagKey data_sensitivity and TagValue Confidential, the workflow sets the custom metadata property Data Governance :: Sensitivity Level to Confidential on the corresponding asset in Atlan.
If a TagKey maps to a custom metadata property of type Options, any TagValues specified in the tag association files that don't yet exist as allowed options are created automatically.
Any tag key that appears in the tag association files but is missing from metadata_map.json is ignored (no custom metadata is written for that key).