Skip to main content

Workflows and Data Processing

Everything about managing data workflows, understanding lineage generation, and optimizing data processing pipelines in Atlan.

How do I configure custom cron schedules?

You can use cron expressions to create custom cron schedules for your workflows in Atlan. A cron expression helps you specify the date and time for when a scheduled task must be executed.

Cron expressions consist of five date and time fields separated by white space - there shouldn't be any white spaces within a field value.

Cron expressions in Atlan include the following five fields and corresponding values:

Field nameAllowed valuesAllowed special characters
Minutes0-59, - * /
Hours0-23, - * /
Day of the month1-31, - * /
Month0-12, - * /
Day of the week0-7, - * /
  • , - comma specifies a list of values
  • - - dash specifies a range of values
  • * - asterisk specifies all possible values for a field
  • / - slash is used to skip a given number of values

Examples of cron expressions and their respective meanings:

  • 0 0 1 * * - Run at midnight on day 1 of every month
  • 0 0 * * * - Run once a day at midnight
  • 0 */3 * * * - Run at minute 0 past every 3rd hour or run every 3 hours
  • 0 0 1,15 * * - Run at midnight on day 1 and 15 of every month
  • 30 14 * * 1,3 - Run at 14:30 on Monday and Wednesday
  • 0 6,18 * * * - Run at minute 0 past hours 6 and 18 or run at 6 AM and then 6 PM.
  • 0 0 1 3,6,9,12 * - Run at midnight on day 1 of March, June, September, and December

Why is the first miner run taking so long to finish?

Typically, the first run of the miner takes longer than usual. This is likely because it's parsing through queries beginning from the date chosen during setup.

It's recommended that the start date be no further back than a week. As long as the miner is scheduled to run and running, it continuously picks up and builds lineage as data flows run.

Subsequent runs must be much quicker in relation to the first run - especially if the miner is set up to run daily. In that case, it only parses through new queries as opposed to historic ones. Keep in mind that the number of queries or transformations running daily can also be a factor in the time it takes for the miner to run.

To learn more about miner logic, see here.

Are there any extra steps required for rerunning the miner?

The miner can rerun without any additional steps involved. However, it's possible that the miner errors out when running after a few weeks. If this happens, consider changing the start date of the miner config to be no further back than a week.

Why do some workflows take longer?

You can take the following reasons into consideration when accounting for longer workflow runtimes:

  • Extracting a high volume of assets from the source may increase the time a workflow takes to complete.
  • Atlan workflows run differential crawls, bringing in only delta changes from the source. This speeds things up, and further parallelisation helps optimise runtime.
  • On some days the runtime can exceed the usual duration:
    • There may be more transformations to process, so more delta changes need to be synced to Atlan.
    • If one of those transformations includes a delete operation, Atlan archives the removed assets. Archival can take longer and therefore extend the overall runtime.

For general guidelines, see How to order workflows.

Why is the workflow config or new workflow button not working?

If the workflow config page is blank or the New workflow button doesn't proceed to the next step, try these checks:

  1. Open Atlan in an incognito / private-browsing window and see whether the page loads.
  2. If it loads, verify whether your browser has any ad-blockers enabled.
  3. Either disable the ad-blocker or add *.atlan.com to the allowlist.

Workflow is failing with: Delete percentage is more than 80.0. Exiting.

Atlan has added guardrails to the workflow execution to make sure that assets don't get archived accidentally. A circuit breaker logic is triggered when 80% of existing assets are missing in the current workflow run. This logic aborts the workflow and prevents it from committing any changes.

This situation commonly arises if:

  • Permissions for the Atlan integration user are revoked or updated in the source system.
  • Include and exclude metadata filters in the workflow configuration are modified.
  • Assets are removed from the source system.

In case the mass deletion is intentional, please reach out to Atlan support to disable the circuit breaker. The next run of the workflow proceeds with archiving assets that aren't part of the run. However, note that any metadata updates (tags, descriptions, and more) on these assets are lost.

How's lineage from procedures deduced in Atlan?

Lineage from procedures is inferred indirectly from the query history generated when the procedure runs.

Can offline extraction fail if there are spaces in the path?

Atlan currently doesn't support spaces in folder names for S3. The offline extraction workflow fails if you include any spaces in the folder name in S3. To follow documented guidelines for safe characters, refer to Amazon S3 documentation.

Is the existing file in the bucket overwritten when uploading the JSON files?

Yes. For dbt Core workflows the recommended approach is to replace the folder with the new manifest.json and run_results.json files. The workflow uses file names to locate its inputs and doesn't check timestamps. (The catalog.json file is no longer required.)

Can I configure a Snowflake workflow using account usage and then switch to information schema?

When you modify an existing Snowflake connection and change the extraction method, Atlan deletes and recreates all assets in that connection. If you need to switch from Account usage to Information schema, please contact Atlan support so the change can be applied safely.

Can I receive notifications when workflows fail?

Atlan can send failure alerts to Slack and Microsoft Teams.

In Admin → Integrations, open the Slack (or Microsoft Teams) tile and enable Receive failure alerts only to get a notification whenever a workflow fails.

Can I create a multi-step approval workflow in Atlan?

Yes. Atlan integrates with tools such as Jira, enabling you to build multi-step approval workflows that match your organisation's processes.

Is the PII tagging of data or metadata automated?

Atlan propagates tags based on hierarchy and lineage. For example, if you attach a tag named PII to a table and tag propagation is enabled, the tag is copied to downstream columns.

Atlan doesn't automatically detect PII. Propagation only occurs if you enable it manually or automate it using playbooks. For details, see Why does tag propagation take time to apply?.

Are there any dbt assets that cannot be viewed in dbt?

Atlan shows the View in dbt link only for dbt models, sources, and tests that include a valid target_url. Assets without a target URL won't display the link.

Can I follow the background processes of workflows in Argo?

If you have cluster-level access, you can open the built-in Argo UI at https://your-atlan-domain/argo to watch each workflow's DAG, pod logs, and retry status in real time. Otherwise, use the History tab inside the workflow sidebar in Atlan or the run-level logs downloadable from the Runs table.

How does Atlan work with dbt single-tenant vs multi-tenant?

For dbt Cloud single-tenant projects, Atlan authenticates with the project-scoped API key you provide. For multi-tenant workspaces, Atlan uses your account-level service token and the project ID to pull lineage and documentation. Behaviour in Atlan is identical; the difference is only in where the credentials are scoped in dbt Cloud.

Cloud logging and monitoring

Atlan sends application and access logs to your cloud provider's native logging service: CloudWatch (AWS), Stackdriver (GCP), or Azure Monitor. You can ingest these logs into your SIEM for central monitoring. Contact Atlan support to enable log shipping for your tenant.