Skip to main content

OBSERVABILITY namespace

The OBSERVABILITY namespace exposes operational data from Atlan's Lakehouse services. Use it to track data quality (DQ) scores over time, monitor job success and failure rates, analyze retry patterns, measure job duration across Lakehouse pipelines, and investigate app execution logs.

This reference provides complete configuration details for the OBSERVABILITY namespace, including table schemas, column definitions, and when to use each element in your queries.

Core tables

The following tables are available in the OBSERVABILITY namespace. Use them when you need to query or report on Lakehouse job runs, data quality scores, pipeline health, or app execution logs.

  • JOB_METRICS: One row per internal Lakehouse job execution (workflow run). Use this table when you need to analyze job lifecycle, success or failure rates, run duration, or workflow-specific metrics for internal Lakehouse pipelines (for example, DQ workflows, metadata sync, and table maintenance). The table stores lifecycle timestamps, status codes, error messages, and a custom_metrics JSON field whose structure depends on job_name. The table is partitioned by month on started_at; include a time-range filter on started_at in your queries for efficient partition pruning.

  • APP_LOGS: One row per log event from an Atlan app or connector execution (for example, a Snowflake or Redshift connector run). Use this table when you need to debug a failed workflow, investigate errors or exceptions from app executions, or analyze log events from connector runs. The table is partitioned by day on timestamp; include a time-range filter on timestamp in your queries for efficient partition pruning.

note

JOB_METRICS and APP_LOGS track different systems—JOB_METRICS covers internal Lakehouse jobs, while APP_LOGS covers Atlan app and connector workflows. These tables can't be joined.

Column reference

One row per job execution (workflow run). The custom_metrics column holds a JSON string whose structure depends on job_name; see Custom metrics by job type for details.

ColumnTypeDescription
tenant_idstringYour tenant identifier. Use it to scope queries to your tenant.
service_namestringThe Lakehouse service that ran the job, for example mdlh. Use it to filter by service.
job_namestringLogical job or workflow name, for example AtlasDqOrchestrationWorkflow. Use it to filter or group by job type.
job_instance_idstringUnique identifier for this job run. Use it to join or deduplicate executions.
workflow_idstringWorkflow identifier when the job is part of a workflow. Use it for workflow-level correlation.
trace_idstringDistributed trace ID. Use it to correlate this job with logs or other services.
correlation_idstringCross-service correlation ID. Use it to link related job runs across services.
created_attimestamptzWhen the job record was written (UTC). Use it for audit or ordering.
started_attimestamptzWhen the job started (UTC). Use it for time-range filters and duration calculations. This column is used for partitioning.
completed_attimestamptzWhen the job finished (UTC). Use it with started_at to compute run duration.
environmentstringDeployment environment. Use it to filter by environment when relevant.
worker_idstringID of the worker that ran the job. Use it for capacity or worker-level analysis.
node_idstringNode identifier. Use it for cluster or node-level analysis.
cloudstringCloud provider, for example aws, azure, or gcp. Use it to segment by cloud.
regionstringCloud region. Use it to segment by region.
queue_namestringQueue name. Use it when analyzing queue-based execution.
attempt_numberintAttempt number for this run. Use it to distinguish first run from retries.
retry_countintTotal number of retries. Use it to analyze retry behavior.
status_messagestringHuman-readable status, for example SUCCESS. Use it for reporting.
status_codeintNumeric status code; 200 means success. Use it to filter or aggregate by success or failure.
error_messagestringError details when the job failed. Use it for troubleshooting and failure analysis.
custom_metricsstringJob-specific metrics as JSON. Use it when you need DQ scores, record counts, or other workflow metrics. Structure varies by job_name; see Custom metrics by job type.
versionintSchema version of the job record. Use it when handling multiple schema versions.
Partitioning

Partitioned by month on started_at. Include a time-range filter on started_at (for example, last 7 or 30 days) so the engine can skip irrelevant partitions.

Sort order

Rows are sorted by tenant_idservice_namejob_namestarted_at (all ascending). Queries that filter or group by these columns in order typically perform better.

Custom metrics by job type

The custom_metrics column in JOB_METRICS holds a JSON string whose keys and structure depend on job_name. Use this section when you need to know which fields are available for a given job type (for example, to extract DQ scores or record counts in SQL). The tables below list known job types and the key metrics exposed in each. Use the tabs to browse by category: data quality, metadata sync, or table maintenance and scheduling.

Job nameDescriptionKey metrics
AtlasDqOrchestrationWorkflowOrchestrates a full DQ run across all entity types.dq_score, total_typedefs, total_atlas_count, total_lh_count, total_missing_count, total_extra_count, total_mismatch_count, total_duration_ms
AtlasTypeDefDqWorkflowPer-entity-type DQ check comparing Atlas to Lakehouse counts.typedef_name, atlas_count, lh_count, missing_count, extra_count, mismatch_count, duration_ms
UsageAnalyticsCountValidationWorkflowValidates row counts for usage analytics tables.overall_dq_score, tables_validated, failed_table_count, threshold_passed, total_duration_ms
info

The set of job types and their custom_metrics schemas may expand as new Lakehouse features are added. To see which job types exist in your tenant, run SELECT DISTINCT job_name on the JOB_METRICS table in your Lakehouse database or catalog.

Example queries

The following examples show how to query the OBSERVABILITY namespace for common use cases: trending DQ scores, job success and failure rates, job duration, and app log analysis. Use the panel below to browse by category and copy the SQL. Replace {{DATABASE}} with your Lakehouse database name (Snowflake) or catalog name (Databricks). On Databricks, use DATE_ADD(CURRENT_TIMESTAMP(), -30) instead of DATEADD('day', -30, CURRENT_TIMESTAMP()) and get_json_object(custom_metrics, '$.dq_score') (or the relevant path) to read JSON fields from custom_metrics.

Browse OBSERVABILITY example queries by category. Select a query to see the SQL and key columns used.

7of 7 results
Debug a failed workflow
App logs
DQ score over time
Data quality
Error rate by app
App logs
Exception analysis
App logs
Job duration analysis
Performance
Job success and failure rates
Job health
Recent errors by workflow run
App logs
🔌

Select an item

Choose an item from the list to view details

See also

  • Data reference: Overview of all Lakehouse namespaces.
  • Get started with Lakehouse: Enable Lakehouse for your organization before running these queries.
  • Use cases: Browse Lakehouse use cases across metadata quality, lineage, glossary analysis, and usage analytics.