OBSERVABILITY Namespace
The OBSERVABILITY namespace exposes workflow and job execution metrics from Atlan's internal Lakehouse services. Use it to track data quality (DQ) scores over time, monitor job success and failure rates, analyze retry patterns, and measure job duration across Lakehouse pipelines.
This reference provides complete configuration details for the OBSERVABILITY namespace, including table schemas, column definitions, and when to use each element in your queries.
Core tables
The following table is available in the OBSERVABILITY namespace. Use it when you need to query or report on Lakehouse job runs, data quality scores, or pipeline health.
JOB_METRICS: One row per job execution (workflow run). Use this table when you need to analyze job lifecycle, success or failure rates, run duration, or workflow-specific metrics. The table stores lifecycle timestamps, status codes, error messages, and acustom_metricsJSON field whose structure depends onjob_name. The table is partitioned by month onstarted_at; include a time-range filter onstarted_atin your queries for efficient partition pruning.
JOB_METRICS columns
The columns below define the schema of JOB_METRICS. Use this reference when you write SQL for the OBSERVABILITY namespace or when you need to interpret results. Each row represents a single job execution. The custom_metrics column holds a JSON string whose keys vary by job_name; see Custom metrics by job type for the structure per job type.
| Column | Type | Required | Description |
|---|---|---|---|
tenant_id | string | Yes | Your tenant identifier. Use it to scope queries to your tenant. |
service_name | string | Yes | The Lakehouse service that ran the job, for example mdlh. Use it to filter by service. |
job_name | string | Yes | Logical job or workflow name, for example AtlasDqOrchestrationWorkflow. Use it to filter or group by job type. |
job_instance_id | string | Yes | Unique identifier for this job run. Use it to join or deduplicate executions. |
workflow_id | string | No | Workflow identifier when the job is part of a workflow. Use it for workflow-level correlation. |
trace_id | string | No | Distributed trace ID. Use it to correlate this job with logs or other services. |
correlation_id | string | No | Cross-service correlation ID. Use it to link related job runs across services. |
created_at | timestamptz | Yes | When the job record was written (UTC). Use it for audit or ordering. |
started_at | timestamptz | Yes | When the job started (UTC). Use it for time-range filters and duration calculations. This column is used for partitioning. |
completed_at | timestamptz | No | When the job finished (UTC). Use it with started_at to compute run duration. |
environment | string | No | Deployment environment. Use it to filter by environment when relevant. |
worker_id | string | No | ID of the worker that ran the job. Use it for capacity or worker-level analysis. |
node_id | string | No | Node identifier. Use it for cluster or node-level analysis. |
cloud | string | No | Cloud provider, for example aws, azure, or gcp. Use it to segment by cloud. |
region | string | No | Cloud region. Use it to segment by region. |
queue_name | string | No | Queue name. Use it when analyzing queue-based execution. |
attempt_number | int | Yes | Attempt number for this run. Use it to distinguish first run from retries. |
retry_count | int | No | Total number of retries. Use it to analyze retry behavior. |
status_message | string | No | Human-readable status, for example SUCCESS. Use it for reporting. |
status_code | int | Yes | Numeric status code; 200 means success. Use it to filter or aggregate by success or failure. |
error_message | string | No | Error details when the job failed. Use it for troubleshooting and failure analysis. |
custom_metrics | string | No | Job-specific metrics as JSON. Use it when you need DQ scores, record counts, or other workflow metrics. Structure varies by job_name; see Custom metrics by job type. |
version | int | No | Schema version of the job record. Use it when handling multiple schema versions. |
Partitioning
The table is partitioned by month on the started_at column. When you query JOB_METRICS, include a time-range filter on started_at (for example, last 7 or 30 days) so the engine can skip irrelevant partitions and run faster.
Sort order
Rows are stored in order of tenant_id, service_name, job_name, then started_at (all ascending). Queries that filter or group by these columns in a similar order typically perform better.
Custom metrics by job type
The custom_metrics column in JOB_METRICS holds a JSON string whose keys and structure depend on job_name. Use this section when you need to know which fields are available for a given job type (for example, to extract DQ scores or record counts in SQL). The tables below list known job types and the key metrics exposed in each. Use the tabs to browse by category: data quality, metadata sync, or table maintenance and scheduling.
- Data quality
- Metadata sync
- Table maintenance and scheduling
| Job name | Description | Key metrics |
|---|---|---|
AtlasDqOrchestrationWorkflow | Orchestrates a full DQ run across all entity types. | dq_score, total_typedefs, total_atlas_count, total_lh_count, total_missing_count, total_extra_count, total_mismatch_count, total_duration_ms |
AtlasTypeDefDqWorkflow | Per-entity-type DQ check comparing Atlas to Lakehouse counts. | typedef_name, atlas_count, lh_count, missing_count, extra_count, mismatch_count, duration_ms |
UsageAnalyticsCountValidationWorkflow | Validates row counts for usage analytics tables. | overall_dq_score, tables_validated, failed_table_count, threshold_passed, total_duration_ms |
| Job name | Description | Key metrics |
|---|---|---|
AtlasBulkTypedefRefreshWorkflow | Bulk refresh of entity type definitions into Lakehouse. | typedefs_total, success_count, failed_count, total_records |
AtlasNotificationProcessorWorkflow | Processes incremental metadata change notifications. | total_files, total_messages, total_batches, total_duration_ms |
AtlasReconciliationWorkflow | Reconciles mutated assets between Atlas and Lakehouse. | mutated_assets_extracted, typedefs_partitioned, total_records_upserted |
SnowflakeIncrementalExtractionWorkflow | Extracts incremental changes for Snowflake-connected tables. | tables_processed, total_records, total_duration_ms |
DataConnectionProcessingWorkflow | Processes data connection records into Lakehouse. | records_processed, records_skipped, destination_namespace, destination_table, total_duration_ms |
| Job name | Description | Key metrics |
|---|---|---|
IcebergCompactionWorkflow | Compacts small Iceberg data files for query performance. | No custom metrics (lifecycle only). |
IcebergSnapshotCleanupWorkflow | Expires old Iceberg snapshots to reclaim storage. | No custom metrics (lifecycle only). |
IcebergOrphanFileCleanupWorkflow | Removes orphaned data files from object storage. | No custom metrics (lifecycle only). |
Various *SchedulerWorkflow | Scheduler jobs that trigger other workflows on a schedule. | No custom metrics (lifecycle only). |
The set of job types and their custom_metrics schemas may expand as new Lakehouse features are added. To see which job types exist in your tenant, run SELECT DISTINCT job_name on the JOB_METRICS table in your Lakehouse database or catalog.
Example queries
The following examples show how to query the OBSERVABILITY namespace for common use cases: trending DQ scores, job success and failure rates, and job duration. Use the panel below to browse by category and copy the SQL. Replace {{DATABASE}} with your Lakehouse database name (Snowflake) or catalog name (Databricks). On Databricks, use DATE_ADD(CURRENT_TIMESTAMP(), -30) instead of DATEADD('day', -30, CURRENT_TIMESTAMP()) and get_json_object(custom_metrics, '$.dq_score') (or the relevant path) to read JSON fields from custom_metrics.
Browse OBSERVABILITY example queries by category. Select a query to see the SQL and key columns used.
Select an item
Choose an item from the list to view details
See also
- Data reference: Overview of all Lakehouse namespaces.
- Get started with Lakehouse: Enable Lakehouse for your organization before running these queries.
- Use cases: Browse Lakehouse use cases across metadata quality, lineage, glossary analysis, and usage analytics.