Skip to main content

OBSERVABILITY Namespace

The OBSERVABILITY namespace exposes workflow and job execution metrics from Atlan's internal Lakehouse services. Use it to track data quality (DQ) scores over time, monitor job success and failure rates, analyze retry patterns, and measure job duration across Lakehouse pipelines.

This reference provides complete configuration details for the OBSERVABILITY namespace, including table schemas, column definitions, and when to use each element in your queries.

Core tables

The following table is available in the OBSERVABILITY namespace. Use it when you need to query or report on Lakehouse job runs, data quality scores, or pipeline health.

  • JOB_METRICS: One row per job execution (workflow run). Use this table when you need to analyze job lifecycle, success or failure rates, run duration, or workflow-specific metrics. The table stores lifecycle timestamps, status codes, error messages, and a custom_metrics JSON field whose structure depends on job_name. The table is partitioned by month on started_at; include a time-range filter on started_at in your queries for efficient partition pruning.

JOB_METRICS columns

The columns below define the schema of JOB_METRICS. Use this reference when you write SQL for the OBSERVABILITY namespace or when you need to interpret results. Each row represents a single job execution. The custom_metrics column holds a JSON string whose keys vary by job_name; see Custom metrics by job type for the structure per job type.

ColumnTypeRequiredDescription
tenant_idstringYesYour tenant identifier. Use it to scope queries to your tenant.
service_namestringYesThe Lakehouse service that ran the job, for example mdlh. Use it to filter by service.
job_namestringYesLogical job or workflow name, for example AtlasDqOrchestrationWorkflow. Use it to filter or group by job type.
job_instance_idstringYesUnique identifier for this job run. Use it to join or deduplicate executions.
workflow_idstringNoWorkflow identifier when the job is part of a workflow. Use it for workflow-level correlation.
trace_idstringNoDistributed trace ID. Use it to correlate this job with logs or other services.
correlation_idstringNoCross-service correlation ID. Use it to link related job runs across services.
created_attimestamptzYesWhen the job record was written (UTC). Use it for audit or ordering.
started_attimestamptzYesWhen the job started (UTC). Use it for time-range filters and duration calculations. This column is used for partitioning.
completed_attimestamptzNoWhen the job finished (UTC). Use it with started_at to compute run duration.
environmentstringNoDeployment environment. Use it to filter by environment when relevant.
worker_idstringNoID of the worker that ran the job. Use it for capacity or worker-level analysis.
node_idstringNoNode identifier. Use it for cluster or node-level analysis.
cloudstringNoCloud provider, for example aws, azure, or gcp. Use it to segment by cloud.
regionstringNoCloud region. Use it to segment by region.
queue_namestringNoQueue name. Use it when analyzing queue-based execution.
attempt_numberintYesAttempt number for this run. Use it to distinguish first run from retries.
retry_countintNoTotal number of retries. Use it to analyze retry behavior.
status_messagestringNoHuman-readable status, for example SUCCESS. Use it for reporting.
status_codeintYesNumeric status code; 200 means success. Use it to filter or aggregate by success or failure.
error_messagestringNoError details when the job failed. Use it for troubleshooting and failure analysis.
custom_metricsstringNoJob-specific metrics as JSON. Use it when you need DQ scores, record counts, or other workflow metrics. Structure varies by job_name; see Custom metrics by job type.
versionintNoSchema version of the job record. Use it when handling multiple schema versions.

Partitioning

The table is partitioned by month on the started_at column. When you query JOB_METRICS, include a time-range filter on started_at (for example, last 7 or 30 days) so the engine can skip irrelevant partitions and run faster.

Sort order

Rows are stored in order of tenant_id, service_name, job_name, then started_at (all ascending). Queries that filter or group by these columns in a similar order typically perform better.

Custom metrics by job type

The custom_metrics column in JOB_METRICS holds a JSON string whose keys and structure depend on job_name. Use this section when you need to know which fields are available for a given job type (for example, to extract DQ scores or record counts in SQL). The tables below list known job types and the key metrics exposed in each. Use the tabs to browse by category: data quality, metadata sync, or table maintenance and scheduling.

Job nameDescriptionKey metrics
AtlasDqOrchestrationWorkflowOrchestrates a full DQ run across all entity types.dq_score, total_typedefs, total_atlas_count, total_lh_count, total_missing_count, total_extra_count, total_mismatch_count, total_duration_ms
AtlasTypeDefDqWorkflowPer-entity-type DQ check comparing Atlas to Lakehouse counts.typedef_name, atlas_count, lh_count, missing_count, extra_count, mismatch_count, duration_ms
UsageAnalyticsCountValidationWorkflowValidates row counts for usage analytics tables.overall_dq_score, tables_validated, failed_table_count, threshold_passed, total_duration_ms
info

The set of job types and their custom_metrics schemas may expand as new Lakehouse features are added. To see which job types exist in your tenant, run SELECT DISTINCT job_name on the JOB_METRICS table in your Lakehouse database or catalog.

Example queries

The following examples show how to query the OBSERVABILITY namespace for common use cases: trending DQ scores, job success and failure rates, and job duration. Use the panel below to browse by category and copy the SQL. Replace {{DATABASE}} with your Lakehouse database name (Snowflake) or catalog name (Databricks). On Databricks, use DATE_ADD(CURRENT_TIMESTAMP(), -30) instead of DATEADD('day', -30, CURRENT_TIMESTAMP()) and get_json_object(custom_metrics, '$.dq_score') (or the relevant path) to read JSON fields from custom_metrics.

Browse OBSERVABILITY example queries by category. Select a query to see the SQL and key columns used.

3of 3 results
DQ score over time
Data quality
Job duration analysis
Performance
Job success and failure rates
Job health
🔌

Select an item

Choose an item from the list to view details

See also

  • Data reference: Overview of all Lakehouse namespaces.
  • Get started with Lakehouse: Enable Lakehouse for your organization before running these queries.
  • Use cases: Browse Lakehouse use cases across metadata quality, lineage, glossary analysis, and usage analytics.