OBSERVABILITY namespace
The OBSERVABILITY namespace exposes operational data from Atlan's Lakehouse services. Use it to track data quality (DQ) scores over time, monitor job success and failure rates, analyze retry patterns, measure job duration across Lakehouse pipelines, and investigate app execution logs.
This reference provides complete configuration details for the OBSERVABILITY namespace, including table schemas, column definitions, and when to use each element in your queries.
Core tables
The following tables are available in the OBSERVABILITY namespace. Use them when you need to query or report on Lakehouse job runs, data quality scores, pipeline health, or app execution logs.
-
JOB_METRICS: One row per internal Lakehouse job execution (workflow run). Use this table when you need to analyze job lifecycle, success or failure rates, run duration, or workflow-specific metrics for internal Lakehouse pipelines (for example, DQ workflows, metadata sync, and table maintenance). The table stores lifecycle timestamps, status codes, error messages, and acustom_metricsJSON field whose structure depends onjob_name. The table is partitioned by month onstarted_at; include a time-range filter onstarted_atin your queries for efficient partition pruning. -
APP_LOGS: One row per log event from an Atlan app or connector execution (for example, a Snowflake or Redshift connector run). Use this table when you need to debug a failed workflow, investigate errors or exceptions from app executions, or analyze log events from connector runs. The table is partitioned by day ontimestamp; include a time-range filter ontimestampin your queries for efficient partition pruning.
JOB_METRICS and APP_LOGS track different systems—JOB_METRICS covers internal Lakehouse jobs, while APP_LOGS covers Atlan app and connector workflows. These tables can't be joined.
Column reference
- JOB_METRICS
- APP_LOGS
One row per job execution (workflow run). The custom_metrics column holds a JSON string whose structure depends on job_name; see Custom metrics by job type for details.
| Column | Type | Description |
|---|---|---|
tenant_id | string | Your tenant identifier. Use it to scope queries to your tenant. |
service_name | string | The Lakehouse service that ran the job, for example mdlh. Use it to filter by service. |
job_name | string | Logical job or workflow name, for example AtlasDqOrchestrationWorkflow. Use it to filter or group by job type. |
job_instance_id | string | Unique identifier for this job run. Use it to join or deduplicate executions. |
workflow_id | string | Workflow identifier when the job is part of a workflow. Use it for workflow-level correlation. |
trace_id | string | Distributed trace ID. Use it to correlate this job with logs or other services. |
correlation_id | string | Cross-service correlation ID. Use it to link related job runs across services. |
created_at | timestamptz | When the job record was written (UTC). Use it for audit or ordering. |
started_at | timestamptz | When the job started (UTC). Use it for time-range filters and duration calculations. This column is used for partitioning. |
completed_at | timestamptz | When the job finished (UTC). Use it with started_at to compute run duration. |
environment | string | Deployment environment. Use it to filter by environment when relevant. |
worker_id | string | ID of the worker that ran the job. Use it for capacity or worker-level analysis. |
node_id | string | Node identifier. Use it for cluster or node-level analysis. |
cloud | string | Cloud provider, for example aws, azure, or gcp. Use it to segment by cloud. |
region | string | Cloud region. Use it to segment by region. |
queue_name | string | Queue name. Use it when analyzing queue-based execution. |
attempt_number | int | Attempt number for this run. Use it to distinguish first run from retries. |
retry_count | int | Total number of retries. Use it to analyze retry behavior. |
status_message | string | Human-readable status, for example SUCCESS. Use it for reporting. |
status_code | int | Numeric status code; 200 means success. Use it to filter or aggregate by success or failure. |
error_message | string | Error details when the job failed. Use it for troubleshooting and failure analysis. |
custom_metrics | string | Job-specific metrics as JSON. Use it when you need DQ scores, record counts, or other workflow metrics. Structure varies by job_name; see Custom metrics by job type. |
version | int | Schema version of the job record. Use it when handling multiple schema versions. |
Partitioned by month on started_at. Include a time-range filter on started_at (for example, last 7 or 30 days) so the engine can skip irrelevant partitions.
Rows are sorted by tenant_id → service_name → job_name → started_at (all ascending). Queries that filter or group by these columns in order typically perform better.
One row per log event from an Atlan app or connector execution (for example, a Snowflake or Redshift connector run). Use this table to debug failed connector workflows, investigate exceptions, and analyze log output from app executions.
| Column | Type | Description |
|---|---|---|
timestamp | timestamp (without timezone) | When the log event occurred (microsecond precision). Use it for time-range filters and ordering. This column is used for partitioning. |
level | string | Log severity, for example INFO, WARN, or ERROR. Use it to filter by severity. |
message | string | Log message body. Use it to search for specific diagnostic output. |
correlation_id | string | Cross-service correlation identifier for a workflow execution. Use it to retrieve all logs for a single workflow run. |
app_name | string | Name of the app that emitted the log, for example snowflake or redshift. Use it to filter or group by app. |
logger_name | string | Logger or scope name within the app. Use it for fine-grained filtering. |
trace_id | string | Distributed trace ID. Use it to correlate log events within the same distributed trace. |
span_id | string | Span ID within a trace. Use it for detailed trace-level analysis. |
exception_type | string | Exception class name when the log records an error. Use it to group or filter by exception type. |
exception_message | string | Exception message when the log records an error. Use it for troubleshooting. |
exception_stacktrace | string | Full stack trace when the log records an error. Use it for root-cause analysis. |
tenant_id | string | Your tenant identifier. Use it to scope queries to your tenant. |
Partitioned by day on timestamp. Include a time-range filter on timestamp (for example, last 24 hours or last 7 days) so the engine can skip irrelevant partitions.
Rows are sorted by correlation_id → timestamp (both ascending). Queries that filter by correlation_id and order by timestamp perform best—this matches the most common pattern of retrieving all logs for a single workflow run in chronological order.
Custom metrics by job type
The custom_metrics column in JOB_METRICS holds a JSON string whose keys and structure depend on job_name. Use this section when you need to know which fields are available for a given job type (for example, to extract DQ scores or record counts in SQL). The tables below list known job types and the key metrics exposed in each. Use the tabs to browse by category: data quality, metadata sync, or table maintenance and scheduling.
- Data quality
- Metadata sync
- Table maintenance and scheduling
| Job name | Description | Key metrics |
|---|---|---|
AtlasDqOrchestrationWorkflow | Orchestrates a full DQ run across all entity types. | dq_score, total_typedefs, total_atlas_count, total_lh_count, total_missing_count, total_extra_count, total_mismatch_count, total_duration_ms |
AtlasTypeDefDqWorkflow | Per-entity-type DQ check comparing Atlas to Lakehouse counts. | typedef_name, atlas_count, lh_count, missing_count, extra_count, mismatch_count, duration_ms |
UsageAnalyticsCountValidationWorkflow | Validates row counts for usage analytics tables. | overall_dq_score, tables_validated, failed_table_count, threshold_passed, total_duration_ms |
| Job name | Description | Key metrics |
|---|---|---|
AtlasBulkTypedefRefreshWorkflow | Bulk refresh of entity type definitions into Lakehouse. | typedefs_total, success_count, failed_count, total_records |
AtlasNotificationProcessorWorkflow | Processes incremental metadata change notifications. | total_files, total_messages, total_batches, total_duration_ms |
AtlasReconciliationWorkflow | Reconciles mutated assets between Atlas and Lakehouse. | mutated_assets_extracted, typedefs_partitioned, total_records_upserted |
SnowflakeIncrementalExtractionWorkflow | Extracts incremental changes for Snowflake-connected tables. | tables_processed, total_records, total_duration_ms |
DataConnectionProcessingWorkflow | Processes data connection records into Lakehouse. | records_processed, records_skipped, destination_namespace, destination_table, total_duration_ms |
| Job name | Description | Key metrics |
|---|---|---|
IcebergCompactionWorkflow | Compacts small Iceberg data files for query performance. | No custom metrics (lifecycle only). |
IcebergSnapshotCleanupWorkflow | Expires old Iceberg snapshots to reclaim storage. | No custom metrics (lifecycle only). |
IcebergOrphanFileCleanupWorkflow | Removes orphaned data files from object storage. | No custom metrics (lifecycle only). |
Various *SchedulerWorkflow | Scheduler jobs that trigger other workflows on a schedule. | No custom metrics (lifecycle only). |
The set of job types and their custom_metrics schemas may expand as new Lakehouse features are added. To see which job types exist in your tenant, run SELECT DISTINCT job_name on the JOB_METRICS table in your Lakehouse database or catalog.
Example queries
The following examples show how to query the OBSERVABILITY namespace for common use cases: trending DQ scores, job success and failure rates, job duration, and app log analysis. Use the panel below to browse by category and copy the SQL. Replace {{DATABASE}} with your Lakehouse database name (Snowflake) or catalog name (Databricks). On Databricks, use DATE_ADD(CURRENT_TIMESTAMP(), -30) instead of DATEADD('day', -30, CURRENT_TIMESTAMP()) and get_json_object(custom_metrics, '$.dq_score') (or the relevant path) to read JSON fields from custom_metrics.
Browse OBSERVABILITY example queries by category. Select a query to see the SQL and key columns used.
Select an item
Choose an item from the list to view details
See also
- Data reference: Overview of all Lakehouse namespaces.
- Get started with Lakehouse: Enable Lakehouse for your organization before running these queries.
- Use cases: Browse Lakehouse use cases across metadata quality, lineage, glossary analysis, and usage analytics.