OBSERVABILITY namespace

The OBSERVABILITY namespace exposes operational data from Atlan's Lakehouse services. Use it to track data quality (DQ) scores over time, monitor job success and failure rates, analyze retry patterns, measure job duration across Lakehouse pipelines, and investigate app execution logs.

This reference provides complete configuration details for the OBSERVABILITY namespace, including table schemas, column definitions, and when to use each element in your queries.

Core tables

The following tables are available in the OBSERVABILITY namespace. Use them when you need to query or report on Lakehouse job runs, data quality scores, pipeline health, or app execution logs.

JOB_METRICS: One row per internal Lakehouse job execution (workflow run). Use this table when you need to analyze job lifecycle, success or failure rates, run duration, or workflow-specific metrics for internal Lakehouse pipelines (for example, DQ workflows, metadata sync, and table maintenance). The table stores lifecycle timestamps, status codes, error messages, and a custom_metrics JSON field whose structure depends on job_name. The table is partitioned by month on started_at; include a time-range filter on started_at in your queries for efficient partition pruning.
APP_LOGS: One row per log event from an Atlan app or connector execution (for example, a Snowflake or Redshift connector run). Use this table when you need to debug a failed workflow, investigate errors or exceptions from app executions, or analyze log events from connector runs. The table is partitioned by day on timestamp; include a time-range filter on timestamp in your queries for efficient partition pruning.

note

JOB_METRICS and APP_LOGS track different systems—JOB_METRICS covers internal Lakehouse jobs, while APP_LOGS covers Atlan app and connector workflows. These tables can't be joined.

Column reference

JOB_METRICS
APP_LOGS

One row per job execution (workflow run). The custom_metrics column holds a JSON string whose structure depends on job_name; see Custom metrics by job type for details.

Column	Type	Description
`tenant_id`	string	Your tenant identifier. Use it to scope queries to your tenant.
`service_name`	string	The Lakehouse service that ran the job, for example `mdlh`. Use it to filter by service.
`job_name`	string	Logical job or workflow name, for example `AtlasDqOrchestrationWorkflow`. Use it to filter or group by job type.
`job_instance_id`	string	Unique identifier for this job run. Use it to join or deduplicate executions.
`workflow_id`	string	Workflow identifier when the job is part of a workflow. Use it for workflow-level correlation.
`trace_id`	string	Distributed trace ID. Use it to correlate this job with logs or other services.
`correlation_id`	string	Cross-service correlation ID. Use it to link related job runs across services.
`created_at`	timestamptz	When the job record was written (UTC). Use it for audit or ordering.
`started_at`	timestamptz	When the job started (UTC). Use it for time-range filters and duration calculations. This column is used for partitioning.
`completed_at`	timestamptz	When the job finished (UTC). Use it with `started_at` to compute run duration.
`environment`	string	Deployment environment. Use it to filter by environment when relevant.
`worker_id`	string	ID of the worker that ran the job. Use it for capacity or worker-level analysis.
`node_id`	string	Node identifier. Use it for cluster or node-level analysis.
`cloud`	string	Cloud provider, for example `aws`, `azure`, or `gcp`. Use it to segment by cloud.
`region`	string	Cloud region. Use it to segment by region.
`queue_name`	string	Queue name. Use it when analyzing queue-based execution.
`attempt_number`	int	Attempt number for this run. Use it to distinguish first run from retries.
`retry_count`	int	Total number of retries. Use it to analyze retry behavior.
`status_message`	string	Human-readable status, for example `SUCCESS`. Use it for reporting.
`status_code`	int	Numeric status code; `200` means success. Use it to filter or aggregate by success or failure.
`error_message`	string	Error details when the job failed. Use it for troubleshooting and failure analysis.
`custom_metrics`	string	Job-specific metrics as JSON. Use it when you need DQ scores, record counts, or other workflow metrics. Structure varies by `job_name`; see Custom metrics by job type.
`version`	int	Schema version of the job record. Use it when handling multiple schema versions.

Partitioning

Partitioned by month on started_at. Include a time-range filter on started_at (for example, last 7 or 30 days) so the engine can skip irrelevant partitions.

Sort order

Rows are sorted by tenant_id → service_name → job_name → started_at (all ascending). Queries that filter or group by these columns in order typically perform better.

One row per log event from an Atlan app or connector execution (for example, a Snowflake or Redshift connector run). Use this table to debug failed connector workflows, investigate exceptions, and analyze log output from app executions.

Column	Type	Description
`timestamp`	timestamp (without timezone)	When the log event occurred (microsecond precision). Use it for time-range filters and ordering. This column is used for partitioning.
`level`	string	Log severity, for example `INFO`, `WARN`, or `ERROR`. Use it to filter by severity.
`message`	string	Log message body. Use it to search for specific diagnostic output.
`correlation_id`	string	Cross-service correlation identifier for a workflow execution. Use it to retrieve all logs for a single workflow run.
`app_name`	string	Name of the app that emitted the log, for example `snowflake` or `redshift`. Use it to filter or group by app.
`logger_name`	string	Logger or scope name within the app. Use it for fine-grained filtering.
`trace_id`	string	Distributed trace ID. Use it to correlate log events within the same distributed trace.
`span_id`	string	Span ID within a trace. Use it for detailed trace-level analysis.
`exception_type`	string	Exception class name when the log records an error. Use it to group or filter by exception type.
`exception_message`	string	Exception message when the log records an error. Use it for troubleshooting.
`exception_stacktrace`	string	Full stack trace when the log records an error. Use it for root-cause analysis.
`tenant_id`	string	Your tenant identifier. Use it to scope queries to your tenant.

Partitioning

Partitioned by day on timestamp. Include a time-range filter on timestamp (for example, last 24 hours or last 7 days) so the engine can skip irrelevant partitions.

Sort order

Rows are sorted by correlation_id → timestamp (both ascending). Queries that filter by correlation_id and order by timestamp perform best—this matches the most common pattern of retrieving all logs for a single workflow run in chronological order.

Custom metrics by job type

The custom_metrics column in JOB_METRICS holds a JSON string whose keys and structure depend on job_name. Use this section when you need to know which fields are available for a given job type (for example, to extract DQ scores or record counts in SQL). The tables below list known job types and the key metrics exposed in each. Use the tabs to browse by category: data quality, metadata sync, or table maintenance and scheduling.

Data quality
Metadata sync
Table maintenance and scheduling

Job name	Description	Key metrics
`AtlasDqOrchestrationWorkflow`	Orchestrates a full DQ run across all entity types.	`dq_score`, `total_typedefs`, `total_atlas_count`, `total_lh_count`, `total_missing_count`, `total_extra_count`, `total_mismatch_count`, `total_duration_ms`
`AtlasTypeDefDqWorkflow`	Per-entity-type DQ check comparing Atlas to Lakehouse counts.	`typedef_name`, `atlas_count`, `lh_count`, `missing_count`, `extra_count`, `mismatch_count`, `duration_ms`
`UsageAnalyticsCountValidationWorkflow`	Validates row counts for usage analytics tables.	`overall_dq_score`, `tables_validated`, `failed_table_count`, `threshold_passed`, `total_duration_ms`

Job name	Description	Key metrics
`AtlasBulkTypedefRefreshWorkflow`	Bulk refresh of entity type definitions into Lakehouse.	`typedefs_total`, `success_count`, `failed_count`, `total_records`
`AtlasNotificationProcessorWorkflow`	Processes incremental metadata change notifications.	`total_files`, `total_messages`, `total_batches`, `total_duration_ms`
`AtlasReconciliationWorkflow`	Reconciles mutated assets between Atlas and Lakehouse.	`mutated_assets_extracted`, `typedefs_partitioned`, `total_records_upserted`
`SnowflakeIncrementalExtractionWorkflow`	Extracts incremental changes for Snowflake-connected tables.	`tables_processed`, `total_records`, `total_duration_ms`
`DataConnectionProcessingWorkflow`	Processes data connection records into Lakehouse.	`records_processed`, `records_skipped`, `destination_namespace`, `destination_table`, `total_duration_ms`

Job name	Description	Key metrics
`IcebergCompactionWorkflow`	Compacts small Iceberg data files for query performance.	No custom metrics (lifecycle only).
`IcebergSnapshotCleanupWorkflow`	Expires old Iceberg snapshots to reclaim storage.	No custom metrics (lifecycle only).
`IcebergOrphanFileCleanupWorkflow`	Removes orphaned data files from object storage.	No custom metrics (lifecycle only).
Various `*SchedulerWorkflow`	Scheduler jobs that trigger other workflows on a schedule.	No custom metrics (lifecycle only).

info

The set of job types and their custom_metrics schemas may expand as new Lakehouse features are added. To see which job types exist in your tenant, run SELECT DISTINCT job_name on the JOB_METRICS table in your Lakehouse database or catalog.

Example queries

The following examples show how to query the OBSERVABILITY namespace for common use cases: trending DQ scores, job success and failure rates, job duration, and app log analysis. Use the panel below to browse by category and copy the SQL. Replace {{DATABASE}} with your Lakehouse database name (Snowflake) or catalog name (Databricks). On Databricks, use DATE_ADD(CURRENT_TIMESTAMP(), -30) instead of DATEADD('day', -30, CURRENT_TIMESTAMP()) and get_json_object(custom_metrics, '$.dq_score') (or the relevant path) to read JSON fields from custom_metrics.

Browse OBSERVABILITY example queries by category. Select a query to see the SQL and key columns used.

7of 7 results

🔌

Select an item

Choose an item from the list to view details

Core tables​

Column reference​

Custom metrics by job type​

Example queries​

Select an item

See also​

Core tables

Column reference

Custom metrics by job type

Example queries

See also