Skip to main content

Distributed locking

Distributed locking coordinates resource access across multiple workflow instances running on different nodes in your Atlan applications. This coordination prevents race conditions, implements rate limiting, and ensures stable performance when your applications access shared resources like external APIs, databases, or file systems.

The Application SDK provides a @needs_lock decorator that automatically manages lock acquisition and release around Temporal activity execution.

Why distributed locking matters

Atlan applications run as Temporal workflows where activities execute independently on different worker nodes across your infrastructure. Each worker node operates without inherent knowledge of what other nodes are doing. While this distributed architecture provides scalability and fault tolerance, it creates coordination challenges when multiple workflow instances access shared resources simultaneously.

When multiple workflows run concurrently without coordination, you encounter three primary problems:

🌐

API rate limit violations

Multiple workflows calling the same external API simultaneously can exceed rate limits and cause HTTP 429 throttling errors.

🗄️

Database connection exhaustion

Heavy concurrent queries from multiple workflows can exhaust database connection pools and degrade query performance.

📁

File system bottlenecks

Multiple workflows processing large files simultaneously create I/O bottlenecks that slow down all operations.

Without coordination, you experience unpredictable failures, resource exhaustion, performance degradation, and cascading failures that are difficult to troubleshoot.

How distributed locking works

The Application SDK implements a slot-based locking approach that distributes contention across multiple Redis keys. Each lock configuration creates a pool of available "slots" (independent Redis keys). When an activity needs to execute, it randomly selects a slot and attempts to acquire it. If the slot is available, the activity proceeds. If occupied, the system retries with a different random slot until one becomes available or the workflow timeout is reached.

Distributed Lock Management Architecture

The locking system integrates with Temporal workflows through an interceptor pattern. Temporal interceptors provide hooks into the workflow execution lifecycle, allowing the Application SDK to detect activities that require locking and orchestrate coordination transparently.

When you use the @needs_lock decorator on an activity, the system:

  1. Detects the decorator when the activity is scheduled
  2. Acquires a distributed lock before activity execution
  3. Executes your business logic with the acquired lock
  4. Releases the lock after activity completion or failure

The lock acquisition process generates a unique owner ID ({APPLICATION_NAME}:{workflow_run_id}), selects a random slot, and attempts to acquire it using Redis SET with NX (not exists) and EX (expiration) flags. Lock release executes as an atomic Lua script that verifies ownership before releasing.

The system automatically calculates lock TTL (Time To Live) based on your activity's schedule_to_close_timeout configuration. This automatic TTL management prevents deadlocks by ensuring locks are eventually released even if an activity times out or fails.

Redis keys follow a hierarchical structure: {APPLICATION_NAME}:{lock_name}:{slot}. This structure isolates different applications, groups related operations, and provides clear ownership patterns for debugging.

See also