Skip to main content

Distributed locking issues

Resolve common distributed locking errors related to Redis connectivity, lock acquisition, and Sentinel configuration in your Atlan applications.

Redis connection failure

Error

ATLAN-CLIENT-503-00: Redis connection failure

Cause

The application can't establish a connection to the Redis server. This occurs when Redis is unreachable, authentication credentials are incorrect, or network connectivity issues prevent communication.

Solution

  1. Verify Redis server is running:

    # Check Redis service status
    systemctl status redis
  2. Test basic connectivity:

    # Test without authentication
    redis-cli -h $REDIS_HOST -p $REDIS_PORT ping

    # Test with authentication
    redis-cli -h $REDIS_HOST -p $REDIS_PORT -a $REDIS_PASSWORD ping
  3. Check environment variables are set correctly:

    echo $REDIS_HOST
    echo $REDIS_PORT
    echo $REDIS_PASSWORD
    echo $IS_LOCKING_DISABLED
  4. Verify network connectivity:

    # Test port connectivity
    telnet $REDIS_HOST $REDIS_PORT

    # Check firewall rules
    nmap -p $REDIS_PORT $REDIS_HOST
  5. Check firewall rules permit connections on the Redis port from your application hosts.

  6. Validate Redis authentication credentials match your configuration.


Lock not available

Error

ATLAN-ACTIVITY-503-01: Lock not available

Cause

The activity timed out while waiting for an available lock slot. This occurs when lock contention is high and all slots are occupied for longer than the activity timeout period.

Solution

  1. Increase the schedule_to_close_timeout for high-contention activities:

    result = await workflow.execute_activity(
    locked_activity,
    args=[data],
    schedule_to_close_timeout=timedelta(minutes=15) # Increased from 5
    )
  2. Monitor lock contention patterns to identify bottlenecks.

  3. Consider increasing max_locks if resource capacity supports it:

    @needs_lock(max_locks=10, lock_name="api_calls")  # Increased from 5
    async def api_activity(data: dict):
    return await process_data(data)
  4. Split long-running operations into separate activities to reduce lock duration.

  5. Adjust LOCK_RETRY_INTERVAL to retry more frequently:

    export LOCK_RETRY_INTERVAL=3  # Reduced from 5 seconds

Sentinel master discovery failure

Error

Failed to discover Redis master through Sentinel

Cause

The application can't connect to Redis through Sentinel instances. This occurs when sentinel instances aren't running, the sentinel service name is incorrect, or network connectivity to sentinels is blocked.

Solution

  1. Verify all sentinel instances are running:

    # Check sentinel service status on each host
    systemctl status redis-sentinel
  2. Test sentinel connectivity:

    # Test connection to each sentinel
    redis-cli -h sentinel1.example.com -p 26379 sentinel masters
    redis-cli -h sentinel2.example.com -p 26379 sentinel masters
    redis-cli -h sentinel3.example.com -p 26379 sentinel masters
  3. Verify the sentinel service name matches your configuration:

    # Check sentinel configuration
    redis-cli -h sentinel1.example.com -p 26379 sentinel master mymaster
  4. Check environment variables are correct:

    echo $REDIS_SENTINEL_HOSTS
    echo $REDIS_SENTINEL_SERVICE
    echo $REDIS_PASSWORD
  5. Check network connectivity to all sentinel hosts:

    # Test connectivity to each sentinel
    telnet sentinel1.example.com 26379
    telnet sentinel2.example.com 26379
    telnet sentinel3.example.com 26379
  6. Validate sentinel configuration files have the correct master name and replication setup.


Missing schedule_to_close_timeout

Error

Activity with @needs_lock requires schedule_to_close_timeout

Cause

The activity decorated with @needs_lock doesn't specify a schedule_to_close_timeout. The system requires this timeout to calculate lock TTL and prevent deadlocks.

Solution

Add schedule_to_close_timeout when executing the activity:

# Before (incorrect)
result = await workflow.execute_activity(
locked_activity,
args=[data]
)

# After (correct)
result = await workflow.execute_activity(
locked_activity,
args=[data],
schedule_to_close_timeout=timedelta(minutes=5)
)

See also

Need help

If you need assistance after trying the steps, contact Atlan support: Submit a request.