Retry strategies
Retry strategies are the mechanisms that enable apps to recover from transient failures automatically. They combine timeouts, retry policies, and heartbeats to detect failures and re-attempt work that's safe to retry, ensuring your apps remain resilient under real-world conditions.
The Application SDK uses Temporal's timeout and retry system to control how long operations can run and how many times they retry on failure. Workflows and Activities each have distinct timeout types and retry behaviors: Workflow timeouts bound overall execution windows, while Activity timeouts control the duration of external calls and long-running operations. Heartbeats enable long-running Activities to signal progress, allowing the system to distinguish genuinely stuck operations from healthy tasks that take time.
Core mental model
Idempotency
The guarantee that performing the same operation multiple times produces the same final state. Idempotency is the foundation of safe retries—without it, a retry after a network failure can create duplicate records or corrupt downstream workflows.
Timeout
An upper bound on execution duration that triggers failure detection when exceeded.
Retry Policy
Configuration that controls how many times an operation retries, how long to wait between attempts, and which errors to exclude from retries.
Heartbeat
A periodic signal from a long-running Activity proving it's still making progress.
How it works
Retry strategies use three mechanisms that work together to build resilient apps:
Workflow timeouts
Workflow timeouts bound the duration of Workflow Execution and tasks. Temporal provides three types:
- Workflow Execution Timeout: Upper bound on total Workflow lifetime across all runs. If exceeded, the Workflow times out and fails.
- Workflow Run Timeout: Upper bound on a single run between Continue-As-New operations. Prevents a single run from growing unbounded.
- Workflow Task Timeout: Maximum time for a Worker to process a Workflow Task. Surfaces stuck Workers or excessively long decision logic.
Temporal recommends avoiding strict Workflow timeouts for business logic because Workflows are designed to be durable and long-lived. Activity timeouts and heartbeats provide more precise failure detection for external work.
Activity timeouts
Activity timeouts provide precise control over external operations and long-running work. Temporal supports three key timeout types:
- Schedule-To-Start Timeout: Maximum time from Activity scheduling until a Worker picks it up. Detects capacity issues or Worker unavailability.
- Start-To-Close Timeout: Maximum time the Activity code can run after starting. Detects stuck or excessively long-running execution.
- Schedule-To-Close Timeout: Overall deadline from scheduling to completion. Bounds total wall-clock time including queueing and execution.
Use Schedule-To-Start to surface Worker pool pressure, Start-To-Close to bound external call runtime, and Schedule-To-Close when you need a single upper bound on total Activity duration.
Heartbeats
Heartbeats enable long-running Activities to signal progress and respond to cancellation requests quickly. If the Temporal server doesn't receive a heartbeat within the configured heartbeat timeout, the Activity times out and becomes eligible for retry according to its RetryPolicy.
Use heartbeats for work that's chunked or iterative, such as poll loops, ingestion batches, or external calls you can checkpoint. Emit heartbeats at a cadence that matches your unit of work, and keep heartbeat payloads small.
Auto heartbeater decorator
The Application SDK provides an @auto_heartbeater
decorator that automatically sends heartbeat signals from Activities to the Temporal server during execution. This prevents long-running Activities from timing out by periodically signaling that the Activity is still making progress.
The decorator calculates the heartbeat interval as one-third of the Activity's configured heartbeat timeout. If no timeout is configured, it defaults to 120 seconds, resulting in a 40-second heartbeat interval.
See also
- Retry policies: Configuration reference for retry policy parameters
- Timeouts: Timeout configuration parameters and examples