API Error Handling Patterns with Retry Strategy Configurator

Q: When should an API client retry a failed request?

Retry only on transient errors that are likely to succeed on a subsequent attempt: HTTP 429 (rate limited, respect Retry-After header), 502/503/504 (gateway and service unavailability), network timeouts, and connection resets. Never retry on 400 (bad request, same payload will fail again), 401/403 (authentication or authorization errors), 404 (resource does not exist), or 409 (conflict, requires client resolution). The retry decision should be based on the HTTP status code and the idempotency of the request: GET, PUT, and DELETE are safe to retry, POST requires an idempotency key.

Q: What is exponential backoff and how do I calculate the delay?

Exponential backoff increases the delay between retries exponentially to prevent overwhelming a recovering service. The formula is: delay = min(base_delay * 2^attempt + random_jitter, max_delay). For example, with a 1-second base and 3 retries: attempt 1 waits ~1s, attempt 2 waits ~2s, attempt 3 waits ~4s. Random jitter (0-1s) prevents thundering herd problems where all clients retry at the exact same moment. The max_delay cap (typically 30-60 seconds) prevents unreasonably long waits. Most HTTP libraries provide built-in exponential backoff, but configuring the parameters correctly for your specific API is critical.

Q: What is the difference between a retry and a circuit breaker?

A retry attempts the same request again after a failure, hoping the transient issue has resolved. A circuit breaker stops making requests entirely after a threshold of failures, returning an error immediately without contacting the server. The circuit breaker prevents cascading failures by giving the failing service time to recover without being hammered by retries. After a timeout period, the circuit breaker allows a single test request through. If it succeeds, the circuit closes and normal traffic resumes. If it fails, the circuit stays open. Use retries for individual request failures and circuit breakers for systemic service degradation.

Q: How should I structure error responses for a REST API?

Structure error responses with a consistent JSON format containing: an error code (machine-readable string like 'VALIDATION_ERROR' or 'RATE_LIMITED'), a human-readable message describing what went wrong, a details array for field-level validation errors (field name, code, message per field), and a request ID for correlation with server logs. Always return the appropriate HTTP status code. Never return 200 with an error body. Include a documentation URL pointing to the specific error code reference page so developers can self-serve error resolution without contacting support.

May 25, 2026 · 16 min read

Error handling is the part of API design that separates production-ready systems from prototypes. When everything works, any API design looks good. When things break, which they will, the error handling determines whether the client can recover automatically, whether the developer can debug quickly, and whether cascading failures bring down your entire service mesh. The three pillars of robust API error handling are structured error responses (so clients know what went wrong), retry strategies (so transient failures resolve automatically), and circuit breakers (so persistent failures do not cascade).

This tool provides an interactive pattern selector that recommends the right error handling approach for your use case, a retry strategy configurator that calculates exponential backoff timings with jitter, and an error response formatter that generates RFC 7807 Problem Details templates. Use the retry visualizer to see exactly how your backoff parameters translate to timing under failure conditions, and copy the generated code templates into your client and server implementations.

Retry Strategy Configurator

Configure your retry parameters below and click Calculate Retry Timeline to visualize the backoff delays. The configurator shows the exact delay for each retry attempt, the total maximum wait time, and the probability of success assuming independent failure probability per attempt. Adjust the base delay, max retries, and jitter to find the right balance between fast recovery and server protection.

Retry Parameters

Base Delay (ms)

Max Retries

Max Delay Cap (ms)

Jitter Strategy

Failure Probability per Attempt (%)

Backoff Type

Error Pattern Selector

Select your API characteristics below and the tool recommends the appropriate error handling pattern combination. Click the patterns that match your requirements, then click Get Recommendation.

What type of errors does your API encounter?

Transient Network Errors

Timeouts, connection resets, DNS failures that resolve on retry.

Rate Limiting (429)

API throttling with Retry-After headers and quota management.

Validation Errors (400)

Malformed requests with field-level error details for client correction.

Downstream Service Failures

Dependencies that fail intermittently causing 502/503 cascading errors.

Partial Failures

Batch operations where some items succeed and others fail.

Error Response Formatter

Generate a structured error response in RFC 7807 format or a custom format. Select the error type, enter details, and click Generate Error Response to produce a copy-ready JSON response body with the correct HTTP status code.

Error Response Generator

HTTP Status

Response Format

Error Message

Error Code

Retryable vs Non-Retryable Errors

The most critical decision in error handling is whether to retry. Retrying a non-retryable error wastes resources, delays the user, and can cause duplicate side effects. Not retrying a retryable error forces the user to manually trigger an action that would have succeeded automatically.

Status	Retryable	Reason	Action
`400`	No	Client error, same payload will fail again	Fix request and resubmit
`401`	Once	Token may have expired	Refresh token, retry once
`403`	No	Permission denied permanently	Check authorization
`404`	No	Resource does not exist	Verify URL
`409`	Conditional	Conflict may resolve after read	Re-read, resolve, retry
`429`	Yes	Rate limit, will reset	Wait for Retry-After
`500`	Yes	Server bug may be transient	Retry with backoff
`502`	Yes	Gateway issue, often transient	Retry with backoff
`503`	Yes	Service temporarily unavailable	Retry with Retry-After
`504`	Yes	Timeout, request may not have reached server	Retry with idempotency key

Exponential Backoff Deep Dive

Exponential backoff is the standard retry timing strategy because it balances fast recovery for transient errors with protection for the server during sustained outages. The formula is: delay = min(base * 2^attempt, max_delay) + jitter. The exponential growth means the first few retries happen quickly (1s, 2s, 4s) to catch brief glitches, while subsequent retries space out dramatically (8s, 16s, 32s) to avoid overwhelming a recovering server.

Jitter is not optional. Without jitter, all clients that failed at the same time retry at the same time, creating a thundering herd that can re-crash the recovering server. Full jitter randomizes the delay between 0 and the calculated value. Equal jitter randomizes between half and the full value, guaranteeing at least some backoff. Decorrelated jitter uses the previous delay to calculate the next, producing more variance. AWS recommends full jitter based on their analysis of distributed system behavior, and the difference in recovery time between jitter strategies is typically under 10%, so full jitter is the safe default.

Circuit Breaker Pattern

The circuit breaker operates in three states: Closed (normal operation, requests pass through), Open (all requests fail immediately without reaching the server), and Half-Open (a single test request passes through to probe server health). The circuit transitions from Closed to Open when the failure count exceeds a threshold within a time window. It transitions from Open to Half-Open after a timeout period. It transitions from Half-Open to Closed on a successful test request, or back to Open on a failed test request.

The key parameters are: failure threshold (typically 5-10 failures in 60 seconds), timeout duration (typically 30-60 seconds in Open state), and success threshold (typically 1-3 successes in Half-Open to close). These values should be tuned based on your service's recovery characteristics. A service that recovers in seconds (restarting a container) needs a short timeout. A service that recovers in minutes (database failover) needs a longer timeout. Monitor the circuit breaker state and alert when it opens, because an open circuit breaker means the dependency is degraded. For a complete implementation guide, see our circuit breaker pattern guide.

RFC 7807 Problem Details

RFC 7807 standardizes API error responses with five fields. The type field is a URI that identifies the error class and should point to documentation explaining the error. The title is a short, human-readable summary that stays constant for a given type. The status is the HTTP status code repeated in the body for convenience. The detail is a human-readable explanation specific to this occurrence. The instance is a URI identifying this specific error occurrence, typically a request ID or trace ID.

HTTP/1.1 422 Unprocessable Entity
Content-Type: application/problem+json

{
  "type": "https://api.example.com/errors/validation",
  "title": "Validation Error",
  "status": 422,
  "detail": "The request body contains invalid fields",
  "instance": "/requests/abc-123-def",
  "errors": [
    {
      "field": "email",
      "code": "INVALID_FORMAT",
      "message": "Must be a valid email address"
    },
    {
      "field": "age",
      "code": "OUT_OF_RANGE",
      "message": "Must be between 0 and 150"
    }
  ]
}

Idempotency and Safe Retries

Retrying a request is only safe if the request is idempotent, meaning executing it multiple times produces the same result as executing it once. GET, PUT, and DELETE are inherently idempotent in REST semantics. POST is not: retrying a POST /orders request might create duplicate orders. The solution is idempotency keys: the client generates a unique key (UUID) and includes it in the request header. The server stores the key with the response and returns the cached response for duplicate keys, guaranteeing that retries do not create duplicates.

Implement idempotency keys with a TTL that matches your retry window. If your retry strategy completes within 5 minutes, store idempotency keys for 24 hours to cover delayed retries and manual re-attempts. Use the key as a database unique constraint so concurrent duplicate requests fail at the database level rather than creating duplicates. Return the original response (including the original status code) for idempotent replays, not a new 200 or a 409 conflict, so the client cannot distinguish a replay from the original response.

Frequently Asked Questions

What is RFC 7807 Problem Details and should I use it?

RFC 7807 defines a standard JSON format for HTTP API error responses with fields: type (URI), title (summary), status (HTTP code), detail (explanation), and instance (occurrence URI). Use it because it provides a consistent error format across endpoints, is machine-parseable, and is adopted by major APIs including Microsoft Graph. Clients can build generic error handlers that work across any RFC 7807-compliant API.

When should an API client retry a failed request?

Retry only on transient errors: HTTP 429 (rate limited), 502/503/504 (gateway/availability), network timeouts, and connection resets. Never retry on 400 (bad request), 401/403 (auth errors), 404 (not found), or 409 (conflict). The retry decision depends on the HTTP status code and request idempotency: GET, PUT, and DELETE are safe to retry; POST requires an idempotency key.

What is exponential backoff and how do I calculate the delay?

Exponential backoff increases retry delay exponentially: delay = min(base * 2^attempt + jitter, max_delay). With 1s base: attempt 1 waits ~1s, attempt 2 ~2s, attempt 3 ~4s. Jitter randomizes timing to prevent thundering herds. The max_delay cap prevents unreasonably long waits. Most HTTP libraries provide built-in exponential backoff.

What is the difference between a retry and a circuit breaker?

A retry re-attempts the same request hoping the issue resolved. A circuit breaker stops all requests after a failure threshold, giving the server time to recover. After a timeout, one test request passes through. If it succeeds, traffic resumes. Use retries for individual failures and circuit breakers for systemic degradation.

How should I structure error responses for a REST API?

Use a consistent JSON format with: error code (machine-readable string), message (human-readable), details array (field-level errors), and request ID (for log correlation). Always use the correct HTTP status code. Include a documentation URL for each error code. Never return 200 with an error body.

Michael Lip

Solo developer building free tools at Zovo. Kappafy helps developers work with JSON and APIs faster. No tracking, no accounts, no data collection. Learn more.

Last updated: May 25, 2026