API Error Handling Patterns with Retry Strategy Configurator
Error handling is the part of API design that separates production-ready systems from prototypes. When everything works, any API design looks good. When things break, which they will, the error handling determines whether the client can recover automatically, whether the developer can debug quickly, and whether cascading failures bring down your entire service mesh. The three pillars of robust API error handling are structured error responses (so clients know what went wrong), retry strategies (so transient failures resolve automatically), and circuit breakers (so persistent failures do not cascade).
This tool provides an interactive pattern selector that recommends the right error handling approach for your use case, a retry strategy configurator that calculates exponential backoff timings with jitter, and an error response formatter that generates RFC 7807 Problem Details templates. Use the retry visualizer to see exactly how your backoff parameters translate to timing under failure conditions, and copy the generated code templates into your client and server implementations.
Retry Strategy Configurator
Configure your retry parameters below and click Calculate Retry Timeline to visualize the backoff delays. The configurator shows the exact delay for each retry attempt, the total maximum wait time, and the probability of success assuming independent failure probability per attempt. Adjust the base delay, max retries, and jitter to find the right balance between fast recovery and server protection.
Retry Parameters
Error Pattern Selector
Select your API characteristics below and the tool recommends the appropriate error handling pattern combination. Click the patterns that match your requirements, then click Get Recommendation.
What type of errors does your API encounter?
Transient Network Errors
Timeouts, connection resets, DNS failures that resolve on retry.
Rate Limiting (429)
API throttling with Retry-After headers and quota management.
Validation Errors (400)
Malformed requests with field-level error details for client correction.
Downstream Service Failures
Dependencies that fail intermittently causing 502/503 cascading errors.
Partial Failures
Batch operations where some items succeed and others fail.
Error Response Formatter
Generate a structured error response in RFC 7807 format or a custom format. Select the error type, enter details, and click Generate Error Response to produce a copy-ready JSON response body with the correct HTTP status code.
Error Response Generator
Retryable vs Non-Retryable Errors
The most critical decision in error handling is whether to retry. Retrying a non-retryable error wastes resources, delays the user, and can cause duplicate side effects. Not retrying a retryable error forces the user to manually trigger an action that would have succeeded automatically.
| Status | Retryable | Reason | Action |
|---|---|---|---|
400 | No | Client error, same payload will fail again | Fix request and resubmit |
401 | Once | Token may have expired | Refresh token, retry once |
403 | No | Permission denied permanently | Check authorization |
404 | No | Resource does not exist | Verify URL |
409 | Conditional | Conflict may resolve after read | Re-read, resolve, retry |
429 | Yes | Rate limit, will reset | Wait for Retry-After |
500 | Yes | Server bug may be transient | Retry with backoff |
502 | Yes | Gateway issue, often transient | Retry with backoff |
503 | Yes | Service temporarily unavailable | Retry with Retry-After |
504 | Yes | Timeout, request may not have reached server | Retry with idempotency key |
Exponential Backoff Deep Dive
Exponential backoff is the standard retry timing strategy because it balances fast recovery for transient errors with protection for the server during sustained outages. The formula is: delay = min(base * 2^attempt, max_delay) + jitter. The exponential growth means the first few retries happen quickly (1s, 2s, 4s) to catch brief glitches, while subsequent retries space out dramatically (8s, 16s, 32s) to avoid overwhelming a recovering server.
Jitter is not optional. Without jitter, all clients that failed at the same time retry at the same time, creating a thundering herd that can re-crash the recovering server. Full jitter randomizes the delay between 0 and the calculated value. Equal jitter randomizes between half and the full value, guaranteeing at least some backoff. Decorrelated jitter uses the previous delay to calculate the next, producing more variance. AWS recommends full jitter based on their analysis of distributed system behavior, and the difference in recovery time between jitter strategies is typically under 10%, so full jitter is the safe default.
Circuit Breaker Pattern
The circuit breaker operates in three states: Closed (normal operation, requests pass through), Open (all requests fail immediately without reaching the server), and Half-Open (a single test request passes through to probe server health). The circuit transitions from Closed to Open when the failure count exceeds a threshold within a time window. It transitions from Open to Half-Open after a timeout period. It transitions from Half-Open to Closed on a successful test request, or back to Open on a failed test request.
The key parameters are: failure threshold (typically 5-10 failures in 60 seconds), timeout duration (typically 30-60 seconds in Open state), and success threshold (typically 1-3 successes in Half-Open to close). These values should be tuned based on your service's recovery characteristics. A service that recovers in seconds (restarting a container) needs a short timeout. A service that recovers in minutes (database failover) needs a longer timeout. Monitor the circuit breaker state and alert when it opens, because an open circuit breaker means the dependency is degraded. For a complete implementation guide, see our circuit breaker pattern guide.
RFC 7807 Problem Details
RFC 7807 standardizes API error responses with five fields. The type field is a URI that identifies the error class and should point to documentation explaining the error. The title is a short, human-readable summary that stays constant for a given type. The status is the HTTP status code repeated in the body for convenience. The detail is a human-readable explanation specific to this occurrence. The instance is a URI identifying this specific error occurrence, typically a request ID or trace ID.
HTTP/1.1 422 Unprocessable Entity
Content-Type: application/problem+json
{
"type": "https://api.example.com/errors/validation",
"title": "Validation Error",
"status": 422,
"detail": "The request body contains invalid fields",
"instance": "/requests/abc-123-def",
"errors": [
{
"field": "email",
"code": "INVALID_FORMAT",
"message": "Must be a valid email address"
},
{
"field": "age",
"code": "OUT_OF_RANGE",
"message": "Must be between 0 and 150"
}
]
}
Idempotency and Safe Retries
Retrying a request is only safe if the request is idempotent, meaning executing it multiple times produces the same result as executing it once. GET, PUT, and DELETE are inherently idempotent in REST semantics. POST is not: retrying a POST /orders request might create duplicate orders. The solution is idempotency keys: the client generates a unique key (UUID) and includes it in the request header. The server stores the key with the response and returns the cached response for duplicate keys, guaranteeing that retries do not create duplicates.
Implement idempotency keys with a TTL that matches your retry window. If your retry strategy completes within 5 minutes, store idempotency keys for 24 hours to cover delayed retries and manual re-attempts. Use the key as a database unique constraint so concurrent duplicate requests fail at the database level rather than creating duplicates. Return the original response (including the original status code) for idempotent replays, not a new 200 or a 409 conflict, so the client cannot distinguish a replay from the original response.
Frequently Asked Questions
What is RFC 7807 Problem Details and should I use it?
RFC 7807 defines a standard JSON format for HTTP API error responses with fields: type (URI), title (summary), status (HTTP code), detail (explanation), and instance (occurrence URI). Use it because it provides a consistent error format across endpoints, is machine-parseable, and is adopted by major APIs including Microsoft Graph. Clients can build generic error handlers that work across any RFC 7807-compliant API.
When should an API client retry a failed request?
Retry only on transient errors: HTTP 429 (rate limited), 502/503/504 (gateway/availability), network timeouts, and connection resets. Never retry on 400 (bad request), 401/403 (auth errors), 404 (not found), or 409 (conflict). The retry decision depends on the HTTP status code and request idempotency: GET, PUT, and DELETE are safe to retry; POST requires an idempotency key.
What is exponential backoff and how do I calculate the delay?
Exponential backoff increases retry delay exponentially: delay = min(base * 2^attempt + jitter, max_delay). With 1s base: attempt 1 waits ~1s, attempt 2 ~2s, attempt 3 ~4s. Jitter randomizes timing to prevent thundering herds. The max_delay cap prevents unreasonably long waits. Most HTTP libraries provide built-in exponential backoff.
What is the difference between a retry and a circuit breaker?
A retry re-attempts the same request hoping the issue resolved. A circuit breaker stops all requests after a failure threshold, giving the server time to recover. After a timeout, one test request passes through. If it succeeds, traffic resumes. Use retries for individual failures and circuit breakers for systemic degradation.
How should I structure error responses for a REST API?
Use a consistent JSON format with: error code (machine-readable string), message (human-readable), details array (field-level errors), and request ID (for log correlation). Always use the correct HTTP status code. Include a documentation URL for each error code. Never return 200 with an error body.