Rate Limiting Calculator & Tester

Q: How does the token bucket algorithm work?

The token bucket algorithm maintains a bucket that holds tokens. Tokens are added at a fixed rate (e.g., 10 per second). Each request consumes one token. If the bucket is empty, the request is rejected. The bucket has a maximum capacity (burst size), so tokens accumulate up to the limit during idle periods. This allows short bursts of traffic up to the bucket capacity while enforcing an average rate over time. It is the most commonly used algorithm in production APIs including AWS, Stripe, and GitHub.

Q: What is a good rate limit for a REST API?

Common rate limits for REST APIs range from 60 requests per minute for free tiers to 1000+ for paid plans. GitHub allows 5000 requests per hour for authenticated users. Stripe allows 100 requests per second in live mode. The right limit depends on your server capacity, the cost per request, and your business model. Start conservative (100/min) and increase based on actual usage patterns. Always return X-RateLimit-Remaining and Retry-After headers so clients can adapt.

May 25, 2026 · 14 min read

Rate limiting is the mechanism that prevents any single client from overwhelming an API with too many requests. Every major API uses it: GitHub caps authenticated users at 5,000 requests per hour, Stripe allows 100 requests per second in live mode, and OpenAI enforces both request-per-minute and token-per-minute limits. Understanding how rate limiting works is not optional for developers who consume or build APIs. The difference between a smooth integration and a broken one often comes down to whether the frontend correctly handles 429 responses and implements proper backoff strategies.

This interactive calculator lets you configure rate limit parameters, select the algorithm, simulate bursts of traffic, and visualize exactly which requests pass and which get rejected. You can compare the three major algorithms side by side to understand their trade-offs: fixed window is simplest but has boundary problems, sliding window is fairest but uses more memory, and token bucket is the industry standard that balances burst tolerance with sustained rate enforcement.

Interactive Rate Limit Calculator

Configure your rate limit parameters below and click Simulate Burst to send a batch of requests through the selected algorithm. The timeline visualization shows each request as a dot: green for accepted, red for rejected. Metrics update in real time.

Rate Limit Configuration

Requests / Second Limit

Burst Size (bucket capacity)

Algorithm

Window Duration (seconds)

Requests to Simulate

Spread Over (seconds)

Max Sustained Throughput

--req/s

Burst Capacity

--requests

Recovery Time

--seconds

Accepted / Rejected

--/ --

Simulation log will appear here...

Algorithm Comparison

Click Compare All Algorithms above to run the same burst against all three algorithms simultaneously. The results appear below, showing exactly how each algorithm handles the same traffic pattern differently.

FIXED Fixed Window

Accepted--

Rejected--

Max burst allowed--

Boundary spike--

SLIDING Sliding Window

Accepted--

Rejected--

Max burst allowed--

Memory per client--

TOKEN Token Bucket

Accepted--

Rejected--

Max burst allowed--

Recovery time--

How Rate Limiting Algorithms Work

Rate limiting algorithms sit between the client and the server, deciding whether each incoming request should be processed or rejected. The decision is based on how many requests the client has already sent within a defined time period. What differs between algorithms is how they define "within a time period" and how they track request counts. Each algorithm makes a different trade-off between simplicity, fairness, memory usage, and burst tolerance.

All three major algorithms serve the same purpose: protect the server from being overwhelmed while giving each client a fair share of capacity. The choice between them depends on your specific requirements: how important is it to prevent boundary spikes? How much memory can you allocate per client? Do you need to support legitimate traffic bursts?

Fixed Window Rate Limiting

The fixed window algorithm divides time into discrete intervals of equal length (windows). Each window has a counter that starts at zero. Every request increments the counter. When the counter reaches the limit, all subsequent requests in that window are rejected. When a new window begins, the counter resets to zero.

class FixedWindowRateLimiter {
  constructor(limit, windowMs) {
    this.limit = limit;
    this.windowMs = windowMs;
    this.counters = new Map(); // clientId -> { count, windowStart }
  }

  allow(clientId) {
    const now = Date.now();
    const record = this.counters.get(clientId);

    if (!record || now - record.windowStart >= this.windowMs) {
      this.counters.set(clientId, { count: 1, windowStart: now });
      return true;
    }

    if (record.count < this.limit) {
      record.count++;
      return true;
    }

    return false; // rate limited
  }
}

The critical weakness of fixed windows is the boundary problem. If a client sends 10 requests at the end of window 1 and 10 requests at the beginning of window 2, they effectively send 20 requests in a short span while the limit is 10. This doubles the effective rate at window boundaries. For many applications this is acceptable, but for APIs that need strict rate enforcement (payment processing, authentication endpoints), it creates a vulnerability window.

Fixed window is the simplest to implement and uses the least memory: one counter and one timestamp per client. Redis makes this trivial with INCR and EXPIRE. If the boundary problem is acceptable for your use case, fixed window is the right choice. Most public APIs (including GitHub's hourly limit) use fixed windows because the boundary spike is small relative to the window size.

Sliding Window Rate Limiting

The sliding window algorithm eliminates the boundary problem by tracking the exact timestamp of each request. Instead of counting requests within a fixed interval, it counts requests within a moving window that ends at the current moment. Every time a new request arrives, the algorithm looks back exactly windowMs milliseconds and counts how many requests fall within that range.

class SlidingWindowRateLimiter {
  constructor(limit, windowMs) {
    this.limit = limit;
    this.windowMs = windowMs;
    this.requests = new Map(); // clientId -> [timestamps]
  }

  allow(clientId) {
    const now = Date.now();
    const cutoff = now - this.windowMs;

    if (!this.requests.has(clientId)) {
      this.requests.set(clientId, []);
    }

    const timestamps = this.requests.get(clientId);

    // Remove expired timestamps
    while (timestamps.length > 0 && timestamps[0] <= cutoff) {
      timestamps.shift();
    }

    if (timestamps.length < this.limit) {
      timestamps.push(now);
      return true;
    }

    return false; // rate limited
  }
}

Sliding window is the fairest algorithm. A client can never exceed the limit, regardless of when they send requests relative to any boundary. The trade-off is memory: you need to store every request timestamp within the window for every client. For a limit of 1000 requests per minute with 10,000 active clients, that is up to 10 million timestamps in memory. In practice, a sliding window log uses a Redis sorted set (ZADD + ZRANGEBYSCORE + ZCARD) which handles this efficiently but at higher memory cost than a simple counter.

A common optimization is the sliding window counter, which approximates the sliding window by weighting the previous fixed window's count. If 70% of the current window has elapsed, the count is: currentWindowCount + previousWindowCount * 0.3. This uses the same memory as fixed window (two counters) but eliminates most of the boundary spike.

Token Bucket Rate Limiting

The token bucket is the most widely deployed rate limiting algorithm in production. AWS API Gateway, Stripe, Cloudflare, and GitHub all use variants of it. The mental model is simple: imagine a bucket that holds tokens. Tokens are added at a constant rate (e.g., 10 per second). Each request removes one token. If the bucket is empty, the request is rejected. The bucket has a maximum capacity (the burst size), so tokens accumulate during idle periods up to the cap.

class TokenBucketRateLimiter {
  constructor(rate, bucketSize) {
    this.rate = rate;         // tokens per second
    this.bucketSize = bucketSize;
    this.buckets = new Map(); // clientId -> { tokens, lastRefill }
  }

  allow(clientId) {
    const now = Date.now();

    if (!this.buckets.has(clientId)) {
      this.buckets.set(clientId, {
        tokens: this.bucketSize,
        lastRefill: now
      });
    }

    const bucket = this.buckets.get(clientId);
    const elapsed = (now - bucket.lastRefill) / 1000;
    bucket.tokens = Math.min(
      this.bucketSize,
      bucket.tokens + elapsed * this.rate
    );
    bucket.lastRefill = now;

    if (bucket.tokens >= 1) {
      bucket.tokens -= 1;
      return true;
    }

    return false; // rate limited
  }
}

The token bucket naturally handles two scenarios that the other algorithms struggle with. First, it allows legitimate bursts. A client that has been idle accumulates tokens, so they can briefly exceed the sustained rate without being rejected. This matches real traffic patterns: a user loads a dashboard that makes 20 API calls simultaneously, then goes idle for a minute. Second, it enforces a strict average rate over time. No matter how the requests are distributed, the long-term average cannot exceed the token refill rate.

Token bucket uses minimal memory (two numbers per client: token count and last refill time) and requires only a simple calculation per request. It is the best default choice for most APIs and the algorithm this tool defaults to.

Choosing the Right Algorithm

Criterion	Fixed	Sliding	Token
Implementation complexity	Low	Medium	Low
Memory per client	8 bytes	8 bytes * limit	16 bytes
Boundary spike risk	Yes (2x)	None	None
Burst tolerance	No	No	Yes (configurable)
Fairness	Good	Best	Good
Used by	GitHub (hourly)	Cloudflare	AWS, Stripe, GitHub (per-second)

For most REST APIs, token bucket is the right default. It handles bursts gracefully, uses minimal memory, and is straightforward to implement with Redis. Use sliding window when you need the strictest possible enforcement (financial APIs, authentication endpoints). Use fixed window when simplicity is the primary concern and boundary spikes are acceptable.

Implementation Patterns

Redis-Based Token Bucket

In production, rate limiters run in Redis (or a similar shared store) so all application instances share the same counters. The token bucket maps cleanly to Redis operations.

-- Redis Lua script for atomic token bucket
local key = KEYS[1]
local rate = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

local elapsed = (now - last_refill) / 1000
tokens = math.min(capacity, tokens + elapsed * rate)

if tokens >= 1 then
  tokens = tokens - 1
  redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
  redis.call('EXPIRE', key, math.ceil(capacity / rate) + 1)
  return 1 -- allowed
end

redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / rate) + 1)
return 0 -- rejected

Rate Limit Response Headers

Well-designed APIs include rate limit information in response headers so clients can adapt proactively. These headers are not standardized in an RFC but are universally adopted by major APIs. Include them in your mock API responses to test your client-side handling.

Header	Value	Purpose
`X-RateLimit-Limit`	Integer	Maximum requests allowed in the window
`X-RateLimit-Remaining`	Integer	Requests remaining in current window
`X-RateLimit-Reset`	Unix timestamp	When the limit resets (for fixed window)
`Retry-After`	Seconds	How long to wait before retrying (on 429)

Client-Side Rate Limit Handling

Your frontend must handle 429 responses gracefully. The worst possible behavior is retrying immediately in a tight loop, which amplifies the problem and can get your API key revoked. The correct approach is exponential backoff with Retry-After header support.

async function fetchWithRateLimitHandling(url, options = {}) {
  const maxRetries = options.maxRetries || 3;

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.status !== 429) return response;

    // Read rate limit headers
    const retryAfter = response.headers.get('Retry-After');
    const remaining = response.headers.get('X-RateLimit-Remaining');
    const resetAt = response.headers.get('X-RateLimit-Reset');

    // Calculate wait time
    let waitMs;
    if (retryAfter) {
      waitMs = parseInt(retryAfter) * 1000;
    } else if (resetAt) {
      waitMs = (parseInt(resetAt) * 1000) - Date.now();
    } else {
      waitMs = Math.pow(2, attempt) * 1000; // exponential backoff
    }

    console.log(
      'Rate limited. Remaining:', remaining,
      'Waiting:', waitMs, 'ms'
    );

    await new Promise(r => setTimeout(r, Math.max(waitMs, 100)));
  }

  throw new Error('Rate limit exceeded after ' + maxRetries + ' retries');
}

For visualizing and debugging the JSON payloads in rate limit responses, use Kappafy's JSON explorer. If you are simulating API errors during development, combine the error simulator with this rate limit calculator to test the full spectrum of failure modes. Teams building webhook systems need to understand rate limiting from both the consumer and provider side.

Frequently Asked Questions

What is the difference between fixed window and sliding window rate limiting?

Fixed window rate limiting resets the request counter at fixed intervals (e.g., every 60 seconds). This creates a boundary problem: a client can send the maximum number of requests at the end of one window and the beginning of the next, effectively doubling throughput for a brief period. Sliding window rate limiting tracks the exact timestamp of each request and enforces the limit over a continuously moving time window, eliminating the boundary spike entirely. Sliding window is fairer but requires more memory because it stores individual request timestamps rather than a simple counter.

How does the token bucket algorithm work?

The token bucket maintains a bucket that holds tokens. Tokens are added at a fixed rate (e.g., 10 per second). Each request consumes one token. If the bucket is empty, the request is rejected. The bucket has a maximum capacity (burst size), so tokens accumulate up to the limit during idle periods. This allows short bursts of traffic up to the bucket capacity while enforcing an average rate over time. It uses only 16 bytes per client (token count + last refill timestamp) and is the most commonly used algorithm in production APIs including AWS API Gateway, Stripe, and GitHub.

What is a good rate limit for a REST API?

Common rate limits range from 60 requests per minute for free tiers to 1000+ for paid plans. GitHub allows 5,000 per hour authenticated, 60 per hour unauthenticated. Stripe allows 100/second in live mode, 25/second in test mode. OpenAI uses tiered limits based on usage history. Start conservative (100 req/min) and increase based on actual usage. Always return X-RateLimit-Remaining and Retry-After headers so clients can adapt proactively. The right limit depends on your server capacity, cost per request, and business model.

How do I handle 429 Too Many Requests in my frontend?

Read the Retry-After header to determine how long to wait. Implement exponential backoff: wait 1s, then 2s, then 4s between retries. Queue pending requests and release them after the cooldown period. Show users a non-blocking notification that the system is temporarily throttled. Never retry immediately or in a tight loop because this makes the problem worse and can get your API key permanently revoked. The code example in the Client-Side Rate Limit Handling section above shows the complete pattern with header parsing and backoff.

What is burst capacity in rate limiting?

Burst capacity is the maximum number of requests a client can send instantaneously before being rate limited. In a token bucket algorithm, it equals the bucket size. A rate limit of 10 requests per second with a burst of 50 means a client can send 50 requests at once (consuming all tokens) but then must wait 5 seconds for the bucket to fully refill. Burst capacity absorbs natural traffic spikes, like a dashboard loading 20 widgets simultaneously, without rejecting requests. The sustained rate limit still prevents abuse over longer periods.

Michael Lip

Solo developer building free tools at Zovo. Kappafy helps developers work with JSON and APIs faster. No tracking, no accounts, no data collection. Learn more.

Last updated: May 25, 2026

Rate Limiting Calculator & Tester

Interactive Rate Limit Calculator

Rate Limit Configuration

Algorithm Comparison

FIXED Fixed Window

SLIDING Sliding Window

TOKEN Token Bucket

How Rate Limiting Algorithms Work

Fixed Window Rate Limiting

Sliding Window Rate Limiting

Token Bucket Rate Limiting

Choosing the Right Algorithm

Implementation Patterns

Redis-Based Token Bucket

Rate Limit Response Headers

Client-Side Rate Limit Handling

Frequently Asked Questions

Related Tools & Guides

Michael Lip