Rate Limit Algorithm Comparison and Case Studies

Tadashi Shigeoka ·  Sun, January 4, 2026

While implementing rate limiting for the Giselle API, I researched rate limiting algorithms.

Rate limiting is an essential mechanism for preventing excessive requests to a system and maintaining availability and security (such as DoS attack prevention). This article summarizes the characteristics and selection guidelines for representative algorithms.

Algorithm Comparison Table

AlgorithmCharacteristicsBurst ToleranceMemory EfficiencyAccuracyImplementation Difficulty
Fixed Window CounterSimple, bursts possible at boundariesBestLowLow
Sliding Window LogAccurate, high memory usage×LowBestMedium
Sliding Window CounterMiddle ground between Fixed + SlidingHighHighMedium
Token BucketBurst tolerant, smoothingHighMediumMedium
Leaky BucketProcesses at constant rate×HighMediumMedium

1. Fixed Window Counter

The simplest method that divides time into fixed intervals (such as 1 minute) and counts within them.

Window 1 (12:00:00-12:00:59)    Window 2 (12:01:00-12:01:59)
┌─────────────────────────┐    ┌─────────────────────────┐
│  count: 45/60           │    │  count: 12/60           │
│  ████████████░░░░░░░░░  │    │  ████░░░░░░░░░░░░░░░░░  │
└─────────────────────────┘    └─────────────────────────┘

How It Works

  • Reset the counter for each fixed time window such as “1 minute” or “1 hour”
  • Example: 100 requests per minute
    • 12:00:00 to 12:00:59 → maximum 100
    • Reset at 12:01:00

Advantages

  • Very simple to implement
  • Easy to implement with Redis / in-memory
  • Efficient as it only requires 1 record per window

Disadvantage: Boundary Problem

End of Window 1      Start of Window 2
    ↓                    ↓
────────────┼────────────┼────────────
   At 59th   │  At 1st
   second    │  second
   60 requests│  60 requests
            │
         120 requests possible in 2 seconds!

This is called the “boundary problem” or “burst at window edges” and is the biggest weakness of Fixed Window.

Suitable Use Cases

  • Admin panels
  • APIs that can tolerate some bursts
  • MVPs and initial implementations

Real-World Adoption Cases

GitHub API

GitHub API uses Fixed Window-based rate limiting. Details are available in the GitHub API official documentation.

  • Unauthenticated: 60 requests/hour (per IP address)
  • Personal Access Token: 5,000 requests/hour (per user)
  • GitHub Apps (Enterprise Cloud): 15,000 requests/hour

You can check the remaining request count and reset time via response headers.

X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999
X-RateLimit-Reset: 1372700873

X (Twitter) API

X API uses a 15-minute fixed window. Details are available in the X API official documentation.

  • Most endpoints: n requests per 15 minutes
  • Some endpoints (such as posting tweets): 300 requests per 3 hours

Limits differ by authentication method: per-user for OAuth 1.0a, and per-app for OAuth 2.0 Bearer Token.

Slack API

Slack API uses a Fixed Window (1 minute) rate limit with a Tier system. Details are available in the Slack API official documentation.

TierLimitExample Methods
Tier 11 request/minMost restrictive
Tier 220 requests/minfiles.upload
Tier 350 requests/minGeneral methods
Tier 4100 requests/minchat.postEphemeral

Different Tiers are assigned to each method, so even if one method is rate limited, other methods remain available.

Implementation Example at Giselle

We adopted Fixed Window Counter for this PR feat(api/apps/run): add team-scoped rate limiting to App Run API · #2603.

// 60-second fixed window
const WINDOW_SECONDS = 60;
 
// Key by window start time
const windowStart = Math.floor(Date.now() / 1000 / WINDOW_SECONDS) * WINDOW_SECONDS;
 
// Atomic counter increment (INSERT ... ON CONFLICT DO UPDATE)

A well-balanced choice between simplicity and performance.

2. Sliding Window Log

A method to solve the boundary problem of fixed windows.

How It Works

  • Record timestamps for all requests
  • Calculate “how many occurred in the last 60 seconds” each time
  • When a new request arrives, delete logs older than “1 minute ago” from the current time
  • Allow if the remaining log count is below the limit

Advantages

  • Enables the most accurate rate limiting
  • No boundary bursts

Disadvantages

  • High memory consumption due to storing and calculating timestamps for all requests
  • Not suitable for high traffic

Suitable Use Cases

  • When high-precision limits are needed
  • APIs with few users and requests
  • Situations requiring strictness

Implementation Pattern

Due to high memory consumption, Sliding Window Log is rarely adopted by large-scale services; most choose Sliding Window Counter or Token Bucket.

A typical implementation uses Redis Sorted Sets.

MULTI
ZREMRANGEBYSCORE $key 0 ($currentTime - $window)
ZADD $key $currentTime $uniqueRequestId
ZCARD $key
EXPIRE $key $window
EXEC

Atomicity via MULTI / EXEC prevents race conditions in distributed environments. For high-traffic environments, migration to Sliding Window Counter is recommended.

3. Sliding Window Counter

A compromise combining the low overhead of “Fixed Window” with the accuracy of “Sliding Window Log.”

How It Works

Calculate using a weighted average of the previous window and current window.

effective_count =
  current_window_count +
  previous_window_count × overlap_ratio

For example, if the window is 1 minute and the current time is 12:01:30:

  • Previous window (12:00:00-12:00:59) count × 0.5
  • Current window (12:01:00-12:01:59) count

Advantages

  • Smoother than Fixed Window
  • Lighter than Log method
  • Boundary problem is also mitigated

Disadvantages

  • Not completely accurate (approximation)

Suitable Use Cases

  • Many API Gateways
  • When balancing accuracy and performance

Real-World Adoption Cases

Cloudflare

Cloudflare uses Sliding Window Counter to solve the boundary problem of Fixed Window. Detailed implementation is published on their official blog. Also refer to Cloudflare’s official documentation.

Formula:

rate = previous_count × ((window_size - elapsed) / window_size) + current_count

Example: With a limit of 50 requests/minute, if there were 42 requests in the previous minute and 18 requests in the current minute (15 seconds elapsed):

rate = 42 × ((60 - 15) / 60) + 18
     = 42 × 0.75 + 18
     = 49.5 requests

According to Cloudflare’s analysis, they achieved high accuracy with an error rate of only 0.003% across 400 million requests.

The following optimizations are also implemented:

  • Minimal storage: Only 2 values per counter
  • Atomic operations: Implemented with a single INCR command
  • Asynchronous counting: Does not delay legitimate requests

4. Token Bucket

The most commonly used algorithm, adopted by many services including Amazon API Gateway and Stripe.

How It Works

  1. Tokens are replenished at a constant rate into a “bucket” with a fixed capacity
  2. Each request consumes one token from the bucket
  3. If the bucket is empty, the request is rejected
Bucket capacity: 10 tokens
Refill rate: 1 token/second
→ 10 requests possible instantly, then 1 request/second

Advantages

  • Can allow short bursts while limiting average rate
  • Compatible with real-world API limits
  • Prioritizes user experience

Disadvantages

  • Somewhat complex to implement
  • Requires synchronization in distributed environments

Suitable Use Cases

  • Public APIs
  • APIs prioritizing user experience
  • Stripe / AWS API style

Real-World Adoption Cases

Stripe

Stripe uses Token Bucket, with detailed implementation published on their official blog. Also refer to the Stripe API official documentation. Each user is assigned a bucket, achieving low latency using Redis.

Stripe actually uses 4 types of limiters in combination.

LimiterPurpose
Request Rate Limitern requests/second per user (burst tolerant)
Concurrent Requests LimiterLimits concurrent requests (for CPU-intensive endpoints)
Fleet Usage Load ShedderRejects non-critical requests when infrastructure usage exceeds 90%
Worker Utilization Load ShedderClassifies traffic into 4 categories per server for priority control

The default limit is 25 requests/second.

Discord

Discord API uses Token Bucket-based rate limiting. Details are available in the Discord API official documentation.

  • Global limit: 50 requests/second (total across all endpoints)
  • Per-route limits: Different limits per endpoint

You can check limit status via response headers.

X-RateLimit-Limit: 5
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1470173023.123
X-RateLimit-Bucket: abcd1234

A notable feature is the X-RateLimit-Bucket header, which allows grouping endpoints that share the same bucket. Additionally, independent limits are applied per resource (guild_id, channel_id, etc.), so being limited on one channel doesn’t affect other channels.

5. Leaky Bucket

Similar to Token Bucket, but focuses on request processing rate.

How It Works

  1. Imagine a bucket with a hole at the bottom
  2. Requests enter the bucket and are processed at a constant rate (water leaking from the hole)
  3. If the bucket overflows, new requests are discarded

Advantages

  • Can keep output (processing) at a constant pace
  • Reliably protects downstream systems
  • Suited for traffic smoothing

Disadvantages

  • Bursts are immediately delayed or dropped
  • Increased latency

Suitable Use Cases

  • Backend protection
  • Asynchronous job queues
  • Network traffic control

Real-World Adoption Cases

Shopify

Shopify uses Leaky Bucket for all APIs. Details are available in the Shopify API official documentation.

REST Admin API limits:

PlanBucket SizeLeak Rate
Standard40 requests/app/store2 requests/second
Shopify Plus400 requests/app/store20 requests/second

You can check current usage via response headers.

X-Shopify-Shop-Api-Call-Limit: 32/40

For the GraphQL Admin API, limits are based on calculated query cost rather than request count. More complex queries consume more points.

Note that the Storefront API has no rate limits and is designed to handle sudden traffic increases such as flash sales.

Common Combinations in Practice

In real systems, it’s common to combine multiple rate limits.

LayerMethod
Edge / CDNFixed or Sliding Window
API GatewayToken Bucket
BackgroundLeaky Bucket

Which One Should You Choose? (Selection Guidelines)

RequirementRecommended Algorithm
Implementation priorityFixed Window Counter
Accuracy prioritySliding Window Log
BalancedSliding Window Counter
UX priorityToken Bucket
Downstream protectionLeaky Bucket

Summary

For Giselle’s PR, we adopted Fixed Window Counter. The reasons for selection are as follows:

  1. Simplicity - Easy to implement and understand
  2. Efficiency - Only 1 record per window needed
  3. Atomic operations - Avoids race conditions with DB upsert
  4. Sufficient accuracy - Acceptable for many use cases

While the boundary problem exists, we determined it to be acceptable for this use case. If stricter control becomes necessary, we’ll consider migrating to Sliding Window Counter or Token Bucket.

That’s all from the Gemba on rate limit algorithm comparison and case studies.