Rate Limit Algorithm Comparison and Case Studies

While implementing rate limiting for the Giselle API, I researched rate limiting algorithms.

Rate limiting is an essential mechanism for preventing excessive requests to a system and maintaining availability and security (such as DoS attack prevention). This article summarizes the characteristics and selection guidelines for representative algorithms.

Algorithm Comparison Table

Algorithm	Characteristics	Burst Tolerance	Memory Efficiency	Accuracy	Implementation Difficulty
Fixed Window Counter	Simple, bursts possible at boundaries	△	Best	Low	Low
Sliding Window Log	Accurate, high memory usage	×	Low	Best	Medium
Sliding Window Counter	Middle ground between Fixed + Sliding	○	High	High	Medium
Token Bucket	Burst tolerant, smoothing	◎	High	Medium	Medium
Leaky Bucket	Processes at constant rate	×	High	Medium	Medium

1. Fixed Window Counter

The simplest method that divides time into fixed intervals (such as 1 minute) and counts within them.

Window 1 (12:00:00-12:00:59)    Window 2 (12:01:00-12:01:59)
┌─────────────────────────┐    ┌─────────────────────────┐
│  count: 45/60           │    │  count: 12/60           │
│  ████████████░░░░░░░░░  │    │  ████░░░░░░░░░░░░░░░░░  │
└─────────────────────────┘    └─────────────────────────┘

How It Works

Reset the counter for each fixed time window such as “1 minute” or “1 hour”
Example: 100 requests per minute
- 12:00:00 to 12:00:59 → maximum 100
- Reset at 12:01:00

Advantages

Very simple to implement
Easy to implement with Redis / in-memory
Efficient as it only requires 1 record per window

Disadvantage: Boundary Problem

End of Window 1      Start of Window 2
    ↓                    ↓
────────────┼────────────┼────────────
   At 59th   │  At 1st
   second    │  second
   60 requests│  60 requests
            │
         120 requests possible in 2 seconds!

This is called the “boundary problem” or “burst at window edges” and is the biggest weakness of Fixed Window.

Suitable Use Cases

Admin panels
APIs that can tolerate some bursts
MVPs and initial implementations

Real-World Adoption Cases

GitHub API

GitHub API uses Fixed Window-based rate limiting. Details are available in the GitHub API official documentation.

Unauthenticated: 60 requests/hour (per IP address)
Personal Access Token: 5,000 requests/hour (per user)
GitHub Apps (Enterprise Cloud): 15,000 requests/hour

You can check the remaining request count and reset time via response headers.

X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999
X-RateLimit-Reset: 1372700873

X (Twitter) API

X API uses a 15-minute fixed window. Details are available in the X API official documentation.

Most endpoints: n requests per 15 minutes
Some endpoints (such as posting tweets): 300 requests per 3 hours

Limits differ by authentication method: per-user for OAuth 1.0a, and per-app for OAuth 2.0 Bearer Token.

Slack API

Slack API uses a Fixed Window (1 minute) rate limit with a Tier system. Details are available in the Slack API official documentation.

Tier	Limit	Example Methods
Tier 1	1 request/min	Most restrictive
Tier 2	20 requests/min	files.upload
Tier 3	50 requests/min	General methods
Tier 4	100 requests/min	chat.postEphemeral

Different Tiers are assigned to each method, so even if one method is rate limited, other methods remain available.

Implementation Example at Giselle

We adopted Fixed Window Counter for this PR feat(api/apps/run): add team-scoped rate limiting to App Run API · #2603.

// 60-second fixed window
const WINDOW_SECONDS = 60;
 
// Key by window start time
const windowStart = Math.floor(Date.now() / 1000 / WINDOW_SECONDS) * WINDOW_SECONDS;
 
// Atomic counter increment (INSERT ... ON CONFLICT DO UPDATE)

A well-balanced choice between simplicity and performance.

2. Sliding Window Log

A method to solve the boundary problem of fixed windows.

How It Works

Record timestamps for all requests
Calculate “how many occurred in the last 60 seconds” each time
When a new request arrives, delete logs older than “1 minute ago” from the current time
Allow if the remaining log count is below the limit

Advantages

Enables the most accurate rate limiting
No boundary bursts

Disadvantages

High memory consumption due to storing and calculating timestamps for all requests
Not suitable for high traffic

Suitable Use Cases

When high-precision limits are needed
APIs with few users and requests
Situations requiring strictness

Implementation Pattern

Due to high memory consumption, Sliding Window Log is rarely adopted by large-scale services; most choose Sliding Window Counter or Token Bucket.

A typical implementation uses Redis Sorted Sets.

MULTI
ZREMRANGEBYSCORE $key 0 ($currentTime - $window)
ZADD $key $currentTime $uniqueRequestId
ZCARD $key
EXPIRE $key $window
EXEC

Atomicity via MULTI / EXEC prevents race conditions in distributed environments. For high-traffic environments, migration to Sliding Window Counter is recommended.

3. Sliding Window Counter

A compromise combining the low overhead of “Fixed Window” with the accuracy of “Sliding Window Log.”

How It Works

Calculate using a weighted average of the previous window and current window.

effective_count =
  current_window_count +
  previous_window_count × overlap_ratio

For example, if the window is 1 minute and the current time is 12:01:30:

Previous window (12:00:00-12:00:59) count × 0.5
Current window (12:01:00-12:01:59) count

Advantages

Smoother than Fixed Window
Lighter than Log method
Boundary problem is also mitigated

Disadvantages

Not completely accurate (approximation)

Suitable Use Cases

Many API Gateways
When balancing accuracy and performance

Real-World Adoption Cases

Cloudflare

Cloudflare uses Sliding Window Counter to solve the boundary problem of Fixed Window. Detailed implementation is published on their official blog. Also refer to Cloudflare’s official documentation.

Formula:

rate = previous_count × ((window_size - elapsed) / window_size) + current_count

Example: With a limit of 50 requests/minute, if there were 42 requests in the previous minute and 18 requests in the current minute (15 seconds elapsed):

rate = 42 × ((60 - 15) / 60) + 18
     = 42 × 0.75 + 18
     = 49.5 requests

According to Cloudflare’s analysis, they achieved high accuracy with an error rate of only 0.003% across 400 million requests.

The following optimizations are also implemented:

Minimal storage: Only 2 values per counter
Atomic operations: Implemented with a single INCR command
Asynchronous counting: Does not delay legitimate requests

4. Token Bucket

The most commonly used algorithm, adopted by many services including Amazon API Gateway and Stripe.

How It Works

Tokens are replenished at a constant rate into a “bucket” with a fixed capacity
Each request consumes one token from the bucket
If the bucket is empty, the request is rejected

Bucket capacity: 10 tokens
Refill rate: 1 token/second
→ 10 requests possible instantly, then 1 request/second

Advantages

Can allow short bursts while limiting average rate
Compatible with real-world API limits
Prioritizes user experience

Disadvantages

Somewhat complex to implement
Requires synchronization in distributed environments

Suitable Use Cases

Public APIs
APIs prioritizing user experience
Stripe / AWS API style

Real-World Adoption Cases

Stripe

Stripe uses Token Bucket, with detailed implementation published on their official blog. Also refer to the Stripe API official documentation. Each user is assigned a bucket, achieving low latency using Redis.

Stripe actually uses 4 types of limiters in combination.

Limiter	Purpose
Request Rate Limiter	n requests/second per user (burst tolerant)
Concurrent Requests Limiter	Limits concurrent requests (for CPU-intensive endpoints)
Fleet Usage Load Shedder	Rejects non-critical requests when infrastructure usage exceeds 90%
Worker Utilization Load Shedder	Classifies traffic into 4 categories per server for priority control

The default limit is 25 requests/second.

Discord

Discord API uses Token Bucket-based rate limiting. Details are available in the Discord API official documentation.

Global limit: 50 requests/second (total across all endpoints)
Per-route limits: Different limits per endpoint

You can check limit status via response headers.

X-RateLimit-Limit: 5
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1470173023.123
X-RateLimit-Bucket: abcd1234

A notable feature is the X-RateLimit-Bucket header, which allows grouping endpoints that share the same bucket. Additionally, independent limits are applied per resource (guild_id, channel_id, etc.), so being limited on one channel doesn’t affect other channels.

5. Leaky Bucket

Similar to Token Bucket, but focuses on request processing rate.

How It Works

Imagine a bucket with a hole at the bottom
Requests enter the bucket and are processed at a constant rate (water leaking from the hole)
If the bucket overflows, new requests are discarded

Advantages

Can keep output (processing) at a constant pace
Reliably protects downstream systems
Suited for traffic smoothing

Disadvantages

Bursts are immediately delayed or dropped
Increased latency

Suitable Use Cases

Backend protection
Asynchronous job queues
Network traffic control

Real-World Adoption Cases

Shopify

Shopify uses Leaky Bucket for all APIs. Details are available in the Shopify API official documentation.

REST Admin API limits:

Plan	Bucket Size	Leak Rate
Standard	40 requests/app/store	2 requests/second
Shopify Plus	400 requests/app/store	20 requests/second

You can check current usage via response headers.

X-Shopify-Shop-Api-Call-Limit: 32/40

For the GraphQL Admin API, limits are based on calculated query cost rather than request count. More complex queries consume more points.

Note that the Storefront API has no rate limits and is designed to handle sudden traffic increases such as flash sales.

Common Combinations in Practice

In real systems, it’s common to combine multiple rate limits.

Layer	Method
Edge / CDN	Fixed or Sliding Window
API Gateway	Token Bucket
Background	Leaky Bucket

Which One Should You Choose? (Selection Guidelines)

Requirement	Recommended Algorithm
Implementation priority	Fixed Window Counter
Accuracy priority	Sliding Window Log
Balanced	Sliding Window Counter
UX priority	Token Bucket
Downstream protection	Leaky Bucket

Summary

For Giselle’s PR, we adopted Fixed Window Counter. The reasons for selection are as follows:

Simplicity - Easy to implement and understand
Efficiency - Only 1 record per window needed
Atomic operations - Avoids race conditions with DB upsert
Sufficient accuracy - Acceptable for many use cases

While the boundary problem exists, we determined it to be acceptable for this use case. If stricter control becomes necessary, we’ll consider migrating to Sliding Window Counter or Token Bucket.

That’s all from the Gemba on rate limit algorithm comparison and case studies.