While implementing rate limiting for the Giselle API, I researched rate limiting algorithms.
Rate limiting is an essential mechanism for preventing excessive requests to a system and maintaining availability and security (such as DoS attack prevention). This article summarizes the characteristics and selection guidelines for representative algorithms.
| Algorithm | Characteristics | Burst Tolerance | Memory Efficiency | Accuracy | Implementation Difficulty |
|---|---|---|---|---|---|
| Fixed Window Counter | Simple, bursts possible at boundaries | △ | Best | Low | Low |
| Sliding Window Log | Accurate, high memory usage | × | Low | Best | Medium |
| Sliding Window Counter | Middle ground between Fixed + Sliding | ○ | High | High | Medium |
| Token Bucket | Burst tolerant, smoothing | ◎ | High | Medium | Medium |
| Leaky Bucket | Processes at constant rate | × | High | Medium | Medium |
The simplest method that divides time into fixed intervals (such as 1 minute) and counts within them.
Window 1 (12:00:00-12:00:59) Window 2 (12:01:00-12:01:59)
┌─────────────────────────┐ ┌─────────────────────────┐
│ count: 45/60 │ │ count: 12/60 │
│ ████████████░░░░░░░░░ │ │ ████░░░░░░░░░░░░░░░░░ │
└─────────────────────────┘ └─────────────────────────┘
100 requests per minute
End of Window 1 Start of Window 2
↓ ↓
────────────┼────────────┼────────────
At 59th │ At 1st
second │ second
60 requests│ 60 requests
│
120 requests possible in 2 seconds!
This is called the “boundary problem” or “burst at window edges” and is the biggest weakness of Fixed Window.
GitHub API uses Fixed Window-based rate limiting. Details are available in the GitHub API official documentation.
You can check the remaining request count and reset time via response headers.
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999
X-RateLimit-Reset: 1372700873
X API uses a 15-minute fixed window. Details are available in the X API official documentation.
Limits differ by authentication method: per-user for OAuth 1.0a, and per-app for OAuth 2.0 Bearer Token.
Slack API uses a Fixed Window (1 minute) rate limit with a Tier system. Details are available in the Slack API official documentation.
| Tier | Limit | Example Methods |
|---|---|---|
| Tier 1 | 1 request/min | Most restrictive |
| Tier 2 | 20 requests/min | files.upload |
| Tier 3 | 50 requests/min | General methods |
| Tier 4 | 100 requests/min | chat.postEphemeral |
Different Tiers are assigned to each method, so even if one method is rate limited, other methods remain available.
We adopted Fixed Window Counter for this PR feat(api/apps/run): add team-scoped rate limiting to App Run API · #2603.
// 60-second fixed window
const WINDOW_SECONDS = 60;
// Key by window start time
const windowStart = Math.floor(Date.now() / 1000 / WINDOW_SECONDS) * WINDOW_SECONDS;
// Atomic counter increment (INSERT ... ON CONFLICT DO UPDATE)A well-balanced choice between simplicity and performance.
A method to solve the boundary problem of fixed windows.
Due to high memory consumption, Sliding Window Log is rarely adopted by large-scale services; most choose Sliding Window Counter or Token Bucket.
A typical implementation uses Redis Sorted Sets.
MULTI
ZREMRANGEBYSCORE $key 0 ($currentTime - $window)
ZADD $key $currentTime $uniqueRequestId
ZCARD $key
EXPIRE $key $window
EXEC
Atomicity via MULTI / EXEC prevents race conditions in distributed environments. For high-traffic environments, migration to Sliding Window Counter is recommended.
A compromise combining the low overhead of “Fixed Window” with the accuracy of “Sliding Window Log.”
Calculate using a weighted average of the previous window and current window.
effective_count =
current_window_count +
previous_window_count × overlap_ratio
For example, if the window is 1 minute and the current time is 12:01:30:
Cloudflare uses Sliding Window Counter to solve the boundary problem of Fixed Window. Detailed implementation is published on their official blog. Also refer to Cloudflare’s official documentation.
Formula:
rate = previous_count × ((window_size - elapsed) / window_size) + current_count
Example: With a limit of 50 requests/minute, if there were 42 requests in the previous minute and 18 requests in the current minute (15 seconds elapsed):
rate = 42 × ((60 - 15) / 60) + 18
= 42 × 0.75 + 18
= 49.5 requests
According to Cloudflare’s analysis, they achieved high accuracy with an error rate of only 0.003% across 400 million requests.
The following optimizations are also implemented:
INCR commandThe most commonly used algorithm, adopted by many services including Amazon API Gateway and Stripe.
Bucket capacity: 10 tokens
Refill rate: 1 token/second
→ 10 requests possible instantly, then 1 request/second
Stripe uses Token Bucket, with detailed implementation published on their official blog. Also refer to the Stripe API official documentation. Each user is assigned a bucket, achieving low latency using Redis.
Stripe actually uses 4 types of limiters in combination.
| Limiter | Purpose |
|---|---|
| Request Rate Limiter | n requests/second per user (burst tolerant) |
| Concurrent Requests Limiter | Limits concurrent requests (for CPU-intensive endpoints) |
| Fleet Usage Load Shedder | Rejects non-critical requests when infrastructure usage exceeds 90% |
| Worker Utilization Load Shedder | Classifies traffic into 4 categories per server for priority control |
The default limit is 25 requests/second.
Discord API uses Token Bucket-based rate limiting. Details are available in the Discord API official documentation.
You can check limit status via response headers.
X-RateLimit-Limit: 5
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1470173023.123
X-RateLimit-Bucket: abcd1234
A notable feature is the X-RateLimit-Bucket header, which allows grouping endpoints that share the same bucket. Additionally, independent limits are applied per resource (guild_id, channel_id, etc.), so being limited on one channel doesn’t affect other channels.
Similar to Token Bucket, but focuses on request processing rate.
Shopify uses Leaky Bucket for all APIs. Details are available in the Shopify API official documentation.
REST Admin API limits:
| Plan | Bucket Size | Leak Rate |
|---|---|---|
| Standard | 40 requests/app/store | 2 requests/second |
| Shopify Plus | 400 requests/app/store | 20 requests/second |
You can check current usage via response headers.
X-Shopify-Shop-Api-Call-Limit: 32/40
For the GraphQL Admin API, limits are based on calculated query cost rather than request count. More complex queries consume more points.
Note that the Storefront API has no rate limits and is designed to handle sudden traffic increases such as flash sales.
In real systems, it’s common to combine multiple rate limits.
| Layer | Method |
|---|---|
| Edge / CDN | Fixed or Sliding Window |
| API Gateway | Token Bucket |
| Background | Leaky Bucket |
| Requirement | Recommended Algorithm |
|---|---|
| Implementation priority | Fixed Window Counter |
| Accuracy priority | Sliding Window Log |
| Balanced | Sliding Window Counter |
| UX priority | Token Bucket |
| Downstream protection | Leaky Bucket |
For Giselle’s PR, we adopted Fixed Window Counter. The reasons for selection are as follows:
While the boundary problem exists, we determined it to be acceptable for this use case. If stricter control becomes necessary, we’ll consider migrating to Sliding Window Counter or Token Bucket.
That’s all from the Gemba on rate limit algorithm comparison and case studies.