While implementing rate limiting for the Giselle API, I researched rate limiting algorithms.
Rate limiting is an essential mechanism for preventing excessive requests to a system and maintaining availability and security (such as DoS attack prevention). This article summarizes the characteristics and selection guidelines for representative algorithms.
Algorithm Comparison Table
| Algorithm | Characteristics | Burst Tolerance | Memory Efficiency | Accuracy | Implementation Difficulty |
|---|---|---|---|---|---|
| Fixed Window Counter | Simple, bursts possible at boundaries | △ | Best | Low | Low |
| Sliding Window Log | Accurate, high memory usage | × | Low | Best | Medium |
| Sliding Window Counter | Middle ground between Fixed + Sliding | ○ | High | High | Medium |
| Token Bucket | Burst tolerant, smoothing | ◎ | High | Medium | Medium |
| Leaky Bucket | Processes at constant rate | × | High | Medium | Medium |
1. Fixed Window Counter
The simplest method that divides time into fixed intervals (such as 1 minute) and counts within them.
Window 1 (12:00:00-12:00:59) Window 2 (12:01:00-12:01:59)
┌─────────────────────────┐ ┌─────────────────────────┐
│ count: 45/60 │ │ count: 12/60 │
│ ████████████░░░░░░░░░ │ │ ████░░░░░░░░░░░░░░░░░ │
└─────────────────────────┘ └─────────────────────────┘
How It Works
- Reset the counter for each fixed time window such as “1 minute” or “1 hour”
- Example:
100 requests per minute- 12:00:00 to 12:00:59 → maximum 100
- Reset at 12:01:00
Advantages
- Very simple to implement
- Easy to implement with Redis / in-memory
- Efficient as it only requires 1 record per window
Disadvantage: Boundary Problem
End of Window 1 Start of Window 2
↓ ↓
────────────┼────────────┼────────────
At 59th │ At 1st
second │ second
60 requests│ 60 requests
│
120 requests possible in 2 seconds!
This is called the “boundary problem” or “burst at window edges” and is the biggest weakness of Fixed Window.
Suitable Use Cases
- Admin panels
- APIs that can tolerate some bursts
- MVPs and initial implementations
Real-World Adoption Cases
GitHub API
GitHub API uses Fixed Window-based rate limiting. Details are available in the GitHub API official documentation.
- Unauthenticated: 60 requests/hour (per IP address)
- Personal Access Token: 5,000 requests/hour (per user)
- GitHub Apps (Enterprise Cloud): 15,000 requests/hour
You can check the remaining request count and reset time via response headers.
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999
X-RateLimit-Reset: 1372700873
X (Twitter) API
X API uses a 15-minute fixed window. Details are available in the X API official documentation.
- Most endpoints: n requests per 15 minutes
- Some endpoints (such as posting tweets): 300 requests per 3 hours
Limits differ by authentication method: per-user for OAuth 1.0a, and per-app for OAuth 2.0 Bearer Token.
Slack API
Slack API uses a Fixed Window (1 minute) rate limit with a Tier system. Details are available in the Slack API official documentation.
| Tier | Limit | Example Methods |
|---|---|---|
| Tier 1 | 1 request/min | Most restrictive |
| Tier 2 | 20 requests/min | files.upload |
| Tier 3 | 50 requests/min | General methods |
| Tier 4 | 100 requests/min | chat.postEphemeral |
Different Tiers are assigned to each method, so even if one method is rate limited, other methods remain available.
Implementation Example at Giselle
We adopted Fixed Window Counter for this PR feat(api/apps/run): add team-scoped rate limiting to App Run API · #2603.
// 60-second fixed window
const WINDOW_SECONDS = 60;
// Key by window start time
const windowStart = Math.floor(Date.now() / 1000 / WINDOW_SECONDS) * WINDOW_SECONDS;
// Atomic counter increment (INSERT ... ON CONFLICT DO UPDATE)A well-balanced choice between simplicity and performance.
2. Sliding Window Log
A method to solve the boundary problem of fixed windows.
How It Works
- Record timestamps for all requests
- Calculate “how many occurred in the last 60 seconds” each time
- When a new request arrives, delete logs older than “1 minute ago” from the current time
- Allow if the remaining log count is below the limit
Advantages
- Enables the most accurate rate limiting
- No boundary bursts
Disadvantages
- High memory consumption due to storing and calculating timestamps for all requests
- Not suitable for high traffic
Suitable Use Cases
- When high-precision limits are needed
- APIs with few users and requests
- Situations requiring strictness
Implementation Pattern
Due to high memory consumption, Sliding Window Log is rarely adopted by large-scale services; most choose Sliding Window Counter or Token Bucket.
A typical implementation uses Redis Sorted Sets.
MULTI
ZREMRANGEBYSCORE $key 0 ($currentTime - $window)
ZADD $key $currentTime $uniqueRequestId
ZCARD $key
EXPIRE $key $window
EXEC
Atomicity via MULTI / EXEC prevents race conditions in distributed environments. For high-traffic environments, migration to Sliding Window Counter is recommended.
3. Sliding Window Counter
A compromise combining the low overhead of “Fixed Window” with the accuracy of “Sliding Window Log.”
How It Works
Calculate using a weighted average of the previous window and current window.
effective_count =
current_window_count +
previous_window_count × overlap_ratio
For example, if the window is 1 minute and the current time is 12:01:30:
- Previous window (12:00:00-12:00:59) count × 0.5
- Current window (12:01:00-12:01:59) count
Advantages
- Smoother than Fixed Window
- Lighter than Log method
- Boundary problem is also mitigated
Disadvantages
- Not completely accurate (approximation)
Suitable Use Cases
- Many API Gateways
- When balancing accuracy and performance
Real-World Adoption Cases
Cloudflare
Cloudflare uses Sliding Window Counter to solve the boundary problem of Fixed Window. Detailed implementation is published on their official blog. Also refer to Cloudflare’s official documentation.
Formula:
rate = previous_count × ((window_size - elapsed) / window_size) + current_count
Example: With a limit of 50 requests/minute, if there were 42 requests in the previous minute and 18 requests in the current minute (15 seconds elapsed):
rate = 42 × ((60 - 15) / 60) + 18
= 42 × 0.75 + 18
= 49.5 requests
According to Cloudflare’s analysis, they achieved high accuracy with an error rate of only 0.003% across 400 million requests.
The following optimizations are also implemented:
- Minimal storage: Only 2 values per counter
- Atomic operations: Implemented with a single
INCRcommand - Asynchronous counting: Does not delay legitimate requests
4. Token Bucket
The most commonly used algorithm, adopted by many services including Amazon API Gateway and Stripe.
How It Works
- Tokens are replenished at a constant rate into a “bucket” with a fixed capacity
- Each request consumes one token from the bucket
- If the bucket is empty, the request is rejected
Bucket capacity: 10 tokens
Refill rate: 1 token/second
→ 10 requests possible instantly, then 1 request/second
Advantages
- Can allow short bursts while limiting average rate
- Compatible with real-world API limits
- Prioritizes user experience
Disadvantages
- Somewhat complex to implement
- Requires synchronization in distributed environments
Suitable Use Cases
- Public APIs
- APIs prioritizing user experience
- Stripe / AWS API style
Real-World Adoption Cases
Stripe
Stripe uses Token Bucket, with detailed implementation published on their official blog. Also refer to the Stripe API official documentation. Each user is assigned a bucket, achieving low latency using Redis.
Stripe actually uses 4 types of limiters in combination.
| Limiter | Purpose |
|---|---|
| Request Rate Limiter | n requests/second per user (burst tolerant) |
| Concurrent Requests Limiter | Limits concurrent requests (for CPU-intensive endpoints) |
| Fleet Usage Load Shedder | Rejects non-critical requests when infrastructure usage exceeds 90% |
| Worker Utilization Load Shedder | Classifies traffic into 4 categories per server for priority control |
The default limit is 25 requests/second.
Discord
Discord API uses Token Bucket-based rate limiting. Details are available in the Discord API official documentation.
- Global limit: 50 requests/second (total across all endpoints)
- Per-route limits: Different limits per endpoint
You can check limit status via response headers.
X-RateLimit-Limit: 5
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1470173023.123
X-RateLimit-Bucket: abcd1234
A notable feature is the X-RateLimit-Bucket header, which allows grouping endpoints that share the same bucket. Additionally, independent limits are applied per resource (guild_id, channel_id, etc.), so being limited on one channel doesn’t affect other channels.
5. Leaky Bucket
Similar to Token Bucket, but focuses on request processing rate.
How It Works
- Imagine a bucket with a hole at the bottom
- Requests enter the bucket and are processed at a constant rate (water leaking from the hole)
- If the bucket overflows, new requests are discarded
Advantages
- Can keep output (processing) at a constant pace
- Reliably protects downstream systems
- Suited for traffic smoothing
Disadvantages
- Bursts are immediately delayed or dropped
- Increased latency
Suitable Use Cases
- Backend protection
- Asynchronous job queues
- Network traffic control
Real-World Adoption Cases
Shopify
Shopify uses Leaky Bucket for all APIs. Details are available in the Shopify API official documentation.
REST Admin API limits:
| Plan | Bucket Size | Leak Rate |
|---|---|---|
| Standard | 40 requests/app/store | 2 requests/second |
| Shopify Plus | 400 requests/app/store | 20 requests/second |
You can check current usage via response headers.
X-Shopify-Shop-Api-Call-Limit: 32/40
For the GraphQL Admin API, limits are based on calculated query cost rather than request count. More complex queries consume more points.
Note that the Storefront API has no rate limits and is designed to handle sudden traffic increases such as flash sales.
Common Combinations in Practice
In real systems, it’s common to combine multiple rate limits.
| Layer | Method |
|---|---|
| Edge / CDN | Fixed or Sliding Window |
| API Gateway | Token Bucket |
| Background | Leaky Bucket |
Which One Should You Choose? (Selection Guidelines)
| Requirement | Recommended Algorithm |
|---|---|
| Implementation priority | Fixed Window Counter |
| Accuracy priority | Sliding Window Log |
| Balanced | Sliding Window Counter |
| UX priority | Token Bucket |
| Downstream protection | Leaky Bucket |
Summary
For Giselle’s PR, we adopted Fixed Window Counter. The reasons for selection are as follows:
- Simplicity - Easy to implement and understand
- Efficiency - Only 1 record per window needed
- Atomic operations - Avoids race conditions with DB upsert
- Sufficient accuracy - Acceptable for many use cases
While the boundary problem exists, we determined it to be acceptable for this use case. If stricter control becomes necessary, we’ll consider migrating to Sliding Window Counter or Token Bucket.
That’s all from the Gemba on rate limit algorithm comparison and case studies.