Rate limits Algorithm

Chapt 4 of System Design Interview – An insider’s guide Algorithms for rate limiting

Rate limiting can be implemented using different algorithms, and each of them has distinct pros and cons.

Token bucket algorithm

Token Bucket Algorithm for Rate Limiting

Widely used for rate limiting.
Commonly implemented by companies like Amazon and Stripe.

How it Works:

Token Bucket:
- A container with a pre-defined capacity.
- Tokens are periodically fulfilled at preset rates.
- Once full, no additional tokens are added till the preset period.
Request Handling:
- Each request consumes one token.
- If enough tokens are available, one token is removed per request, and the request proceeds.
- If not enough tokens, the request is dropped.

Parameters:

Bucket Size: Maximum number of tokens allowed.
Refill Rate: Number of tokens added every second.

Bucket Requirements:

Per Endpoint:
- Different API endpoints may need separate buckets (e.g., 1 post/sec, 150 friends/day, 5 likes/sec for each user).
Per IP Address:
- Each IP address may require an individual bucket.
Global Bucket:
- For systems allowing a maximum number of requests (e.g., 10,000 requests/sec), a global bucket may be shared.

Pros:

Simple to implement.
Memory efficient.
Allows short bursts of traffic if tokens are available.

Cons:

Tuning the bucket size and refill rate can be challenging.

Leaking Bucket Algorithm for Rate Limiting

Overview:
- Similar to the token bucket algorithm but processes requests at a fixed rate.
- Typically implemented with a first-in-first-out (FIFO) queue.

How it Works:

Request Arrival:
- Check if the queue is full.
- If the queue is not full, add the request to the queue.
- If the queue is full, drop the request.
Request Processing:
- Requests are pulled from the queue and processed at regular intervals.

Parameters:

Bucket Size: Equals the queue size; holds requests to be processed at a fixed rate.
Outflow Rate: Defines how many requests can be processed at a fixed rate, usually per second.

Pros:

Memory efficient due to the limited queue size.
Processes requests at a fixed rate, making it suitable for scenarios requiring a stable outflow rate.

Cons:

Burst traffic can fill the queue with old requests, causing recent requests to be rate-limited if not processed in time.
Tuning the bucket size and outflow rate can be challenging.

Fixed Window Counter Algorithm for Rate Limiting

How It Works:
1. Timeline Division:
  - Divides the timeline into fixed-sized time windows.
  - Assigns a counter for each window.
2. Request Handling:
  - Each request increments the counter by one.
  - If the counter reaches the predefined threshold, new requests are dropped until a new time window starts.
Example:
- Scenario: Time unit is 1 second, with a maximum of 3 requests per second.
- Behavior: In each 1-second window, if more than 3 requests are received, extra requests are dropped.

Major Problem:

Traffic Bursts at Window Edges:
- A burst of traffic at the edges of time windows can result in more requests than the allowed quota going through.
- Example:
  - System allows a maximum of 5 requests per minute.
  - Quota resets at the start of each minute.
  - Issue: If five requests occur between 2:00:00 and 2:01:00, and another five between 2:01:00 and 2:02:00, this results in 10 requests within the one-minute window from 2:00:30 to 2:01:30, which exceeds the allowed limit.

Pros:

Memory efficient.
Easy to understand.
Resetting quota at the end of each time window suits certain use cases.

Cons:

Traffic spikes at window edges can cause more requests than the allowed quota to go through.

Sliding Window Log Algorithm

Overview:
- Addresses the edge-case issue of the fixed window counter algorithm.
- Keeps track of request timestamps, usually in a cache like Redis sorted sets.

How It Works:

Track Timestamps:
- Logs each request’s timestamp.
- Removes outdated timestamps (older than the current time window).
Request Handling:
- Add the new request’s timestamp to the log.
- Accept the request if the log size is within the allowed count; otherwise, reject it.

Example:

Scenario: Rate limiter allows 2 requests per minute.
- 1:00:01: Log is empty, request allowed.
- 1:00:30: Timestamp added, log size = 2, request allowed.
- 1:00:50: Timestamp added, log size = 3 (exceeds limit), request rejected.
- 1:01:40: Outdated timestamps (1:00:01, 1:00:30) removed. Log size = 2, request allowed.

Pros:

Highly accurate rate limiting within any rolling window.

Cons:

High memory consumption as timestamps are stored even for rejected requests.

Sliding Window Counter Algorithm

Overview:
- Hybrid approach combining fixed window counter and sliding window log algorithms.
- Provides a smoother rate limiting by considering the average rate of the previous window.

Pros:

Smooths out traffic spikes by averaging the previous window’s rate.
Memory efficient.

Cons:

Works best for non-strict look-back windows.
Approximate rate due to the assumption of even distribution of requests in the previous window.
Experiments by Cloudflare show a very low error rate (0.003% among 400 million requests).