Day 9/21: System Design (api rate limiting and throttling)
In large-scale systems, API rate limiting and throttling are critical for preventing abuse, ensuring fair usage, and maintaining system stability. Without these controls, a single user or bot could overload servers, leading to degraded performance or downtime.
Why API Rate Limiting is Important?
- Prevents abuse: Stops excessive requests from users, bots, or attackers.
- Protects server resources: Prevents overloading and keeps response times fast.
- Ensures fair usage: Distributes API access fairly among users.
- Avoids service crashes: Helps manage high traffic and DDoS attacks.
Example:
- Twitter API limits the number of tweets a user can post per hour to prevent spam.
- GitHub API limits requests to avoid overloading its infrastructure.
Types of Rate Limiting Strategies
1. Token Bucket Algorithm (Most Common)
- Each user gets a fixed number of tokens.
- Every request uses up a token.
- Tokens refill at a fixed rate (e.g., 10 tokens per second).
- If tokens run out, requests are blocked or delayed.
Example:
- Cloudflare uses a token bucket to manage API access.
- If you make too many requests too quickly, you have to wait for tokens to refill.
2. Leaky Bucket Algorithm
- Works like a water bucket with a small hole at the bottom.
- Requests enter the bucket, and only a fixed number of requests “leak” out per second.
- If the bucket overflows, extra requests are dropped.
Example:
- Used in network traffic shaping to ensure smooth data flow.
3. Fixed Window Counter
- A counter tracks requests per user in a fixed time window (e.g., 100 requests per minute).
- If a user exceeds the limit, they are blocked until the window resets.
Example:
- Instagram API allows 200 requests per hour per user.
4. Sliding Window Log (More Precise)
- Similar to Fixed Window, but instead of resetting, it keeps a moving log of requests.
- Ensures that rate limits are spread evenly over time instead of resetting suddenly.
Example:
- Used by Stripe API for more precise rate limiting.
Where to Implement Rate Limiting?
1. At API Gateway (Most Common)
- Implemented at API Gateways like NGINX, Kong, or AWS API Gateway before requests reach backend services.
- Scalable and efficient because it blocks excessive requests early.
2. In Load Balancer
- Load balancers like AWS ALB, NGINX, or HAProxy can enforce rate limits before requests hit application servers.
3. In Application Code
- API frameworks (Express.js, Flask, Django, Spring Boot) support rate limiting via middleware.
- Example: Express.js rate-limiter middleware controls how many requests a user can make.
Advanced Rate Limiting Strategies
1. IP-Based Rate Limiting
- Limits requests per IP address.
- Issue: Shared networks (like office WiFi) can hit limits quickly.
- Solution: Use user-based or API key-based limits instead.
2. User-Based Rate Limiting
- Limits requests per authenticated user instead of IP.
- Example: GitHub API limits requests per user token.
3. Adaptive Rate Limiting (AI & ML Based)
- Dynamically adjusts rate limits based on user behavior.
- Example: If an account suddenly makes 1000+ requests in a second, AI flags it as a bot and blocks further access.
4. Distributed Rate Limiting (Using Redis or Cloud)
- In distributed systems, requests may come from multiple servers.
- Solution: Store rate limit counters in Redis or use cloud-based rate limiters (AWS WAF, Cloudflare).
Handling Exceeded Rate Limits
1. HTTP Response Codes
- 429 Too Many Requests → Standard error when exceeding limits.
- 503 Service Unavailable → Used when server-side rate limits are triggered.
2. Backoff Strategies
- Exponential Backoff: Retry requests with increasing wait time (e.g., 1s, 2s, 4s…).
- Jitter (Randomized Delay): Prevents all clients from retrying at the same time.
3. API Clients Handling Limits
- Good API clients should check rate limits and wait before retrying.
- Example: GitHub API returns a RateLimit-Remaining header to tell users how many requests they have left.
Conclusion
API rate limiting is essential for system stability, preventing abuse, and ensuring fair access to resources. Different strategies like token bucket, leaky bucket, and sliding window help balance performance and user experience. Implementing proper rate limits at the API gateway or using distributed rate limiters like Redis ensures scalability in large systems.
Understanding these concepts is crucial for designing high-performance, resilient APIs used by platforms like Twitter, GitHub, and Stripe.
I’ll be posting and stay consistent in both my learning followed by daily pushups. Thank you!
Follow my journey:
Medium: https://ankittk.medium.com/
Instagram: https://www.instagram.com/ankitengram/