Day 4/21: System Design(Caching)
Caching is a critical technique in distributed systems that improves performance, reduces database load, and enhances user experience. Large-scale systems like Google, Facebook, and Netflix rely heavily on caching to serve millions of requests efficiently.
What is Caching?
- Caching is a method of storing frequently accessed data in a fast storage layer (RAM, SSD, or in-memory stores like Redis).
- Instead of retrieving data from slow databases every time, a cache returns pre-fetched data instantly.
- It is widely used in databases, APIs, content delivery, and distributed systems.
- YouTube caches popular videos on regional servers, reducing latency for users in different countries.
Why Caching is Important in Distributed Systems?
In large-scale distributed systems, caching helps in:
- Reducing Database Load: Avoids unnecessary repeated queries.
- Improving Response Time: Data is fetched from in-memory storage instead of slow databases.
- Handling High Traffic Efficiently: Helps in scaling horizontally across multiple servers.
- Reducing Cost: Lower compute and database usage reduces infrastructure expenses.
- Enhancing Fault Tolerance: If a database crashes, cached data can still serve users.
Caching in Distributed Systems
1. Distributed Caching
When a system scales, a single cache cannot handle all requests.
- Use distributed caching, where multiple cache nodes store and serve data.
- Facebook’s caching layer (TAO) helps deliver fast user profile lookups across multiple data centers.
2. Cache Invalidation Challenges
If cached data becomes outdated, users might get stale or incorrect data.
- Time-to-Live (TTL): Expire cache after a certain period.
- Write-Through Cache: Update cache and database at the same time.
- Event-Driven Cache Invalidation: Cache is updated when new data arrives.
- Versioning: Store different versions of cached data.
3. Cache Consistency Models
- Strong Consistency: Cache is always updated with the latest data (expensive).
- Eventual Consistency: Cached data may be slightly outdated but will sync over time (scalable).
- E-commerce websites use eventual consistency for product availability while payment systems require strong consistency.
4. Data Partitioning in Caching
Large-scale caching systems split data into partitions for scalability.
Methods of Partitioning:
- Range-Based Partitioning: Divide cache based on a range (e.g., user ID 1–1000).
- Hash-Based Partitioning: Assign keys to cache nodes using a hash function.
- Consistent Hashing: Distributes cache efficiently even when nodes are added or removed.
- Redis Cluster uses consistent hashing to distribute cache across multiple nodes.
5. Handling Cache Failures in Distributed Systems
- Cache Replication: Maintain multiple copies of cached data.
- Failover Mechanisms: If one cache server fails, redirect requests to a backup server.
- Graceful Degradation: Serve partial results instead of complete failure.
- Amazon uses cache replication to ensure its recommendation system continues working during cache failures.
6. Hotspot Caching Problem
- Some data is accessed much more frequently than others, overloading a single cache node.
- Sharding: Spread high-demand data across multiple cache nodes.
- Load Balancing: Distribute cache requests across multiple servers.
- Locality-Sensitive Hashing (LSH): Store related data together to minimize overload.
- Twitter handles trending hashtags by caching them across multiple data centers instead of a single location.
7. Cache Warm-Up Strategies
When a cache is restarted, it starts empty, leading to slow responses.
- Preloading Cache: Load frequently used data when starting a cache node.
- Shadow Traffic: Send a copy of real traffic to fill the cache before making it active.
- Background Refreshing: Proactively update cache before it’s needed.
- Netflix preloads cached data for upcoming popular shows to ensure a smooth viewing experience.
8. Advanced Caching Architectures
1. Content Delivery Network (CDN) Caching
- CDNs store and serve static content (images, videos, JavaScript) closer to users.
- Reduces latency and offloads traffic from origin servers.
- Cloudflare, Akamai, and AWS CloudFront improve website load times.
2. Multi-Layered Caching
- Uses multiple cache levels to optimize performance:
- L1 Cache (In-Memory Cache): Fastest cache stored in RAM.
- L2 Cache (Distributed Cache): Stored across multiple servers.
- L3 Cache (Persistent Storage Cache): Cached data in SSD or disk storage.
- Google Search uses multi-layered caching to optimize query responses.
3. Hybrid Cache (Write-Through + Cache-Aside)
Choosing between consistency (write-through) and speed (cache-aside).
- Use both methods together for optimized performance.
- Stock trading systems need hybrid caching for instant updates and reliability.
Caching Best Practices for Distributed Systems
- Use Consistent Hashing to balance cache distribution.
- Avoid Stale Data by implementing cache invalidation techniques.
- Replicate Cache Data to prevent failure impact.
- Use Compression to store more cache data in memory.
- Monitor Cache Usage to detect inefficiencies.
Modern systems rely on well-optimized caching strategies to serve millions of users with minimal delay, making caching an essential concept for system design and large-scale architecture.
I’ll be posting daily to stay consistent in both my learning followed by daily pushups. Thank you!
Follow my journey:
Medium: https://ankittk.medium.com/
Instagram: https://www.instagram.com/ankitengram/