Day 4/21: System Design(Caching)

Ankit Kumar
3 min readFeb 20, 2025

--

Caching is a critical technique in distributed systems that improves performance, reduces database load, and enhances user experience. Large-scale systems like Google, Facebook, and Netflix rely heavily on caching to serve millions of requests efficiently.

What is Caching?

  • Caching is a method of storing frequently accessed data in a fast storage layer (RAM, SSD, or in-memory stores like Redis).
  • Instead of retrieving data from slow databases every time, a cache returns pre-fetched data instantly.
  • It is widely used in databases, APIs, content delivery, and distributed systems.
  • YouTube caches popular videos on regional servers, reducing latency for users in different countries.

Why Caching is Important in Distributed Systems?

In large-scale distributed systems, caching helps in:

  • Reducing Database Load: Avoids unnecessary repeated queries.
  • Improving Response Time: Data is fetched from in-memory storage instead of slow databases.
  • Handling High Traffic Efficiently: Helps in scaling horizontally across multiple servers.
  • Reducing Cost: Lower compute and database usage reduces infrastructure expenses.
  • Enhancing Fault Tolerance: If a database crashes, cached data can still serve users.

Caching in Distributed Systems

1. Distributed Caching

When a system scales, a single cache cannot handle all requests.

  • Use distributed caching, where multiple cache nodes store and serve data.
  • Facebook’s caching layer (TAO) helps deliver fast user profile lookups across multiple data centers.

2. Cache Invalidation Challenges

If cached data becomes outdated, users might get stale or incorrect data.

  • Time-to-Live (TTL): Expire cache after a certain period.
  • Write-Through Cache: Update cache and database at the same time.
  • Event-Driven Cache Invalidation: Cache is updated when new data arrives.
  • Versioning: Store different versions of cached data.

3. Cache Consistency Models

  • Strong Consistency: Cache is always updated with the latest data (expensive).
  • Eventual Consistency: Cached data may be slightly outdated but will sync over time (scalable).
  • E-commerce websites use eventual consistency for product availability while payment systems require strong consistency.

4. Data Partitioning in Caching

Large-scale caching systems split data into partitions for scalability.

Methods of Partitioning:

  • Range-Based Partitioning: Divide cache based on a range (e.g., user ID 1–1000).
  • Hash-Based Partitioning: Assign keys to cache nodes using a hash function.
  • Consistent Hashing: Distributes cache efficiently even when nodes are added or removed.
  • Redis Cluster uses consistent hashing to distribute cache across multiple nodes.

5. Handling Cache Failures in Distributed Systems

  • Cache Replication: Maintain multiple copies of cached data.
  • Failover Mechanisms: If one cache server fails, redirect requests to a backup server.
  • Graceful Degradation: Serve partial results instead of complete failure.
  • Amazon uses cache replication to ensure its recommendation system continues working during cache failures.

6. Hotspot Caching Problem

  • Some data is accessed much more frequently than others, overloading a single cache node.
  • Sharding: Spread high-demand data across multiple cache nodes.
  • Load Balancing: Distribute cache requests across multiple servers.
  • Locality-Sensitive Hashing (LSH): Store related data together to minimize overload.
  • Twitter handles trending hashtags by caching them across multiple data centers instead of a single location.

7. Cache Warm-Up Strategies

When a cache is restarted, it starts empty, leading to slow responses.

  • Preloading Cache: Load frequently used data when starting a cache node.
  • Shadow Traffic: Send a copy of real traffic to fill the cache before making it active.
  • Background Refreshing: Proactively update cache before it’s needed.
  • Netflix preloads cached data for upcoming popular shows to ensure a smooth viewing experience.

8. Advanced Caching Architectures

1. Content Delivery Network (CDN) Caching

  • CDNs store and serve static content (images, videos, JavaScript) closer to users.
  • Reduces latency and offloads traffic from origin servers.
  • Cloudflare, Akamai, and AWS CloudFront improve website load times.

2. Multi-Layered Caching

  • Uses multiple cache levels to optimize performance:
  • L1 Cache (In-Memory Cache): Fastest cache stored in RAM.
  • L2 Cache (Distributed Cache): Stored across multiple servers.
  • L3 Cache (Persistent Storage Cache): Cached data in SSD or disk storage.
  • Google Search uses multi-layered caching to optimize query responses.

3. Hybrid Cache (Write-Through + Cache-Aside)

Choosing between consistency (write-through) and speed (cache-aside).

  • Use both methods together for optimized performance.
  • Stock trading systems need hybrid caching for instant updates and reliability.

Caching Best Practices for Distributed Systems

  1. Use Consistent Hashing to balance cache distribution.
  2. Avoid Stale Data by implementing cache invalidation techniques.
  3. Replicate Cache Data to prevent failure impact.
  4. Use Compression to store more cache data in memory.
  5. Monitor Cache Usage to detect inefficiencies.

Modern systems rely on well-optimized caching strategies to serve millions of users with minimal delay, making caching an essential concept for system design and large-scale architecture.

I’ll be posting daily to stay consistent in both my learning followed by daily pushups. Thank you!

Follow my journey:
Medium: https://ankittk.medium.com/
Instagram: https://www.instagram.com/ankitengram/

--

--

Ankit Kumar
Ankit Kumar

No responses yet