Cache Considerations

Highlighted from System Design Interview Fundamentals, Chapter 5

Chapter 6 of System Design Interview – An insider’s guide

also check Cache strategy notes

Cache is a storage that improves query efficiency. Cache is different from a database in that cache is volatile, which means that the data will be lost when the cache goes down.

In reality, cache comes with a high cost of maintainability and complexity.

:brain: Before proposing cache as a solution, it’s important to understand the problem you’re trying to solve.

Improve Latency

In a system design interview, this is a great spot to think about the latency non-functional requirement.
Don’t just claim you want to improve latency without knowing why improving the latency is important. Like in real-life design, you’re not going to add a cache layer on every stack if it doesn’t add much value to the end-user.

Common scenarios of using cache

Reading 1 MB of disk compared to memory is almost 100 times slower.
- For example, if the end-users of your application are fine with 500ms latency, it doesn’t matter if you reduce latency from 5 ms to 0.1 ms. However, if the end-user is hoping for less than 20 ms latency, reducing the latency has a much bigger impact.
you can use cache to materialize various data sources into fields where the client can quickly access, especially if computing the data is compute-intensive.
- The challenge of this pattern is to figure out how to keep the materialized field up to date as the data sources continue to mutate.
Another way cache can improve latency is similar to replication where the architecture can bring the data physically closer to the user.
- One canonical example of this is using the CDN, which is considered a form of a cache.

Improve Throughput

Perhaps in an interview, you calculate a QPS, and you see that a single database can’t handle that QPS.
- You might consider caching as a possible solution to the problem.
- Assume disk and memory server have similar specs, and if both the disk and memory have a single thread with a single core and if memory is 100 times faster than disk, then in theory, memory can process 100 times more work in the same amount of time.

Improve Bandwidth

Similar to replication, you’re able to bring the data source physically closer to the user. By bringing the content closer to the user, you reduce the number of bytes that need to go through the internet and improve the overall bandwidth capacity.
This is one of the main problems CDNs try to solve.

Cache Considerations

There is a non-trivial cost to having a cache that fronts other data sources. We need a database because of its durability, but we don’t always need a cache. If we have a cache, the cache needs to be useful because it costs money.

:brain: After you’ve identified the reasons to use a cache for your particular system design question, you need to think about utilizing the cache.

Few things to consider for using cache

Cache Hit Rate

:black_nib: Cache Hit: The data the request is looking for exists in the cache.

:black_nib: Cache Miss: The data the request is looking for doesn’t exist in the cache.

:black_nib: Cache Hit Rate = Cache Hit / (Cache Hit + Cache Miss).

:brain: If your cache hit rate is low, it might not be worth it. Depending on what you cache and the expected query pattern of the user

it may be that a user only reads it once and never makes that same query ever again. In that case, there’s no point in caching.
The trade-off is why it is important to understand and make assumptions about the query patterns in an interview so the interviewer knows how you’re thinking about it.

What Are You Caching?

Similar to database schema design, it is really important to articulate what you’re going to cache.

Example: Scenario: The candidate is contemplating on using a cache.

:weary: “I am going to put a cache in front of the database and it is going to scale the database because it can handle more traffic and will be faster.”

:whale:

“From the non-functional requirement, we are trying to achieve sub 20 ms latency with more than 100,000 read throughput.

Let me consider a cache since it will help achieve the latency and throughput we’re looking for, and I will discuss the complexities that come with it.

The key is the user_id and the value is the recommendation list, and here are a couple invalidation strategies.”

In first one, the candidate trivialized the complexity with a cache without demonstrating why we need the cache in the first place, what we’re caching exactly, and the trade-off.

In the second one, the candidate has a purpose to use the cache and explains the complexities that come with it.

Example 2: a simple search service that takes in a free-form text with English words.


search(free_text) → [doc_id]

The search result should contain a list of documents with those words. For simplicity, we will assume there are two documents:


Document 1: "System Design Interview"
Document 2: "Coding Interview"


//Option 1: Cache the Search Query
- "My OR Design" → [1]
- "Super OR Interview" → [1, 2]
 
//Option 2: Cache the Text Token
- "Design" → [1]
- "Interview" → [1, 2]

Option 1 is a query key-value look-up which is faster compared to option 2 where you need to break the text into tokens and search each token individually and combine the result.
However, option 1 will have a much worse hit rate because end-users are less likely to search for the same free form text.
The above is an example where details matter a lot. You need to talk about what you’re caching and what the implications are.