Cache strategy

Highlighted from System Design Interview Fundamentals, Chapter 5

Write to Cache

1. Write-Through

For write-through cache, the data is written synchronously to both the cache and the underlying data store.

:whale: The advantage of this solution is the data will, in theory, exist in both the cache and the data store.
:weary: The disadvantage of it is higher write latency and lower availability since synchronous write to 2 sources has a higher failure rate than just 1.
:weary: Also, since the cache and database are two separate sources, doing a synchronous write to two separate sources means the atomicity of the write isn’t guaranteed. —> distributed transaction.

2. Write-Back

In a write-back cache, we first write to the cache and update the database later.

:whale: The advantage of this write-back cache is the latency is lower because you only need to write to cache without writing to the database. The data would be immediately available to be served by the reads.
:weary: The disadvantage of write-back cache is that the data on the cache could be lost before the system persists data on the database. In a system design interview, this could be an option if the write latency needs to be performant and occasional data loss due to cache failure is acceptable.

3. Write-Around

In a write-around cache, the system only writes to the database without writing to the cache.

:whale: The advantage is the data is durable by going to disk first. In addition, you don’t have to worry about writing to cache and having the cache fail on you before it goes to disk.
:weary: The disadvantage is the cache isn’t populated, so if the client accesses the cache for the first time, the latency will be slower.
:weary: Also, there’s additional complexity to update and warm up the cache.
With a write-around cache, you can incorporate a read-through cache:
- Upon read, if there’s a cache miss, you fetch the database and populate the cache.
- Upon updating the key from the database, the system invokes a delete operation onto the cache.
Update to the cache is also possible but is not idempotent.
- For example, if there are two updates (A and B) to the database, it doesn’t guarantee A will hit before B in the cache in a distributed system.
- It will be difficult to guarantee updates ordering if requests happen concurrently.

Cache Invalidation

Designing for a cache is very hard, which is why you need a strong justification.

Once you have a cache in place, you need to think about updating the cache when the underlying source of truth changes.

Let’s take an example where you frequently update a materialized value from value_1, value_2, value_3, and value_4. Let’s make the following assumptions about the values:

Value 1: The value is computed daily via a batch job.
Value 2, 3: The values are persisted by the end-user.
Value 4: The value is derived from another set of values.

Let’s say the cached value is the sum of values 1 to 4.

If any of the values changes, the sum also needs to change.
You need to update the cache value to compute the materialized field again.

Option 1: Have a Listener on the Values

You can have a listener on any of the values, and when any of the values change, you can immediately delete the cache value. The listener is an abstract term since it depends on the data source. It could be a pub-sub queue for post-commit changes.
:whale: The advantage of updating the key is the cache will be fresh since it is immediately updated.
:weary: The disadvantage is that it depends on the flow of data. The architecture could lead to a lot of unnecessary invalidations if the list of dependencies is huge.
:weary: Sometimes it’s expensive to build a listener pipeline just to update a value.

:brain: Also, you can just listen to a subset of the values if the staleness is acceptable for your application. In an interview setting, try to think about the cache value means to the end-user then apply the appropriate strategy.

Option 2: Have a Periodic Job to Calculate the Cache Value

Instead of listening to value changes, you can have a periodic job to compute an updated materialized value. The trade-off is that a more frequent run will result in fresher data.
:whale: The advantage of periodically updating the cache is the simplicity of not needing to listen to a list of dependent values.
:whale: Another advantage is the value will already be populated when the user queries from the cached data.
:weary: The disadvantage is the staleness of data depending on how frequently you run the computation. Also, you will waste a lot of resources if the dependent values aren’t queried frequently.

Option 3: Cache a Lower Layer and Fan-In Read

Instead of optimizing for a super-fast read with a single value, you can trade off a slightly slower read if it satisfies your non-functional requirement.
Take the above example where the system derives cached value from value 1 and value 2. When a client queries the output, you can read from the cached value, value 3, and value 4 and compute on the fly.
Also, as a note, materialized property doesn’t necessarily need to be an in-memory cache value. You could also store the materialized property onto disk as well, and it faces the same invalidation challenges.
For questions like the Netflix recommendation system and generating News Feeds where candidates often have some sort of cache to store the temporal News Feed and Netflix recommendation. If any of the signals change, you should think about how that would impact the cached recommendation and News Feed.

Option 4: Expiration Through TTL

When you insert a cache value, you can have the option to specify a time-to-live (TTL) where the entry expires after the TTL has expired.
The trade-off here is that
- The shorter the TTL, the more frequent the cache miss, resulting in a lookup.
- The longer the TTL, the more chance the entry is stale, and if it’s not frequently accessed, you will be wasting memory on the cache.
For example, the browser caches the DNS address. When the website updates the IP to the DNS servers, the DNS servers don’t send massive invalidation updates to all the browsers. The browser relies on TTL to invalidate the cache.

Cache Eviction

We can’t store everything in the cache because cache space is expensive and limited. Cache eviction a heuristic-based question concerning query patterns and budget and there’s no right answer here.

What you’re optimizing for is limiting the amount of memory used in your cache cluster while maintaining a reasonable cache hit rate.
If you evict a commonly queried key, you might have worse performance on average due to a higher cache miss rate by needing to hit the disk with higher latency.

Ask the interviewer and make assumptions about the query patterns, then pick the most appropriate strategy. The purpose is to let the interviewer know about your thought process and not your ability to recite as many strategies as possible.

Common eviction strategies

Least Recently Used (LRU)

Whenever you add, update, or access a cache value, the cache considers the values most recently used. Values that haven’t been updated and accessed for a while will have a higher chance of eviction.
This pattern should be intuitive since data that hasn’t been accessed recently isn’t likely to be accessed.
But it isn’t always true since it’s possible a key query is cyclical and accessed for a short duration and doesn’t come back until much later.

Least Frequently Used (LFU)

For each cache value, count the number of times the key is updated or accessed. The ones that are least accessed are more likely to be evicted.
This pattern makes sense since frequently accessed keys should continue to be accessed, but it isn’t always the case.
If a stock has a lot of volatility and volume and suddenly becomes quiet, that stock’s cache hit rate will be lower, even if it was just frequently accessed.

Custom Eviction

There are more eviction strategies, and they all require you to think about the query pattern.

For example, you should use a heuristic to predict the cache hit rate for future queries.
However, you might already know what the future looks like and pre-optimize for the future in your application.
For example, a stadium hosts a concert once a month, and the traffic will be quiet until the day of the concert. Therefore, you may not want to evict that data the day before the concert, and evict it after the concert instead.

Failure Scenario and Redundancy

The cache may occasionally crash like everything else. However, unlike databases, when a cache server crashes, all data is lost.

When the data is lost, and requests hit the cache, the requests may cause a thundering-herd to the database and bring the database down. There are various options for handling failure scenarios.

:brain: In a system design interview, you need to think about how each option will impact the requirements of the question you’re designing. - Failure scenarios are often a good deep dive discussion topic to bring up during the interview. It’s important to think about the options and see how it impacts the end-user experience.

Option 1: Periodic Snapshot

The cache can periodically create a backup file of the data in the current cache. When the cache goes down, the cache server can use the backup file to recreate the cache.

:whale: The advantage of creating periodic snapshots is the writes are faster without going to disk. In the event of a cache failure, it still has backup files to recreate a point-in-time snapshot.
:weary: Depending on the frequency of the backup, the disadvantage is the data might be outdated.
:weary: Also, it takes time to uncompress the files and recreate the snapshot in time.

:brain: Depending on your requirements, an out-of-date snapshot might be acceptable. Frequency of snapshot makes a good trade-off discussion with the interviewer between frequency against the freshness requirement.

Option 2: Write-Ahead Log (WAL)

Before writing to the cache, you write to a write-ahead log (WAL) to disk. Since WAL is append-only, the operation is fast.

:whale: The advantage of WAL is the cache will have the most up-to-date records. In the event of a cache failure, we can replay from the last checkpoint using the WAL.
:weary: The disadvantage of WAL is the write will be slower because the cache update now requires the overhead of appending to WAL. Depending on the length of the WAL, recreating the cache may take some time to replay the logs.

:brain: You can periodically create a snapshot so you don’t have to replay WAL from the beginning. You can just replay from the latest snapshot.

Option 3: No Backup

The data is so transient for some systems that by the time the server restarts and recreates with the backup, the data is outdated.

A good example is the driver location update for a ride-sharing service, where the drivers are constantly moving.
You can just depend on the next update for the latest data instead of recreating from backup.

Option 4: Replication

Cache is a data source like a database. You can have replication for cache, just like a database.

The pros and cons of cache replication are similar to database replication.

:brain: In a system design interview, if a cache server goes down, you can try to read from another replica to serve the read requests. Also, when the leader is fixed from failure, you can use the read replicas to rebuild the cache leader.

Data Structure for cache

People typically think of a key-value store when people think about cache, but it doesn’t have to be. Cache could be some in-memory data structure hosted on a server.

:whale: One advantage of an in-memory data structure is that it is flexible without fitting into a database structure.

For example, a trie used in a type-ahead implementation is a cache.
A quadtree used in location-based system design questions is a cache.

As you use those in-memory data structures in an interview, you should also consider the same set of considerations for a cache.

For example, if the trie is in-memory, what happens if it goes down? How would you rebuild it?

:rotating_light: A common mistake candidates make for cache is thinking it improves the bandwidth directly.

Streaming YouTube or Netflix videos to the user is usually network bound. Having a cache server over the database in the same data center isn’t necessarily going to improve the viewing experience if the network is the bottleneck.
That’s why people move the data source closer to the user, like CDNs.

Thundering Herd

“the thundering herd problem occurs when a large number of processes or threads waiting for an event are awoken when that event occurs, but only one process is able to handle the event. When the processes wake up, they will each try to handle the event, but only one will win. All processes will compete for resources, possibly freezing the computer, until the herd is calmed down again” - wiki

The purpose of a cache is to scale the number of clients wanting the same data. The question you might have is what happens when the cache is cold, meaning it doesn’t have the data the client is looking for?

Typically, engineers solve it with a read-through cache.

But what happens if there’s a lot of requests hitting the cache simultaneously with cache miss and causing a thundering herd to the underlying system and bringing the system down?

One technique to consider is cache blocking, where only one request is responsible for fetching from the main data source while all the other requests wait.

The problem is, what happens when that one request fails to come back to warm up the cache? The rest of the requests will continue waiting.

One way to address this is by having a timeout and allowing the next request to fetch from the data source.

:brain: A trade-off discussion that you can talk about in the interviewer is the longer the timeout, the more chance the requests have to wait if the response never comes back. The shorter the timeout, the more requests that will hit the data source.