Queue

Highlighted from System Design Interview Fundamentals, Chapter 5

Most likely, in asynchronous processing, you need some sort of queue to ensure the events are waiting in line to be processed.

:thinking: Having a queue is important because the processing speed can sometimes be slower than the producing speed.

For example, if there’s a thundering herd event, the event may cause the downstream to break or cause the events to be dropped.

Just because you have a queue doesn’t make the flow asynchronous.

The caller can wait for the result of the queue and make it synchronous.
Depending on the queue type, waiting for the response of queue post-processing is unnecessary and slow.

Here we will focus on when we need an asynchronous workflow, and how we can use a queue to help with reliability and durability.

In an interview, going deeper about the queue of your choice and the trade-offs might be an interesting deep dive area.

Message Queue

In a message queue like Kafka, clients insert events as logs for a given topic.

A topic is a category that you define.
You can partition each Kafka topic into multiple partitions, and each partition maintains its message ordering.

If you need global ordering, having multiple partitions will remove that global ordering guarantee.

For each topic partition, there is a consumer.
Each consumer maintains a durable offset as message batches are processed and can resume from the offset in case of failure.
Also, Kafka event logs have retention periods for durability if some consumers still need them within that period.

:whale The advantage of a message queue is that it can handle high throughput by horizontally scaling and adding partitions.

:weary: However, the downside is the complexity of sharding and durability.

In practice, maintaining a durable message queue has a higher maintenance cost.

It’s unnecessarily complex for events like a ridesharing request where you only need to process it once.
By the time you reprocess the event, the event is already outdated.

In a system design interview, if you have a system where it takes in a very high write rate of events like metrics and logs, the message queue might be a good candidate to hold the data while the downstream does aggregation.

Generally, consumers are partition aware to allow the application to design for throughput for a message queue.

This adds additional complexity for the application developer to think about which partitions to connect to.

When you do consider a message queue like Kafka, it’s worth considering the following discussion points if it’s relevant to your design interview question:

How do you scale horizontally?
What is your sharding scheme and does it lead to hotspots?
How do you query the result after the consumers finish processing?
What’s the replication scheme for the partitions?
How does rebalancing work when you add or remove partitions?

You can apply the same knowledge in the sharding and replications chapters.

Do you need global ordering? Can you sacrifice throughput by having one partition to guarantee global ordering?
What is the retention period for the partition? The longer you store, the more storage you will need, but the safer it is in case a consumer needs to replay from it. The default for Kafka is 7 days but you can use it as a trade-off discussion.

Each consumer maintains its offset per topic partition.
In case of consumer failure in the process, the consumer can replay from the last offset.

Examples: Kafka, Amazon Kinesis

Publisher Subscriber

In a pub/sub model, the subscriber scribes to events based on configuration.
Once the producer publishes an event to a message exchange, the exchange will forward the event into each subscriber’s queue based on their subscription configuration.
Once the consumer successfully processes the event, the consumer will send an acknowledgment to the queue, and the queue will remove the event.

This model has the advantage that each subscriber can behave independently from other subscribers.

Unlike a message queue like Kafka, there’s no retention period for the message.

Once the subscriber pulls the event out, the event is removed.
Not having a retention period eliminates the complexity and maintenance of durability of the message queue.
Once the queue removes the message, it’s up to the subscriber to persist the processed data.

How you deal with the durability post queue processing is no longer the queue’s responsibility.

Pub/sub is useful where you intend the event to be consumed and removed immediately.
For example, if you’re designing a ridesharing service that takes in a message request, that request should be processed as soon as possible and be done with. There’s no need to replay it since 10 minutes later, that request is no longer relevant.

Examples: RabbitMQ, Amazon SQS

Delivery Semantics and Guarantees

Delivery semantics and guarantees are important topics to discuss in a system design interview because your selection can impact the performance and the user experience.
For example, imagine you’re designing a queue to process payments. Delivering more than once will cause an over-charge, and failing to deliver will result in undercharging.

At-Most-Once

The queue delivers the message at-most-once. You will choose this semantic when the system would rather lose a message than deliver the message twice.

Producer:

This producer can achieve this by firing and forgetting into the queue’s acknowledgment.
If the message never made it to the consumer or the consumer fails to consume the event, the message is lost forever.

Consumer:

On the consumer side, the consumer can poll the event and remove the message from the queue.
In the event of consumer failure, the system loses the message forever.

Trade Off:

At-most-once semantics results in better throughput because it has a lower overhead by firing and forgetting the events.
The downside is you will occasionally lose events.

Use Cases:

Metrics collection like server health status is usually a good use case where occasionally losing metrics data isn’t the end of the world, since there is already a large volume.
Driver location update in a ridesharing application is another possible use case.
Just because you drop a driver’s location at a given point in time doesn’t mean the whole system will stop working. Instead, you will get another update in 5-10 seconds.

At-Least-Once

In at-least-once semantics, the queue may contain already processed events.
Having already processed events can happen when the consumer fails to acknowledge the queue and remove the event.
On subsequent retries, the consumer receives the same message again.

Producer:

When the producer sends an event to the queue, if the producer doesn’t acknowledge and resend the data, it could duplicate data in the queue.

Consumer:

When the consumer polls from the queue and commits the change but fails to acknowledge the queue, the event will be processed again when the consumer retries.

Trade Off

At-least-once semantics guarantees the event won’t be lost but may process the event multiple times. The throughput will be slightly worse than at-most-once because it requires an acknowledgment for both the producer and consumer.

Use Cases

A periodic job that parses a file and saves it to the database. Even if the job runs twice, the same file will be parsed and overwrites the record by file_id.
The reason is that this parsing and writing is idempotent based on file_id.
For notifications, sometimes it may be ok to have the occasional minor end-user annoyance of the same notification in favor of better throughput than exactly once. In this use case, you don’t even need to worry about idempotency if your product is ok with the occasional duplicate.

Exactly-Once

Exactly-once semantics ensure the message is processed exactly-once.
It does so by having an idempotent key inside of the queue where if you try to insert multiple events into the queue, the queue will dedupe the messages based on the idempotent key.
As a result, the throughput is much worse with the need to dedupe the idempotent key in favor of exactly-once guarantee
For queues that don’t support the exactly-once guarantee, you can use at-leastonce delivery and handle the idempotency on the application level by having your data store for duplicate events.

Trade Off

The throughput will be the worst here because of the need for an idempotency check but with the simplicity of exact guarantee as most people expect a queue to behave.

Use Case

Suppose you need to send a payment request to a third party that does not handle idempotency, you need to ensure the same payment doesn’t get charged twice. If the downstream processing is costly to call and reprocess, you may want to ensure each event is only handled once instead of wasting resources to reprocess.

Custom Queue

Queues can take many forms. A queue doesn’t have to be a pub/sub or message queue necessarily. Here are some examples:

Durable Priority Queue

You can implement a queue using a relational database table.
For example, you might have a priority queue table of column event_id and priority_number.
You index on the priority_num and fetch for the events with the highest priority number to process.
Or you simply have a table of event_id and status, and you grab all the events in the pending stage.

In-Memory FIFO Queue

The queue can also be an in-memory Java concurrent queue that’s on the application server.
Just make sure in a system design interview, you develop a queue solution that makes the most sense with your application.

In-Memory Priority Queue

The following is the max heap data structure where the root has the event with the highest priority.
In a system design interview, you need to worry about concurrent threads accessing the heap and the throughput implication.
Also, since it is in-memory, you need to worry about what happens when the queue goes down.