Service Discovery and Request Routing

Load Balancing

Use a load balancer when you need to distribute the load to a set of servers.
Unless the load balancer significantly impacts your end design you don’t need to spend too much time here, since it’s generic to all questions.
There are various load balancing algorithms. Either you can round robin or monitor the hosts for health (CPU, memory, bandwidth) and forward it to the most underutilized one.

A shard discovery tells you which shard a request belongs to and the shard node to forward the request to.
Shard is usually a logical concept since there can be many nodes within a shard.
Usually, the mapping is stored in a data store like a ZooKeeper.
there are 2 reasonable approaches to maintain the mapping

Option 1: Clients Call A Central Service

For a request, the client will hit a partition-aware service like a ZooKeeper itself.
The advantage is simplicity since you only have a central service that manages the mapping.
The downside could be additional latency to make an interprocess call from the client to a separate service.
Also, having a central service makes it difficult to scale since all requests will need to hit the service.

Option 2: Client is Node Aware

Each client can maintain the node it needs to send to. It does this by requesting for the node when the service starts up.
The advantage is improved latency since you don’t need to call the shard discovery service every time,
The disadvantage is
- the complexity of updating the client mapping every time the configuration is changed.
- the consistency may suffer if the push to the client upon config update is delayed.
In a system design interview, this may be an interesting discussion if your system is latency-sensitive.
You can consider option 2 to improve that latency at the cost of occasionally hitting the wrong node if the client mapping isn’t successfully updated.