Failure/Error handling

wiki

chapter 6 of System Design Interview – An insider’s guide

Gossip protocol

A decentralized failure detection methods
The concept of gossip communication can be illustrated by the analogy of office workers spreading rumors
Computer systems typically implement this type of protocol with a form of random “peer selection”: with a given frequency, each machine picks another machine at random and shares any rumors.
Gossip protocols are just one class among many classes of networking protocols. Each protocols differs in their details and performance properties but similar at the level of the guarantees offered to users.

How?

Each node periodically sends heartbeats to a set of random nodes, which in turn propagate to another set of nodes.
Once nodes receive heartbeats, membership list is updated to the latest info.
If the heartbeat for a node X has not increased for more than predefined periods from node Y’s perspective, plus once other nodes confirm that heartbeat counter of node X has not been updated for a long time, node X is marked down, and this information is propagated to other nodes.

After failures have been detected through the gossip protocol, the system needs to deploy certain mechanisms to ensure availability.

Sloppy quorum

Sloppy quorum is used to improve availability. Instead of enforcing the quorum requirement, the system chooses the first W healthy servers for writes and first R healthy servers for reads on the hash ring. Offline servers are ignored. (Hence “sloppy”)
If a server is unavailable due to network or server failures, another server will process requests temporarily.
When the down server is up, changes will be pushed back to achieve data consistency. This process is called hinted handoff, which is used to handle temporary failures.
What if a replica is permanently unavailable?

“Anti-entropy protocol” keeps replicas in sync, which involves comparing each piece of data on replicas and updating each replica to the newest version.
A Merkle tree is used for inconsistency detection and minimizing the amount of data transferred.

Merkle tree

“A hash tree or Merkle tree is a tree in which every non-leaf node is labeled with the hash of the labels or values (in case of leaves) of its child nodes.
To compare two Merkle trees, start by comparing the root hashes. If root hashes match, both servers have the same data. If root hashes disagree, then the left child hashes are compared followed by right child hashes.
You can traverse the tree to find which buckets are not synchronized and synchronize those buckets only.
Using Merkle trees, the amount of data needed to be synchronized is proportional to the differences between the two replicas, and not the amount of data they contain.