Exponential Backoff
Highlighted from System Design Interview Fundamentals, Chapter 5
In system design, the downstream could have temporary errors that could go away through time.
- It could be the server is busy due to a sudden unexpected traffic,
- it could be a bug that later gets rolled back,
- it could be network congestion that resulted in timeout.
The client may decide to retry on certain types of errors in anticipation it will be fixed on the next retry.
- However, a frequent retry could cause a snowball effect since there will be more requests backed up.
Usually this is mitigated with exponential backoff with maximum retry.
- At some point, perhaps the error can not be fixed. After maximum retry is reached, most likely manual intervention is required as it signals a true downstream failure.
:brain: In a system design interview, it is helpful to discuss what happens if there’s a temporary first or third party downstream failure.
- A bigger exponential backoff will :weary: cause delay in processing but :whale: eases the downstream by too many client retries.
- A smaller exponential backoff will :whale: get a quicker response if the service is back alive, but at the expense of :weary: more stress to the downstream service.
Last updated on