Skip to Content

Exponential Backoff

Highlighted from System Design Interview Fundamentals, Chapter 5 

In system design, the downstream could have temporary errors that could go away through time.

  • It could be the server is busy due to a sudden unexpected traffic,
  • it could be a bug that later gets rolled back,
  • it could be network congestion that resulted in timeout.

The client may decide to retry on certain types of errors in anticipation it will be fixed on the next retry.

  • However, a frequent retry could cause a snowball effect since there will be more requests backed up.

Usually this is mitigated with exponential backoff with maximum retry.

  • At some point, perhaps the error can not be fixed. After maximum retry is reached, most likely manual intervention is required as it signals a true downstream failure.

:brain: In a system design interview, it is helpful to discuss what happens if there’s a temporary first or third party downstream failure.

  • A bigger exponential backoff will :weary: cause delay in processing but :whale: eases the downstream by too many client retries.
  • A smaller exponential backoff will :whale: get a quicker response if the service is back alive, but at the expense of :weary: more stress to the downstream service.
Last updated on