Twitter snowflake id generator
- chapter 7 of System Design Interview – An insider’s guide
- twitter engineering blog
- twitter deprecated this tho, in Scala
Requirement
- from engineering blog
We needed something that could generate tens of thousands of ids per second in a highly available manner. This naturally led us to choose an uncoordinated approach. These ids need to be roughly sortable, meaning that if tweets A and B are posted around the same time, they should have ids in close proximity to one another since this is how we and most Twitter clients sort tweets. Additionally, these numbers have to fit into 64 bits. We’ve been through the painful process of growing the number of bits used to store tweet ids before. It’s unsurprisingly hard to do when you have over 100,000 different codebases involved.
Idea
- Embedded the timestamp into partial bits!
- For the 64 bits
- First bit: reserved
- Timestamp: 41 bits. Milliseconds since the epoch or custom epoch.
- Datacenter ID: 5 bits, which gives us 2 ^ 5 = 32 data centers.
- Machine ID: 5 bits, which gives us 2 ^ 5 = 32 machines per data center.
- Sequence number: 12 bits.
- 12 bits only have 4096 difference, how is it enough? The clever part is that the sequence number is reset to 0 every millisecond! This makes it be able to support 4096 unique id per milliseconds!
- And the timestamp 41 bits gives 2^41 - 1 unique number, which is in ms, and convert this number it’s about 69 years!
Last updated on