Unique ID generation in distributed systems

Solution at hand –

1. Using UUID – Index size is a key consideration if uuid is used as index. Some UUID types are completely random and have no natural sort.

2. Using a Ticket Server – This is one of the very famous approaches where you can simply maintain a table to store just the latest generated ID and every time a node asks for ID they make a ‘select for update’ on this table, update the value with a incremented value and use the selected value as the next ID.
This approach is resilient and distributed in nature. The ID generation can be separated from the actual data store. However there is a risk of Single Point of Failure as all the nodes rely on this table for the next ID and if this service goes down your app may stop functioning properly.
Additionally  MySQL shards are built as master-master replicant pairs for resiliency. This means we need to be able to guarantee uniqueness within a shard in order to avoid key collisions. We’d love to go on using MySQL auto-incrementing columns for primary keys like everyone else, but MySQL can’t guarantee uniqueness across physical and logical databases.
Also this approach might not be suitable in case where the writes per second are very high because that will overload the Ticket Server and also degrade your app performance.  Flickr Ticketing service

3. Twitter Snowflake – This approach tackles the problem of SPOF as well as the latency issues. Here the ID is generated as a concatenation of timestamp, node ID and Sequence number. The timestamp is the System time measured as number of millisec since EPOCH. 41 bits are allotted to timestamp.
Node ID can be assigned to any physical node when during its startup and it can be retrieved from a shared cache in the cluster. Node ID can occupy next 10 bits.
And the Sequence number can be a monotonically increasing 12 bit number.
Twitter has Snowflake service which is open source.
Pros:
– Snowflake IDs are 64-bits, half the size of a UUID
– Can use time as first component and remain sortable
– Distributed system that can survive nodes dying
Cons: Would introduce additional complexity and more ‘moving parts’ (ZooKeeper, Snowflake servers) into our architecture

4. Instagram built there own stack.  Before starting out, we listed out what features were essential in our system:

  1. * Generated IDs should be sortable by time
  2. IDs should ideally be 64 bits (for smaller indexes, and better storage in systems like Redis)
    The basis for this is the initial bits(40) represent timestamp and the rest of the bit is formed based on other info like – node-id, machine-id.
Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s