Unique ID generation in distributed systems

Solution at hand –

1. Using UUID – Index size is a key consideration if uuid is used as index. Some UUID types are completely random and have no natural sort.
Pro – Each application thread generates IDs independently, minimizing points of failure and contention for ID generation. If you use a timestamp as the first component of the ID, the IDs remain time-sortable.
Cons – Generally requires more storage space (96 bits or higher) to make reasonable uniqueness guarantees. Some UUID types are completely random and have no natural sort.

2. Using a Ticket Server – This is one of the very famous approaches where you can simply maintain a table to store just the latest generated ID and every time a node asks for ID they make a ‘select for update’ on this table, update the value with a incremented value and use the selected value as the next ID.
This approach is resilient and distributed in nature. The ID generation can be separated from the actual data store. However there is a risk of Single Point of Failure as all the nodes rely on this table for the next ID and if this service goes down your app may stop functioning properly.
Additionally  MySQL shards are built as master-master replicant pairs for resiliency. This means we need to be able to guarantee uniqueness within a shard in order to avoid key collisions. We’d love to go on using MySQL auto-incrementing columns for primary keys like everyone else, but MySQL can’t guarantee uniqueness across physical and logical databases.
Also this approach might not be suitable in case where the writes per second are very high because that will overload the Ticket Server and also degrade your app performance.  Flickr Ticketing service
Cons
Can eventually become a write bottleneck (though Flickr reports that, even at huge scale, it’s not an issue).
An additional couple of machines (or EC2 instances) to admin.
If using a single DB, becomes single point of failure. If using multiple DBs, can no longer guarantee that they are sortable over time.

3. Twitter Snowflake –
Snowflake is a service used to generate unique IDs for objects within Twitter (Tweets, Direct Messages, Users, Collections, Lists etc.). These IDs are unique 64-bit unsigned integers, which are based on time, instead of being sequential. The full ID is composed of a timestamp, a worker number, and a sequence number.

This approach tackles the problem of SPOF as well as the latency issues.
– Here the ID is generated as a concatenation of timestamp, node ID and Sequence number. 41 bits are allotted to timestamp. This also allows the higher bit to be sorted and so allows somewhat sorted data.
– Node ID can be assigned to any physical node when during its startup and it can be retrieved from a shared cache in the cluster. Node ID can occupy next 10 bits. This number are coordinated by Zookeeper.
– The Sequence number can be a monotonically increasing 12 bit number.
Twitter has Snowflake service which is open source.
Pros:
– Snowflake IDs are 64-bits, half the size of a UUID
– Can use time as first component and remain sortable
– Distributed system that can survive nodes dying
Cons: 
Would introduce additional complexity and more ‘moving parts’ (ZooKeeper, Snowflake servers) into our architecture

4. Instagram built there own stack.  Before starting out, we listed out what features were essential in our system:

  1. * Generated IDs should be sortable by time
  2. IDs should ideally be 64 bits (for smaller indexes, and better storage in systems like Redis)
    The basis for this is the initial bits(40) represent timestamp and the rest of the bit is formed based on other info like – node-id, machine-id.

Each of our IDs consists of:
–   41 bits for time in milliseconds (gives us 41 years of IDs with a custom epoch)
–  13 bits that represent the logical shard ID(can be used id )
–  10 bits that represent an auto-incrementing sequence, modulus 1024. This means we can generate 1024 IDs, per shard, per millisecond

 This looks a lot like Twitter snowflake approach.




Ticket Servers: Distributed Unique Primary Keys on the Cheap [Link]

Ticket servers aren’t inherently interesting, but they’re an important building block at Flickr. They are core to topics we’ll be talking about later, like sharding and master-master. Ticket servers give us globally (Flickr-wide) unique integers to serve as primary keys in our distributed setup.

Why?

Sharding (aka data partioning) is how we scale Flickr’s datastore. Instead of storing all our data on one really big database, we have lots of databases, each with some of the data, and spread the load between them. Sometimes we need to migrate data between databases, so we need our primary keys to be globally unique.
Additionally our MySQL shards are built as master-master replicant pairs for resiliency. This means we need to be able to guarantee uniqueness within a shard in order to avoid key collisions. We’d love to go on using MySQL auto-incrementing columns for primary keys like everyone else, but MySQL can’t guarantee uniqueness across physical and logical databases.

GUIDs?

Given the need for globally unique ids the obvious question is, why not use GUIDs? Mostly because GUIDs are big, and they index badly in MySQL. One of the ways we keep MySQL fast is we index everything we want to query on, and we only query on indexes. So index size is a key consideration. If you can’t keep your indexes in memory, you can’t keep your database fast. Additionally ticket servers give us sequentiality which has some really nice properties including making reporting and debugging more straightforward, and enabling some caching hacks.

Consistent Hashing?

Some projects like Amazon’s Dynamo provide a consistent hashing ring on top of the datastore to handle the GUID/sharding issue. This is better suited for write-cheap environments (e.g. LSMTs), while MySQL is optimized for fast random reads.

Centralizing Auto-Increments

If we can’t make MySQL auto-increments work across multiple databases, what if we just used one database? If we inserted a new row into this one database every time someone uploaded a photo we could then just use the auto-incrementing ID from that table as the primary key for all of our databases.

Of course at 60+ photos a second that table is going to get pretty big. We can get rid of all the extra data about the photo, and just have the ID in the centralized database. Even then the table gets unmanageably big quickly. And there are comments, and favorites, and group postings, and tags, and so on, and those all need IDs too.

REPLACE INTO

A little over a decade ago MySQL shipped with a non-standard extension to the ANSI SQL spec, “REPLACE INTO”. Later “INSERT ON DUPLICATE KEY UPDATE” came along and solved the original problem much better. However REPLACE INTO is still supported.

REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted.

This allows us to atomically update in a place a single row in a database, and get a new auto-incremented primary ID.

Putting It All Together

A Flickr ticket server is a dedicated database server, with a single database on it, and in that database there are tables like Tickets32 for 32-bit IDs, and Tickets64 for 64-bit IDs.

The Tickets64 schema looks like:

CREATE TABLE `Tickets64` (
  `id` bigint(20) unsigned NOT NULL auto_increment,
  `stub` char(1) NOT NULL default '',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `stub` (`stub`)
) ENGINE=MyISAM

SELECT * from Tickets64 returns a single row that looks something like:

+-------------------+------+
| id                | stub |
+-------------------+------+
| 72157623227190423 |    a |
+-------------------+------+

When I need a new globally unique 64-bit ID I issue the following SQL:

REPLACE INTO Tickets64 (stub) VALUES ('a');
SELECT LAST_INSERT_ID();

SPOFs

You really really don’t know want provisioning your IDs to be a single point of failure. We achieve “high availability” by running two ticket servers. At this write/update volume replicating between the boxes would be problematic, and locking would kill the performance of the site. We divide responsibility between the two boxes by dividing the ID space down the middle, evens and odds, using:

TicketServer1:
auto-increment-increment = 2
auto-increment-offset = 1

TicketServer2:
auto-increment-increment = 2
auto-increment-offset = 2

We round robin between the two servers to load balance and deal with down time. The sides do drift a bit out of sync, I think we have a few hundred thousand more odd number objects then evenly numbered objects at the moment, but this hurts no one.

Advertisements

3 thoughts on “Unique ID generation in distributed systems

  1. When using the Node ID for Snowflake or MAC for Boundary’s Flake. You’re relying on this being unique that ensures each ID is unique. If there’s a zookeeper system, it just means something else is coordinating the original uniqueness. With MAC address you rely on the assumption that the MAC addresses are unique within a cluster. Coordination between disparate systems can be minimised if we can always assume that the underlying system starts off as unique as well. In Boundary’s Erlang flake, they expect that you only run 1 service per network interface. But if you do more than 1, then you need something to coordinate the uniqueness between them.

    Like

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s