What’s the difference between sharding DB tables and partitioning them?
I take sharding to mean the partitioning of a table over multiple machines (over multiple database instances in a distributed database system – each having it’s own replication node setup), whereas partitioning may just refer to the splitting up of a table on the same machine. So a table that is sharded has been partitioned, but a table that has been partitioned has not necessarily been sharded.
Sharding is a method to distribute data across multiple different servers. MongoDB achieves horizontal scalability through sharding.
Sharding is the equivalent of “horizontal partitioning”. When you shard a database, you create replica’s of the schema, and then divide what data is stored in each shard based on a shard key. For example, I might shard my customer database using CustomerId as a shard key – I’d store ranges 0-10000 in one shard and 10001-20000 in a different shard. When choosing a shard key, the DBA will typically look at data-access patterns and space issues to ensure that they are distributing load and space across shards evenly.
A good article from InterviewBit describing the case of sharding. It also explains the case of how to distribute the load evenly across multiple shard with fault tolerance.
Storage with every machine : 10TB
Q: What is the minimum number of machines required to store the data?
As, such H%(S+1) changes for every single key causing us to relocate each and every key in our data store. This is extremely expensive and highly undesirable.
What happens if we need to add another shard ? Or what if one of the shard goes down and we need to re-distribute the data among remaining shards?
Similarily, there is a problem of cascading failure when a shard goes down.
Modified consistent hashing
What if we slightly changed the ring so that instead of one copy per shard, now we have multiple copies of the same shard spread over the ring.
- Case when new shard is added :
- Case when a shard goes down : No cascading failure. Yay!