CAP Theorem

A plain english introduction to CAP Theorem

The CAP Theorem (henceforth ‘CAP’) says that it is impossible to build an implementation of read-write storage in an asynchronous network that satisfies all of the following three properties:

• Availability – will a request made to the data store always eventually complete?
• Consistency – will all executions of reads and writes seen by all nodes be atomic or linearizably consistent?
• Partition tolerance – the network is allowed to drop any messages.

CAP Theorem
wiki

In theoretical computer science, the CAP theorem, also named Brewer’s theorem after computer scientist Eric Brewer, states that it is impossible for a distributed computer system to simultaneously provide more than two out of three of the following guarantees:[1][2][3]

Consistency Availability Partition tolerance
Every read receives the most recent write or an error Every request receives a (non-error) response – without guarantee that it contains the most recent write The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

Note that consistency as defined in the CAP Theorem is quite different from the consistency guaranteed in ACID database transactions.

No distributed system is safe from network failures, thus network partitioning generally has to be tolerated. In the presence of a partition, one is then left with two options: consistency or availability. When choosing consistency over availability, the system will return an error or a time out if particular information cannot be guaranteed to be up to date due to network partitioning. When choosing availability over consistency, the system will always process the query and try to return the most recent available version of the information, even if it cannot guarantee it is up to date due to network partitioning.[4]

In the absence of network failure – that is, when the distributed system is running normally – both availability and consistency can be satisfied.

CAP is frequently misunderstood as if one had to choose to abandon one of the three guarantees at all times. In fact, the choice is really between consistency and availability for when a partition happens only; at all other times, no trade-off has to be made.[5]

Database systems designed with traditional ACID guarantees in mind such as RDBMS choose consistency over availability, whereas systems designed around the BASE philosophy, common in the NoSQL movement for example, choose availability over consistency

Why is CAP true?

The basic idea is that if a client writes to one side of a partition, any reads that go to the other side of that partition can’t possibly know about the most recent write. Now you’re faced with a choice: do you respond to the reads with potentially stale information, or do you wait (potentially forever) to hear from the other side of the partition and compromise availability?

This is a proof by construction – we demonstrate a single situation where a system cannot be consistent and available. One reason that CAP gets some press is that this constructed scenario is not completely unrealistic. It is not uncommon for a total partition to occur if networking equipment should fail.