PACELC theorem

In any distributed system, different kinds of failure can happen like network loss or device failure in a machine etc. So What are the guiding principles says about the desirable balance between various characteristics of distributed system.

CAP theorem

According to this theorem, its impossible for distributed system to provide all these three characteristics at the same time.

C – Consistency: All nodes in distributed system should have same data at the same time. We can achieve consistency by updating every node with latest write action so user can read up-to date data.
A – Availability: All nodes should be available for reading of data and system should be available.
P – Partition tolerance: There is network failure and because of that, two nodes are unable to communicate with each other.

Read more about CAP theorem from here.

According to this theorem, we can achieve only two properties out of these three.

Consistency & Availability: If there is no network failure, then definitely as all nodes are available for reading and all nodes will always be updated with latest data. Hence, user will always get the consistent data through out all available nodes.

Availability & Partition tolerance: If there is a network failure between two nodes of distributed system, nodes won’t have consistent data as two nodes cannot communicate with each other and one of the node does not have up-to-date data. Example: When user updates his profile, write action occurs on one of the node. However, due to network failure, other nodes of system cannot receive recent data and whenever that node will serve the request, user won’t get consistent data.

Consistency & Partition tolerance: To achieve consistency in case of network failure, Node which has out of date data should stop serving requests but then there won’t be 100% availability of all nodes.

What is missing in the CAP theorem?

One place where the CAP theorem is silent is what happens when there is no network partition? What choices does a distributed system have when there is no partition?

PACELC theorem

The PACELC theorem states that in a system that replicates data:

  • if there is a partition (‘P’), a distributed system can tradeoff between availability and consistency (i.e., ‘A’ and ‘C’);
  • else (‘E’), when the system is running normally in the absence of partitions, the system can tradeoff between latency (‘L’) and consistency (‘C’).
PACELC theorem

In this theorem, first part is the CAP theorem however ELC is the extension of CAP. The whole thesis assumes we maintain high availability by replication. So, when there is a failure, CAP theorem prevails. But if not, we still have to consider the tradeoff between consistency and latency of a replicated system.

Example

MongoDB: can be considered PA/EC (default configuration): MongoDB have primary and secondary replicas. By default, all writes and reads are performed on the primary. As all replication is done asynchronously (from primary to secondaries), when there is a network partition in which primary is not available due to network or other failure, there is a chance of losing data that is not replicated yet to secondaries, hence there is a loss of consistency during partitions.

Therefore, it can be concluded that in the case of a network partition, MongoDB chooses availability but otherwise guarantees consistency.

Alternately, when MongoDB is configured to write on majority replicas and read from the primary, it could be categorized as PC/EC.

Conclusion

Strong consistency adds a steep price from higher write latencies due to data having to replicate and commit across large distances. Strong consistency may also suffer from reduced availability (during failures) because data cannot replicate and commit in every region.

Eventual consistency offers higher availability and better performance, but its more difficult to program applications because data may not be completely consistent across all regions.

Leave a Reply

Your email address will not be published. Required fields are marked *