Is CAP just used by truckers that distribute systems?

What are distributed systems and where can we find them in our everyday life? Bottle CAP and CAP theorem what is the difference?

To answer these questions we need to go back in time to the introduction of network communication between different machines in the '60s and the continued technological advances during the next decades that set the foundation for machine-to-machine communication as we know it today. Machine-to-machine communication provided the ability for machines to coordinate their work towards a common task by communicating. This gave birth to a new kind of system, distributed replication systems, or distributed systems for short, are defined as multiple devices connected via a network cooperating to perform some task. Sabu M. Thampi explains one of the many reasons why distributed systems have grown with the entrance of the information era:

"When a handful of powerful computers are linked together and communicate with each other, the overall computing power available can be amazingly vast. Such a system can have a higher performance share than a single supercomputer."

A problem designers of distributed systems are faced with is the CAP Theorem. CAP stands for Consistency, Availability, and Partitioning. When designing a distributed system you want the system to have strong consistency, always be available, and be operational even when the network is partitioned (you want to reach the center of the illustration in the image below). When a distributed system has geographically long distanced replicas of data and information is sent between the replicas, network partitioning will always be present, may it be due to the human factor or a glitch in the matrix.

CAP diagram — Illustration of the CAP theorem

What the CAP theorem states are that you can only have two out of the three CAP properties, thus there are two main options when handling a system that is in a partitioned state (always present). One of the options is to block all operations to replicas thus keeping consistency but providing no availability. The other is to allow all operations to replicas even though it breaks consistency. However, choosing between consistency and availability is not exclusive, trade-offs between consistency and availability can be made, for example, reading but not writing may be allowed or certain data might be changed thus keeping some level of relation between consistency and availability.

Strong consistency is desirable but costly for the system's performance and it also blocks operations thus lowering the availability of the system. A bank transfer requires stronger consistency because of the nature of the transaction than for example Amazon Dynamo

which has sacrificed strong consistency for better availability. Let's face it, in today's society, availability is extremely important. When people can’t access their Facebook or Instagram accounts it is on every newspaper around the globe.

Even though these incidents probably didn't happen because of their servers dropping availability to reach a consistent state, you and I would probably not “accept” their services to be down for a couple of hours every other Tuesday night for them to reach consistency.

The structure of a system can be constructed in two different ways: centralized, where one replica is the primary and the rest of the replicas in the network are secondary, or decentralized where all nodes have the same priority. In a centralized system, all communication from the slave nodes is directed towards the primary node while in the decentralized system communication is made between arbitrary replicas.

The above image displays a possible network of five replicas where four out of the five replicas are symmetric and the middle replica differs from the other replicas. This is the primary replica and the others are secondary replicas. Secondary replicas that receive updates (e.g. an upvote on a post, illustrated by the 👍 emoji), do not apply the 👍 immediately. Instead, they notify the primary replica that a 👍has been received and wait for the main replica to process consequent updates. An example would be when two secondary replicas receive a 👍 at the same time, notify the primary replica at the same time and apply the two 👍 in the same order due to the decision of order being made by the primary replica.

Now you are thinking why does it matter in which order the 👍 are made, and it doesn't matter for this specific case as in this case two 👍 will increase the total count by two always. But what if the updates were 👍 and 👎 on the same cat picture from the same user? If 👎 is performed before 👍 nothing will happen for the first update as there is no 👍 to remove resulting in one more 👍 on the picture. If the order was reversed, the thumbs would remove each other resulting in no more likes for the cat.

In a decentralized system, all nodes perform actions to reach a consistent state by communicating with other replicas. Actions are performed locally and the changes of the state are communicated to replicas currently in the network. The updates made, propagate across the network and reach all replicas either directly or indirectly if no replica crashes or disconnects indefinitely.

There are pros and cons for a centralized system, where simplicity is a strong argument for this kind of system. A weakness in a centralized system is the single point of failure, all secondary nodes rely on the primary replica for updates. If the primary replica never responds with updates to the secondary replicas they can never update and thus lose both availability and consistency. The system is also faced with the possible bottleneck of the primary replica not having the performance to handle all requests from the secondary replicas, making the whole system not function properly. If this is the case then it does not matter if the secondary replicas might have enough performance to handle their tasks if the primary replica is the point of congestion.

In comparison to centralized replication, decentralized replication has no single point of failure, all replicas can continue working if any replica disconnects. The advantage of no single point of failure leads to higher complexity for keeping consistency when replicating data as all replicas must reach consensus. The complexity of the replication depends on the level of consistency required in the system. Stronger consistency requires more complex solutions than weak consistency.

Both systems have their pros and cons but there is never a universal fit for all use cases. For a larger system like Facebook, a decentralized system may be the best fit and for a DIY project, maybe the less complex central system is the better choice.

Relevant resources recommended by the author

Introduction to Distributed Systems

Computing has passed through many transformations since the birth of the first computing machines. Developments in technology have resulted in the availability of fast and inexpensive processors, and progresses in communication technology have resulted in the availability of lucrative and highly proficient computer networks. Among these, the centralized networks have one component that is shared by users all the time. All resources are accessible, but there is a single point of control as well as a single point of failure. The integration of computer and networking technologies gave birth to new paradigm of computing called distributed computing in the late 1970s. Distributed computing has changed the face of computing and offered quick and precise solutions for a variety of complex problems for different fields. Nowadays, we are fully engrossed by the information age, and expending more time communicating and gathering information through the Internet. The Internet keeps on progressing along more than a few magnitudes, abiding end systems increasingly to communicate in more and more different ways. Over the years, several methods have evolved to enable these developments, ranging from simplistic data sharing to advanced systems supporting a multitude of services. This article provides an overview of distributed computing systems. The definition, architecture, characteristics of distributed systems and the various distributed computing fallacies are discussed in the beginning. Finally, discusses client/server computing, World Wide Web and types of distributed systems.

Relevant resources recommended by the author

Introduction to Distributed Systems

Did you like the post?