Cassandra Bible: Cassandra Architecture

As per my working experience, I am telling you that hardware failure is very common in production environment. Any node can go down in any point of time. There is no 100 % guarantee for protecting the nodes in the production environment and it is not practically possible to safe guard too. To solve this hiccup, Cassandra is designed with peer to peer distributed architecture. It has the ability to store data on different nodes based on the partition.

Now Let's see the components of Cassandra :

This is one of the setups I come up with for testing the Data Center level failure. As per this setup, We have two data centre one in North India and Another one in South India with two Cassandra nodes running in each DC.

Now Let's take this real-time example and see the component details,

Node (Node 1, 2, 3, 4) :
It is the basic component and the place where the data is stored actually in Cassandra.

Data Center (North India and South India) :
Generally, the group of nodes will form a Data Center. It is the logical entity for grouping the nodes based on the use case and needs.

Cluster (qnap) :
It is the logical grouping of Datacenter's which in turn grouping of nodes.

How does Cassandra bootstrap works?

Cassandra will have a unique name to form a cluster, It will be available in Cassandra setup package. Any node to form a cluster should have the same name (Example - cluster_name: India). There is a concept in Cassandra bootstrapping called 'Seed Nodes'. What do they do? Seed nodes have no purpose other than bootstrapping the gossip protocol for new nodes joining the cluster.

Let's say I have planned to create a 3 node cluster, Node N1 - 172.30.50.60, Node N2 - 172.30.50.61 and Node N3 - 172.30.50.62,

In which I configured the seed nodes as N1 and N2. To start the Cassandra cluster I should have the same cluster name in file, then first It is mandatory to start the seed nodes Node N1 and N2, they form a cluster with two nodes and start to gossip about each other for exchanging the information. then when I start N3 it looks up for seeds nodes and contacts them to join the cluster and get the information about other nodes.

Seed nodes are not a single point of failure but If there are no seed nodes in the cluster then it is not possible for the new nodes to join the cluster.

I believe It's getting curious with too much of questions getting raised in your mind, Let me ask myself and clarify it :)

What is gossiping and gossip protocol in Cassandra?

Gossip protocol is a peer-to-peer communication protocol in which Cassandra nodes periodically exchange state information about themselves and about other nodes they know about. In Cassandra, this Gossip process runs every second and exchange state information to up to 3 other nodes in the cluster.

If it gossips to MAX of 3 nodes how other nodes know about the remaining nodes?

The nodes exchange information about themselves and about the other nodes that they have gossiped about, so all nodes quickly learn about all other nodes in the cluster.

How consistency in gossiping is maintained (i.e) How Cassandra ensures most recent Gossiping data?

A gossip message has a version associated with it so that during a gossip exchange, older information is overwritten with the most current state for a particular node.

Why there need to be more than one seed nodes in the cluster?

As mentioned, Seed nodes are not a single point of failure but If there are no seed nodes in the cluster then it is not possible for the new nodes to join the cluster.So definitely there need to be more than one seed nodes available in the cluster.

Why are seed nodes called as gossip hotspot?

Mostly gossiping favours the seed nodes (i.e) If there is a seed node available in the cluster In the MAX of 3 gossiping nodes one seed node is mandatory for gossiping. so it is always the hot spotted and also It is a good architecture since the new nodes always get the information about other nodes using the seed nodes. This is the reason we should not make all the nodes as seeds in the cluster.

How Cassandra distributes the data?

Cassandra's data distribution architecture is considered as a 'ring', It is technically seen in that way because it distributes the data evenly among all the nodes in the cluster. Cassandra has the hashing algorithm to distribute the data from -2 power 63 (-9,223,372,036,854,775,807) to 2 power 63 (9,223,372,036,854,775,807)

For understanding let's consider, we have 20 tokens (hash values) and see how the Cassandra distributes the data.

In the real time, This is how the data will be distributed from - 2 power 63 to 2 power 63

To see how the token/hash value is distributed use the following utility (./nodetool ring),

This real time example shows that how the data is distributed based on the token / hash value from -2 power 63 to 2 power 63. As per the above example the token value from -918897766822945945 to -9162825472670438566 is owned by Node 172.30.56.60 and goes on.

This is the end of the nodetool ring output, It clearly shows that the token ring is supported till 2 power 63.

During Cassandra bootstrap, each Cassandra node is assigned a token range which determines its position in the cluster. Each node receives an equal range of the token ranges to ensure that data is spread evenly across the Casandra nodes in the ring format. Like the sample distribution mentioned above, the tokens are equally distributed among all the nodes in the cluster and it's back up / replicated data will be available on the other nodes. Each node is assigned a token and is responsible for token values from the previous token (exclusive) to the node's token (inclusive).

(i.e) Each node in a cluster is responsible for a certain set of data which is determined by the partitioner. A partitioner is a hash function for computing the resultant token for a particular row key. This token is then used to determine the node which will store the first replica. Currently, till Cassandra 3.0 offers a Murmur3Partitioner (default), RandomPartitioner and a ByteOrderedPartitioner. It is also possible to use the custom partitioner.

Here is a full process of bootstraping in steps, The new node joining the cluster is defined as an empty node without system tables or data.

Contact the seed nodes to learn about gossip state.
Transition to Up and Joining state (to indicate it is joining the cluster; represented by UJ in the ./nodetool status).
Contact the seed nodes to ensure schema agreement.
Calculate the tokens that it will become responsible for.
Stream replica data associated with the tokens it is responsible for from the former owners.
Transition to Up and Normal state once streaming is complete (to indicate it is now part of the cluster; represented by UN in the ./nodetool status).

I believe now we got the clear-cut idea about bootstrapping and data distribution in Cassandra.

We will explore and do more research on data distribution when we come across the data structure in Cassandra in this same blog :)

Cassandra Bible

Monday, 25 December 2017

Cassandra Architecture