Sunday, 24 December 2017

Demonstration of High-Availability and Data-Replication



This is a practical demo of what we discussed in Network Topology replication strategy in the previous post.

For the analysis purpose, I used the following setup 

         
                                                                                                                                                                           
                     



Setup status using nodetool :

 

Now lets see the Need for Multi-DataCenter :


1) Workload Separation :
We could distribute the work between two data centres based on the functional need.
Example :
We can have two clients for Cassandra in which one client will read the data from newyork data centre and another from chennai data center.

2) Data backup across data centres:
Since we are going to follow Network Topology replication strategy (Explained in the previous post), In case of whole data centre failure another data centre will hold the data.
3) Geographical location

Proposed replication strategy for Multi-Datacenter setup :



In case of Multi-DataCenter, we need to follow the ‘Network Topology strategy’ in Cassandra.
Network Topology strategy :
As per the current structure (newyork and chennai datacenter), If the client writes to the data center ‘newyork’, then it will be replicated to the chennai datacenter as per the configuration we provide.

Example :

I have configured the network topology strategy for the ‘qnapusers’ keyspace like below


where I provided ‘1’ replication in each datacenter,

(i.e) If the client write a data in newyork, It replicates to one node in the chennai datacenter.
(i.e) If the client write a data in chennai, It replicates to one node in the newyork datacenter.

If the number of the nodes in each data center is high, we could increase the replication in each datacenter and reduce the CONSISTENCY_LEVEL.

NOTE : Everything depends on the use case and the configuration we provide for the keyspaces.


DEMO :


1) I have created a table ‘qnapuser’ in the keyspace ‘qnapusers’ in one of the node (172.30.56.60) in the datacenter ‘neywork’ and written the data in it like below.


2) I forcefully bought down the nodes (172.30.56.60 and 172.30.56.61) in the datacenter ‘neywork’

 

3) I am trying to read the same data from the nodes in the datacenter ‘chennai’, Since we used the ‘network topology strategy’ with the replication factor as ‘1’ for every write the data is replicated to another data center (‘chennai’ in our case)

We are able to successfully read the data from the nodes (Example : 172.30.56.62) in another data center ‘chennai’.

 

Based on this analysis, study and observation, we are clear that the data replication and live back up scenario is completely possible in Cassandra.

Using the Multi-Datacenter setup, we will be able to provide the workload seperation and Geographical location based data seperation, Also we need to apply ‘NETWORK TOPOLOGY STRATEGY’ for replicating the data across the data centers in an efficient way which in turn provides us the live back up’s (as described in the Demo section – Topic 3) .
The only caveat in network topology strategy is : If the datacenters are very far (latency between DC’s) then we end up in high latency in write which is obvious.


No comments:

Post a Comment