Study notes from DS201: Foundations of Apache Cassandra™ and DataStax Enterprise.
What is a Ring?
Cassandra’s architecture is built on clusters formed by multiple nodes. At the core of this architecture is the Ring data structure. The ring manages how data is distributed among nodes within the cluster.
Data Distribution and Placement
Cassandra divides all data into partitions and assigns a unique token to each partition. These tokens are placed on a virtual circle called the ring. The ring can be thought of as a visualization of the range of all possible token values.
Each node “owns” a specific range of tokens on this ring. By storing the portion of data corresponding to its token range, each node determines which data resides on which node, ensuring that data is evenly distributed across the entire cluster.
Scalability and Fault Tolerance
The ring structure is designed to minimize data redistribution when nodes are added or removed.
- Adding nodes: When a new node is added, it occupies a specific position on the ring and takes over a portion of data from existing adjacent nodes. This allows the newly added node to function as a new data storage location, improving the cluster’s overall capacity and performance.
- Removing nodes / Failures: Since the ring structure has no single point of failure, if one node fails or goes down, other nodes take over its role and the system continues to operate normally. This is because each node holds a portion of the data and owns specific token ranges.
Replication
Cassandra’s ring structure also supports replication to ensure data consistency and high availability. Data from each partition is replicated to multiple nodes on the ring. Typically, this is done to adjacent nodes on the ring. This replication process ensures that data is not lost when a node fails and that required data is always available for reading.
In this way, Cassandra’s ring is a powerful mechanism for managing the key challenges of distributed databases: data placement, scalability, fault tolerance, and consistency.
Exercises
Result of SELECT token(tag), tag FROM videos_by_tag;
Here is the result of selecting token(tag) and tag from the videos_by_tag table in cqlsh:
cqlsh:killrvideo> SELECT token(tag), tag
... FROM videos_by_tag;
system.token(tag) | tag
----------------------+-----------
-1651127669401031945 | datastax
-1651127669401031945 | datastax
356242581507269238 | cassandra
356242581507269238 | cassandra
356242581507269238 | cassandra
This result shows that the two tags "datastax" and "cassandra" each form distinct partitions. The system.token(tag) value is the token value corresponding to each tag, and data is distributed to nodes in the cluster based on this token value.
Output of nodetool ring
The nodetool ring command displays information about nodes in the Cassandra cluster. Each row represents one node and shows the starting token of the token range that node is responsible for.
$ nodetool ring
Datacenter: datacenter1
==========
Address Rack Status State Load Owns Token
8495111347830785616
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% -9107256078387604241
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% -7666987848485021001
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% -6595487232144988189
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% -5577635827402561173
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% -4759963894790210379
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% -3684208013564630839
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% -2948292320853737199
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% -1868919513406135542
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% -625399507725543569
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% 341964735352991929
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% 1931969287866890567
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% 3550992583563933864
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% 4529138036080047940
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% 6307772336903635068
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% 7403814237138573357
127.0.0.1 rack1 Up Normal 12.04 MiB 100.00% 8495111347830785616
Warning: "nodetool ring" is used to output all the tokens of a node.
To view status related info of a node use "nodetool status" instead.
This output shows that a single-node Cassandra cluster (127.0.0.1) owns all token ranges on the ring. The Token column in each row represents the starting token of the data range that node is responsible for.