Study notes from DS201: Foundations of Apache Cassandra™ and DataStax Enterprise.
What is Replication?
Cassandra employs replication to achieve high availability and fault tolerance by copying data to multiple nodes. This prevents data loss and service interruptions even when some nodes experience failures, enabling continuous operation.
Replication Factor (RF)
The Replication Factor (RF) is a parameter that specifies how many times each dataset is replicated within the cluster. The RF value is set per keyspace.
| RF | Fault Tolerance | Storage | Write Cost |
|---|---|---|---|
| 1 | None (data loss on node failure) | 1x | Low |
| 2 | Tolerates 1 node failure | 2x | Medium |
| 3 | Tolerates 2 node failures | 3x | High |
RF = 3 is the commonly recommended value for production environments. This allows continued operation without data loss even when two nodes are simultaneously down.
Setting RF with CQL
You specify the RF when creating a keyspace.
-- Create a keyspace with SimpleStrategy and RF=3
CREATE KEYSPACE my_keyspace
WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': 3
};
-- Create a keyspace with NetworkTopologyStrategy
CREATE KEYSPACE production_keyspace
WITH replication = {
'class': 'NetworkTopologyStrategy',
'dc1': 3,
'dc2': 2
};
To modify the RF of an existing keyspace, use ALTER KEYSPACE.
ALTER KEYSPACE my_keyspace
WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': 2
};
After changing the RF, you need to run nodetool repair to update replication of existing data.
Replication Strategy
Cassandra has several replication strategies that determine how data is replicated.
SimpleStrategy
Designed for clusters with a single data center. Places replicas on the next nodes clockwise around the ring from the partition’s primary node.
- Suitable for development and testing environments
- Does not consider rack or data center topology
- Should not be used in multi-data center deployments
NetworkTopologyStrategy
Designed for clusters with multiple data centers and racks. Allows different RF per data center and distributes replicas across different racks within each data center.
- Always recommended for production (even with a single DC, to prepare for future expansion)
- Independent RF per data center
- Better data protection against rack failures
Choosing a Strategy
| Scenario | Recommended Strategy |
|---|---|
| Development/Test (single node) | SimpleStrategy |
| Production (single DC) | NetworkTopologyStrategy |
| Production (multi DC) | NetworkTopologyStrategy |
Replication and Consistency Level
Replication strategy and RF are closely related to the Consistency Level. For example, with RF=3 and QUORUM consistency, both reads and writes require responses from 2 nodes.
The rule for strong consistency:
\[ W + R > RF \]Where \(W\) is the number of nodes required for the write Consistency Level, and \(R\) is the number required for the read Consistency Level.
DS201 Series
This article is part of the DS201 study notes series.
- Previous: Cassandra Snitches
- Next: Cassandra Consistency Levels