Key Takeaway: This article covers Cassandra’s replication factor (RF) settings and the choice between SimpleStrategy and NetworkTopologyStrategy. For production, RF=3 + NetworkTopologyStrategy is recommended.
DS201 Series Table of Contents (17 articles)
- Installing Cassandra on macOS
- CQL Basics
- Cassandra Partitions
- Cassandra Clustering Columns
- Cassandra Python Driver
- Cassandra nodetool Commands
- Cassandra Ring Structure
- Cassandra VNodes
- Cassandra Gossip Protocol
- Cassandra Snitches
- Cassandra Replication (this article)
- Cassandra Consistency Levels
- Cassandra Hinted Handoff
- Cassandra Read Repair
- Cassandra NodeSync
- Cassandra Write Path and Read Path
- Cassandra Compaction
Study notes from DS201: Foundations of Apache Cassandra™ and DataStax Enterprise.
What is Replication?
Cassandra employs replication to achieve high availability and fault tolerance by copying data to multiple nodes. This prevents data loss and service interruptions even when some nodes experience failures, enabling continuous operation.
Replication Factor (RF)
The Replication Factor (RF) is a parameter that specifies how many times each dataset is replicated within the cluster. The RF value is set per keyspace.
| RF | Fault Tolerance | Storage | Write Cost |
|---|---|---|---|
| 1 | None (data loss on node failure) | 1x | Low |
| 2 | Tolerates 1 node failure | 2x | Medium |
| 3 | Tolerates 2 node failures | 3x | High |
RF = 3 is the commonly recommended value for production environments. This allows continued operation without data loss even when two nodes are simultaneously down.
Setting RF with CQL
You specify the RF when creating a keyspace.
-- Create a keyspace with SimpleStrategy and RF=3
CREATE KEYSPACE my_keyspace
WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': 3
};
-- Create a keyspace with NetworkTopologyStrategy
CREATE KEYSPACE production_keyspace
WITH replication = {
'class': 'NetworkTopologyStrategy',
'dc1': 3,
'dc2': 2
};
To modify the RF of an existing keyspace, use ALTER KEYSPACE.
ALTER KEYSPACE my_keyspace
WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': 2
};
After changing the RF, you need to run nodetool repair to update replication of existing data.
Replication Strategy
Cassandra has several replication strategies that determine how data is replicated.
SimpleStrategy
Designed for clusters with a single data center. Places replicas on the next nodes clockwise around the ring from the partition’s primary node.
- Suitable for development and testing environments
- Does not consider rack or data center topology
- Should not be used in multi-data center deployments
NetworkTopologyStrategy
Designed for clusters with multiple data centers and racks. Allows different RF per data center and distributes replicas across different racks within each data center.
- Always recommended for production (even with a single DC, to prepare for future expansion)
- Independent RF per data center
- Better data protection against rack failures
Choosing a Strategy
| Scenario | Recommended Strategy |
|---|---|
| Development/Test (single node) | SimpleStrategy |
| Production (single DC) | NetworkTopologyStrategy |
| Production (multi DC) | NetworkTopologyStrategy |
Replication and Consistency Level
Replication strategy and RF are closely related to the Consistency Level. For example, with RF=3 and QUORUM consistency, both reads and writes require responses from 2 nodes.
The rule for strong consistency:
\[ W + R > RF \]Where \(W\) is the number of nodes required for the write Consistency Level, and \(R\) is the number required for the read Consistency Level.
Practical Scenarios: Choosing RF and Strategy
Scenario 1: Development/Test Environment
For single-node development, SimpleStrategy + RF=1 is sufficient.
CREATE KEYSPACE dev_keyspace
WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': 1
};
Scenario 2: Single-DC Production
For clusters with 3+ nodes, use NetworkTopologyStrategy + RF=3 to prepare for future expansion.
CREATE KEYSPACE prod_keyspace
WITH replication = {
'class': 'NetworkTopologyStrategy',
'dc1': 3
};
Scenario 3: Multi-Region Global Service
For a two-region setup (e.g., Tokyo and Virginia), you can set different RF per DC.
CREATE KEYSPACE global_keyspace
WITH replication = {
'class': 'NetworkTopologyStrategy',
'tokyo': 3,
'virginia': 2
};
The primary region (Tokyo) uses RF=3 for full redundancy, while the DR region (Virginia) uses RF=2 to reduce costs while maintaining availability.
Related Articles
- Docker Compose: Building and Managing Multi-Container Environments - Learn how to use Docker Compose for setting up Cassandra cluster development environments.
- GitHub Actions Basics - Building CI/CD pipelines that can be applied to Cassandra schema management and test automation.
DS201 Series
This article is part of the DS201 study notes series.
- Previous: Cassandra Snitches
- Next: Cassandra Consistency Levels