DS201 Study Notes: Cassandra Replication

How to choose the right Cassandra replication factor (RF=1/2/3) and replication strategy (SimpleStrategy vs NetworkTopologyStrategy), with CQL examples and practical scenarios.

Key Takeaway: This article covers Cassandra’s replication factor (RF) settings and the choice between SimpleStrategy and NetworkTopologyStrategy. For production, RF=3 + NetworkTopologyStrategy is recommended.

DS201 Series Table of Contents (17 articles)
  1. Installing Cassandra on macOS
  2. CQL Basics
  3. Cassandra Partitions
  4. Cassandra Clustering Columns
  5. Cassandra Python Driver
  6. Cassandra nodetool Commands
  7. Cassandra Ring Structure
  8. Cassandra VNodes
  9. Cassandra Gossip Protocol
  10. Cassandra Snitches
  11. Cassandra Replication (this article)
  12. Cassandra Consistency Levels
  13. Cassandra Hinted Handoff
  14. Cassandra Read Repair
  15. Cassandra NodeSync
  16. Cassandra Write Path and Read Path
  17. Cassandra Compaction

Study notes from DS201: Foundations of Apache Cassandra™ and DataStax Enterprise.

What is Replication?

Cassandra employs replication to achieve high availability and fault tolerance by copying data to multiple nodes. This prevents data loss and service interruptions even when some nodes experience failures, enabling continuous operation.

Replication Factor (RF)

The Replication Factor (RF) is a parameter that specifies how many times each dataset is replicated within the cluster. The RF value is set per keyspace.

RFFault ToleranceStorageWrite Cost
1None (data loss on node failure)1xLow
2Tolerates 1 node failure2xMedium
3Tolerates 2 node failures3xHigh

RF = 3 is the commonly recommended value for production environments. This allows continued operation without data loss even when two nodes are simultaneously down.

Setting RF with CQL

You specify the RF when creating a keyspace.

-- Create a keyspace with SimpleStrategy and RF=3
CREATE KEYSPACE my_keyspace
WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': 3
};

-- Create a keyspace with NetworkTopologyStrategy
CREATE KEYSPACE production_keyspace
WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'dc1': 3,
  'dc2': 2
};

To modify the RF of an existing keyspace, use ALTER KEYSPACE.

ALTER KEYSPACE my_keyspace
WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': 2
};

After changing the RF, you need to run nodetool repair to update replication of existing data.

Replication Strategy

Cassandra has several replication strategies that determine how data is replicated.

SimpleStrategy

Designed for clusters with a single data center. Places replicas on the next nodes clockwise around the ring from the partition’s primary node.

  • Suitable for development and testing environments
  • Does not consider rack or data center topology
  • Should not be used in multi-data center deployments

NetworkTopologyStrategy

Designed for clusters with multiple data centers and racks. Allows different RF per data center and distributes replicas across different racks within each data center.

  • Always recommended for production (even with a single DC, to prepare for future expansion)
  • Independent RF per data center
  • Better data protection against rack failures

Choosing a Strategy

ScenarioRecommended Strategy
Development/Test (single node)SimpleStrategy
Production (single DC)NetworkTopologyStrategy
Production (multi DC)NetworkTopologyStrategy

Replication and Consistency Level

Replication strategy and RF are closely related to the Consistency Level. For example, with RF=3 and QUORUM consistency, both reads and writes require responses from 2 nodes.

The rule for strong consistency:

\[ W + R > RF \]

Where \(W\) is the number of nodes required for the write Consistency Level, and \(R\) is the number required for the read Consistency Level.

Practical Scenarios: Choosing RF and Strategy

Scenario 1: Development/Test Environment

For single-node development, SimpleStrategy + RF=1 is sufficient.

CREATE KEYSPACE dev_keyspace
WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': 1
};

Scenario 2: Single-DC Production

For clusters with 3+ nodes, use NetworkTopologyStrategy + RF=3 to prepare for future expansion.

CREATE KEYSPACE prod_keyspace
WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'dc1': 3
};

Scenario 3: Multi-Region Global Service

For a two-region setup (e.g., Tokyo and Virginia), you can set different RF per DC.

CREATE KEYSPACE global_keyspace
WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'tokyo': 3,
  'virginia': 2
};

The primary region (Tokyo) uses RF=3 for full redundancy, while the DR region (Virginia) uses RF=2 to reduce costs while maintaining availability.

DS201 Series

This article is part of the DS201 study notes series.