DS201 Study Notes: Cassandra Replication

DS201 course study notes on Cassandra replication: replication factor settings and strategies (SimpleStrategy, NetworkTopologyStrategy).

Study notes from DS201: Foundations of Apache Cassandra™ and DataStax Enterprise.

What is Replication?

Cassandra employs replication to achieve high availability and fault tolerance by copying data to multiple nodes. This prevents data loss and service interruptions even when some nodes experience failures, enabling continuous operation.

Replication Factor (RF)

The Replication Factor (RF) is a parameter that specifies how many times each dataset is replicated within the cluster. The RF value is set per keyspace.

RFFault ToleranceStorageWrite Cost
1None (data loss on node failure)1xLow
2Tolerates 1 node failure2xMedium
3Tolerates 2 node failures3xHigh

RF = 3 is the commonly recommended value for production environments. This allows continued operation without data loss even when two nodes are simultaneously down.

Setting RF with CQL

You specify the RF when creating a keyspace.

-- Create a keyspace with SimpleStrategy and RF=3
CREATE KEYSPACE my_keyspace
WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': 3
};

-- Create a keyspace with NetworkTopologyStrategy
CREATE KEYSPACE production_keyspace
WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'dc1': 3,
  'dc2': 2
};

To modify the RF of an existing keyspace, use ALTER KEYSPACE.

ALTER KEYSPACE my_keyspace
WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': 2
};

After changing the RF, you need to run nodetool repair to update replication of existing data.

Replication Strategy

Cassandra has several replication strategies that determine how data is replicated.

SimpleStrategy

Designed for clusters with a single data center. Places replicas on the next nodes clockwise around the ring from the partition’s primary node.

  • Suitable for development and testing environments
  • Does not consider rack or data center topology
  • Should not be used in multi-data center deployments

NetworkTopologyStrategy

Designed for clusters with multiple data centers and racks. Allows different RF per data center and distributes replicas across different racks within each data center.

  • Always recommended for production (even with a single DC, to prepare for future expansion)
  • Independent RF per data center
  • Better data protection against rack failures

Choosing a Strategy

ScenarioRecommended Strategy
Development/Test (single node)SimpleStrategy
Production (single DC)NetworkTopologyStrategy
Production (multi DC)NetworkTopologyStrategy

Replication and Consistency Level

Replication strategy and RF are closely related to the Consistency Level. For example, with RF=3 and QUORUM consistency, both reads and writes require responses from 2 nodes.

The rule for strong consistency:

\[ W + R > RF \]

Where \(W\) is the number of nodes required for the write Consistency Level, and \(R\) is the number required for the read Consistency Level.

DS201 Series

This article is part of the DS201 study notes series.