Study notes from DS201: Foundations of Apache Cassandra™ and DataStax Enterprise.
NodeSync
NodeSync is a feature provided by DataStax Enterprise (DSE) that continuously verifies data consistency within a Cassandra cluster and automatically repairs inconsistencies. It is designed as a more efficient and lower-overhead alternative to traditional anti-entropy repair (e.g., nodetool repair) for maintaining data consistency.
Key Features of NodeSync
- Continuous verification and repair: Continuously verifies data synchronization across all replicas and automatically performs repairs when inconsistencies are found.
- Low overhead: Always runs in the background with minimal impact on cluster performance.
- Fully automated: No manual intervention is required, automating data consistency maintenance.
- Anti-entropy repair replacement:
Functions as an alternative to the traditional
nodetool repaircommand, providing a more efficient method for ensuring data consistency.
How It Works
The NodeSync service runs on each node by default. NodeSync can be enabled per table, and it continuously validates the local data ranges of enabled tables.
- Segment division: Data ranges are divided into small units called “segments.” This keeps the amount of data processed at once small, reducing the impact on resources.
- Prioritization: Segments are prioritized based on the need for validation.
- Validation and repair: NodeSync selects segments, reads the data within them, and checks for inconsistencies between replicas. If inconsistencies are found, it performs repairs (data synchronization) as needed.
- Recording validation state:
The validation state of each segment is stored in the
system_distributed.nodesync_statustable. This allows administrators to monitor NodeSync’s progress and results.
Segment Validation States
The system_distributed.nodesync_status table records the following segment validation states:
successful: Validation completed normally and consistency was confirmed.full_in_sync: The entire segment is fully synchronized.full_repaired: The entire segment has been successfully repaired.unsuccessful: Validation failed.partial_in_sync: Part of the segment is synchronized.partial_repaired: Part of the segment has been repaired.uncompleted: Validation is incomplete.failed: Validation was interrupted due to an error.
NodeSync is an important feature that greatly simplifies data consistency maintenance operations in DSE clusters and improves reliability.