Cassandra Cluster Synchronization

Cassandra is a highly-distributable NoSQL database with tunable consistency. What makes it highly distributable makes it also, in part, vulnerable: the whole deployment must run on synchronized clocks.

It’s quite surprising that, given how crucial this is, it is not covered sufficiently in literature. And, if it is, it simply refers to installation of a NTP daemon on each node which – if followed blindly – leads to really bad consequences. You will find blog posts by users who got burned by clock drifting. [Holub]

Cassandra labels updates with times and then uses those times to figure out the freshest update. Suppose machine A and B are both working with a database of records. B updates record R, then A  updates record R and 3 out of 5 machines holding copies of R get this second update. Then B asks for record R and the read operation gets 5 responses – it can throw away the stale copies of R and forward the fresh copy because the timestamp on the update from A is more recent than those on the older copies.   But this only works if the times are right – which means you need solid clock synchronization. Time synchronization clients like NTPd and PTDp (and variants) will silently fail, so the system can lose synchronization without  notice.  This will silently corrupt data. One of the most sophisticated parts of TimeKeeper is the algorithm to detect incorrect times, to failover where possible, and to alarm where not possible.

See: TimeKeeper, Clouds, Big Data