Skip to main content

Posts

Showing posts from December, 2015

Cassandra data modelling: Redundant data, a tough decision

The biggest challenge around building an efficient data model for Cassandra is data redundancy. Though the basic rules for data modelling with Cassandra, mention the usual RDBMS modelling goals, as non goals for Cassandra (Refer:  Basic rules for C* data modelling ), it builds upon assumptions that clusters are built on commodity hardware, storage is cheap, and as data needs increase more nodes can be added to the cluster incurring very low cost. But in real life we are faced with technical as well as non technical problems. a. Keeping multiple column families in sync can be a major overhead, if the same data is spread across them. What if writes to some CF succeed and some fail? How long and how much will we retry? b. Horizontal scalability may be a truth, but think of a mundane question of where to keep all those heat producing, energy guzzling machines? So how do we model the database that does not allow joins without redundancy? The simple answer is, we do not. Wh...