Oct 29 2009

HBase vs. Cassandra: NoSQL Battle!

(Note: Check out the Drawn to Scale platform. Store, query, search, process, and serve *all* your data. To *all* your users. In real time).

Distributed, scalable databases are desperately needed these days. From building massive data warehouses at a social media startup, to protein folding analysis at a biotech company, “Big Data” is becoming more important every day. While Hadoop has emerged as theĀ de facto standard for handling big data problems, there are still quite a few distributed databases out there and each has their unique strengths.

Two databases have garnered the most attention: HBase and Cassandra. The split between these equally ambitious projects can be categorized into Features (things missing that could be added any at time), and Architecture (fundamental differences that can’t be coded away). HBase is a near-clone of Google’s BigTable, whereas Cassandra purports to being a “BigTable/Dynamo hybrid”.

In my opinion, while Cassandra’s “writes-never-fail” emphasis has its advantages, HBase is the more robust database for a majority of use-cases. Cassandra relies mostly on Key-Value pairs for storage, with a table-like structure added to make more robust data structures possible. And it’s a fact that far more people are using HBase than Cassandra at this moment, despite both being similarly recent.

Let’s explore the differences between the two in more detail…

Continue reading