Jul 10 2009

Decline of the Enterprise Data Warehouse

(…due to Hadoop, HBase, and Hive)

If you wanted to generate intelligence or other OLAPish things from large amounts of data (TBs) three years ago, you faced a terrifying prospect: You bought a server (or several large servers) costing $x00,000, paid for software from Oracle, Sybase, or IBM, and then prayed to Data Jesus that it would meet your needs for a few years. You’d find all the data you could in all the disparate formats your company used and try and cram it into tables, have data analysts schedule SQL queries, and output your reports. This was by far the most common use case for “big data analysis”.

Then, Social Media happened, and small businesses needed to process TB, web-scale companies needed to process PB, and neither of them could afford the $millions to do it. With the advent of the Hadoop and HBase ecosystem, however soon anyone can scale their Data Warehousing in predictable, affordable ways.

The established and new vendors in this space may become irrelevant to a decent portion of businesses. They recognize there’s a problem, but their solutions so far will do nothing to address the fundamental issues. They won’t disappear, but the high-cost Enterprise Data Warehouse is no longer the only solution out there, and its significance will continue to dwindle.

Hadoop represents the first radical shift in a long time in an industry that was built around “pay big or go home”.

Continue reading


Jun 19 2009

Social Media Kills the Database

(or: How the Greatest Tool of the 1980’s Crippled a Generation, and how Hadoop and HBase help)

postnote: This isn’t about a complete death of the RDBMS. Just the death of the idea that it’s a tool meant for all your structured data storage needs.

(Like this? Check out our data platform startup, Drawn to Scale — Fill out the contact form and we’ll drop you a line!

Imagine someone told you that there was a piece of software out there which could change the way you do everything – and I mean everything. This code is so perpetually practical; it’s a Swiss Army Knife! Surely, it can handle the simple data storage and analytics you want to do on a ‘Social Media’ scale.  For example, you can:

  • Store pieces of content for your website so it’s easy to manage
  • Store content from EVERY website on the Internet
  • Track your company’s financial transactions
  • Track every credit card transaction for your multibillion-dollar banking conglomerate
  • Store an address so it’s easy to find by a unique key
  • Store every photo on Facebook so it’s accessible by a unique key
  • Generate charts and graphs on how your business did in the last month
  • Generate charts and graphs on key analytics for every piece of Social Media on the Internet.
  • Analyze records of how people have accessed your website and attempt to simulate their behavior
  • Analyze every click ever made on MySpace and optimize intrasite workflow.

…Well, in theory you can do the above.

Does it make sense that people can install the exact same software package and be able to do this, at all scales? Just throw bigger computers at the problem? It’s counter-intuitive to me, and it sure doesn’t work in reality. With the coming need to analyze things not on an enterprise-scale, but a Web Scale, social media is driving the final stake into the large analytical RDBMS (Relational Database Management System).

The ACIDy, Transactional, RDBMS doesn’t scale, and it needs to be relegated to the proper dustbin before it does any more damage to engineers trying to write scalable software.

Continue reading