Decline of the Enterprise Data Warehouse
(…due to Hadoop, HBase, and Hive)
If you wanted to generate intelligence or other OLAPish things from large amounts of data (TBs) three years ago, you faced a terrifying prospect: You bought a server (or several large servers) costing $x00,000, paid for software from Oracle, Sybase, or IBM, and then prayed to Data Jesus that it would meet your needs for a few years. You’d find all the data you could in all the disparate formats your company used and try and cram it into tables, have data analysts schedule SQL queries, and output your reports. This was by far the most common use case for “big data analysis”.
Then, Social Media happened, and small businesses needed to process TB, web-scale companies needed to process PB, and neither of them could afford the $millions to do it. With the advent of the Hadoop and HBase ecosystem, however soon anyone can scale their Data Warehousing in predictable, affordable ways.
The established and new vendors in this space may become irrelevant to a decent portion of businesses. They recognize there’s a problem, but their solutions so far will do nothing to address the fundamental issues. They won’t disappear, but the high-cost Enterprise Data Warehouse is no longer the only solution out there, and its significance will continue to dwindle.
Hadoop represents the first radical shift in a long time in an industry that was built around “pay big or go home”.
Hi, I'm Bradford. I write about scalability and the fringes of Computer Science.