<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Decline of the Enterprise Data Warehouse</title>
	<atom:link href="http://www.roadtofailure.com/2009/07/10/decline-of-the-enterprise-data-warehouse/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.roadtofailure.com/2009/07/10/decline-of-the-enterprise-data-warehouse/</link>
	<description>The Fringes of Scalability, Social Media, and Computer Science.</description>
	<lastBuildDate>Fri, 12 Mar 2010 17:10:58 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Martyn Richard Jones</title>
		<link>http://www.roadtofailure.com/2009/07/10/decline-of-the-enterprise-data-warehouse/comment-page-1/#comment-249</link>
		<dc:creator>Martyn Richard Jones</dc:creator>
		<pubDate>Tue, 29 Sep 2009 18:18:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=42#comment-249</guid>
		<description>Hadoop, HBase, and Hive are interesting technologies, that don&#039;t signal the end of Data Warehousing and Business Intelligence as we know it. Sorry, great article, and I&#039;ll keep an eye on the progress, but this isn&#039;t even a remote threat to the major DW players, never mind the niche players on the DB/DW adapter side.

Just wait until correlation database technologies get their feet under the DW/BI table.  

Cheers, Martyn</description>
		<content:encoded><![CDATA[<p>Hadoop, HBase, and Hive are interesting technologies, that don&#8217;t signal the end of Data Warehousing and Business Intelligence as we know it. Sorry, great article, and I&#8217;ll keep an eye on the progress, but this isn&#8217;t even a remote threat to the major DW players, never mind the niche players on the DB/DW adapter side.</p>
<p>Just wait until correlation database technologies get their feet under the DW/BI table.  </p>
<p>Cheers, Martyn</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Roddy</title>
		<link>http://www.roadtofailure.com/2009/07/10/decline-of-the-enterprise-data-warehouse/comment-page-1/#comment-115</link>
		<dc:creator>Roddy</dc:creator>
		<pubDate>Tue, 11 Aug 2009 21:06:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=42#comment-115</guid>
		<description>@DanTheMan: actually, here at Facebook, Hive is wildly popular, much more so than our RDBMS deployment; almost 30% of employees have used it.  More than half of them are non-engineers.  Check out the note at http://www.facebook.com/data.</description>
		<content:encoded><![CDATA[<p>@DanTheMan: actually, here at Facebook, Hive is wildly popular, much more so than our RDBMS deployment; almost 30% of employees have used it.  More than half of them are non-engineers.  Check out the note at <a href="http://www.facebook.com/data" rel="nofollow">http://www.facebook.com/data</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: DanTheMan</title>
		<link>http://www.roadtofailure.com/2009/07/10/decline-of-the-enterprise-data-warehouse/comment-page-1/#comment-64</link>
		<dc:creator>DanTheMan</dc:creator>
		<pubDate>Thu, 30 Jul 2009 18:57:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=42#comment-64</guid>
		<description>hadoop is great.  MapReduce it great.  I&#039;ve read everrything I can find on the web, pro and con versus data warehouses and other techniques.  Its only the uninformed immature coder who thinks Hadoop and MR will replace traditional data warehouses.  I notice most of the people syaing the DW is bad are using the most inefficient SQL database -- MYSQL.  
OK, I yield on price.  Free is hard to beat with Open source.  

But what most MR/Hadoop afficianadoes overlook is that the only way to use Hadoop is through a programmer.  A million years ago, the SQL guys were like that.  But today, the business user has cool tools to access the data, slice, dice, OLAP, pivot, etc.  They dont need no stinking programmers.  The business user does amazing things, makes incredible discoveries when they interact with the data direct.  Hadoop has a lon way to go to get there.  Hive and Pig are a good start, but quite young and immature.

The other thing Hadoop folks ignore is performance.  They say &quot;well, the database takes 20 minutes and Hadoop takes 2 hours but given that Hadoop is free, its worth it.&quot;  What they leave out is the 1-8 hours of programmer labor needed to get the Hadoop MR job built, tested, and finally running.  We can have endless debate over every labor cost and job needed to run (ETL, load times, programming, operations, etc.) At the end of the day, the data warehouse stuff is strong, works well, and often already running.  

I suggest a better way to think of this.  Hadoop does not destroy or eliminate the data warehouse.  And there are many things the DW does Hadoop/Hive/PIG cannot do and might never do.  In contrast, Hadoop/Pig/Hive have a place in this world and its not fully decided what that is.  We dont need to piss on one another to win.  Pissing contests always end up with two stinkers.  Hadoop/MR is a tool.  Use it wisely and it will yield amazing things.  the Data Warehouse is a tool.  it too produces fantastic things.  And dont give me that whining about cost -- I&#039;ve seen many multi-million dollar warehouses sold when the alternative was free or $20K.  If the ROI is 800%, the cost of the warehouse means nothing.  We only care that we have happy execs and users (emphasize users not programmers) AND that the right tool for the job was used.  

Last, hadoop/MR has NOT proven itself in the market.  A few dozens sites are doing great things.  If you read Geoffrey Moores Crossing teh chasm, Hadoop/MR is at the beginng.  Its called lunatic fringe, earliest adopters. Look at the Gratner Hype Cycles and Hadoop/MR hasn&#039;t even hit the charts while cloud computing is roaring along.  These things follow a consistent pattern.  Hadoop/MR is not mainstream stuff and is primarily limited to a few dozen DOT.Coms. I&#039;ve been doing this a long time and Hadoop/MR has all the earmarks of becoming mainstream in the next 5-7 years.  Today, its still risky stuff, but marvelous in our eyes.  Hadoop and EDW will sit side by side in the same site for the next 10-15 years.  So why the fight?  Use the right tool for the job.</description>
		<content:encoded><![CDATA[<p>hadoop is great.  MapReduce it great.  I&#8217;ve read everrything I can find on the web, pro and con versus data warehouses and other techniques.  Its only the uninformed immature coder who thinks Hadoop and MR will replace traditional data warehouses.  I notice most of the people syaing the DW is bad are using the most inefficient SQL database &#8212; MYSQL.<br />
OK, I yield on price.  Free is hard to beat with Open source.  </p>
<p>But what most MR/Hadoop afficianadoes overlook is that the only way to use Hadoop is through a programmer.  A million years ago, the SQL guys were like that.  But today, the business user has cool tools to access the data, slice, dice, OLAP, pivot, etc.  They dont need no stinking programmers.  The business user does amazing things, makes incredible discoveries when they interact with the data direct.  Hadoop has a lon way to go to get there.  Hive and Pig are a good start, but quite young and immature.</p>
<p>The other thing Hadoop folks ignore is performance.  They say &#8220;well, the database takes 20 minutes and Hadoop takes 2 hours but given that Hadoop is free, its worth it.&#8221;  What they leave out is the 1-8 hours of programmer labor needed to get the Hadoop MR job built, tested, and finally running.  We can have endless debate over every labor cost and job needed to run (ETL, load times, programming, operations, etc.) At the end of the day, the data warehouse stuff is strong, works well, and often already running.  </p>
<p>I suggest a better way to think of this.  Hadoop does not destroy or eliminate the data warehouse.  And there are many things the DW does Hadoop/Hive/PIG cannot do and might never do.  In contrast, Hadoop/Pig/Hive have a place in this world and its not fully decided what that is.  We dont need to piss on one another to win.  Pissing contests always end up with two stinkers.  Hadoop/MR is a tool.  Use it wisely and it will yield amazing things.  the Data Warehouse is a tool.  it too produces fantastic things.  And dont give me that whining about cost &#8212; I&#8217;ve seen many multi-million dollar warehouses sold when the alternative was free or $20K.  If the ROI is 800%, the cost of the warehouse means nothing.  We only care that we have happy execs and users (emphasize users not programmers) AND that the right tool for the job was used.  </p>
<p>Last, hadoop/MR has NOT proven itself in the market.  A few dozens sites are doing great things.  If you read Geoffrey Moores Crossing teh chasm, Hadoop/MR is at the beginng.  Its called lunatic fringe, earliest adopters. Look at the Gratner Hype Cycles and Hadoop/MR hasn&#8217;t even hit the charts while cloud computing is roaring along.  These things follow a consistent pattern.  Hadoop/MR is not mainstream stuff and is primarily limited to a few dozen DOT.Coms. I&#8217;ve been doing this a long time and Hadoop/MR has all the earmarks of becoming mainstream in the next 5-7 years.  Today, its still risky stuff, but marvelous in our eyes.  Hadoop and EDW will sit side by side in the same site for the next 10-15 years.  So why the fight?  Use the right tool for the job.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ali Sohani</title>
		<link>http://www.roadtofailure.com/2009/07/10/decline-of-the-enterprise-data-warehouse/comment-page-1/#comment-58</link>
		<dc:creator>Ali Sohani</dc:creator>
		<pubDate>Sun, 26 Jul 2009 10:14:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=42#comment-58</guid>
		<description>There is no argument MapReduce/ Hadoop is and has already proven itself to be a highly scalable &amp; fault-tolerant mechanism over the cloud for data intensive operations; to compute, to aggregate; at the end of the day it’s same old distributed computing via grid jobs that’s showing it’s magic.

Whole NoSQL moment and everything related to it isn&#039;t about shedding everything existing and go new way on, it&#039;s about making people aware, to let people out of local maxima and help them see the world beyond which is to realize &quot;There is a more than one way to do it&quot; (Perl mantra), and there always is.

Sticking always to a traditional approaches or systems to do some next generation or a different sort of job isn&#039;t surely a way to go, we got to change the solution space when problem space changes.

I think future is about hybrid technology, when it wouldn&#039;t even be require to be called hybrid anyways... We already see combinations of technology working complementary with each other: when Bradford&#039;s mentioned: [Hadoop + Hbase + Hive] thing is in work, when Facebook’s [Hadoop + Casandra + Hive] is in work, when Linkedin’s [Hadoop + Voldemort + RDBMS (Oracle, MySQL)] is in work. (Reference: http://developer.yahoo.net/blog/archives/2009/06/nosql_meetup.html)

Line between traditional RDBMS &amp; noSQL systems is already blurring, as we see HadoopDB (http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf) combining power of [Hadoop + RDBMS (postgresql) + Hive] to cater the job. MongoDB and Yahoo Sherpa both are working to provide a scalable data storage system with as many friendly querying capabilities as possible. (Reference: http://developer.yahoo.net/blog/archives/2009/06/nosql_meetup.html)

Very soon I believe big vendors like Oracle are also going to introduce such parallel DBMS, with hybrid combination of some of these noSQL system approaches in the backend, as other close sourced data warehousing vendors GreenPlum (www.greenplum.com/technology/mapreduce) and Aster Data (http://www.asterdata.com/product/mapreduce.php) did. 
(Reference: http://www.dbms2.com/2008/08/26/why-mapreduce-matters-to-sql-data-warehousing)

Future is of parallel DBMS functioning as a one package (Reference: http://www.computerworld.com/s/article/print/9131526/Researchers_Databases_still_beat_Google_s_MapReduce), where we don’t need to worry about integration of components making it work. Still when that’ll be here, one solution of course wouldn’t cater all problems, all problems will evolve with our solutions too; we would adapt and should continue picking the hat that (most closely) fits the head.</description>
		<content:encoded><![CDATA[<p>There is no argument MapReduce/ Hadoop is and has already proven itself to be a highly scalable &amp; fault-tolerant mechanism over the cloud for data intensive operations; to compute, to aggregate; at the end of the day it’s same old distributed computing via grid jobs that’s showing it’s magic.</p>
<p>Whole NoSQL moment and everything related to it isn&#8217;t about shedding everything existing and go new way on, it&#8217;s about making people aware, to let people out of local maxima and help them see the world beyond which is to realize &#8220;There is a more than one way to do it&#8221; (Perl mantra), and there always is.</p>
<p>Sticking always to a traditional approaches or systems to do some next generation or a different sort of job isn&#8217;t surely a way to go, we got to change the solution space when problem space changes.</p>
<p>I think future is about hybrid technology, when it wouldn&#8217;t even be require to be called hybrid anyways&#8230; We already see combinations of technology working complementary with each other: when Bradford&#8217;s mentioned: [Hadoop + Hbase + Hive] thing is in work, when Facebook’s [Hadoop + Casandra + Hive] is in work, when Linkedin’s [Hadoop + Voldemort + RDBMS (Oracle, MySQL)] is in work. (Reference: <a href="http://developer.yahoo.net/blog/archives/2009/06/nosql_meetup.html)" rel="nofollow">http://developer.yahoo.net/blog/archives/2009/06/nosql_meetup.html)</a></p>
<p>Line between traditional RDBMS &amp; noSQL systems is already blurring, as we see HadoopDB (<a href="http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf" rel="nofollow">http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf</a>) combining power of [Hadoop + RDBMS (postgresql) + Hive] to cater the job. MongoDB and Yahoo Sherpa both are working to provide a scalable data storage system with as many friendly querying capabilities as possible. (Reference: <a href="http://developer.yahoo.net/blog/archives/2009/06/nosql_meetup.html)" rel="nofollow">http://developer.yahoo.net/blog/archives/2009/06/nosql_meetup.html)</a></p>
<p>Very soon I believe big vendors like Oracle are also going to introduce such parallel DBMS, with hybrid combination of some of these noSQL system approaches in the backend, as other close sourced data warehousing vendors GreenPlum (www.greenplum.com/technology/mapreduce) and Aster Data (<a href="http://www.asterdata.com/product/mapreduce.php" rel="nofollow">http://www.asterdata.com/product/mapreduce.php</a>) did.<br />
(Reference: <a href="http://www.dbms2.com/2008/08/26/why-mapreduce-matters-to-sql-data-warehousing)" rel="nofollow">http://www.dbms2.com/2008/08/26/why-mapreduce-matters-to-sql-data-warehousing)</a></p>
<p>Future is of parallel DBMS functioning as a one package (Reference: <a href="http://www.computerworld.com/s/article/print/9131526/Researchers_Databases_still_beat_Google_s_MapReduce)" rel="nofollow">http://www.computerworld.com/s/article/print/9131526/Researchers_Databases_still_beat_Google_s_MapReduce)</a>, where we don’t need to worry about integration of components making it work. Still when that’ll be here, one solution of course wouldn’t cater all problems, all problems will evolve with our solutions too; we would adapt and should continue picking the hat that (most closely) fits the head.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bradford</title>
		<link>http://www.roadtofailure.com/2009/07/10/decline-of-the-enterprise-data-warehouse/comment-page-1/#comment-54</link>
		<dc:creator>Bradford</dc:creator>
		<pubDate>Wed, 22 Jul 2009 22:36:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=42#comment-54</guid>
		<description>You know, that&#039;s a really interesting use case. For a lot of the scenarios I&#039;m talking about, there are few &quot;ad-hoc&quot; queries. I can see if you&#039;re running that sort of scale data (and classic analysis), you&#039;d like something like Oracle, Greenplum, etc. We&#039;ve found for our needs, that&#039;s way overkill. 

How large is your data size, how many concurrent requests do you have, what software are you using, and how much did everything cost? 

You can e-mail me if you want, I&#039;m truly interested :)</description>
		<content:encoded><![CDATA[<p>You know, that&#8217;s a really interesting use case. For a lot of the scenarios I&#8217;m talking about, there are few &#8220;ad-hoc&#8221; queries. I can see if you&#8217;re running that sort of scale data (and classic analysis), you&#8217;d like something like Oracle, Greenplum, etc. We&#8217;ve found for our needs, that&#8217;s way overkill. </p>
<p>How large is your data size, how many concurrent requests do you have, what software are you using, and how much did everything cost? </p>
<p>You can e-mail me if you want, I&#8217;m truly interested :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ken farmer</title>
		<link>http://www.roadtofailure.com/2009/07/10/decline-of-the-enterprise-data-warehouse/comment-page-1/#comment-53</link>
		<dc:creator>ken farmer</dc:creator>
		<pubDate>Wed, 22 Jul 2009 18:04:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=42#comment-53</guid>
		<description>&gt; For many types of large data warehouse analytics, the difference between 20 minutes from a Vendor solution and 1-2 hours from Hadoop may not be enough to justify the extra expenditure.

Actually, I&#039;m doing real-time data warehousing now - and have peaks of 60,000+ queries running per day.  And I expect over 99% of my queries to run in under a tenth of a second.  And then there are those massive queries that take ten minutes.

But 2 hours?  My users aren&#039;t happy if their queries are taking that long - especially since it often takes a half-dozen iterations to get them perfect.</description>
		<content:encoded><![CDATA[<p>&gt; For many types of large data warehouse analytics, the difference between 20 minutes from a Vendor solution and 1-2 hours from Hadoop may not be enough to justify the extra expenditure.</p>
<p>Actually, I&#8217;m doing real-time data warehousing now &#8211; and have peaks of 60,000+ queries running per day.  And I expect over 99% of my queries to run in under a tenth of a second.  And then there are those massive queries that take ten minutes.</p>
<p>But 2 hours?  My users aren&#8217;t happy if their queries are taking that long &#8211; especially since it often takes a half-dozen iterations to get them perfect.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Flow &#187; Blog Archive &#187; Daily Digest for July 12th - The zeitgeist daily</title>
		<link>http://www.roadtofailure.com/2009/07/10/decline-of-the-enterprise-data-warehouse/comment-page-1/#comment-42</link>
		<dc:creator>Flow &#187; Blog Archive &#187; Daily Digest for July 12th - The zeitgeist daily</dc:creator>
		<pubDate>Sun, 12 Jul 2009 04:33:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=42#comment-42</guid>
		<description>[...] Decline of the Enterprise Data Warehouse &#8212; 10:24am via Google [...]</description>
		<content:encoded><![CDATA[<p>[...] Decline of the Enterprise Data Warehouse &mdash; 10:24am via Google [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Soruja</title>
		<link>http://www.roadtofailure.com/2009/07/10/decline-of-the-enterprise-data-warehouse/comment-page-1/#comment-41</link>
		<dc:creator>Soruja</dc:creator>
		<pubDate>Sun, 12 Jul 2009 02:34:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=42#comment-41</guid>
		<description>I liek gigabytes. &lt;3</description>
		<content:encoded><![CDATA[<p>I liek gigabytes. &lt;3</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike</title>
		<link>http://www.roadtofailure.com/2009/07/10/decline-of-the-enterprise-data-warehouse/comment-page-1/#comment-38</link>
		<dc:creator>Mike</dc:creator>
		<pubDate>Sat, 11 Jul 2009 05:49:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=42#comment-38</guid>
		<description>Dan,

Do you work for Oracle?  Do you have a vested interest in the status-quo for Data Warehouses to continue?  A traditional data warehouse consultant?

The fact is that despite its substantial flaws (as you have pointed out) Hadoop SOMEHOW manages to get chosen for data warehousing by companies that seem to have SOME idea of what is going on.

And as far as &quot;getting my facts straight&quot; goes I&#039;ll simply say &quot;Facebook, Last.fm, IBM, Sun, Amazon, Cloudera, Yahoo, Google, Hulu, ImageShack, Joost, Baidu, Rackspace&quot; and others, all listed at http://wiki.apache.org/hadoop/PoweredBy</description>
		<content:encoded><![CDATA[<p>Dan,</p>
<p>Do you work for Oracle?  Do you have a vested interest in the status-quo for Data Warehouses to continue?  A traditional data warehouse consultant?</p>
<p>The fact is that despite its substantial flaws (as you have pointed out) Hadoop SOMEHOW manages to get chosen for data warehousing by companies that seem to have SOME idea of what is going on.</p>
<p>And as far as &#8220;getting my facts straight&#8221; goes I&#8217;ll simply say &#8220;Facebook, Last.fm, IBM, Sun, Amazon, Cloudera, Yahoo, Google, Hulu, ImageShack, Joost, Baidu, Rackspace&#8221; and others, all listed at <a href="http://wiki.apache.org/hadoop/PoweredBy" rel="nofollow">http://wiki.apache.org/hadoop/PoweredBy</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff</title>
		<link>http://www.roadtofailure.com/2009/07/10/decline-of-the-enterprise-data-warehouse/comment-page-1/#comment-36</link>
		<dc:creator>Jeff</dc:creator>
		<pubDate>Fri, 10 Jul 2009 22:54:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=42#comment-36</guid>
		<description>Hey Dan,

While Hadoop can certainly complement existing data warehouses, Facebook has been using Hive in place of a data warehouse for over a year now. See http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/ for more. In fact, the scale of their data warehouse and the number and diversity of users submitting queries makes it quite clear that Hadoop is a mature (potentially &quot;upstanding&quot;?) technology that&#039;s seeing serious use in enterprises and labs around the world.

Of course, unlike other data warehousing products, you don&#039;t have to just read about the technology. Go check it out for yourself and help us produce &quot;differentiated value&quot;: http://hadoop.apache.org.</description>
		<content:encoded><![CDATA[<p>Hey Dan,</p>
<p>While Hadoop can certainly complement existing data warehouses, Facebook has been using Hive in place of a data warehouse for over a year now. See <a href="http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/" rel="nofollow">http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/</a> for more. In fact, the scale of their data warehouse and the number and diversity of users submitting queries makes it quite clear that Hadoop is a mature (potentially &#8220;upstanding&#8221;?) technology that&#8217;s seeing serious use in enterprises and labs around the world.</p>
<p>Of course, unlike other data warehousing products, you don&#8217;t have to just read about the technology. Go check it out for yourself and help us produce &#8220;differentiated value&#8221;: <a href="http://hadoop.apache.org" rel="nofollow">http://hadoop.apache.org</a>.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
