<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Social Media Kills the Database</title>
	<atom:link href="http://www.roadtofailure.com/2009/06/19/social-media-kills-the-rdbms/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.roadtofailure.com/2009/06/19/social-media-kills-the-rdbms/</link>
	<description>The Fringes of Scalability, Social Media, and Computer Science.</description>
	<lastBuildDate>Fri, 12 Mar 2010 17:10:58 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: What is NoSQL &#8211; Part 1 ? &#124; Devguru</title>
		<link>http://www.roadtofailure.com/2009/06/19/social-media-kills-the-rdbms/comment-page-1/#comment-1314</link>
		<dc:creator>What is NoSQL &#8211; Part 1 ? &#124; Devguru</dc:creator>
		<pubDate>Fri, 05 Mar 2010 04:16:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=21#comment-1314</guid>
		<description>[...] Social Media Kills the RDBMS&#160;by Bradford Stephens [...]</description>
		<content:encoded><![CDATA[<p>[...] Social Media Kills the RDBMS&nbsp;by Bradford Stephens [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: What is NoSQL database? Enter NoSQL East, conference of non-relational data stores — PaulStamatiou.com</title>
		<link>http://www.roadtofailure.com/2009/06/19/social-media-kills-the-rdbms/comment-page-1/#comment-277</link>
		<dc:creator>What is NoSQL database? Enter NoSQL East, conference of non-relational data stores — PaulStamatiou.com</dc:creator>
		<pubDate>Mon, 05 Oct 2009 14:29:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=21#comment-277</guid>
		<description>[...] Social Media Kills the RDBMS by Bradford Stephens  Slides from a NoSQL meetup [...]</description>
		<content:encoded><![CDATA[<p>[...] Social Media Kills the RDBMS by Bradford Stephens  Slides from a NoSQL meetup [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: How To Make Life Suck Less (While Making Scalable Systems) &#124; Road to Failure</title>
		<link>http://www.roadtofailure.com/2009/06/19/social-media-kills-the-rdbms/comment-page-1/#comment-192</link>
		<dc:creator>How To Make Life Suck Less (While Making Scalable Systems) &#124; Road to Failure</dc:creator>
		<pubDate>Wed, 09 Sep 2009 18:51:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=21#comment-192</guid>
		<description>[...] Social Media Kills the Database  [...]</description>
		<content:encoded><![CDATA[<p>[...] Social Media Kills the Database  [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Peterson</title>
		<link>http://www.roadtofailure.com/2009/06/19/social-media-kills-the-rdbms/comment-page-1/#comment-153</link>
		<dc:creator>Andrew Peterson</dc:creator>
		<pubDate>Fri, 21 Aug 2009 15:04:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=21#comment-153</guid>
		<description>You are right on, key-value, Hadoop, Cassandra, etc. are a must. But they don’t replace RDBMS.  They replace what RDBMS don’t do well.    When Dr. Codd wrote the relational concepts in 1969, hierarchal database were the norm, and accounting systems were the primary systems using a database.  ACID and structure were critical.  The RDBMS is great for maintaining an ACID driven, absolute relationship.  Key-value is great for finding text based relationships.  Like any tool, use both wisely for what they do well, not as a Swiss army knife.</description>
		<content:encoded><![CDATA[<p>You are right on, key-value, Hadoop, Cassandra, etc. are a must. But they don’t replace RDBMS.  They replace what RDBMS don’t do well.    When Dr. Codd wrote the relational concepts in 1969, hierarchal database were the norm, and accounting systems were the primary systems using a database.  ACID and structure were critical.  The RDBMS is great for maintaining an ACID driven, absolute relationship.  Key-value is great for finding text based relationships.  Like any tool, use both wisely for what they do well, not as a Swiss army knife.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bradford</title>
		<link>http://www.roadtofailure.com/2009/06/19/social-media-kills-the-rdbms/comment-page-1/#comment-106</link>
		<dc:creator>Bradford</dc:creator>
		<pubDate>Mon, 10 Aug 2009 22:42:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=21#comment-106</guid>
		<description>We make the indexes with a MR job, and then Katta (distributed Lucene) pulls them out of HDFS onto a separate cluster optimized for serving searches.</description>
		<content:encoded><![CDATA[<p>We make the indexes with a MR job, and then Katta (distributed Lucene) pulls them out of HDFS onto a separate cluster optimized for serving searches.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fred</title>
		<link>http://www.roadtofailure.com/2009/06/19/social-media-kills-the-rdbms/comment-page-1/#comment-104</link>
		<dc:creator>Fred</dc:creator>
		<pubDate>Mon, 10 Aug 2009 14:38:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=21#comment-104</guid>
		<description>Thanks very much for your reply!  One thing that wasn&#039;t clear to me is how exactly Lucene fits into this.  Do you have Lucene using HFS as its directory storage and running directly on those 19 nodes, or are you extracting from HFS to a set of conventional (local file based) index shards?</description>
		<content:encoded><![CDATA[<p>Thanks very much for your reply!  One thing that wasn&#8217;t clear to me is how exactly Lucene fits into this.  Do you have Lucene using HFS as its directory storage and running directly on those 19 nodes, or are you extracting from HFS to a set of conventional (local file based) index shards?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bradford</title>
		<link>http://www.roadtofailure.com/2009/06/19/social-media-kills-the-rdbms/comment-page-1/#comment-88</link>
		<dc:creator>Bradford</dc:creator>
		<pubDate>Fri, 07 Aug 2009 03:33:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=21#comment-88</guid>
		<description>Thanks for your comment! You&#039;re right in that some of the OLAP/OLTP points are moot.

The cost and performance benefits are really nice so far. For an example, we have a $140,000 SQL Server + License, that is so full of data that it takes 24 hours to load and process a few GB of documents. 

Yesterday, we loaded all 1TB of our data set into HBase, indexed it with Lucene, and deployed it to  our search cluster in about 18 hours -- and that&#039;s with some random failures that set us back because we&#039;re kinda new to Linux and things weren&#039;t very efficient. This was done on a cluster of 19 nodes that cost $1700 each. 

To do the same work on our SQL Server box took us over a week :)

Dev time wasn&#039;t too bad -- I knew Hadoop pretty well, but the team had to ramp up on learning Linux, HBase, Zookeeper, Lucene and the rest of the stack. It took our team of 5 people about 3 weeks for an end-to-end system prototype, which was pretty fragile.  We&#039;re going to spend several months to roll out a full &quot;enterprise&quot; solution, with proper monitoring, workflow, perf, etc. Which is really no different than any other large-scale application.  

If you have more questions, feel free to e-mail me! </description>
		<content:encoded><![CDATA[<p>Thanks for your comment! You&#8217;re right in that some of the OLAP/OLTP points are moot.</p>
<p>The cost and performance benefits are really nice so far. For an example, we have a $140,000 SQL Server + License, that is so full of data that it takes 24 hours to load and process a few GB of documents. </p>
<p>Yesterday, we loaded all 1TB of our data set into HBase, indexed it with Lucene, and deployed it to  our search cluster in about 18 hours &#8212; and that&#8217;s with some random failures that set us back because we&#8217;re kinda new to Linux and things weren&#8217;t very efficient. This was done on a cluster of 19 nodes that cost $1700 each. </p>
<p>To do the same work on our SQL Server box took us over a week :)</p>
<p>Dev time wasn&#8217;t too bad &#8212; I knew Hadoop pretty well, but the team had to ramp up on learning Linux, HBase, Zookeeper, Lucene and the rest of the stack. It took our team of 5 people about 3 weeks for an end-to-end system prototype, which was pretty fragile.  We&#8217;re going to spend several months to roll out a full &#8220;enterprise&#8221; solution, with proper monitoring, workflow, perf, etc. Which is really no different than any other large-scale application.  </p>
<p>If you have more questions, feel free to e-mail me!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fred</title>
		<link>http://www.roadtofailure.com/2009/06/19/social-media-kills-the-rdbms/comment-page-1/#comment-87</link>
		<dc:creator>Fred</dc:creator>
		<pubDate>Fri, 07 Aug 2009 02:50:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=21#comment-87</guid>
		<description>Interesting article!  This is a thought-provoking discussion.

It sounds like the problem you&#039;re trying to solve is more OLAP than OLTP, so some of your objections to DB transaction cost (for example) could be moot depending on the implementation.  And your storage requirements (you mentioned hundreds of millions of docs), while large, don&#039;t seem beyond the scope of what modern OLAP (or OLTP for that matter) systems can handle.  You&#039;re certainly right of course that there will be a substantial cost.

The Hadoop etc. solution doesn&#039;t sound exactly simple or cheap either, though.  Surely it&#039;s cheaper than all your MS SQL Server boxes, but you still need a bunch of nodes, plus a fair amount of custom development to ingest data, perform the computations, and manage everything.  How has the cost equation turned out?  For example, how many nodes do you need in the new cluster to store everything?  (And how many SQL Servers did you need before?)  What&#039;s the relative scale of the development effort -- like, team of 3 for 3 months or team of 6 for 6 months etc.?

Anyway, thanks for sharing your experiences.  It&#039;s interesting to see the different ways people are trying to solve these problems.</description>
		<content:encoded><![CDATA[<p>Interesting article!  This is a thought-provoking discussion.</p>
<p>It sounds like the problem you&#8217;re trying to solve is more OLAP than OLTP, so some of your objections to DB transaction cost (for example) could be moot depending on the implementation.  And your storage requirements (you mentioned hundreds of millions of docs), while large, don&#8217;t seem beyond the scope of what modern OLAP (or OLTP for that matter) systems can handle.  You&#8217;re certainly right of course that there will be a substantial cost.</p>
<p>The Hadoop etc. solution doesn&#8217;t sound exactly simple or cheap either, though.  Surely it&#8217;s cheaper than all your MS SQL Server boxes, but you still need a bunch of nodes, plus a fair amount of custom development to ingest data, perform the computations, and manage everything.  How has the cost equation turned out?  For example, how many nodes do you need in the new cluster to store everything?  (And how many SQL Servers did you need before?)  What&#8217;s the relative scale of the development effort &#8212; like, team of 3 for 3 months or team of 6 for 6 months etc.?</p>
<p>Anyway, thanks for sharing your experiences.  It&#8217;s interesting to see the different ways people are trying to solve these problems.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bradford</title>
		<link>http://www.roadtofailure.com/2009/06/19/social-media-kills-the-rdbms/comment-page-1/#comment-73</link>
		<dc:creator>Bradford</dc:creator>
		<pubDate>Sun, 02 Aug 2009 15:27:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=21#comment-73</guid>
		<description>That&#039;s very interesting -- one of my friends who went to TechEd was the source of the MSSQL Boxes fact. Now I&#039;ll have to go double-check it&#039;s legit :)</description>
		<content:encoded><![CDATA[<p>That&#8217;s very interesting &#8212; one of my friends who went to TechEd was the source of the MSSQL Boxes fact. Now I&#8217;ll have to go double-check it&#8217;s legit :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gary</title>
		<link>http://www.roadtofailure.com/2009/06/19/social-media-kills-the-rdbms/comment-page-1/#comment-71</link>
		<dc:creator>Gary</dc:creator>
		<pubDate>Sun, 02 Aug 2009 09:14:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.roadtofailure.com/?p=21#comment-71</guid>
		<description>Database requirements of myspace are pretty tiny compared to the search giants.

According to the article below MySpace run off 500ish database servers. Much smaller scale than tens of thousands of the big search companies.

http://highscalability.com/myspace-architecture</description>
		<content:encoded><![CDATA[<p>Database requirements of myspace are pretty tiny compared to the search giants.</p>
<p>According to the article below MySpace run off 500ish database servers. Much smaller scale than tens of thousands of the big search companies.</p>
<p><a href="http://highscalability.com/myspace-architecture" rel="nofollow">http://highscalability.com/myspace-architecture</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>
