A New DB for 80% of Facebook, YouTube-scale Sites
Imagine you’re visiting several of your favorite massive-scale websites. Or go ahead and pull them up in your browser. Facebook, MySpace, LiveJournal, Bebo, Streamy, YouTube, anything that deals with huge amounts of data, or any sufficiently large Social Media site. While you’re at it, take a look at any data-driven website: a forum, your favorite blog, or a news site.
As you examine these sites, try to think about what kind of data they need to present to their users rapidly. There’s usually a selection of:
- Profiles (like users)
- Search results
- Sets of items by various criteria (all videos by one author, posts in a thread)
- Networks (“my friends”)
- Real-time updates “My status”
- Communications subsystems (messages to inboxes, e-mails, IMs)
Until recently, there’s been no other way to retrieve this data than with a classic RDBMS, which as we’ve seen earlier can be expensive and crippling to scale. Yet I’d estimate that a few use cases cover 80% of how data needs to be presented to these websites. The Swiss-Army RDBMS is costly overkill for what most sites *really* use it for. There needs to be a new type of database optimized for serving simple, web-scale data in real-time.
Let’s examine these operations, along with some examples. Then, I’d like to propose a scalable engine I’m architecting that’s actually optimized to drive 80% of websites, built on HBase.
Continue reading
Hi, I'm Bradford. I write about scalability and the fringes of Computer Science.