NoSQL: Why it’s So Damn Sticky

We wouldn’t care as much about NoSQL if it wasn’t called NoSQL. Even the word itself is triggering existential crises among some very smart, seasoned people.

But why?

…because the term NoSQL encapsulates just about every phase of database innovation in the last decade.

The inspiration for this post came from reading Chip and Dan Heath’s book called Made to Stick. Their theories, plus much buzzing from the anti-NoSQL camp about how “this is just a phase like XML DB” and “you obviously aren’t smart enough to understand the RDBMS”, has given me some fresh thoughts on why NoSQL is such a controversial thing.

Photobucket

According to the Heath brothers, a sticky idea is one that’s: simple, unexpected, concrete, credible, emotional, and a story. Now… the connection between psychology, computers, and social science can get pretty vague for someone untrained in two of those fields (me), but I hope this article will at least spark some interesting discussion.

So with all that in mind, let’s examine why NoSQL is such an awesome idea — and why it scares us.

(if you want a data platform that’s scalable, real-time, AND has a query language, check out Drawn to Scale)

Simple

“NoSQL” is the essence of simplicity. It’s one word. Yet it implies so much. First of all, it’s kind of modern-sounding, which is appropriate. But it’s also based on something that people know about (and in some cases, are quite happy to imagine being without).

When introducing new ideas, it’s best to build them upon previous ideas. It helps form a bridge and sets some expectations for what we should expect to see next. For example, let’s learn about Tapirs, because they’re badass:

“A tapir is a browsing mammal with a short, prehensile snout, belonging to a family of odd-toed ungulates. Tapirs inhabit jungle and forest regions of South America and Southern Asia. Most tapirs are about 7 ft long, stand about 3 ft high at the shoulder, and weigh about 500lbs. Their diet is omnivorous and they eat up to 85lbs of food a day.”

Now, if I were to ask you, “Are tapirs cuddly?”, you’d probably have to ponder that for a while. There’s nothing relevant to anchor your thoughts about it.

But what if I said, “A tapir is like a pig with a long snout that lives in the jungle.”?

Then you’d say, “Hells yeahs, Tapirs are cuddly”. Then you’d also know that they probably eat vegetation with that snout, and live in dense undergrowth for protection from predators.

Tapir

Similarly, “NoSQL” immediately implies a database which does not use SQL as an interface. This is powerful. It raises many questions;

“How do I access data?”
“How does someone maintain it?”
“How can you store data without tables and rows?”
“Are there schema?”
“What do I click on?”
“Why the hell would someone wanna do that?”

There’s always a loss of fidelity with simple. Technically, any database could support SQL as a language (Drawn to Scale’s platform will). But you get a good start, and that’s worth quite a lot.

The word NoSQL fundamentally challenges everything we assume about databases. That’s worth embracing.

Unexpected

Unexpected events grab our attention, and keep it by raising questions. Humans have evolved to react and pay attention to changes in the environment. Imagine a class being taught on tapirs. The professor starts up a slide deck, shows a picture of the critter, and says “Tapirs are mammals.” zzzzzzzzz….

Now instead, imagine if the professor darkened the room and said “Tapirs are responsible for 78% of the world’s economic output.” Whoa! Serious business! You’d be interested. This is unexpected news. And it raises a mystery: why are tapirs so important? You’d need to know why.

Likewise, the word NoSQL is unexpected and creates a mystery. The concept steps on more than 25 years of data traditions.

“You can’t have no SQL… it makes no sense!”

Until now, using an RDBMS was as ordinary as wearing underwear. It’s just what you did, because why imagine the world any other way?

The logic goes: This is the way it is done. Data is stored in a database. It is structured as rows and columns. You use a language called SQL to access and manipulate that data. If you want it to go faster, hire consultants or buy a bigger server. If you need to do something else with data outside of the SQL language — tough luck. Go have fun with that.

The database industry is $150 billion. And most of that is spent on RDBMSs. Entire careers and corporations are built around tweaking queries and tables to get a few more % of space or speed.

NoSQL is terrifying. Everything these people have known for decades just became a little more irrelevant. RDBMSs are based on computer science fundamentals — but if you don’t know those fundamentals, it’s hard to understand why you’d need any other kind of database.

It’s always a bit surprising when the pillars of existence start to crack.

Concrete

It seems fairly intuitive that you’d need to make ideas concrete for them to be memorable and communicable, but engineers especially struggle with this. Staying abstract and technical allows us to communicate with each other much faster, at the expense of talking to anyone else. There’s a reason why whiteboards are so prevalent.

As we get more advanced in our studies of a topic, we tend to talk about it in a more abstract way. This makes it harder to communicate with the uninitiated. As an engineer founding a startup, I talk to a lot of CEOs and CTOs. I am painfully aware of this.

Yes, it’s hard to remember what it’s like to not know something… but there’s a point where it just comes off as wilful vaporousness. If you want to teach math, you don’t start up with lambda calculus and explain the recursive fundamentals of subtraction. You present something real, “If I have three Pikachus and put two in a blender, how many do I have left?”. That’s easier to remember.

pikablend

Concrete ideas are more than memorable; they’re easier to communicate. Concrete ideas are like a common language. At my previous workplace , product management asked us to build something that “lets people explore all social media data. Like using Google, but you can drill down on things like Authors”. Using that as a concrete goal, my engineering team extrapolated “build a distributed, real-time database capable of faceted search that is fast no matter how much data you put in.”

It’s tricky to think about software fundamentals in a concrete way.

The suggested replacement term for NoSQL, “AltDB”, isn’t too useful. There’s nothing concrete about it. “Alternate Database? Alternate to what? Does that mean MySQL is an alternate to Postgres? Why do I need that?”

When I say “NoSQL”, something concrete is in there. “Well, shit. It’s not SQL. What do I do now?”

This inspires communication and thought. You may not know how a database does synchronous indexing, but you know it uses SQL, and that’s pretty important. Without that… there’s a lot of questions to be answered. And you won’t easily forget the term NoSQL.

Concrete things stick around much longer than abstract ideas.

Credible

NoSQL is especially powerful because many people are talking about it — and those who started the movement had some serious horsepower behind them. Not only that, but the movement seems to be pissing off the “right” kind of people.

External credibility is divided between authority and anti-authority figures. Both are important.

Authority figures have credibility. If the FDA tells you not to smoke because it’ll cause cancer, that’s believable. They’re a legitimate government institution.

Likewise, anti-authority figures add credibility to an idea. The classic example is the story of the Marlboro Man from old cigarette advertisements. Marlboro cigarettes projected a “tough, rugged, cool cowboy image”. In the 80s, the man was smoking through a tube in his throat and very much the opposite of a tobacco commercial. His misfortune helped strengthen the idea that maybe something was wrong here.

The NoSQL movement has advocates from academia and industry. There are quite a few academic papers about alternative data stores like Hadoop. In addition, Google led the NoSQL charge with their BigTable paper a few years ago. Google has pretty much the biggest “engineer cred”. If they say SQL doesn’t scale for their needs, that’s the truth. More and smaller companies are getting on the NoSQL bandwagon, like Twitter, LinkedIn, and Digg.

What’s just as interesting is the type of people NoSQL is aggravating. The folks who have the most invested in the status quo are not happy. They provide anti-authority credibility.

Database Administrators, 3-tier webapp designers, SQL tuners, analysts, data warehouse vendors, and more are all blogging furiously about the dire repercussions of non-traditional databases. This is often thinly veiled “concern” about “using untested technology”, or implications that engineers don’t REALLY understand relational databases, or they’d be using them.

Emotional

The strongest memories we have in life are the ones with the most intense emotion. NoSQL seems to elicit strong emotions from detractors and advocates alike.

NoSQL the word is not inherently stirring. But it strongly resonates with engineers who regularly feel crippled by the wrong tools for their job. Guys struggling with SQL problems seem to grasp the NoSQL term immediately. If you’ve been a similar situation, you know what it’s like. There’s a fundamental “wrongness”, like having to write with a different hand. It gnaws at you, even as you sleep. The world would be a fundamentally better place if you could just do it right.

On the other side, NoSQL elicits strong emotional responses from its detractors. It sort of eats at the core of their being. When you can no longer define yourself as a “database engineer” by knowing how to intimately tweak around the query planner, you begin to question tenets of your professional existence.

There’s no-one more unhappy than an engineer who can’t solve problems the right way. It’s a core part of our profession — build something that people want, the right way. Most engineers are builders, it’s why we studied this in college (or dug through the textbooks ourselves). When the RDBMS keeps you from building software and doing your duty as a professional, frustration and resentment begin to build against it.

It may sound corny, but the RDBMS paradigm has prevented people from realizing their potential and conquering intellectual challenges. NoSQL isn’t SQL — it’s not “that horrible thing”. The word invokes a sense of optimism, that these hard problems which kill RDBMSs *can* be solved through solid engineering.

Story

NoSQL is not a story of itself. It’s just a word. But it came about from the experiences of so many people. As we’ve seen, these experiences are a common thread between everyone who is interested in NoSQL.

The problems usually faced are:

  • Large data volumes
  • Non-row data structures (graphs, columns, images)
  • Slow processing and query times
  • Many simultaneous users
  • Increasingly complex data
  • Elastic data demand
  • Affordable performance/scalability

When your business has one of these problems, you either have to change your business model, or adopt a NoSQL solution. That always leads to a story.

I’ve told my story in previous blog posts, but I’ll summarize it here. It’s not a special story — in fact, Drawn to Scale was inspired by hearing so many stories just like this.

My first week with my previous employer was eventful. It was the last week of 2007. The company provided business intelligence and brand management on social media — a worthy goal. Imagine being able to track topics, influencers, and everyone talking about your brand on the Internet. That’s incredibly powerful.

It was also utterly impossible with the company’s infrastructure. It would always be impossible. And they didn’t know it. So I had to tell them my first week there.

The workflow was basically:
1. Collect blog posts and tweets
2. Extract entities (Author Name, Comments) from unstructured text
3. Store this data in Microsoft SQL Server
4. Rework the data about 30 different ways to power some dashboards
5. Allow users to slice-and-dice, aggregate, and filter data.

All this was being done with a handful of inexpensive boxes and one pricey database server.

To collect about 10,000 sites a day, clean the data, load it, and then generate dashboards took about 24 hours. There was ~ 1 GB of data daily. That’s not “the entire social web”. That wouldn’t even cover Blogspot.

Not to mention, it had extreme reliability and performance issues. The server would go down once a week. It would slow down enough to be unusable to power the website. Customers would quit.

Many engineers insisted all we had to do was add more disks and processors. It helped for a while — cutting processing time in half, doubling uptime. For 10x the cost. But more data would flow in, and the system would collapse again. This had been happening for years (and would continue).

Eventually I just asked myself, “How did Google do this?”. That led me to the BigTable whitepaper, and then Hadoop and HBase. A quick prototype was powerful enough to earn me the right to hire a team, and I got some awesome people. We then built some distributed indexing and search. It also never crashed with high data volumes.

The platform we built there (which inspired Drawn to Scale’s platform) used a bunch of $1500 boxes to process in 5 minutes what used to take 10 hours. And queries could be updated in 250ms, instead of 24 hours. It was amazing, and never would have been possible with an RDBMS.

This story isn’t unique. You’ll see similar threads among many NoSQL users… even Google’s BigTable.

Due to the growth of the web and its massive data, and data-centric business, you’ll see more and more NoSQL solutions.

Conclusion

NoSQL isn’t controversial because of a few bloggers and tweets. There’s a ton of nice people in all ranges of opinion.

NoSQL is controversial because it fundamentally challenges so many aspects of business and software engineering.

Someone said to me recently: “NoSQL is popular in the same way NoDeath or NoTaxes would be popular. It’s the opposite of a well-known symbol for what some think is wrong with the world.”

I hear complaints about how you can’t frame things in terms of what they *aren’t*, like this post. But that is simply false. One of the best ways to highlight what something is, is by saying what it isn’t.

NoSQL certainly does that.

Photobucket

(P.S. if you want a data platform that’s scalable, real-time, AND has a query language, drop us a line at Drawn to Scale)

Share and Enjoy:
  • Reddit
  • Digg
  • Google Bookmarks
  • Technorati
  • del.icio.us
  • Facebook
  • Twitter
  • StumbleUpon
  • E-mail this story to a friend!
  • RSS
  • HackerNews
  • Slashdot

19 Responses to “NoSQL: Why it’s So Damn Sticky”

  • uberVU - social comments Says:

    Social comments and analytics for this post…

    This post was mentioned on Twitter by Technojobz: NoSQL: Why it’s So Damn Sticky http://dlvr.it/SDtd...

  • j_king Says:

    I think there are a great number of reasons to choose an RDBMS — only the least of which should be one’s opinion of SQL.

    A lot of the NoSQL fervor sounds like a lot of bandwagon hyping and that is probably why there are a lot of resistors out there.

    You have a point that people often reach for an RDBMS without giving it much thought. Yet often it is still the right choice, even if it was made in ignorance.

    Developers just need to be more aware of their persistence needs.

  • IT Corner » Blog Archive » NoSQL: Why it's So Damn Sticky | Road to Failure: Scalability … Says:

    [...] more: NoSQL: Why it's So Damn Sticky | Road to Failure: Scalability … « What Will It Take to Wind Down 'Doomsday Clock'? – AOL News Alas, a blog » Blog [...]

  • Adam Ierymenko Says:

    Instead of NoSQL, I’d really like to see something more conservative (in some ways) but more feature-preserving. How about “SQL reloaded?”

    You should be able to invent new columns and tables on the fly. Replication should be easy. Establishing rules to define subsets of data to replicate should be easy. Everything should be indexed or indexes should be auto-generated when needed and auto-freed when not needed. I’m sure more could come to mind.

    At the very least, changing your schema should be trivial and should be able to be done both automatically/progammatically and manually.

    Even “modern” SQL servers are so 1980s. They incorporate a lot of limitations that come from an era when computers had far less storage, far less memory, and programming techniques were more “plodding” and less advanced in general.

  • Bradford Says:

    Adam: Thanks for the sage commentary!

    (P.S. what you ask for sounds an awful lot like what our platform does).

  • Ryan Says:

    How about “DataMall”; gives a better picture to its scalable and distributed functionality then say datastore.

    Those who have implemented or prototyping a NoSQL architecture that I’ve spoken to are well aware of the trade-offs being made; but there was also a specific data problem to be solved; and in the long term the rdbms wasn’t going to scale economically with the business.

    And with most emerging technologies; the adoption is based on seeing a similar problems being solved at Google, Amazon, etc that provides the necessary confidence for acceptance.

    As for the hype factor; as with most technology that is happening more at peripherals. Those in the middle of it; have been burned enough by the wrong solution to understand that due diligence is in order so the square peg fits into the square hole.

  • Russ Says:

    I personally think that NoSQL isn’t always the right tool for the job. Yes, it scales. Yes, you don’t need to structure your data. Yes, it is fast and distributed and can be redundant. To that I say, it is too new, it has a steep learning curve, there aren’t any tools to abstract away the database portion.

    Now SQL has ease of use, a myriad of tools such as ActiveRecord, or Hibernate, an ANSI specification. All of which allow developers to quickly pump out a site that will handle the needs of 98% of the businesses out there.

    There is no need for a small time web store to be using a Hadoop Cluster, just like we wouldn’t expect Google to be running on a MySQL backend.

    As someone who has worked in both sides of the industry, I can definitely say that trying to integrate a NoSQL solution with a standard 3 Tier type website would increase the cost of development 10 fold. Sure, you can get it with no license fee, but you end up paying for it elsewhere.

    Basically it boils down to NoSQL != NoPricetag

    P.S> I recently wasted a week of my life trying to get HBase up and working on a small cluster with the data from our Vertica solution. I ended up using InfoBright and got it up and data loaded in a few hours. I am still open to the NoSQL idea for our specific business, but why can’t it be just as easy?

  • Robert J. Berger Says:

    I don’t think pigs or tapirs are all that cuddly.

  • CodingSamurai Says:

    Interesting article. I think the danger is the way that the NoSQL folks are hyping the solution by insinuating that traditional SQL is NEVER useful. Even your language in the article comes close to that while also seeming to acknowledge at the end that what NoSQL is really good for is solving problems that are IMPOSSIBLE for traditional RDBMS’s to solve.

    Yes by all means we should use new tools when the old ones don’t work to solve a problem. My issue is that doesn’t mean that traditional databases shouldn’t ever be used or are somehow always inferior.

  • Chad Says:

    Where are you getting $150M for the database market size? IDC says ~10M for database software.

  • Jan-Piet Mens » CouchDB (and a bit of Lotus Notes) Says:

    [...] database, a non-relational datastore, one which belongs to the group of NoSQL databases. (More on NoSQL here.) Contrary to a relational database (RDBMS) in which data is contained in tables, rows, and [...]

  • eric norlin Says:

    Bradford –

    file under “great minds think alike” –
    http://gluecon.ipower.com/blog/?p=201

    great post.

  • Bradford Says:

    Tim Anglade gave a talk recently with the statistic (I believe the $150B included software and services).

  • Merv Adrian Says:

    Wonderfully written and entertaining. Thanks. At its heart, you’ve nailed the problem as being in the difference from the norm. I’d add that it challenges a belief, propagated for 3 decades, that the RDBMS is the universal solution to anything that requires data persistence. That’s always been nonsense. And it gets farther from reality as more data types, more usages, and more volume make that fact more apparent.

  • Saša Tomislav Mataić Says:

    Could it be stated that SQL is one approach, and the idea behind NoSQL is the other approach to data organization, management and structure. Maybe SQL is “thesis” (being really thought through, engineered and sucessfully used), NoSQL is “antithesis” (pragmatic answer for data management) and we have yet to see emergence of a “synthesis”, an approach which combines these two in one scalable, simple and quick way of handling data?
    It seems to me that SQL is a bit over-engineered, extremely smart solution, and NoSQL is a discovered pattern of use which solves 90% of problems which SQL suffers from. But, I may be oversimplifying :)

  • Bradford Says:

    Sasa — it is a bit of an oversimplification, but there’s a kernel of truth. NoSQL is all about being aware of the sacrifices you need to make for scalability up front. The RDBMS is “everything to everyone”, and it’s difficult to scale.

  • araybould Says:

    We should distinguish between SQL and the relational model of databases. The latter is of considerable value, regardless of whether one is using SQL (and SQL itself is criticized by relational proponents as being a poor implementation of the model.)

    We have been here before. In the beginning, RDBMSs had performance problems, and were not used for intensive data processing. Moore’s law took care of that for a while, but then the growth of the internet, together with new, data-hungry techniques for analysis, have brought the wheel full cycle. Moore’s law has an interesting role in this cycle, having made those new techniques possible (or at least practical).

  • NoSQL Daily – Fri Oct 22 › PHP App Engine Says:

    [...] NoSQL: Why it’s So Damn Sticky | Road to Failure: Scalability, Startups, Computer Science, and… [...]

  • ehcache.net Says:

    NoSQL: Why it’s So Damn Sticky…

    We wouldn’t care as much about NoSQL if it wasn’t called NoSQL. Even the word itself is triggering existential crises among some very smart, seasoned people….

Leave a Reply