Wednesday, March 24 2010

Fighting The NoSQL Mindset, Though This Isn't an anti-NoSQL Piece

Shocked by the incredibly poor database performance described on the Digg technology blog, baffled that they cast it as demonstrative of performance issues with RDBMS’ in general, I was motivated to create a simile of their database problem.

While they posted that entry six months ago, they recently followed up with more statements on the NoSQL / RDBMS divide, and are now being heavily used as a citation of sorts.

For instance Dare Obasanjo held Digg's moves as a rebuttal of my prior entry on SQL scaling (though my entry actually explicitly excluded incredibly rare edge cases like Digg's, and my core point was that the majority of database uses don't have the needs of a site like Digg, I'm always one to take on a challenge), which then got picked up in other blogs.

Digg's case is an example of an entry-level RDBMS product used arguably suboptimally on under-powered hardware, and it seems questionable whether it proves anything of substance about either database technology. Yet it's held as demonstrative of something — in particular the failing of the RDBMS — which is why I focus on it. They are different tools in the toolbox, arguably for different purposes, and that isn't the focus of this entry.

So let's take a look at Digg's scenario.

I do this to evaluate their performance claims, to confirm my previous statements about indexing improvements, and to determine the impact that SSDs have on the problem space, because I strongly believe that SSDs (and cheap memory) completely change the equation.

The focus on this entry is not to question or answer whether NoSQL is the right choice for Digg — though there are some ramifications as SSDs take over, which is, I think, an interesting aside — or whether Google or Amazon or anyone else should use it, etc.

SQL Server 2008 Developer Edition, itself viewed as almost a training-wheels RDBMS by many, on Windows 7 was the most convenient platform for me when I ran this test, so I created a quick script to create what I think is a pretty accurate reproduction of the database described in their blog entry—

  • 500,000 users, having…
  • 10,000,000 friend relationships (using a power law distribution)
  • ..and 500,000,000 “Diggs”, randomly distributed among 500,000 virtual “items” (which might be comments, submissions, etc) with a date range covering four years.

The database weighed in at a svelte 30GB.

I ran this on a two-year-old desktop machine with a Q6600 processor and 6GB of RAM, on a standard 7200 RPM consumer drive. You can easily find laptops with more processing and I/O power.

I opted against running it on a real server (you know, like a 24-core, 128GB, banks-of-SSDs monster than many real databases run on) simply because I knew it wasn’t necessary, and went contrary to the demonstration that even a mediocre machine can beat their results.

DISCLAIMER: This is not a high-fidelity reproduction of Digg's situation, as is pointed out many times in many ways in this post. However Digg took the time to post metrics to support their claims that they are some sort of extreme case, at the edge of database limits, and I simply don't believe that is true. Digg's data quantities are relatively small and lend themselves to sharding. The second point, which again is hammered home many times, is that SSDs present a solution that changes the equation, and, I think, provides some interesting inputs to the situation.

The First Clue That Something Isn’t Right: You Can’t Do a Simple Join

The Digg blog entry detailed how they had to manually build an IN clause given their selected database product’s inability to adequately run a trivial join, with the resulting query taking 14 seconds to find the Diggs for a given user’s friends against a single selected item.

This yielded a results return rate of 0.07 per second.

If you can add an IN clause that solves a database join issue that functionally should achieve the same, there is a much larger underlying issue that needs to be dealt with. I'm not a MySQL user, but apparently it offers minimal plan investigation tools, so there aren't the options to fully flesch-out what the query engine is doing. Nonetheless, it is a warning sign of a foundational product issue.

I ran a similar query in SQL Server, albeit without the hand-coded SQL builder, looking up friend Diggs for randomly selected combinations of users and items. It returned so quickly relative to Digg’s experience, even from a cold cache, that I had to up the iteration count to 1000 to get good test durations.

SQL Server was returning a fairly constant 36 result sets per second, probing the friend table’s ten million relationships to find the selected user’s friends, and then probing the five hundred million Diggs for the pertinent records, sorting it in the manner that Digg sorted their results. The query needed to draw data from all over each of the respective table populations, ensuring that it wouldn’t benefit from localized hot-spot caching. To prove this, limiting SQL Server to only have access to 1GB of memory had a negligible impact on the performance.

CPU usage was marginal, with the limiting factor being the slovenly I/O of the lowly magnetic disk. The iterations were run sequentially, as parallel runs yielded no net benefit, the magnetic disk moving as quickly as it possibly could already.

Already we’re running at close to 500x the rate reported by Digg, without doing anything beyond using an arguably better database product, at least in that it can join properly. MySQL's many weaknesses are well, well known, so the core point from that is not to question Digg (though their indexes were suboptimal), but to put their database product under a cloud, as they themselves often do when posting about their move (usually openly declaring their restricted option set given that they limited themselves to open-source products, obviously eliminating from consideration many of the clusterable, very high performance RDBMS options, even if that were a better choice which is completely uncertain).

Their dataset distribution may be entirely different, however even if I doubled or quadrupled or octupled the count in every table it would only marginally impact performance.

At this point I implemented the indexing changes I described in my prior entry – removing the surrogate keys and cluster-indexing on the unique columns – and the lookup rate jumped to 71 result sets per second, or around 1000x the speed reported by Digg. If I massively increased the data quantity and return counts, the difference between their poor indexing and proper indexing would dramatically widen, with the proper indexed solution showing little difference with significantly increased data counts.

If the database was cached in memory those index changes would have had a much more profound impact.

What If Localized Data Isn’t Your Primary Optimization Strategy?

I had been meaning to get an SSD for Eclipse Android development, so when my new 100GB SSD arrived (it’s an MLC unit that did well on an Anandtech review, though I won’t mention specifics as it isn’t pertinent – any decent SSD will perform at a similar level. Of course for real-world production use you would want an SLC drive) I detached and moved the database files over.

A quick reattach later and the 30GB was very amply hosted in the 100GB MLC SSD.

I fired up the benchmark to be pleasantly surprised to find it returning results at a rate of 4100 result sets per second. The write performance, while not a focus of this test, also hit extraordinarily high levels (which would conveniently lubricate the use of copious indexes).

Correcting the indexes and moving the database to a single inexpensive consumer-grade SSD, running on a dated desktop, had results coming back at a rate 60,000x what Digg reported.

None of this is intended to be a serious benchmark of SQL Server (I don’t wish to fall on the wrong side of a DeWitt clause), or even Digg's use of MySQL: This is not a disciplined benchmark, and during parts of it I hopped into some windowed online matches of Battlefield:Bad Company 2 while tests ran, after seeing that it had a limited impact on the results. I knew that the primary weakness was simply the movement of the hard drive head, and different technology choices (NoSQL versus RDBMS, normalized versus denormalized, clustered versus heap, etc) primarily impact how often and how far that head has to move.

And of course I don’t have Digg’s data, so it is completely speculative on my part based upon some rough descriptions given in the Digg posting. Maybe he hugely underestimated their data counts, or their data entropy is vastly different.

This is a macro-benchmark: Digg’s claimed results were so poor that I went in knowing that the difference would be very large.

Their described data quantities are small in the world of large databases. Most decent relational database products don’t even start to sweat with tens or hundreds of millions, or billions, of rows.

The key, of course, is proper indexing, trading write performance for read performance targeting your specific needs.

Indexes could be viewed as ways of creating “virtual tables” that are maintained in lock-step with your base table. Decent database products like SQL Server even allow you to include unordered but included columns in your index to ensure that you have a covering index (the best kind) for all scenarios. And that’s before you even get to the magical world of materialized views.

So either MySQL is an atrociously bad product at the larger limits, which ample evidence seems to point as a truism, or the Digg staffers simply weren’t getting the most out of their systems, but it’s hard to take their statements about the RDBMS field with seriousness, and their arguments more correctly invalidates MySQL more than it invalidates the RDBMS.

The fact that Digg is a large site says nothing to their technical leadership or mastery. Their site has sped up by leaps and bounds over the past year, so I suspect that they know what they are doing, but I'm wary of any cargo-cult like "they did {x} and they're a big site therefore it must be a good option" appeals to example.

On The Role of the DBA

What is most disturbing about this whole database debate are the number of commentators who excuse horrific database usage (not in relation to Digg's issue, but as a general conversation point whenever people make comments about proper database use in virtually any online discussion), ascribing rudimentary database performance design and knowledge as something that is limited to the elusive “DBA”.

This is ignorant and frightful.

You don’t know what a b-tree is? Don’t know how indexes work? Don’t know what a red-black tree is? Please get away from the compiler and save the world from your monstrosities until you have some knowledge of these basic concepts.

This is not esoteric knowledge, and instead is rudimentary comp. sci. knowledge.

DBAs are the guys who setup user accounts and monitor security, schedule backups and determine macro-optimizations like how to allocate file groups on the SAN arrays. They might probe lowest-hanging fruit performance issues and flag offenders or offer up suggestions.

Rudimentary database design and proper usage is the basic responsibility of developers, and if you don’t know it then it is your responsibility to learn it. Alternately you can just clutch onto NoSQL and bleat about how it changes all of the rules anyways, which is the route quite a few have decided to pursue (I fully expect to get the standard angry responses from those who take this like a religion).

Is NoSQL a Solution for Yesterday’s Problems?

Database servers really like having a lot of RAM. Ideally you should have more RAM than you have data, allowing it to cache the entirety of your DB (or at least the working-set quantity of DB on that partition) making incredible read performance achievable.

Joining rows is not a hard activity for database servers. It can do it at unfathomable rates if the data can be fed to it at the appropriate pace and in the right form. Even heavily normalized databases can be high performance.

What normally makes joins a performance issue is data locality: if you have to load two rows from different places on the disk, that’s two seeks instead of just one (or three, four, five or more instead of one). When seeks are as costly as they are on a magnetic disk, you avoid it (either by striving for a database that fits in memory, which paradoxical often calls for heavy normalization, or by de-normalizing).

Writes are obviously important too, yet on a site like Digg I would guess that reads outweighs writes — from a user interaction perspective — by a factor of 10000:1 or more outside of logging (which usually goes to a log-specific technology anyways).

In contrast to all of the “everyone is a publisher and the internet changes everything” bluster that is used to herald the wave of change that NoSQL brings, the reality is that it’s a very small percentage of users that post submissions and add comments, or even that do the simplest possible action of clicking an arrow.

Users overwhelmingly simply consume data, whether it’s the latest tech news, Asthon Kutcher’s tweets, or just browsing through the comments on a Slashdot article to see if they add any additional insight.

Despite Digg’s recent claim that they are “write intensive” (maybe because they’ve decided to dramatically explode the number of writes a simple action causes?), at its root their platform is primarily read focused, which is why they pursued Cassandra in the first place. Take note that their NoSQL solution for friend Digg lookups is to take every Digg and massively explode the number of writes it causes to happen (in the case of a Digg by Kevin Rose, a single write becomes 40,000+ large writes).

Hardware Is Cheap. Manpower is Expensive

If I had 48GB of RAM in the test machine (which is fairly pedestrian outside of gerbil-sized cloud instances. Note that you can now add 128GB of RAM to servers for around $4000 in some cases), outside of the initial caching period the select rates would be stratospheric regardless of storage medium, though SSDs would still come in a very, very strong lead when it came to write performance.

For the same $4000 you could chain five Intel X25-E drives for 320GB of intensely high performance – and persistent – storage. Just keep going up until you have more throughput, I/O and storage than you could dream of.

Some high-end enterprise solutions now tier storage and automatically place data as appropriate, choosing between magnetic, SSD, and memory caching systems. The pages of the table that are never touched end up on the magnetic storage while the hot area – say Diggs within the last 6 months – are moved to SSDs or to huge banks of memory caching.

There are bountiful options to achieve incredible performance, even on a budget, balancing memory and high performance storage systems.

Throwing Storage at the Problem

I didn’t waste the disk space, but as mentioned before I could do a simple join between the tables, materialize the view, and the performance would be very high even on magnetic disk, although it would add a serious cost to writes: When a user with a large number of people who befriended them dug something, their record creation would branch out into the write of potentially thousands of records.

That was the route that Digg took: They are pre-computing the sets of data that a user might possibly want, even apparently for reams and reams of long abandoned accounts.

They do this because looking up data that can’t be cached in memory is an expensive operation. Yet as has been shown, SSDs, which are getting faster and cheaper regularly, completely flip the I/O equation.

SSDs change everything.

Turning a small amount of data into a massive amount of data to improve performance paradoxically makes SSDs much less attainable (because the cost per GB is so much higher), and humorously may thwart the end goal. It also reduces the ability to memory-cache the relevant data.

By pursuing this solution, Digg has limited their ability to choose other solutions that are clearly hitting the mainstream.

Coming Next – PostgreSQL versus Cassandra

There is a complete absence of objective measures of the performance of Cassandra. In place of real performance comparisons and load metrics are a lot of hand-waving and comparisons against completely broken database products (never, ever hold MySQL as the vanguard of the RDBMS world. It is comical to do that) running horrendously malignant queries.

Not anymore. I’m on the case.

My goal is not to belittle the product (which I think is elegant, beautiful and concise, and serves a very important role), but simply to bring some rationality to the argument, as it is currently missing.

[EDIT: The following statement has been proven to be a wrong interpretation, but I leave it here out of humbled shame]Digg claims that Cassandra brings them “linear scalability”, yet every one of their Cassandra nodes is 100% replica of the other, meaning that a write (or 40,000 writes) on one is communicated and then replicated on every single other instance.

Response to Criticisms - 2010-03-25

This entry got picked up on a couple of excellent tech-oriented sites: Hacker's News and Reddit r/programming. Included in the comments of a lot of very smart people are a couple of common criticisms that I thought worthy of specific response.

"Your benchmark stinks. How about you..."

My benchmark, if you can even call it that, was focused on O(n) complexity and the difficulty of joins among very large tables with a half-decent database product, with the core take-away being "it's a solved problem. With proper indexing and a decent database system most datasets are `small'."

On the topic of concurrency, I mentioned that in the entry, noting that executing many parallel runs of the test yielded the same net output on the magnetic disk, while it actually significantly improved performance on the SSD and then leveled off. Database servers are fairly smart about concurrency and task queing.

The top result of 4100 resultsets per second, which was achieved using many simultaneous runs, still wasn't fully exploiting the I/O capabilities of the SSD, owing to the tuned-for-magnetic-disks nature of the database server that I didn't bother resolving.

However the Digg case study lacked significant details beyond a couple of spurious size details there to indicate, I believe, that "we think our data is large and the RDBMS can't service our needs". What I based my run on was quantity-of-data (which is not large in the land of databases) and key phrases like "from a cold-cache" (which can be reasonably interpreted as "on a test instance"). There is a lack of details in the Digg benchmark, given that I don't think they were intending it to be a industry standard metric, so it isn't reasonable to expect so much more regimented discipline from mine. However let me say that I did take the meager stats that were given and, where possible, erred to the high end — where "hundreds of million" appeared, I went with 500 million (if I went with a billion it would have barely impacted the results, but I was impatient and didn't want to wait for the data setup script to run that long). Where they said "millions" I went with 10 million. Some of the responses are demonstrative of how fact-free the debate has become, so it's not particularly surprising that NoSQL blogs group-hug around it.

This is not a replication of Digg's runtime environment, and any illusions that it pretended to be intentionally misinterprets. Though if it were a serious apples-to-apples comparison I would have run it on a serious server with serious load simulations, where the only orders-of-magnitude would be the difference between the results and what I achieved on a dated desktop.

In fact, 36 or even 71 results per second is still far too slow for Digg's use (especially given that they are stuck with a web technology that forces synchronous database calls), and I'm not even purporting that to be a viable option for them as they add out a lot of data-intensive personalization options. It's simply to contrast against their abhorrent performance number which I think are grossly misleading.

"But Google and Facebook and..."

Sure. That has nothing to do with this.

"So you're an RDBMS guy who hopes SSD prevents change..."

I'm not an "RDBMS guy". In a prior outing I was declared a DBA because I didn't just roll over for the NoSQL propaganda, and now I'm cast as a guy who holds himself as a database expert. Actually neither are my primary competency, and I think that's the point: I don't purport to be Joe Celko, or even a remote approximation, yet even I can see some massive issues in Digg's case study.

I'm just a solutions guy that looks at technologies and tries to digg (har har) through the cruft and get to the truth, which can be tough amidst tech religions: Warp back to 2001 and try to have a rational discussion about XML. In the case of Cassandra (and many NoSQL solutions) there is stunning ease with which many make absolute statements about RDBMS, such as the many "relational databases can't handle large amounts of data, just look at Digg" claims that litter the web, while cheering on vaguery and unsubstantiated hype about NoSQL solutions.

Show me real performance numbers for NoSQL solutions: They are disturbingly rare. Instead the argument is dominated by noise comments and hand-waving about how grand NoSQL is because it just simply solves everything and makes everything great.

Digg's NoSQL performance advantage is achieved by localizing all of the data necessary for a given request — in this case "tell me all of my friends who Dugg this item/parent item" which they had precomputed and cast out — ignoring the problem of MySQL not competently doing joins (apparently it has troubles sorting as well).

That is overwhelmingly a storage seek issue, and Digg's solution was to turn many seek actions into one or two by massively exploding their core dataset so the data for every need is repeated and persisted for every possible use. I can say right now that there is no question that if I performed the same benchmark on Cassandra, drawing randomly distributed user-item buckets from the same magnetic disk, my performance would max out at the number of seeks per second of the disk, which in the case of a normal desktop drive is somewhere in the range of 100-200 seeks per second.

Of course NoSQL yields the same massive seek gain of SSDs, but that's where you encounter the competing optimizations: By massively exploding data to optimize seek patterns, SSD solutions become that much more expensive. Digg mentioned that they turned their friend data, which I would estimate to be about 30GB of data (or a single X25-E 64GB with room to spare per "shard") with the denormalizing they did, into 1.5TB, which in the same case blows up to 24 X25-Es per shard.

This is interesting, is it not? Maybe it rains a little on the NoSQL parade, but to me it's a pretty fascinating development.

"Why do you have to insult the Digg crew?"

I don't intend to insult them, but at the same time I don't fall in the camp that gives them credibility simply because they're behind a large site. Many of the largest sites on the internet made technology mistake after mistake, yet succeeded regardless because they have a good product: These are some serious examples where ideas beat out execution. PHP somehow formed the base of a good number of the internet's largest sites, yet are there many that will seriously argue for its technical superiority?

"These are two different tools. I'm sick of this argument! Let's get back to the NoSQL Propaganda Parade."

In many cases they are used to solve the same problem. In the specific entry I refer to for this post the whole point was "we were using an RDBMS, and now we're using a NoSQL and it's so much better", so is it really rational to claim that they're two completely different worlds?

"NoSQL solves different problems like scaling out, data centers, etc."

Orthogonal. Cassandra solves problems that you can't as easily do with MySQL. MySQL != the RDBMS industry.

   

Reader Comments

I have to admit to being caught off guard by the improvement you noted with the SSD, though I guess it makes sense given the relative I/O counts. I'm going to put in a req. for a couple of test units to see how it impacts our solution.
Rogue @ 3/24/2010 12:09:13 PM
Brilliant. This is a really interesting way of looking at it. As a suggestion, however, you might want to throw some images in there as it comes across as a bit of a wall of text.
John S @ 3/24/2010 12:31:52 PM
"every one of their Cassandra nodes is 100% replica of the other"

Get your facts straight before knocking down straw men, please.
Jonathan Ellis @ 3/24/2010 1:24:16 PM
> Just look at Digg claiming that Cassandra brings them “linear
> scalability”, yet every one of their Cassandra nodes is 100%
> replica of the other
I don't know where you got this idea, but it is absolutely not true. You configure the number of copies of your data using a parameter called replication-factor (RF). Digg probably keeps 3-6 total copies of each item, despite the fact that they have at least 10x that many servers.

Also, your benchmark "without caching" is also without simultaneous updates to the dataset, which has very negative effects on cache performance.
Stu Hood @ 3/24/2010 1:41:08 PM
From http://about.digg.com/node/564

"every node is identical and there is no single point of failure"
Dennis Forbes @ 3/24/2010 2:24:08 PM
"I don't know where you got this idea, but it is absolutely not true."

Seems irrational, but I pulled it directly from the Digg technology blog. One of their statements concerned the node-full-replica nature of their deployment. Why they would do that was a mystery to me, but just taking their word for it.

"Also, your benchmark "without caching" is also without simultaneous updates to the dataset, which has very negative effects on cache performance."

Hey there Stu. Absolutely no doubt about it, though I assume that when the staffer was working on that problem they were doing so on some sort of test environment. Maybe not, but yeah of course there would be a dramatic cost of I/O contention.
Dennis Forbes @ 3/24/2010 2:29:49 PM
> every one of their Cassandra nodes is 100% replica of the other, meaning that a write (or 40,000 writes) on one is communicated and then replicated on every single other instance.

> "every node is identical and there is no single point of failure"

You've misunderstood that sentence. It's saying all nodes are identical from a protocol perspective (there are no masters/slaves, only "peers"), not that all nodes contain identical data.
Joe K @ 3/24/2010 2:46:52 PM
"every node is identical and there is no single point of failure"
does not imply:
"every one of their Cassandra nodes is 100% replica of the other"
Josh @ 3/24/2010 2:52:22 PM
Another "get your facts straight" comment. Cassandra is heavily optimized for writes, not reads.

http://wiki.apache.org/cassandra/FAQ#reads_slower_writes
Wooo @ 3/24/2010 3:08:27 PM
Which fact, exactly, does that "get straight"?
Dennis Forbes @ 3/24/2010 3:11:12 PM
Great post, Dennis. I think it's pretty telling that the defensive crew all so viciously focused on one irrelevant aside while completely ignoring the main point.

Two years ago the database was constantly the limiting factor in our deploy. Now I have to admit that it's probably our application layer that is our stress point. In our case the major improvement was the drop in memory prices, so we're now geared up with far more memory than it would ever need, and it dramatically reduced our IO contention issues.
Jacob @ 3/24/2010 3:15:03 PM
Our kit regularly shifts 5000 queries a second from SQL Server.

That is not enough to run Digg, even with the caching.
Chris Smith @ 3/24/2010 3:23:44 PM
"The fact that Digg is a large site says nothing to their technical credibility, and it is just as likely in spite of their technical ignorance than because of their technical prowess. Their site has sped up by leaps and bounds over the past year, so I suspect that they know what they are doing, but I'm wary of any cargo-cult like "they did {x} and they're a big site therefore it must be a good option" appeals to example."

You're "wary", but how often do you really think that web operations similar in magnitude to Digg are run by ignorant technical teams? You truly believe that the Digg team is just as likely to be technically ignorant as technically proficient? I find that hard to believe...
Max @ 3/24/2010 3:50:30 PM
Hey there Chris.

While my core intention was never to argue that Digg should run on an RDBMS (they are the ultimate example of a *perfect* user of a product like Cassandra), to make the case for their transition the relational database was maligned in the most fantastically ridiculous fashion, which was then held up as an example of the failure of that technology to meet their needs.

It's farce. Their schema was bad, their RDBMS product even worse, and moreso some technology shifts make the performance advantages (although not other elements like geographical distribution, which isn't the contention of this) moot or actually a net negative.

I can perform 4100 lookups a second against a junction of a ten million row table with a five hundred million row table on a low-end desktop, which zero caching beyond the basic caching of the DB. That's without getting into the serious performance options.
Dennis Forbes @ 3/24/2010 3:58:05 PM
Dennis, I feel like your test is a bit lacking in that there were no simultaneous writes going on messing up the caching and indexing while you're querying.

This all reminds me of what Joel describes as the infrastructure for StackOverflow. Very straight forward, unremarkable hardware, yet excellent performance and lots of spare capacity.
Almost Anonymous @ 3/24/2010 4:05:11 PM
Perhaps your call for rationality concerns those who foretell the death of SQL. I'll agree with those aims, SQL databases aren't going away. Swinging a k/v, document or column oriented hammer at every data mgt nail is silly; some things *really* belong in an RDBMS. I'll also accept as a given that performance problems will largely evaporate if you can keep your working set in RAM. And I'll agree that SSDs are a disruptive change in how we should think about persistence. However, unless you're subjecting large data sets to high update volumes, posting about high QPS on an RDBMS is specious. I'm sure you know that getting good performance from a single database instance hosting read-only data is trivial. But the real world workloads of apps like digg aren't comparable.
Ian Kallen @ 3/24/2010 4:11:52 PM
Ian and Almost,

Yeah, I'm certain that writes upset the apple cart. I also would wager, with very strong certainty, that the numbers posted by Digg were done on a test instance (I doubt they were running 14 second queries against prod while trying to develop this functionality).

Secondly the point was not that it was 10%, or even 30%, faster. It was that it was 50000% faster and beyond, and most importantly that with SSDs -- which is less tenable with the NoSQL "explode it all out" approach -- it was 6,000,000% faster (okay...5,999,900% faster). This isn't a nit picking type comparison. It's orders of magnitude.
Dennis Forbes @ 3/24/2010 4:41:45 PM
Besides the question if NoSQL is just a hype or not... are your really sure, that you are open-minded enough to listen to these people, catch up their good ideas, throw away the crazy ones and integrate it into a SQL-ng standard?

For me it's clear that you can do everything you can imagine using SQL - just like C or machine code - but we live in 2010 and why should I still implement everything again and again using low-level methods and SQL anti-patterns? Why is there no SQL for databases optimized for key-value solutions? Why no one for document solutions? Why no one for graph based solutions? Why no one for deep-integration into programming languages besides LINQ? and so on... I think there would be no religious war if the SQL-guys would provide the tools the developers are asking for!

But in the end, in my opinion there is only one real database - the internet - and for this scenario your "just buy more ram"-attitute does not really work... neither for any query you have to ask more than one database for ;)

(Disclaimer: I'm involved in a NoSQL solution... or better a graphDB solution, and we spend a lot of time implementing a SQL/OQL-based GraphQL - we love it ;) )
Achim Friedland @ 3/24/2010 4:44:12 PM
"Rudimentary database design and proper usage is the basic responsibility of developers, and if you don’t know it then it is your responsibility to learn it."

I recently finished a MS CS at Duke, during which I took the intro "grad DB" class. My single biggest take-away from this class was an idea I reduced down to the pithy: "execute programmers, not queries." If you expect 80-way joins against terabytes of data to "just work" quickly, all the time then YOU'RE DOING IT WRONG. Expecting RDBMS systems to consistently work miracles in the face of horribly complex and ill-specified requests . . . is just a recipe for disaster.
Matt Sayler @ 3/24/2010 5:34:28 PM
Generally when I've run SQL Server and Mysql on the same machine I get much better performance (3-10x) out of Mysql. I will say, however, that I've seen certain RAID arrays on UNIX systems on which Mysql performance just sucks. I've never really gotten to the bottom of it.

There's definitely a lot of people who are jumping too fast into fashionable technologies. I had a job that took 3 days to run, and many of my pals were telling me to use EC2 and Hadoop. I took a little time to redo my data structures and algorithms, eliminated the random access I/O and got that 3 day job down to a 20 minute job.
Paul Houle @ 3/24/2010 5:42:33 PM
Hey there Achim!

"are your really sure, that you are open-minded enough to listen to these people, catch up their good ideas, throw away the crazy ones and integrate it into a SQL-ng standard?"

A year from now someone is inevitably going to look back and say "Geee, didn't you used to be against NoSQL? Why have you switched positions?"

Yet it will be the case that I didn't switch positions at all, but instead (maybe it's just courtesy of my ability to use weasly-wordiness) I say a lot without actually declaring a position at all? In the end my primary campaign has been simply an honest and level playing field where technologies are argued on merit and not fanaticism.

There is a lot of unsubstantiated hype right now, and really it is somewhat staggering how much is said in favor of Cassandra -- given that it's the great hope right now -- with poorly defined/substantiated anecdotes and an absolute dearth of actual comparable metrics. Even in the Digg case, look at how fuzzy their definitions are, and how difficult it is to counterpoint it, even when talking about many, many scales of magnitude?

In some ways the NoSQL thing is hyped along like various diet crazes, with a kernel of truth taken and cast out into a million distorted directions, way beyond the realms of reason. Somewhere in the middle is the rational truth, but it can be get at with all of the "you're either with us or against us" zealots.
Dennis Forbes @ 3/24/2010 5:56:08 PM
you brought up a lot of very valid points in this article and for that you are to be commended but you seriously can't expect to create your own schema and data and say that it is even remotely comparable to the schema/data that you are trying to prove is inefficient

I do agree with you quite a lot though that there is a SEVERE lack of knowledge with basic comp. sci. skills out there that most people who are in our field should be straight up embarrassed for
ian @ 3/24/2010 6:47:50 PM
You're one of those people who still don't get.
Chris @ 3/24/2010 6:53:37 PM
Hey there Ian!

I based my schema directly on the one from the Digg post, and distributed my data by upping the amounts given in the same (e.g. 500 million Diggs where he says "hundreds of millions", and 10 million friend relationship where he says "millions").

However no I don't intend it to be a direct comparison (and if it came anywhere within two orders of magnitude I would have just moved on). Instead it's an assault on the absurd claims by not only Digg, but by others, that relational databases just can't do lots of data, and once you reach that scale it falls apart and you have to evolve to something better.

That is truly absurd, yet it's repeated like a mantra. Digg's old architect has been on the road casting all of their failures with MySQL as failures of RDBMS and it's really a bit bizarre.

No doubt that joining a lot of rows -- beyond the limits of memory caching -- can be a trying ordeal as a simple facet of data locality (which is 100% what the gain from Digg's Cassandra implementation is, reducing many, many seeks down to just a couple), but it's one that SSDs and embiggened memory quantities are making a thing of the past, and it isn't just that "oh, it makes the RDBMS a contanda again!", but it actually puts solutions like Diggs not just on par, but at a disadvantage.
Dennis Forbes @ 3/24/2010 6:55:25 PM
I have to say that like many others, you completely miss the point of "NoSQL" solutions.

Some argue that NoSQL addresses "impedance mismatch", and they are completely wrong.
Others say that it is because RDBMS is slow and of course nothing could be farther from the truth.

The real reason behind modern NoSQL is actually an attempt to solve 3 technical problems that every big company faces:

Transparent automation of sharding, replication and failover.

It's when your application does not concern itself with sharding, replication and failover. When it is not aware from which server the data it is requesting is coming and to which server and how many times it is written.

Unfortunately there no RDBMS that allows you to transparently automate these tasks, no matter how much money you throw at it.

This is the true goal of NoSQL and true reason all big companies are using it.
Vagif Verdi @ 3/24/2010 7:17:34 PM
Brilliant post, Dennis. Maybe it's just because I've made successful RDBMS solutions that I'm biased, though I have nothing against NoSQL solutions either.

You're going to get attacked by the knowNOthingSQL camp from every direction, however. They'll argue that it's for fast reads, fast writes, decoupling, coupling, up, down, left, right. The "movement", as it has been coined, has all the hallmarks of classic snake-oil sales pitches.

And there are of course the classic signs of complete and utter ignorance intermixed. Have these people never heard of Sybase ASE? Oracle RAC? Greenplum? Vertica? Countless other solutions that have the advantages that they ridiculously hold as the sole domain of "NoSQL"? Of course they haven't, their heads dug deep in the sand while they cheer each other on.
Jacob @ 3/24/2010 7:22:33 PM
It does seem like the web folk regard everything done before
as 'old folk' stuff and 'those old guys just don't get it'.

I rememember writing a 'nosql' solution in awk 20 years ago. It
seems SQL, let alone SQL tuning and database design, is
regarded as black magic in many quarters. It's a strange
world, I wonder if the main frame guys had the same
thoughts when mini computers came along?

Don't they teach merge sort, disk latency, disk block structure
etc. at uni anymore?
old guy @ 3/24/2010 7:55:41 PM
Great post. Thanks for the reality check.
Ranga @ 3/24/2010 8:25:15 PM
Achim,

"For me it's clear that you can do everything you can imagine using SQL - just like C or machine code "

The mistake here is assuming that SQL is a low-level interface, but it's not. It's a high-level interface that abstracts all the details of collecting together rows of data, filtering it, etc. Most NoSQL solutions are much more low-level than SQL; taking away the high-level query language that runs on server and forcing you to do much of the work yourself on the client.

"Why is there no SQL for databases optimized for key-value solutions?"

You could simply store JSON in a table row and you've accomplished nearly the same thing.

"Why no one for deep-integration into programming languages besides LINQ?"

Actually, deep-integration of SQL is *old*. DBase, FoxPro, etc. It's simply gone out of favor. I've used FoxPro and SQL is fully and naturally integrated into the language.
Almost Anonymous @ 3/24/2010 9:29:24 PM
Great article. I don't think I've ever said this before, but I think a tiny bit of polite rhetoric would go a long way towards helping people understand your position. Maybe giving in examples of where NoSQL has a serious advantage (like iterating a product over a constantly evolving schema), or at the very least stating the obvious "no silver bullet" disclaimer.
Andrey Fedorov @ 3/24/2010 11:47:41 PM
Achim:
Since my main RDBMS is PostgreSQL, I'll answer from that perspective.

"Why is there no SQL for databases optimized for key-value solutions?"
Key-value is not a solution, but nevertheless. A two field table works here, one field for key, one for value, index on key, and use a functional index to index on some property of the value.

"Why no one for document solutions?"
A document store can be represented either in a normalised schema, or it can be serialized into XML or JSON into a large text field (or field of XML type) in a table (preferably XML, as most RDBMSes have integrated native XML support. You can then create functional indexes over that field using, f.e. xpath queries. A similar solution using JSON could be created with a field of text type (or using a JSON domain) which you could then use external language stored procedure support in PgSQL to write validation and accessing functions for the data, and then use those functions as the basis for indexes. Full text search could also be used, if necessary. Also contrib/hstore might be helpful there.

"Why no one for graph based solutions?"
A graph can be represented quite easily in an relational database, and with recursive query (recursive CTEs) support (or just using stored procedures) they can be queried efficiently. Either use the adjacency model or nested set models to represent your relationships.

Let's leave the SQL-ng or whatever effort for real problems, like temporal data, instead of problems created by poor choice of RDBMS, lack of familiarity with the state of RDBMS technology or lack of knowledge of SQL or the relational model.
Jay @ 3/25/2010 1:45:48 AM
Hey "Almost Anonymous"...

"The mistake here is assuming that SQL is a low-level interface, but it's not."

For me SQL is just a low-level language for sets. But what I want is a declarative database language which:

- has a first class support of n:m-relations without maintaining a
relation table on my own,
- has a first class support of generic lists
- has a first class support of key-value pairs (which is something else as a table, as key-value-pairs might lead to other sharding and locking strategies)
- ...a lot more SQL anti patterns...

In all these points SQL isn't a real declarative language anymore, as I can not tell the database what I real want. Sure there are always ways around it, but that's no longer the spirit of a declarative language.
Achim Friedland @ 3/25/2010 2:42:46 AM
Hi, I am about two years into to my comp sci degree.

I am doing a paper on NoSQL vs RDBMS. From the material I have researched, NoSql is rarley "sold" as an "all-in-one" solution. Most of the companies that are using No-Sql also use RDBMS. They use RDBMS for structured, sensitive data, and "No-Sql" for large chunks of unstructured data. Facebook wallposts, amazon reviews, twitter images, etc.

What RDBMS articles seem to never discuss is "the compromise" that creates a more efficient hybrid system. 4 out 5 No-Sql papers always bring up this hybrid solution.

~~.23 cents
Tim @ 3/25/2010 3:53:12 AM
And one important diffrence between your test and the Digg case is that Digg nowdays has datacenters on both the east and west coast of the USA. Introduce a slow network pipe between your two instances and rerun the test?
Staffan Ericsson @ 3/25/2010 5:29:46 AM
From your description of your test case I assume you are running a single query at a time and counting how many times the queries gets executed in a given second. How about running a test for 1000 simultaneous users reading and maybe say 10 simultaneous writes?

I am sure you are going to see different numbers when you do simultaneous reads (and some writes)
bhatti @ 3/25/2010 6:22:20 AM
How about a better statement. There is a time and place for the various DB technologies. Developers need to learn the strengths and weaknesses of the technologies that they are using.
Scott Bevington @ 3/25/2010 6:48:04 AM
Boy oh Boy, it would be a gas to work with folks who understand.
Robert Young @ 3/25/2010 7:09:23 AM
I think in-memory-databases prevent these type of problems http://www.mcobject.com/in_memory_database
David Rukstales @ 3/25/2010 7:21:00 AM
@Jay

As I wrote earlier you can do all this using SQL - no problem.
But e.g. using SQL + stored procedures for graphs isn't really a declarative way of doing it, as it is just as good as using textfiles with some lines of code.
Achim Friedland @ 3/25/2010 7:42:48 AM
(1) the team there couldn't solve the problem with RDBMS, even if someone brighter could have
(2) the team there was able to solve the problem with nosql

that's interesting, even if someone brilliant can solve the problem with an RDBMS. sounds to me like nosql makes scaling easier, at least for the masses.
anon @ 3/25/2010 7:43:53 AM
>> The key, of course, is proper indexing, trading write performance for read performance targeting your specific needs.

A common statement, but mostly wrong. Real databases, SQL Server included, don't impose a write penalty simply due to indexes. There is only one case where indexing affects writes, and that's when an indexed column is updated (or a new row inserted, of course). (In the normal course of events, page overflows/splits are a far more common write penalty.) In normal circumstances, indexed columns will not be updated, since they represent identity characteristics of the row. If the indexes are important, they will be in buffers, and the update will appear there immediately, thence written to log then to the file (or the other way round, depending on the DB logging architecture).


>> This is not esoteric knowledge, and instead is rudimentary comp. sci. knowledge.

That used to be true. From what I can see, CS grads over the last decade never took a DB course, or slept through it. It's all about java these days.


>> Turning a small amount of data into a massive amount of data to improve performance paradoxically makes SSDs much less attainable (because the cost per GB is so much higher), and humorously may thwart the end goal. It also reduces the ability to memory-cache the relevant data.
>> This is overwhelmingly a storage seek issue, and Digg's solution was to turn many seek actions into one or two by massively exploding their core dataset so the data for every need is repeated and persisted for every possible use.

The point of my endeavor, exactly. If these young-uns had read Codd and Date and Gray & Reuter more, and less futzing with yet another killer language, they would understand that data reduction starts with the data model (and, no, xml is not a data model).
Robert Young @ 3/25/2010 8:14:39 AM
@Paul Houle:
Generally when I've run SQL Server and Mysql on the same machine I get much better performance (3-10x) out of Mysql. I will say, however, that I've seen certain RAID arrays on UNIX systems on which Mysql performance just sucks. I've never really gotten to the bottom of it.

Depends mightily on which flavour of MySql one uses. The InnoDB engined version is far closer to industrial strength engines (BUT, since it implements Oracle's MVCC semantics, not so much vanilla SQL Server) in terms of services provided, and thus will appear slower when used as a file server for application logic. OTOH, vanilla MySql (any version which behaves like 3.x) provides nothing more than a SQL parser fronting the file system and no services, so it will appear faster; all the data integrity logic (you do have some, don't you) is spaghettied through the application code.

If you're using the InnoDB engined version, and application code written to 3.x semantics, you're getting the worst of both worlds.
Robert Young @ 3/25/2010 8:26:04 AM
Finally, I think: taken to its logical conclusion, the future for smart developers will be Real RDBMS, with a minimal cover normalised dataset and lots of joins, running on multi-core/processor SSD machines implemented with stored procedures. Such development will involve client side coders making pretty screens, and server side developers taking care of the data. I, for one on the data side, am willing to bury the hatchet in the spirit of cooperation. Now, to find just ONE client side coder willing to do the same!!! :)
Robert Young @ 3/25/2010 8:31:57 AM
Well done. For some reason when programmers encounter database performance problems they tend to blame the RDBMS or SQL itself rather than buckle down and try to understand it. This article is a step in the right direction.
Noah Yetter @ 3/25/2010 8:58:27 AM
@anon: The team there used Cassandra because their other Bay Area buddies used it too and they could trade notes. Plus, a dash of FOSS elitism thrown in. I doubt they put serious efforts with other, better RDBMSes or they'd have talked about it too.
another anon @ 3/25/2010 9:05:03 AM
On the topic of SSDs, I highly recommend a look at RethinkDB, which is a new engine for MySQL based on the different perofmance characteristics of SSDs. The blog is interesting too.
alphadogg @ 3/25/2010 9:42:12 AM
Excellent article!

Just wondering how long it did take to create your test data set?
Jochen S @ 3/25/2010 2:56:22 PM
So your article says NoSQL can scale, but why bother when you can throw lots of money into commercial RDBMS and hardware. If this is the case, then why is NoSQL bad?
Mark Smith @ 3/25/2010 3:23:30 PM
Loved this article... was very enlightening. I like the relational model a lot. I'm gaining mastery over the model and building abstractions that rely on it in my own work. But I don't have mastery over what I can expect from relational systems on the performance / scalability front. This piece gives me hope that my work to fully leverage the power of the model is not in vain and inspires me to learn more about indexing and other optimization techniques. Here's hoping there's more open source resources dedicated to improving relational technology and leveraging these new SSDs!
Nathan Sobo @ 3/25/2010 4:05:38 PM
I am not an database expert but have to agree with Dennis that developers should have some fundamentals before jumping in to designing something that becomes a laughing stock later.

Coming to MySQL, most of the people use MySQL because it is free and most of the hosting companies give you multiple instances for free. For normal applications with the complexity of 5-7 in the scale of 1-10 can easily use MySQL and it performs very well.

The reason why it is always "MySQL to NoSQL" is because most of the startup companies use MySQL for the same reason I mentioned above and when they reach a breaking point they will consider moving to NoSQL rather than moving to commercial version of the DB like "Sybase / SQL Server / Oracle" because of the licensing cost. Think about how much money a company has to pay for an Enterprise version of any of the commercial DB.

MySQL is good but it comes with its limitations and one has to know what those limitations are. I have used MySQL in multiple projects and it has worked out great so far. We have some issues with the .NET driver and with the current release of the driver, even that is gone.
Shiv @ 3/25/2010 7:42:16 PM
>> Think about how much money a company has to pay for an Enterprise version of any of the commercial DB.

But, they don't consider the cost of creating and maintaining and debugging all that application code they have to write in order to get the services they get from the engine of an industrial strength DB. Of course not; they _want_ to write code until the cows come home. Designing a proper BCNF database, and running it in a proper engine eliminates so much coding effort that many of them would become redundant. Can't have that, now can we?

And, so far as $$$ go, Postgres is every bit as functional as commercial engines.
Robert Young @ 3/26/2010 7:50:16 AM
Why were they using MySQL and not Postgres...
GuessWhat @ 3/26/2010 1:42:02 PM
Excellent post Dennis!

I've been enjoying your past few articles on this subject, which seem to have a more balanced view. My experience lies more in the commercial RDBMS camp, but it would be an interesting exercise to see how Postgres fairs in your tests, as it's starting to become my open source database of choice. Keep up the good work!
Garett @ 3/26/2010 3:04:34 PM
It's funny to see all of this NoSQL guys... Of course NoSQL has it's interest, but hey it's like everybody has a startup that plan to be the next Facebook or youtube... But still has a tiny budget an their server belong to an eeePC in their student room right now. But if a solution need 10X more time to be implemented it's not a problem.

It appear the interned bubble has not exploded yet...


I throught that the main case for us developper was that we were employed by real enterprises... Not a startup located in some studient room.

And only a few of us will end up trying, or succeding building the next digg, youtube or google... It's funny because the "Internet" community is very different from real people.

Here all of us know the NoSQL movement, Scala and all. But outside this community nobody does, included the developper community !

What is said on the Internet is really different from the reality. The reality is much more like Dennis express it than the basic NoSQL fanboy.

Nobody care of NoSQL outside the internet community. Even there, most will end up building a RDBMS system anyway because they work for a bank or thing like that. With volume far far greater that digg ones.

If a comment is lost on twitter nobody care... If some financial transaction is lost, it's another thing...
Nicolas @ 3/27/2010 1:00:23 PM
A year back, I was responsible for evaluating and recommending a scalable and ultra-low latency database (the SLA was 95th percentile response time on queries under 30ms). Cassandra was one of the systems I evaluated, but the organization I was working for was incredibly conservative and slow-moving and the right solution for them was PostgreSQL on SSD (I benchmarked with FusionIO). They ended up going with Oracle because a free product with 50% better performance was not enterprise grade, but I digress.

I benchmarked MySQL on the same hardware. I was getting 16,000 tps on with PostgreSQL at 1.2ms 95th percentile latency, with MySQL that was over 100ms, oddly enough for queries that only used SSD (we tested with a mix of SSD and HDD to see if there was any priority inversion problems in the DBMS engines). This was MySQL with InnoDB and the Google/Percona patches, yet there clearly are internal bottlenecks that preclude high performance, no matter what.

The Digg guys' paper only shows one thing, that they have absolutely no clue what they are talking about, and that they confuse the limitations of their toy database with limitations in all RDBMSes. Preaching with the fervid conviction of the newly converted, they wax lyrical about a system that, when I last evaluated it, lacked something as basic as an online ALTER TABLE ADD COLUMN facility...

I have no doubt there is a need for a NoSQL system that relaxes some of the ACID properties in situations where untimely responses to a query are more objectionable than the risk of data loss in case of systen failure. I doubt Digg falls under that category, probably only Google and Facebook do.
Fazal Majid @ 3/29/2010 8:31:36 AM
It's hard to compare your results with the server load of Digg. You have to imagine thousands of concurrent users making simultaneous requests which fire off a plethora of INSERTS, UPDATES, and SELECTS. You can't simply assume that the INSERTS and UPDATES will not drastically affect the table and row level locking, thereby drastically reducing the performance of subsequent SELECT statements. There's too much variability involved. It would also be useful to know Digg's estimated read/write ratio and queries per second. This might help perform more accurate benchmarking.

I will go ahead and say, however, that I'm sure their database was far from optimized and could have been tweaked much more prior to switching architectures entirely. Nice article nonetheless as it's created quite the commentary.
Corey Ballou @ 3/30/2010 6:02:14 AM
Dennis,
when writing about MySQL performance, even if it is only indirectly as in your case, you should also keep in mind that "MySQL != MySQL".
MySQL comes with two storage engines out of the box.
The known performance limitations of these two engines when it comes to JOINs are widely discussed.
Additionally, MySQL customers can choose to swap the storage engine with third party products or develop their own storage engines.
MySQL storage engines like 'Kickfire' perform at the same level as SQL Server at much better cost/query efficiency.
Christoph @ 6/3/2010 2:21:31 AM
As a LAMP developer who's looking at some MySQL alternatives it's shocking to hear SQLServer getting results like that. In this world you never even hear about real db products.

the progression is always this:

begin: single db server
bigger: add slave
bigger: add more slaves
bigger: add memcache [and pepper your app w/ (cache?cache:sql)]
biggest: scrap architecture, & most of ur code for something exotic b/c high-end databases are too expensive.

given your results it really makes me think we're throwing the baby out with the bathwater in regards to mysql.
lurchpop @ 6/5/2010 2:35:17 AM

Add Comment

Name *:

Email Address:

(your email address is not displayed)
Website:

Comment *:



About the Author
Dennis Forbes Dennis Forbes is a Toronto-based software architect. While focused primarily on the .NET and SQL Server worlds, Dennis frequently ventures outside of this comfort zone into game development and image processing. He has been published in several industry magazines, has been quoted in the Wall Street Journal and has been interviewed by NPR.

He is a vice president and lead software architect at an innovative New York City hedge fund back-office services firm.

Dennis has been working on solutions for the financial, telecommunications, and power generation markets for over 15 years.





 

Dennis Forbes