Thursday, May 12 2011

Max Schireson, president of 10gen (the makers of MongoDB), has made the case for document-based data systems – such as MongoDB and CouchDB – by arguing against the heavily-normalized relational model.

Max offers up his entry as a challenge to the “relational-is-always-best set”, asking them to prove that the complexity of storing data in a relational form is worth the trouble, at least for the scenario he describes.

Given that I’ve been anointed as an anti-NoSQL crusader on a number of occasions, I feel obligated to argue on behalf of the relational model, which I will do in a later entry.

Despite being a big fan of MongoDB. As I have done many times in the past I encourage everyone to download and play around with the excellent MongoDB product. Do yourself a favour by running through the tutorial.

All things have a place.

Recognizing the MySpace Angle And Its Inverse

I once sat in a meeting where a peer described the purportedly intractable complexity of a task they were failing at. They did this by drawing the various actors on the whiteboard and then detailing their many complex relationships.

Image the best path-finding algorithm. Now imagine the opposite: The least efficient, most unnecessarily sloppy routing imaginable.

That was how complexity was deceptively exaggerated, with absurdly circuitous relationship lines weaving to and fro. It was comical.

That memory came to mind, and how the deception goes both ways, while reading Max’s entry, and again when reading the linked entry by MongoDBer Kyle Banker.

When comparing the document model with the relational model, many if not all examples seem to contrast a complex relational model – one that encapsulates an end-to-end platform for a whole domain – against a trivial island of a tiny subset of data in a document structure. The former usually built to support entire operations and systems, while the latter tends to be crafted for one single purpose (like "allow customer services to look at an order", as was used in Max's scenario).

Max highlights relational complexity by pointing to an Oracle end-to-end order reference platform containing “126 tables”. Kyle does the same thing when comparing a simple could-be-one-single-row document (which humorous includes four relationships, which to resolve would require four expensive round-trips to the MongoDB server given the platform’s bizarre lack of server joins) against a complex catalogue schema. Both explain their arguably deceptive comparisons with statements like “Of course, this is not a complete representation of a product”…

I would argue that in such a case such a comparison shouldn’t be made at all. Why contrast an incomplete example of a document-based implementation – simplistic in its useless innocence – against a fully scoped relational platform?

It is the “MySpace angle” used to hide the ugly reality of technology. If you have a MongoDB simile of the compared product, have at it, but simply hiding the ugly details and zooming in on a non-functional, cherry-picked subset just misleads potential suitors.

Realtors use this trick when taking photos of homes, showing just enough of the grass while avoiding nearby structures. Your mind naturally extrapolates; imagining expanses of lush green fields, when in reality there’s usually another house imposing itself four feet over.

In Defense of Relational Databases

I have a full workload right now, but in the near future, during a mental lull, I will respond to Max. There is a very compelling counterargument to be made.

 C#  .NET  MongoDB  NoSQL 
   

Reader Comments

"... asking them to prove that the complexity of storing data in a relational form is worth the trouble..."

Non-sequitur, as storing data using the relational model is rarely complex. It seems complex when you *don't understand* the relational model, which sadly most developers don't (nor do they apparently care to).

Also, 126 tables is a near-trivial system. The system I care for daily has, at a quick count, 1904 tables, not including reporting structures.
Noah Yetter @ 5/12/2011 8:36:19 AM
That is a very cogent point. Our current, main operational system is comprised of multiple databases, the largest of which has 1300+ tables across a number of namespaces. some of those tables, where the use prescribes, use embedded XML documents for that odd case where that approach was optimal.
Dennis Forbes @ 5/12/2011 8:52:54 AM
Basically, a e-commerce system can consist of two parts: a CMS and a shopping cart.

The CMS can use anything you can imagine as suitable data-system, SQL databases, Cloud-storages such as Amazon Dynamo or simply a filesystem.

For the shopping cart a RDBMS is well, but I'd use only 4 tables: One for short product details, one for orderings including "negative orderings" for available numbers and one for customers and finally one for any informations associated with those customers (for example payment methods, addresses, ect..). I didn't care about "properly normalization".

When i need performance and/or availability for the shopping cart, i would trust on massive distribution.
Moriz @ 5/20/2011 2:42:35 AM

Add Comment

Name *:

Email Address:

(your email address is not displayed)
Website:

Comment *:



About the Author
Dennis Forbes Dennis Forbes is a Toronto-based software architect. While focused primarily on the .NET and SQL Server worlds, Dennis frequently ventures outside of this comfort zone into game development and image processing. He has been published in several industry magazines, has been quoted in the Wall Street Journal and has been interviewed by NPR.

He is a vice president and lead software architect at an innovative New York City hedge fund back-office services firm.

Dennis has been working on solutions for the financial, telecommunications, and power generation markets for over 15 years.





 

Dennis Forbes