Sunday, March 14 2010

Cassandra has gotten a lot of hype lately, having been recently chosen as the nucleus of the Digg upgrade simultaneous with Reddit taking baby-steps to the platform. Digg is promoting their revised technology stack as enabling a "wicked fast" experience that is much more individualized, while Reddit is thus far only really using it as a drop-in key/value replacement for MemcacheDB.

And of course Cassandra is well known for its use by Facebook and Twitter.

Naturally, given the white-hot hype, most want to see what the big deal is. Emerging web technologies often require that you either have a Linux box available (either a physical box or a virtual instance in a product like VirtualBox), however with just a couple of minor config changes and deviations from the docs, you can do a trial run and kick the wheels on Windows as a first class host, even if any possible production use would almost certainly see it deployed to Linux.

Cassandra is layered over Java, and of course a benefit of that platform is that it is inherently cross platform.

  1. Download and install the latest Java JDK. Ensure, post install, that the JAVA_HOME environment variable is pointed at the root of your JDK install (which on a 64-bit machine might be C:\Program Files (x86)\Java\jdk1.6.0_18).
  2. Download Apache Ant. Uncompress to the folder C:\Apache\Ant (giving you files like C:\Apache\Ant\bin\ant.bat).
  3. Download Cassandra. Given that you're probably going to be playing around for a while, go with the 6.0b2 copy, downloading the bin version. Uncompress the package into the location C:\Apache\Cassandra.
  4. Open a command prompt and navigate to the cassandra directory (e.g. after running C:, do a cd C:\Apache\Cassandra). Run the command C:\Apache\Ant\bin\ant ivy-retrieve. This will download Cassandra depedencies.
  5. Edit C:\Apache\Cassandra\conf\storage-conf.xml, updating lines 188 - 193 to replace each instance of /var/lib/cassandra/ references with C:\Apache\Cassandra\Files\.
  6. Copy the files from C:\apache\cassandra\build\lib\jars (which are the files that ant downloaded) to C:\apache\cassandra\lib. It isn't the most elegant solution but it's the most concise in point form.
  7. In a command prompt, after running C: and cd \Apache\Cassandra, run the command bin\cassandra. Cassandra should start up successfully (and if applicable the Windows firewall will ask if it should make an exception). If it doesn't start successfully you likely didn't follow one of the prior points correctly.
  8. In another console window navigate to C:\Apache\Cassandra and run the command bin\cassanda-cli. At the prompt run the command connect localhost/9160 and you should connect. You can now try out some of the simple set and get commands you can find in the README.txt.
  9. Start reading up on the Thrift API, the basics of data storage, what a "super-column is", and so on.

I've been playing around with various NoSQL* solutions for some time, however given the incredible hype — which is strangely coupled with a complete lack of any objective measure — I've decided to put it to the test. In the next couple of days a high-performance SSD will arrive and I'll gather some metrics for objective purposes, because the message being sold doesn't technically pass the B.S. test.

* - A better name than "NoSQL" is desperately needed. Backronyms and revisionist history — seriously, guys, "Not Only SQL?" — don't solve the problem that the name is incendiary, inaccurate, and a little ridiculous. KVDBMS works for some of the products, but isn't quite applicable to richer solutions like Cassandra.

Cassandra is a very, very cool product, and I immediately see lots of very interesting uses for it, but what is most striking is what is missing from the product. It is so intensely bare-bones at the moment, which is exactly how MySQL made inroads: When it first became the first-love of many of the same people and sites that now herald NoSQL (the same people who almost without fail rallied behind PHP which...well...enough said), it was almost comically deficient as a database product, but as it grew those features it grew away from its core contingent.

Exciting times regardless. There are many niches in the technology space, in which the appropriate solutions should be applied, so it is always worth keeping an open mind.

 NoSQL  SQL  Cassandra 
   

Reader Comments

Thank you Dennis. It works like a charm.

I'm very intrigued by the comment about metrics (benchmarking?) and SSDs.
K.B. @ 3/14/2010 9:55:01 PM
It's been frustrating trying to decide whether or not it's worth the time to learn about a NoSQL database product with the only major source of information being the throngs of bad developers writing bad SQL against bad databases and proclaiming it bad. I'm very much looking forward to an objective assessment of this product from someone with the experience and technical chops necessary to actually perform a fair evaluation/comparison.

So thanks for choosing to invest some of your own time in this study - keep us posted!
Aaron G @ 3/15/2010 7:29:26 AM
This work perfectly. Thanks mucho!

I didn't realize it was Java, as Windows is completely unmentioned in the docs or on their wiki. Weird because it works superbly.
sergei @ 3/15/2010 1:27:39 PM
I see the hype, and I used it.

It is nothing more than a key/value store. They add the redundancy ability and the load balancing component. It will get better, but if you think about it, a database could do exactly this.

Throw a 2 column table together, key/value and the value column can contain many many fields within the one actual column. So yeah, you could do it and probably get great performance in fetching the values, but this is not the value that the RDBMS provides, so if that is all you are going to do, then you should be using things like Cassandra and not an RDBMS. Just saying....
Tom @ 3/22/2010 7:55:52 AM
test
Test @ 4/2/2010 10:12:33 PM

Add Comment

Name *:

Email Address:

(your email address is not displayed)
Website:

Comment *:



About the Author
Dennis Forbes Dennis Forbes is a Toronto-based software architect. While focused primarily on the .NET and SQL Server worlds, Dennis frequently ventures outside of this comfort zone into game development and image processing. He has been published in several industry magazines, has been quoted in the Wall Street Journal and has been interviewed by NPR.

He is a vice president and lead software architect at an innovative New York City hedge fund back-office services firm.

Dennis has been working on solutions for the financial, telecommunications, and power generation markets for over 15 years.





 
Earlier EntriesLater Entries

Dennis Forbes