Monday, 5 July 2010

Getting started with Cassandra and .NET

Over the past couple of days, I've started playing around with Cassandra in an effort to satisfy my curiosity and to see what it's all about. In terms of databases, SQL Server is my main skill - I first started using it nearly 10 years ago, and will always remember the first book I read cover to cover on it: Inside Microsoft SQL Server 6.5 by Ron Soukup. So Cassandra is a step outside of my comfort zone; something new and a little alien to me....a challenge. The aim of this post, is mainly to dump my early thoughts, key points and useful links - basically an overflow area for my brain; things that I find are important/helpful resources.

Along with SQL Server my other core technology is C#. So what I'm trying to do is get up and running with Cassandra and knock up some C# to talk to it. Just something basic will do - a "Hello Cassandra" app. It's not about the pros/cons of NoSQL, SQL Server vs Cassandra vs anything else, and it's definitely not a guide on best practices when it comes to this subject (I'm hardly an expert after 2 days). End of disclaimer.

First things first, what is Cassandra?
It's a highly scalable, eventually consistent, distributed, structured key-value store.
Check out the WIKI on
Also a great reference here on the architecture overview.

You can install it on any OS for which there is a Java runtime. Windows is what I'm comfortable with, so that's the host OS I used (XP to be exact - yes, my laptop is old-school, cue violins). Future steps will be to get more comfortable with Linux and have it running on a number of VMs.

Great guide for installing Cassandra by Nick Berardi (Twitter): Cassandra Jump Start For The Windows Developer. Gives instructions for Windows and Linux.

Data model - key terminology
Cluster, ColumnFamily, Keyspace, SuperColumn - sounds strange. A read through of the DataModel WIKI page is a must.

From the WIKI:
Thrift is a software framework for scalable cross-language services development. Thrift allows you to define data types and service interfaces in a simple definition file. Taking that file as input, the compiler generates code to be used to easily build RPC clients and servers that communicate seamlessly across programming languages.
You can write your code directly using the Thrift API to Cassandra, but for beginners (like me) it's definitely worth downloading a high-level client. Plenty available for a wide variety of languages including C# / .NET, Python, Ruby, Java and more.

For .NET I originally went to try Fluent Cassandra (Github) which is being developed by Nick Berardi - after all, his site gave me the guide to get up and running. Plus he has a number of useful Cassandra articles from a .NET perspective. However, I currently don't have .NET 4.0/VS2010 so to get something going quickly, I looked at Aquiles (Codeplex) which requires .NET 3.5 or above.

That got me to the point where I am now - able to connect to Cassandra from .NET. Nothing shiny, nothing flash. But the main point is the links I've collated in this post are the ones I've used to get going with the basics.

Thursday, 1 July 2010

Validating an SQL query programmatically, with gotcha

If you want to check the validity of a TSQL statement programmatically, you can make use of the SET NOEXEC statement. As this MSDN reference states, SET NOEXEC ON will compile the query but won't actually execute it. This is ideal (well, nearly....) if perhaps you have a dynamically generated statement that you want to check is valid before executing it.

Example time
SELECT TOP * FROM sys.tables
This results in the following error, as I've failed to specify the number of rows to return for the TOP:
Msg 102, Level 15, State 1, Line 1
Incorrect syntax near '*'.
So if we correct the statement, then try again:
SELECT TOP 10 * FROM sys.tables
This time, as we'd expect, we get:
Command(s) completed successfully.
However, it appears that in SQL Server 2008, SET NOEXEC does not work exactly as documented. The documentation states it will validate the syntax and object names. As shown above, it did validate the syntax. But I found it didn't actually error when invalid objects are referenced.
SELECT * FROM NonExistentTable
This comes back as successful, despite the table not existing.
SQL Server 2005 DOES behave as documented, and throws error 208 - "Invalid object name".
I've raised a connect bug report - if you can reproduce it in SQL Server 2008, please add your input to that bug report. If anyone is using SQL Server 2008 R2, I'd be interested to know the behaviour in that - drop me an email or add as a comment below.

Another statement worth looking at, is SET FMTONLY which as per MSDN, returns metadata from the query without actually running the query. This does throw error 208 when referencing an invalid table name. If the query is valid, it will return the metadata of the resultset which is handy if you want to test the response format without executing the statement.