Monday, 16 January 2012

Excluding nodes from XML data before returning from SQL Server

This post follows on from a question I recently replied to, for how to exclude a specific node from an XML column value before returning it, using TSQL.

Example setup

CREATE TABLE Example
(
ID INTEGER IDENTITY(1,1) PRIMARY KEY,
XmlField XML
)

INSERT Example (XmlField) VALUES ('<Root><ChildA>Value I want to see</ChildA><ChildB>Value I do not want to see</ChildB></Root>')
So if you want to return the XML minus the ChildB node, how do you do it?

Modify it

Literally, using the modify method that is supported on the XML data type. To quote MSDN, modify:
Modifies the contents of an XML document. Use this method to modify the content of an xml type variable or column. This method takes an XML DML statement to insert, update, or delete nodes from XML data. The modify() method of the xml data type can only be used in the SET clause of an UPDATE statement.
That last statement, to me at least, seems slightly open for misinterpretation as strictly speaking it isn't only valid in an UPDATE statement - you can use it as part of a standalone SET statement on an XML variable (though yes, to UPDATE that variable).

So, while you can't do something like this:
SELECT XmlField.modify('delete /Root/ChildB')
FROM Example
WHERE ID = 1
as it results in the following error:
Msg 8137, Level 16, State 1, Line 1
Incorrect use of the XML data type method 'modify'. A non-mutator method is expected in this context.
you can instead do this:
DECLARE @xml XML
SELECT @xml = XmlField
FROM Example
WHERE ID = 1

-- Delete the node from the XML in the variable
SET @xml.modify('delete /Root/ChildB')

SELECT @xml
That does the trick.

Dealing with multiple rows

If you wanted to do something like this from the database side when returning multiple rows then you'd need to either:
  1. create a scalar UDF to perform the removal. e.g.
    CREATE FUNCTION dbo.RemoveNode(@xml XML)
     RETURNS XML
    AS
    BEGIN
     SET @xml.modify('delete /Root/ChildB')
     RETURN @xml
    END
    GO
    
    SELECT dbo.RemoveNode(XmlField) AS XmlField
    FROM Example
    
    Note the parameter to the modify method must be a literal string, so you can't parameterise the UDF to make it a bit more generic by using that parameter in the call to modify().

  2. SELECT the xml values into a temp table/table variable, do an UPDATE like this:
    UPDATE #Temp
    SET XmlField.modify('delete /Root/ChildB')
    
    and then SELECT from the temp table/table variable.

  3. Use SQLCLR
Of course, the typical question should be asked - do you need to do this on the database side, or can you return the XML as-is from SQL Server and modify it accordingly in your calling code? In the original case acting on a single XML value, it's OK but when you start looking into scalar UDF's etc for multi-row requirements then you should be careful to check performance.

Monday, 2 January 2012

Round up of 2011

Another year is done and dusted so it's that time again - time to reflect on the past year.

Quick Timeline
  • March: Mongo UK - London
    A chance to attend my first NOSQL conference, having recently entered in the world of MongoDB for some projects. Picked up some good pointers and food-for-thought.
    Just some words: NOSQL, hard seats, gonna-need-a-bigger-venue, enlightening.
  • April: SQLBits 8 - Beside The Seaside - Brighton
    Highlight of the year for me, conference-wise (was unable to make SQLBits 9 - Liverpool). Fantastic venue, truly awesome speakers/technical content and a really great crowd of people - a true community. Out of all conferences I've gone to over the years, SQLBits never fails to stand out and this one took things to a whole new level.
    Just some words: Crappy code, community, awesome, invaluable, bean bags.
  • April: Microsoft Community Contributor Award
    I really enjoy chipping in on sites like StackOverflow and the MSDN forums, and this was a nice (humbling) surprise when I got the email to say I'd received this acknowledgement (blog).
    Just some words: Surprise
  • July: Job Change
    After a great 4 and a half years, it was time to move on to a new role. Took the opportunity to work with fellow SQLSoton attendee Matt Whitfield (twitter) - who by the way, has some great (FREE) SQL Server tools which can be found here.
    Just some words: New challenge, acquisition, end of an era.
  • September: Mongo UK - London
    A chance to brush up on the latest in the world of MongoDB after a few months away from it.
    Just some words: NOSQL, comfy seats, information gems, lego man USB stick.
  • Monthly: SQLSoton User Group
    A really good year of UserGroup meets down here in Southampton every month. Thanks to Mark Pryce-Maher (twitter) for his time and effort, and all the speakers who have taken the time to give a talk. From the first meeting just over a year a go - a small group, in a tiny side room, with no projector - to now - a larger crowd, in a bigger hall, with a projector - it's obvious SQLSoton is providing a valuable SQL Server community service to the Southampton/South Coast area. Check out all the UK SQL Server UserGroups over at SQLServerFAQ.
    Just some words: Community, pizza, swag, network.

Top 3 Blog Posts By Views
  1. High Performance Bulk Loading to SQL Server Using SqlBulkCopy
  2. SQL Server 2008 - Table Valued Parameters vs XML vs CSV
  3. Getting Started With Cassandra and .NET
  4. Ordnance Survey Post Code Geographic Data Load to SQL Server Using .NET*
* OK, so that's the top 4, not the top 3. But that's because I suspect #3 is actually skewed by people looking for "an entirely different kind of site"...I know I had to be careful which links to click when googling for Cassandra (as in the NOSQL database, obviously) and .NET!

Open Source Projects
I started on a few little side projects, which I stuck up on to GitHub for the world to see and (hopefully) use. First came a .NET app to load some of the freely available geographical datasets from Ordnance Survey into SQL Server (GitHub) which relates to blog post #4 above. Next came a MongoDB-backed ASP.NET session state store (GitHub). It was a lot of fun working on these, even though they are pretty trivial things, and was great to actually get some nice feedback from people who found and started using them.

Bottom Line
2011 was a good year. A lot of learning and new challenges meant I was kept on my toes. Looking back though, there are definitely things I'm not pleased about. I dropped the blogging ball in the second half of the year - activity pretty much flatlined for various reasons. So I'm disappointed about that. To quote a tweet by John Sansom (twitter)...
blogging is like going to the gym. Once you fall out of a routine it's tougher to get back into your rhythm
True that. Looking forward to 2012!

Thursday, 15 September 2011

OS CodePoint Data Geography Update Sept 2011

History:
GB Post Code Geographic Data Load to SQL Server using .NET
OS CodePoint Data Geography Load Update

Following a comment on my original post listed above, it appears the structure of the CodePoint data file has changed. This means that the importer will fail when attempting to import the latest data files supplied by Ordnance Survey as the columns in the files have changed.

I've pushed an update to the project on GitHub to address this - it now takes a sensible approach so the index positions are no longer hard coded. Instead, the CodePoint importer now takes an extra argument - the path to the CSV file that contains the column header listing as supplied in the supporting docs in the download. The column indices we are interested in (for Easting, Northing and PostCode) are worked out from this. Obviously makes it a bit more future proof, so if the structure changes again in the future, it should pick up on that automatically.

This is an update to the CodePoint import only at the moment as I haven't had time to check the Scale Gazetteer import as yet. See the project/readme on GitHub for more info.

Thursday, 18 August 2011

Getting Started with Riak and .NET

Earlier in the year, I started playing around with MongoDB using .NET and wrote up a Getting Started guide. Now it's Riak's turn.

Nutshell
Riak is a distributed, schemaless, data-type agnostic key-value "NoSQL" database written primarily in Erlang.

Check out the "What is Riak?" page on Basho's wiki.

Riak does not run on Windows so you'll need to choose your supported OS of choice to install Riak on. Having used Ubuntu before for MongoDB, I went down that route so my notes here are oriented that way. So, after getting a Ubuntu VM set up...

Step 1
Install the latest custom .deb package, as per the Basho guide.

Step 2
Before you start Riak up, you need to tweak the configuration in Riak's app.config file. Follow the Basic Cluster Setup guide, but note that when you update the IP address for the http interface per those instructions, also make the same change to the pb_ip config value - this is for the Protocol Buffers interface. For performance, I wanted to use the Protocol Buffers interface instead of the HTTP REST API and had some initial problems as the pb_ip configuration was still set to 127.0.0.1.

You then need to edit vm.args, per the setup guide.

Step 3
Now you're ready to fire up Riak. Run:
$ riak start
Then make sure all is well using:
$ riak ping
If all is well, it will respond with "pong".

Step 4
Now, back in the comfort of Windows (my Linux-fu is seriously lacking) we're ready to get going. There is no official .NET client library at time of writing, but Basho list a few community contributed projects. When I first looked, only 2 were listed and these appeared to be inactive with no updates since last year. It was a bit disappointing to see the lack of support for those of us in the .NET world. I then discovered CorrugatedIron, very hot-of-the-press, under development by OJ Reeves (Twitter | Blog) and Jeremiah Peschka (Twitter | Blog).

The simplest way to get everything you need to get going is via NuGet.

Nu-what?

NuGet. It's a Visual Studio extension that you can download from here. Once installed, fire up VS and create a new .NET 4.0 Framework project. Then in the Solution Explorer, right click the project and select "Manage NuGet Packages...".


In the dialog, search online packages for: CorrugatedIron.



Click Install, and it will download all the assemblies/dependencies you need. When it's finished, you'll see you have references to CorrugatedIron, Newtonsoft.Json and protobuf-net.

Open up app.config and you'll see some initial example config for CorrugatedIron. Configure that accordingly to point at your running Riak node. For example, my Riak node is running on a VM called "Riak1", with the REST interface listening on port 8098 and the protocol buffers interface listening on port 8087:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <configSections>
    <section name="riakConfig" type="CorrugatedIron.Config.RiakClusterConfiguration, CorrugatedIron" />
  </configSections>
  <riakConfig nodePollTime="5000" defaultRetryWaitTime="200" defaultRetryCount="3">
    <nodes>
      <node name="Riak1" hostAddress="Riak1" pbcPort="8087" restScheme="http" restPort="8098" poolSize="20" />      
    </nodes>
  </riakConfig>
</configuration>

Step 5
A quick check to make sure you can communicate with your Riak node:
static void Main(string[] args)
{
    IRiakCluster cluster = RiakCluster.FromConfig("riakConfig");
    IRiakClient client = cluster.CreateClient();
    RiakResult result = client.Ping();
    Console.WriteLine(result.IsSuccess);
}

Fingers crossed, and you will get a successful response! From here, you can start playing around with Riak from .NET and find your way round the client.

CorrugatedIron is still very new, so keep an eye on the site and/or on the GitHub repository. Give it a whirl, and be sure to let the guys know how you get on - I'm sure they'd be keen to hear from those using it and the project deserves support from those of us in the .NET world.

Useful tool
It's useful to have some form of GUI over Riak so you can actually "see stuff". I've been using Rekon which I found useful to get up and running with Riak. See the download/install instructions on it's GitHub repository page.

Hopefully, this post will help anyone else wanting to give Riak a try from a .NET perspective!

Monday, 27 June 2011

sp_executesql change between 2005 and 2008

Today I tripped over what turned out to be a difference in the way sp_executesql behaves between SQL Server 2005 and 2008 when executing a string containing a parameterised stored procedure call.

Take this simplified example:
DECLARE @SQL NVARCHAR(256)
SET @SQL = 'sp_help @obj'
EXECUTE sp_executesql @SQL, N'@obj NVARCHAR(100)', 'sp_help'
In SQL Server 2008 (10.0.4000.0), the above executes successfully.
In SQL Server 2005 (9.00.1399.06), it throws the following exception:
Msg 102, Level 15, State 1, Line 1
Incorrect syntax near 'sp_help'.
Adding the "EXEC(UTE)" before the stored procedure name in @SQL resolves the issue in 2005. As standard, I (usually) standardise on ensuring all stored procedure calls are made with "EXEC(UTE)" even when it is the only statement in the batch. Obviously, in this case it was overlooked and tripped me up!


Friday, 24 June 2011

The importance of "Working Set"

One of the things that I see cropping up pretty often is this thing called "working set". After recently chipping in on another StackOverflow question on the subject of "What does it meant to fit 'working set' in RAM?", I thought it was a good subject for a blog post. This is really just a copy and extension of my input on that question and focused in certain parts on MongoDB, but is also as relevant to other databases.

"Working set" is basically the amount of data and indexes that will be active/in use by your system at any given time.

Why is it important to keep your working set in RAM?

Accessing RAM is quick. Accessing disk is slow. When querying your data store, if all the data and indexes typically accessed are in RAM, then performance is blisteringly quick. If it's not in RAM, then disk access is required and that is when performance suffers. Hence it is important to ensure you have enough to hold your working set. The moment your working set exceeds the about of RAM you have, you will start to notice the performance degradation as it has to pull stuff back off disk, so it's important to monitor the situation and react.

Crude Example

Suppose you have 1 year's worth of data. For simplicity, each month relates to 1GB of data giving 12GB in total, and to cover each month's worth of data you have 1GB worth of indexes again totalling 12GB for the year.

If you are always accessing the last 12 month's worth of data, then your working set is: 12GB (data) + 12GB (indexes) = 24GB.

However, if you actually only access the last 3 month's worth of data, then your working set is: 3GB (data) + 3GB (indexes) = 6GB.

You need to understand your data set, scenario and the usage patterns, in order to work out a ball park estimate of your working set. Don't expect a black and white answer for what your working set is in your environment, from someone who doesn't know these things.

What if my working set sky-rockets?

Add more RAM. This can be a case of adding more into your existing node(s). Or, if you need non-trivial increases, making use of sharding to split the data over a number of nodes and just bring more nodes online as you need. This provides incredible potential to scale out your workload.

The key point is to ask yourself: do I have enough RAM for my working set? If the answer is: "I don't know", then get yourself to the position of knowing.

Thursday, 23 June 2011

Upcoming SQLSoton UG - 6th July 2011

The next SQLSoton (twitter) UserGroup meet is coming up on Wednesday 6th July. Matt Whitfield (twitter | blog) will be presenting some "notes from the field" and Alex Whittles (twitter | blog) will be giving a talk on automating cube documentation with SSRS, DMV & spatial data.

Don't forget to register!

Oh, and did I forget to mention? There's an XBox up for grabs. See you there? Thought so :)