Monday, 28 May 2012

Configuring Azure Storage Emulator SQL Server Instance

If you're using Windows Azure Storage, you are almost certainly going to be running the storage emulator during development, instead of working directly against your storage account up in the cloud. This emulator (which comes in the Windows Azure SDK - see the "other" category), allows you to test locally against local instances of Blob, Queue and Table services.

As per the Overview of Running a Windows Azure Application with Storage Emulator reference, the emulator needs a local SQL Server instance. By default, it's configured to run against a SQL Server Express 2005 or 2008 database.

If you want to point it at a different instance of SQL Server, for example a shared development database server, you can do this using the DSInit command line tool. I originally came across this MSDN on How to Configure SQL Server for the Storage Emulator, which led me to try the following in the Windows Azure Command Prompt:

DSInit /sqlInstance:MyDevServerName\MyInstanceName
This tried to create the storage database, but failed with the following:
Creating database DevelopmentStorageDb20110816...
Cannot create database 'DevelopmentStorageDb20110816' : A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: SQL Network Interfaces, error: 26 - Error Locating Server/Instance Specified)

One or more initialization actions have failed. Resolve these errors before attempting to run the storage emulator again. These errors can occur if SQL Server was installed by someone other than the current user. Please refer to http://go.microsoft.com/fwlink/?LinkID=205140 for more details.
The correct way when trying to use a different database server instead of the local machine, is to use the SERVER switch instead:
DSInit /SERVER:MyDevServerName\MyInstanceName
See the full DSInit Command-Line Tool reference.

When you then run the Storage Emulator, it will target that database server/instance. You can easily cleardown/reset that database by right clicking the Windows Azure Emulator icon in the taskbar, select "Show Storage Emulator UI" and the click "Reset". NB. Just to be clear, this will delete everything in your local storage emulator database.

An added "gotcha" to watch out for, if you have the storage account connection stored in a web/app.config and want to specify to use the local emulated instance, you need to use

UseDevelopmentStorage=true
exactly as it appears here. If like me, you initially give an ending semi-colon, you will get a FormatException stating with the error: "Invalid account string".

Thursday, 24 May 2012

Error Creating Azure Blob Storage Container

I received a rather...vague...error when trying out a bit of .NET code to connect to a Windows Azure Blob Storage account, and create a new container in which to store some blobs.

The code

CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
    "DefaultEndpointsProtocol=https;AccountName=REMOVED;AccountKey=REMOVED");

CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
            
CloudBlobContainer container = 
    blobClient.GetContainerReference("TestContainer1");

container.CreateIfNotExist(); // This error'd

The error

A StorageClientException was thrown saying "One of the request inputs is out of range.". An inner WebException showed that "The remote server returned an error: (400) Bad Request."

The cause

I'd assumed (incorrectly) that a container name could be pretty much any string. But that's not the case. As per this MSDN reference, a container name must:
  • be in lowercase (this was the cause in my case)
  • start with a letter or a number and only contain letters, numbers and hyphens (multiple consecutive hyphens are not allowed)
  • be between 3 and 63 characters long
The "Naming and Referencing Containers, Blobs and Metadata" reference is worth a bookmark.

Wednesday, 23 May 2012

ASP.NET MVC Performance Profiling

Building up a profile of how a web application functions, all the database interactions that take place and where the server-side time is spent during a request can be a challenging task when you are new to an existing codebase. If you're trying to address the generic/non-specific "we need to improve performance / it's slow" issue, you need to get a good picture of what is going on and where to prioritise effort.

There are a number of ways to identify specific problems depending on the technologies. For example, if your application is backed by SQL Server, you can query a set of DMVs to identify the top n worst performing queries and then focus your effort on tuning those. Identifying a badly performing query and tuning it can obviously yield huge benefits in terms of the end user's experience. But this doesn't necessarily flag up all the problems. If you are looking at a particular page/view within the application, then you could start a SQL Profiler trace to monitor what's going on during the lifecycle of the request - this is another common and valuable tool to use. Personally, I usually have SQL Profiler open most of the time during development. If you're developing against a shared dev database with others, you can filter out other people's events from the trace - injecting your machine name into the connection string as the ApplicationName, and then filtering on this is one of a number of ways to achieve this which works nicely.

MiniProfiler For The Win

Recently, I've started using another extremely valuable tool within an ASP.NET MVC solution - MiniProfiler which is (quote):
A simple but effective mini-profiler for ASP.NET MVC and ASP.NET
It was developed by the team over at StackOverflow. Simply put, it can render performance statistics on the page you are viewing that detail where the time was spent server-side, fulfilling that request. But the key thing for me, is it provides an ADO.NET profiler. Say you're using LINQ-to-SQL - by wrapping the SqlConnection in a ProfiledDbConnection before then passing it to the constructor of a DataContext, info on the SQL queries executed within the lifetime of a request are then also included in the statistics displayed. (It can also profile calls via raw ADO.NET / Entity Framework etc, minimal effort required).

Make the invisible, visible

Since integrating this into an MVC application, the benefits have been priceless. The key thing for me is: VISIBILITY. It provides extremely value visibility of what is happening under the covers. Going back to the start of this post, if you're new to a codebase, then having this information provided to you as you browse is invaluable. It enables you to identify problems at a glance, and increases visibility of problems to other developers so the "life expectancy" of those problems is lower - they're a lot less likely to hover undetected just under the radar if the information is being pushed right in front of the developer on screen. It also helps you build up a picture of how things hang together.

MiniProfiler includes functionality to flag up N+1 and duplicate queries, a common potential problem you could encounter with ORMs if you're not careful. If a view were performing 100 low hitting queries, these may not show themselves as queries to be tuned. But the fact that 100 database roundtrips are being made, could scream out that perhaps they could be replaced with a single roundtrip and a performance improvement gained there.

I'm now a big fan of MiniProfiler, especially due to it's simplicity to integrate into a codebase. Working on ASP.NET MVC/ASP.NET applications? You might want to give it a try!

Friday, 13 April 2012

Trust your developer instinct and experience

That feature you're implementing, that bug you're fixing, that performance improvement you're trying to make...it's using up a fair few brain CPU cycles trying to analyse how best to implement it. Scalability, performance, ease of maintenance, clean code...just some of the things you are mentally scoring your potential approaches against to determine which route to take. And that's without any mention of business domain logic. We've all been in the situation where you start heading down one path only to find a hurdle. It's OK though, you're an athlete (very debatable in my case) - you can jump over that hurdle. So you do. But then you find a 7 foot high wall with barbed wire on top. No problem - you have a ladder and wire cutters; it's a bit unwieldy carrying them around with you everywhere, but hey - over you go. All good. Until you come across Chuck Norris, who is angry about that "Yo' mamma.." joke you made about his mother. That wall you climbed...that was protecting his private property. Your ladder? Oh, that's still on the outside of the wall - you couldn't pull it over with you so you jumped down without it. Brace for pain.

Those voices in your head

There's always unexpected problems and hurdles that crop up. It's all part of being a developer after all. But maybe at the point you got your ladder and wire cutters out to climb over that wall, you heard a little voice inside say "Hmm, this doesn't quite feel right" or perhaps you just felt something niggling away with you about the approach. That voice, your instinct and experience as a developer built up over your career, is worth listening to. When it pipes up, sit back and take 5 to just take a bird's eye view.

Trust that instinct and experience - it's usually right. At the very least, it's highlighting the need to at least re-justify to yourself that you are heading down the right path.

Wednesday, 14 March 2012

Ordnance Survey Data Importer Coordinate Conversion Accuracy

Update 13 March 2013: Please see latest blog post (fix) on this here.

Thanks to a comment on my original post around my project that imports Ordnance Survey CodePoint data into SQL Server, I was made aware of a potential issue with the (awesome) third party GeoCoordConversion DLL I use to convert the Eastings/Northings coordinates supplied in the Ordnance Survey data files, into Latitude/Longitude coordinates. The issue relates to an inaccuracy in the conversion process, specifically to do with integer divisions instead of double.

So, this blog post is to document my analysis of that and try to quantify what, if any, impact there is on the accuracy of the conversion of Eastings/Northings to Latitude/Longitude. So what I've done is:

  1. Import the Ordnance Survey data set using the released version (1.0.0.0) of the GeoCoordConversion library (i.e. the version my project uses)
  2. Downloaded the source code for GeoCoordConversion and corrected the parts of the calculations where the raised issue noted an inaccuracy
  3. Reran the data import into a separate table using the "corrected" conversion code
  4. Ran the following query to try and quantify how much of a difference it actually makes:
;WITH CTEDiff AS 
(
    SELECT orig.OutwardCode, orig.InwardCode,
        orig.GeoLocation.STDistance(new.GeoLocation) Dist
    FROM PostCodeData orig
 JOIN PostCodeDataWithFix new ON orig.InwardCode = new.InwardCode 
            AND orig.OutwardCode = new.OutwardCode
)
SELECT MIN(Dist) AS MinDifferenceInMetres, 
    MAX(Dist) AS MaxDifferenceInMetres, 
    AVG(Dist) AS AverageDifferenceInMetres, 
    SUM(CASE WHEN Dist = 0 THEN 0 ELSE 1 END) AS NumberOfDifferences
FROM CTEDiff

Results

The MAXIMUM distance between the 2 conversions for a given postcode is: 124.9 metres.
The MINIMUM distance between the 2 conversions for a given postcode is: 0 metres.
The AVERAGE distance between the 2 conversions for a given postcode is: 2.5 metres.
The TOTAL NUMBER of differences was 1,652,458 (out of 1,705,177 conversions).

This map shows the case that had the biggest difference (postcode: NR34 8JW). Plotted points are as follows:
(A) = position from conversion process before fix
(B) = position from conversion process with fix
(C) = position Google Maps gives for location
Note (B) and (C) are in the same spot, so that's why you can't see both!


View Larger Map

Summary

For me, and what I see as potential uses for this data, I personally don't see this as much of an issue especially with an average difference of 2.5 metres. I will add a comment on to the GeoCoordConversion project issue page, referencing this post. Though there doesn't look to be any activity on the project, so whether a fix can be applied is yet to be seen. That's my preferred route at present, to see if the author of that project is around to apply a fix as I currently don't have full knowledge of what I changes I can make to the source code and what would be acceptable in order to distribute the amended source code as part of my project (my project is released under a different, but compatible license...but things might get murkier when it comes to amending/distributing those amendments). In a nutshell, I just don't have time to try to make head and tail of what I can legitimately do (and I don't want to change the license for my project)!

All feedback welcomed

Monday, 5 March 2012

Quick win - check your table variable use

Quick wins are awesome. Making a change that takes minimal effort and yields a significant performance improvement is very satisfying.

This particular potential quick win relates to the use of table variables vs. temporary tables. Have a non-trivial stored procedure that produces some intermediary results and stores in a table variable which then goes on to be used further in the stored procedure? Consider evaluating a switch to a temporary table instead. This very quick change, can give a great performance boost for such little effort due to the points outlined on MSDN (quote):

table variables are not supported in the SQL Server optimizer's cost-based reasoning model. Therefore, they should not be used when cost-based choices are required to achieve an efficient query plan. Temporary tables are preferred when cost-based choices are required. This typically includes queries with joins, parallelism decisions, and index selection choices.
Queries that modify table variables do not generate parallel query execution plans. Performance can be affected when very large table variables, or table variables in complex queries, are modified. In these situations, consider using temporary tables instead......Queries that read table variables without modifying them can still be parallelized.
Indexes cannot be created explicitly on table variables, and no statistics are kept on table variables. In some cases, performance may improve by using temporary tables instead, which support indexes and statistics.
I've seen cases where this alone, with no other modifications, results in noticeable better performance. This could then be built on further by adding supporting indices. Of course, you should be considering whether you need temporary tables/table variables at all in your stored procedure. If you can remove the need entirely, then look to do that.

Further reading:

Monday, 20 February 2012

Automating Web Performance Stats Collection in .NET

You have a web application. You're a .NET developer. Maybe you already have some automated UI testing in place via Selenium, or maybe you don't. What you want to do is automate the collection of some performance metrics about your application.

Q. How would you go about doing that in .NET?
A. Use the following recipe for success.

Ingredients

  • BrowserMob Proxy by Webmetrics, which (quote) is:
    A free utility to help web developers watch and manipulate network traffic from their web applications
  • Selenium, which (quote):
    automates browsers. That's it.
  • BrowserMob Proxy .NET Library, a .NET library to provide a simple way to work with BrowserMob Proxy and Selenium, written by David Burns (blog | twitter) and myself (you're already on my blog | twitter).

Preparation

  1. Download the BrowserMob Proxy from GitHub
  2. Download the BrowserMob Proxy .NET Library from GitHub (binaries, or get the source and build yourself)
  3. Reference Selenium in your .NET project (available via nuget, or from seleniumhq)
  4. Reference the BrowserMob Proxy .NET Library in your .NET project

Make and Bake

Example:
using AutomatedTester.BrowserMob.HAR;
using OpenQA.Selenium;
using OpenQA.Selenium.Firefox;

namespace AutomatedTester.BrowserMob.Example
{
    public class ExampleClass
    {
        public void ExampleUse()
        {
            // Supply the path to the Browsermob Proxy batch file
            Server server = new Server(@"C:\BrowserMobProxy\bin\browsermob-proxy.bat");
            server.Start();

            Client client = server.CreateProxy();
            client.NewHar("google");

            var seleniumProxy = new Proxy { HttpProxy = client.SeleniumProxy };
            var profile = new FirefoxProfile();
            
            profile.SetProxyPreferences(seleniumProxy);
            // Navigate to the page to retrieve performance stats for
            IWebDriver driver = new FirefoxDriver(profile);
            driver.Navigate().GoToUrl("http://www.google.co.uk");

            // Get the performance stats
            HarResult harData = client.GetHar();           
 
            // Do whatever you want with the metrics here. Easy to persist 
            // out to a data store for ongoing metrics over time.
            
            driver.Quit();
            client.Close();
            server.Stop();
        }   
    }
}
What's great is that if you already have some Selenium tests in place, you can add in the collection of performance metrics quickly and easily. This gives you the ability to collate performance metrics over time - a perfect way to identify problem areas to investigate and to quantify performance improvements you make. To learn more about what is in the performance data, check out these links which go into more detail about the HAR format (HTTP Archive) - this is what Webmetric's BrowserMob Proxy returns, which we expand out into a POCO structure (HarResult type). BrowserMob Proxy allows you to do some pretty funky stuff, such as:
  • blacklisting / whitelisting content
  • simulate network traffic / latency
The .NET library wrapper we've made available supports this functionality. Hopefully, this will come in useful for those in the .NET world!

Monday, 16 January 2012

Excluding nodes from XML data before returning from SQL Server

This post follows on from a question I recently replied to, for how to exclude a specific node from an XML column value before returning it, using TSQL.

Example setup

CREATE TABLE Example
(
ID INTEGER IDENTITY(1,1) PRIMARY KEY,
XmlField XML
)

INSERT Example (XmlField) VALUES ('<Root><ChildA>Value I want to see</ChildA><ChildB>Value I do not want to see</ChildB></Root>')
So if you want to return the XML minus the ChildB node, how do you do it?

Modify it

Literally, using the modify method that is supported on the XML data type. To quote MSDN, modify:
Modifies the contents of an XML document. Use this method to modify the content of an xml type variable or column. This method takes an XML DML statement to insert, update, or delete nodes from XML data. The modify() method of the xml data type can only be used in the SET clause of an UPDATE statement.
That last statement, to me at least, seems slightly open for misinterpretation as strictly speaking it isn't only valid in an UPDATE statement - you can use it as part of a standalone SET statement on an XML variable (though yes, to UPDATE that variable).

So, while you can't do something like this:
SELECT XmlField.modify('delete /Root/ChildB')
FROM Example
WHERE ID = 1
as it results in the following error:
Msg 8137, Level 16, State 1, Line 1
Incorrect use of the XML data type method 'modify'. A non-mutator method is expected in this context.
you can instead do this:
DECLARE @xml XML
SELECT @xml = XmlField
FROM Example
WHERE ID = 1

-- Delete the node from the XML in the variable
SET @xml.modify('delete /Root/ChildB')

SELECT @xml
That does the trick.

Dealing with multiple rows

If you wanted to do something like this from the database side when returning multiple rows then you'd need to either:
  1. create a scalar UDF to perform the removal. e.g.
    CREATE FUNCTION dbo.RemoveNode(@xml XML)
     RETURNS XML
    AS
    BEGIN
     SET @xml.modify('delete /Root/ChildB')
     RETURN @xml
    END
    GO
    
    SELECT dbo.RemoveNode(XmlField) AS XmlField
    FROM Example
    
    Note the parameter to the modify method must be a literal string, so you can't parameterise the UDF to make it a bit more generic by using that parameter in the call to modify().

  2. SELECT the xml values into a temp table/table variable, do an UPDATE like this:
    UPDATE #Temp
    SET XmlField.modify('delete /Root/ChildB')
    
    and then SELECT from the temp table/table variable.

  3. Use SQLCLR
Of course, the typical question should be asked - do you need to do this on the database side, or can you return the XML as-is from SQL Server and modify it accordingly in your calling code? In the original case acting on a single XML value, it's OK but when you start looking into scalar UDF's etc for multi-row requirements then you should be careful to check performance.

Monday, 2 January 2012

Round up of 2011

Another year is done and dusted so it's that time again - time to reflect on the past year.

Quick Timeline
  • March: Mongo UK - London
    A chance to attend my first NOSQL conference, having recently entered in the world of MongoDB for some projects. Picked up some good pointers and food-for-thought.
    Just some words: NOSQL, hard seats, gonna-need-a-bigger-venue, enlightening.
  • April: SQLBits 8 - Beside The Seaside - Brighton
    Highlight of the year for me, conference-wise (was unable to make SQLBits 9 - Liverpool). Fantastic venue, truly awesome speakers/technical content and a really great crowd of people - a true community. Out of all conferences I've gone to over the years, SQLBits never fails to stand out and this one took things to a whole new level.
    Just some words: Crappy code, community, awesome, invaluable, bean bags.
  • April: Microsoft Community Contributor Award
    I really enjoy chipping in on sites like StackOverflow and the MSDN forums, and this was a nice (humbling) surprise when I got the email to say I'd received this acknowledgement (blog).
    Just some words: Surprise
  • July: Job Change
    After a great 4 and a half years, it was time to move on to a new role. Took the opportunity to work with fellow SQLSoton attendee Matt Whitfield (twitter) - who by the way, has some great (FREE) SQL Server tools which can be found here.
    Just some words: New challenge, acquisition, end of an era.
  • September: Mongo UK - London
    A chance to brush up on the latest in the world of MongoDB after a few months away from it.
    Just some words: NOSQL, comfy seats, information gems, lego man USB stick.
  • Monthly: SQLSoton User Group
    A really good year of UserGroup meets down here in Southampton every month. Thanks to Mark Pryce-Maher (twitter) for his time and effort, and all the speakers who have taken the time to give a talk. From the first meeting just over a year a go - a small group, in a tiny side room, with no projector - to now - a larger crowd, in a bigger hall, with a projector - it's obvious SQLSoton is providing a valuable SQL Server community service to the Southampton/South Coast area. Check out all the UK SQL Server UserGroups over at SQLServerFAQ.
    Just some words: Community, pizza, swag, network.

Top 3 Blog Posts By Views
  1. High Performance Bulk Loading to SQL Server Using SqlBulkCopy
  2. SQL Server 2008 - Table Valued Parameters vs XML vs CSV
  3. Getting Started With Cassandra and .NET
  4. Ordnance Survey Post Code Geographic Data Load to SQL Server Using .NET*
* OK, so that's the top 4, not the top 3. But that's because I suspect #3 is actually skewed by people looking for "an entirely different kind of site"...I know I had to be careful which links to click when googling for Cassandra (as in the NOSQL database, obviously) and .NET!

Open Source Projects
I started on a few little side projects, which I stuck up on to GitHub for the world to see and (hopefully) use. First came a .NET app to load some of the freely available geographical datasets from Ordnance Survey into SQL Server (GitHub) which relates to blog post #4 above. Next came a MongoDB-backed ASP.NET session state store (GitHub). It was a lot of fun working on these, even though they are pretty trivial things, and was great to actually get some nice feedback from people who found and started using them.

Bottom Line
2011 was a good year. A lot of learning and new challenges meant I was kept on my toes. Looking back though, there are definitely things I'm not pleased about. I dropped the blogging ball in the second half of the year - activity pretty much flatlined for various reasons. So I'm disappointed about that. To quote a tweet by John Sansom (twitter)...
blogging is like going to the gym. Once you fall out of a routine it's tougher to get back into your rhythm
True that. Looking forward to 2012!