Monday, 24 January 2011

GB Post Code Geographic Data Load to SQL Server using .NET

Ordnance Survey now make available a number of mapping data/geographic datasets to download for free, allowing unrestricted use for commercial and non-commercial use. One of these is the Code-Point Open dataset, which gives a precise geographic location of each post code in Great Britain as a CSV file.

This file contains the Easting and Northing coordinates for each postcode. You may want to convert these to Latitude/Longitude coordinates to then load into a GEOGRAPHY column in SQL Server (as of 2008) and do all kinds of spatial wizardry in the database. But how?

If at first you don't find, search, search and search again

I did quite bit of googling and found various complex calculations to convert from Eastings/Northings to Lat/Lon. I also tried some conversion apps that didn't convert into the decimal degrees format I wanted. I finally found the exact kind of .NET library code I was looking for, which is available under the GNU General Public License v3 : geocoordconversion. I downloaded the DLL and from there it only took a matter of minutes of playing about to work out the API to convert the Eastings/Northings from the Code-Point data files into Lat/Lon coordinates (in the WGS84 coordinate system).

Conversion snippet

After referencing the geocoordconversion.dll in your .NET project, converting coordinates is as simple as:
long easting, northing;
// assume easting & northing are set.
GridReference gridRef = new GridReference(easting, northing);
PolarGeoCoordinate polarCoord = GridReference.ChangeToPolarGeo(gridRef);
polarCoord = PolarGeoCoordinate.ChangeCoordinateSystem(polarCoord, 
    CoordinateSystems.WGS84);
// Pull Lat/Lon coordinates from polarCoord.Lat & polarCood.Lon
End to end import process

Using this, I knocked up a C# console app to:
  • read in all the Code-Point data from the .csv files downloaded from the Ordnance Survey (approx 1.7 million rows of data)
  • convert the Eastings/Northings to WGS84 Lat/Lon coordinates
  • bulk load all the data to a new table in SQL Server
  • populate a GEOGRAPHY column with the point data for each post code
  • create a spatial index on that column
If you'd like a copy of/source code for this app, just ping me and I'll be happy to share. I plan to put the code up somewhere when I get around to it. The code is now available on GitHub.

Example query

Here's a quick, very basic demo of querying of geography data, which finds all post codes within about 5 miles of a postcode.

DECLARE @home GEOGRAPHY
SELECT @home = GeoLocation FROM dbo.PostCodeData WHERE OutwardCode = 'AB12' AND InwardCode = '3CD'

SELECT OutwardCode, InwardCode
FROM dbo.PostCodeData
WHERE GeoLocation.STDistance(@home) <= (5 * 1609) -- 1609 = approx metres in 1 mile
This returned me 8354 rows in around 500ms, with a spatial index in place on the GeoLocation column.

UPDATE (14 March 2012)
See my recent post for information on the accuracy of the GeoCoordConversion third party library I'm using to convert Eastings/Northings to Latitude/Longitude.

27 comments:

  1. just a quick note to say this page is brilliant - thank you for it - after a very frustrating day with this yesterday this blog post has made today a breeze :)

    ReplyDelete
  2. @johncscott - Thanks for the kind words, I appreciate you taking the time to give some feedback!

    ReplyDelete
  3. been pulling my hair out for a few days now with all this.

    THANKS!!!

    cracking stuff.

    Cheers,
    Crispin

    ReplyDelete
  4. @Crispin - thanks, glad it helped!

    ReplyDelete
  5. Hi Adrian,

    I liked your blog and wants to implement this in my code. I want to do radius search based on OS Grid reference point in C#.

    Can you please help me in this.

    Thanks in advance,
    Mike

    ReplyDelete
  6. This is brilliant, thank you. Running a conversion now and checking the results against the data on http://mapit.mysociety.org/. The DLL is not quite as accurate as the mapit data (not as many decimal places) but it appears to be accurate enough and agrees with the mapit data.

    ReplyDelete
  7. Thanks for this great article. One question though, do you know how to make this work with Microsoft's Entity Framework? Your sample shows SQL to do the job.

    Thanks

    ReplyDelete
  8. Me again,

    Just been playing with this, and it looks like the data format has changed since you wrote your code. If you get the latest data, it will thrown an index out of bounds exception in the ReadDataFromFiles() in the CodePointDataImporter class.

    You need to change the indices as shown on the following two lines (lines 157 and 158)...

    easting = long.Parse(lineData[2]);
    northing = long.Parse(lineData[3]);

    These are no longer the 11th and 12th columns, they are now the 3rd and 4th. Once that's changed, it works fine.

    Thanks again

    ReplyDelete
  9. @Yossu - I'm afraid EF isn't an area I'm particularly familiar with. Thanks for the tip-off re: change in format - I will check it out!

    ReplyDelete
  10. @Yossu - I've updated the project on GitHub. For the CODEPOINT import, it now no longer has the column index positions hardcoded but instead, the positions are looked up in the column headers CSV file that comes with the supporting docs in the download. So there's now an extra arg to allow that file name to be passed in - see ReadMe on GitHub for more info.

    ReplyDelete
  11. Thanks Adrian, glad to see you're still supporting this great piece of code. Thanks again for a very useful article.

    It turned out that EF doesn't support the geography data type, so I had to write the query in a stored procedure, and import that as a function in my EF model.

    I have written a blog post explaining how to use this data in Entity Framework in case it's of interest to anyone. Please see http://dotnetwhatnot.pixata.co.uk/2011/09/20/TheGeographyDataTypeInSQLServer2008AndGeographicSearchesFromEntityFramework.aspx

    Just one other point. In your post, you mention that your code is supposed to create a spatial index when it imports the data. It didn't do this on my machine, which meant that my query took around 14 seconds, as opposed to your 500ms. Once I spotted the reason, it was easy to fix (described in my blog post), but you might want to check your code to make sure it does create the spatial index, and maybe add a note to the post to remind people to check. It made a huge difference to the performance.

    Thanks again.

    ReplyDelete
  12. @Yossu - ha, great catch on the spatial index! I've updated the code so it now creates the spatial index once the data has been imported.

    And thanks for the EF info (and the kind words in your blog!) - great stuff.

    ReplyDelete
  13. Hi again Adrian,

    I've just moved my code to the production server, and have found that the select is pretty slow. I have the same data as you used, and am using a slight variation of your SQL for the query, but it's taking a lot longer than you reported.

    On my dev machine (Intel i7 2600 @ 3.4GHz) it takes around 11 seconds to search for postcodes within 200 miles of the specified. That's a big difference from your 500ms. On my production server (Intel Pentium Dual CPU 2GHz), it takes around a minute to do the same query. Now I know the server is a slower machine (need to upgrade it sometime), but a minute is a huge difference from your 500ms, and is simply too slow to be of use for the web site it's running. Requests time out before the query has completed.

    Any idea why it's so slow? I have the spatial index properties all set to the defaults (ie 16 cells per object and medium for all grid levels), mainly as I don't really understand what the proeprties all mean! Do you know of any way to improve matters?

    Thanks again

    ReplyDelete
  14. Sorry, me again! Forgot to mention that your blog post uses a field called PostCode in the PostCodeData table, but the code on GitHub doesn't create such a field. It creates OutwardCode, InwardCode, Longitude, Latitude and GeoLocation.

    I fixed the query by changing it to...

    DECLARE @home GEOGRAPHY
    SELECT @home = GeoLocation FROM PostCodeData WHERE OutwardCode + InwardCode = 'MY POSTCODE'
    SELECT OutwardCode + InwardCode
    FROM PostCodeData
    WHERE GeoLocation.STDistance(@home) <= (5 * 1609)

    ...which works fine, but I'm wondering if this might be the cause of the big difference in speed I see from your lightning results.

    Any thoughts? Thanks

    ReplyDelete
  15. @Yossu - Have updated the post above to refer to the latest fields (OutwardCode + InwardCode and not PostCode). Firstly, although not the problem of the performance you are seeing, instead of concatenating the OutwardCode and InwardCode to match on, search on them separately per my updated example, i.e.:

    WHERE OutwardCode = 'AB12' AND InwardCode = '3CD'

    This makes that query sargable (results in index seek, instead of a scan).

    As for the Geo query, the performance does deteriorate when you broaden the search radius - I was doing a 5 mile radius which gave the 500ms response time. When I try a 100 mile radius, I too see performance > 10 seconds - but it is bringing back a far greater number of rows = ~700,000. Difficult to help without a better picture of what your goal is, but that's a lot of rows to return. Feel free to drop me an email, and I'll see if I can help - easier than via these comments!

    ReplyDelete
  16. Just wanted to say thanks, I've been looking at doing the same thing and this page has saved me a fair amount of trial and error.

    Jay

    ReplyDelete
  17. Mate, I just want to shake your hand! Thanks for the uber-useful script!

    ReplyDelete
  18. Awesome stuff! Thanks for this!

    Just one thing to be aware of. Where imported Post Codes have a Positional Quality Indicator of 90, the Easting and Northing are set to 0, putting them out in the Celtic Sea, and no doubt messing with the district and sector averages.

    ReplyDelete
  19. Thanks for letting me know re: Positional Quality Indicator, I'll add myself a TODO to look into it!

    ReplyDelete
  20. Re: PQI issue, fix pushed to GitHub. Those with no coordinates are now not imported (amounts to ~760 postcodes out of over 1.6 million)

    ReplyDelete
    Replies
    1. That was quick! Thanks again!

      Delete
  21. This is great, thank you very much!

    ReplyDelete
  22. Fantastic script - thanks a lot. It's saved me a bunch of time and stopped me being reliant on a huge out of date dataset I'd imported years ago.

    One question - there's an issue on the GeoCoordConversion project:
    http://code.google.com/p/geocoordconversion/issues/detail?id=2

    Do you know if this issue is correct and if this library suffers from this as a result?

    ReplyDelete
    Replies
    1. Glad you found it useful! Re: the GeoCoordConversion project, I don't know if this issue is correct - I'll try and have a look later. As I use that lib, then yes *if* it's an issue then the coords I get out as a result could be slightly out - but would imagine it would be negligible.

      Delete
    2. Ok, I've looked into it and written up a blog post on my analysis/findings surrounding the GeoCoordConversion issue: http://www.adathedev.co.uk/2012/03/ordnance-survey-data-importer.html

      Delete
  23. No offence but if I was a woman, I would ask to have your kids!! dude I have spent a week phaffing around trying to sort this out!

    ReplyDelete
  24. Hi,

    Is the output file (postcodes mapped to latitude and longitude) available for download anywhere?

    Much appreciated,
    Paul

    ReplyDelete