Sunday, 14 March 2010

The QCon London 2010 Experience

QCon London 2010 - what a conference!

Awesome...with a capital AWE

So many inspiring speakers, a packed schedule of talks over 3 days on a number of tracks and a great bunch of people. This was my first QCon, and cannot recommend the experience enough. The only thing I didn't enjoy, was having to decide between which talks to attend. But that's the true sign of a good conference. I'm finding it difficult to write a blog post as my mind is still buzzing and there was a lot to soak up. In all honesty, I won't do justice to the talks if I try to paraphrase them too much, so I'm just going to summarise my experience and some key points I've left with.

Getting the ball rolling

The opening keynote by "Uncle Bob" Martin (twitter) drew some mixed comments on twitter from what I saw, but the main things I took from the talk were about pride of workmanship. After all, if you're not proud of the code you've written and you hope noone finds out it was you who wrote it, then you should probably think about improving it so you can take pride in what you've done. He's definitely a seasoned, engaging speaker, regardless of what your own personal opinions are on the topics at hand, and I enjoyed the talk.

Day 1 - Architectures you've always wondered about

The sessions that particularly grabbed me were the ones by Facebook's director of engineering (Aditya Agarwal) and Skype's architecture team lead (Andres Kütt). The level of scale talked about was mind-blowing. To give some example of this, here's some of the points mentioned during Facebook's talk:
  • people spend 8 billion minutes per day on the site
  • 5 billion pieces of shared content per month
  • 3 billion photos uploaded per month
  • 10's of TB of RAM in use, across thousands of servers containing cached data
  • tweaked memcached to make it even more high performance
  • developed HipHop for PHP which is a source code transformer, transforming PHP into highly optimised C++
Pretty jaw-dropping stuff. You can check out the facebook engineering blog here.

I would have liked to have heard more technical detail in some of the architecture talks, but I guess they have their competitive advantages to keep so they're hardly going to give away all their secrets!

The conference party finished off the first day, in a nearby pub. A chance to grab a beer, take in the wealth of information from day 1, and mingle with other attendees. Cosy, is probably the best way to describe it - the ratio of attendees to pub space was slightly uneven! But a good time nonetheless.

Day 2 - AlphaGeeks on .NET

Being a C# developer, this track was a natural decision for me and I wasn't disappointed. Kicked off by Ben Hall (twitter), who gave a great talk on BDD (Behaviour Driven Development) as opposed to TDD (Test Driven Development) and added IronRuby on to the list of technologies I should look at.

This was followed on by a great talk by Ayende Rahien (Oren Eini) (twitter) who talked about how to scale an application by a "divide and conquer" approach. What I particularly liked about his session, was that he worked through a real-world example and gave some live metrics on performance improvements. The talk was one of a number to mention the CAP theorem, whereby a distributed can only satisfy 2 out of 3 of the following, but not all 3:
  • Consistency
  • Availability
  • Partition tolerance
It also added so many technologies to my list to look at: RhinoPHT/RhinoDHT, Rhino Mocks, RhinoESB and RavenDB.

Jon Skeet's (twitter) style of presentation was familiar to me from the StackOverflow DevDay last year and didn't disappoint in his talk on NodaTime, a .NET port of the JodaTime date/time library in Java. And yes Tony the Pony did make an appearance on the stand.

The most entertaining talk of the whole conference has to go to Roy Osherove (twitter). A cracking talk on "Beautiful teams and leaders" was topped off with a sublime solo guitar song performance. Brilliant.

The day ended with a talk by Josh Graham and Amanda Laucher on "12 months of things to learn" in .NET. I found my TODO list rapidly growing. F#, M, Guerilla SOA, MEF....... Another really good talk, with some good banter.

Day 2 ended with a number of usergroup events, and I went along to the NNUG (Norwegian .NET User Group) / alt.net beers event at a pub in Soho. It was somewhat surreal for me, having drinks and chatting with the type of experts I can only strive to be (to be very geek, heroes): Ayende Rahien (Oren Eini), Jon Skeet, Roy Osherove, Udi Dahan to name but a few. It was fantastic to see a number of other .NET developers show up that hadn't attended QCon, and was good chatting to them. This was my first usergroup event - lesson being, I really should have attended one before now. Take that as another TODO on my list.

Day 3 - Pick and mix

On the final day I mixed it up a bit, jumping between the "SOA 2010", "The Concurrency Challenge" and "Browser as a platform" tracks. Udi Dahan (twitter) kicked it off with a talk on how to avoid a failed SOA, talking about using EDA & SOA (Event Driven & Service Oriented Architecture) and how to only use a Request/Response model if you cannot get an event driven approach working to meet your requirements. Great speaker, very informative and enlightening.

Justin Sheehy (twitter) followed this, on the concurrency track with a talk on "embracing concurrency at scale" - eventual consistency was a term becoming more and more at the forefront of my mind, as was the point that ACID does not work with a distributed, scaled system. Instead you have a trade-off with BASE:
  • Basically
  • Available
  • Soft state
  • Eventually consistent
which leads to Availablity and Partition tolerance (from the CAP theorem).

Summary

Some of the key points I'm walking from the conference with:
  • keep things simple - complicated doesn't work. If you find your design is complicated, then the chances are you're probably doing something wrong!
  • focus on optimising for an individual task, rather than one-size-fits-all approach
  • ACID doesn't go well with scaling
  • eventual consistency - does it matter if your data is persisted throughout immediately? Probably not. As long as the system is eventually consistent, that's usually all that is required and allows you to scale better
  • asynchronous messaging
  • rules of thumb do not apply - use existing patterns as a starting point for discussion - just because you've done something one way before, doesn't mean you should automatically do it the same way in future

Check out the QCon slides.

I would love to go back to QCon next year, if I'm fortunate enough to have the opportunity again. Once I get the chance to fully absorb the whole experience and put things into practice, I will be a better developer because of it.

Thursday, 4 March 2010

Rise of the SQL Server DevBA

Something that has got me thinking recently is the distinction between an SQL Server Developer and a DBA. I imagine that most people would describe themselves as one or the other exclusively. My CV for example says I'm a developer - that is what I am. I would never market myself as a DBA; aside from the fact I just don't have the full skillset of a DBA, it would just be an insult to the real DBAs out there whose knowledge and experience in that arena far outweighs mine. Ask me to set up RAID, a backup/disaster recovery strategy or a SAN, and I will give you a blank stare. More than likely, I'd make a speedy exit through the nearest door/window/cat flap.

I'm surrounded!

I signed up for the recent SQL Server Dynamic Management View training offered by Quest Software - a free, one day online conference. This was a very popular event - after all not only was it free, but there were 3 great speakers presenting a range of sessions at levels ranging from beginner to expert. The notes, slides and links can be found here. Midway through, I started wondering how many other "Database Developers" had signed up, as I was surrounded (albeit virtually) by DBAs, and very proficient ones at that. This naturally got me thinking...

Why am I here?

It turns out the answer is pretty simple - to make myself a better Database Developer. Performance and scalability are very important, and are always at the forefront of my mind. As a developer, I want to design and develop a database to support the system I'm working on, that will perform and scale well. For me this involves thinking about the bigger picture. Not just how it performs at the time of development with relatively low volumes of data on development servers but how it would perform with current live data volumes and expected volumes in the future. Thinking about things from a DBA's perspective, understanding the issues a DBA has to deal with and what skills they use to keep a database server running smoothly is in my opinion, a valuable asset to have. I don't mean that I should be able to do even half of, what a DBA can do - that would have been a different career path had I wanted to head down that route. But some DBA skills are very beneficial for a developer to have - at least I find they are in my role.

Rise of the "DevBA"?

An SQL Server Developer with a bit of DBA thrown in. Knowing how to diagnose performance issues, make use of DMVs, pinpoint and spot potential bottlenecks has been extremely valuable to me. It's led to me broadening my horizons, and fine tuning my skills as a developer. For those of you who perhaps work for a small company without a specific DBA role, where pretty much everyone mucks in, I think you're in the prime DevBA territory.

Another carrot being dangled

The upcoming SQLBits event I recently blogged about is another opportunity I'm looking forward to making the most of. Sessions are currently being submitted in Dev, DBA and BI disciplines. The Dev ones are obviously of main interest to me, but there's also some DBA sessions that I'd like to attend. I can't actually recall the point at which I started becoming more aware of DBA-oriented topics, but it's something that is happening more and more.

Am I a wanna-be DBA? No.

Could I double-up as a DBA? No.

Does being a "DevBA" make me a better developer and increase my technical ability? Yes.

Monday, 1 March 2010

Queue table processing in SQL Server

Implementing SQL Server queue table processing logic is something I keep meaning to blog about and finally I've got round to it thanks to my memory being jogged by StackOverflow questions I've recently participated in, including this one. The scenario is you queue up records in a database table, each representing a piece of work needing to be done. You then want to have processes that periodically poll this table to pick up the next item of work from the queue and process them.

What you want to avoid
  • Multiple processes picking up the same queue item. You want each item in the queue to be processed once after all.
  • Blocking. If multiple processes are polling the queue and they are blocking each other, then scalability will be limited.

Solution
DECLARE @NextId INTEGER
BEGIN TRANSACTION

-- Find next available item available
SELECT TOP 1 @NextId = ID
FROM QueueTable WITH (UPDLOCK, READPAST)
WHERE IsBeingProcessed = 0
ORDER BY ID ASC

-- If found, flag it to prevent being picked up again
IF (@NextId IS NOT NULL)
    BEGIN
        UPDATE QueueTable
        SET IsBeingProcessed = 1
        WHERE ID = @NextId
    END

COMMIT TRANSACTION

-- Now return the queue item, if we have one
IF (@NextId IS NOT NULL)
    SELECT * FROM QueueTable WHERE ID = @NextId

It's all about the table hints

UPDLOCK
This grabs an update lock until the transaction is completed and prevents another process from picking up the same queue item.

READPAST
If a process encounters a row that is currently locked by another, this hint will make it skip over that locked row, and allow to move on to find the next available one.

This was a topic I investigated some time ago and spent some time working through to end up at this approach, finding this MSDN reference on table hints a valuable resource. I then found this article on MSSQLTips which demonstrates the same approach - if only I'd found that at the start, as it was the READPAST hint that was the one I wasn't aware of initially!

Changing primary key index structure

Changing the structure of a primary key constraint index from nonclustered to clustered (or from clustered to nonclustered) is not necessarily as straight forward as it first seems. The process of changing it over, involves the constraint being dropped and then recreated. This could potentially cause a problem if you're making the change on a table whilst there could be activity against it.

Example
TableX was originally created as below:
CREATE TABLE [TableX]
(
    FieldA INTEGER CONSTRAINT PK_TableX PRIMARY KEY NONCLUSTERED,
    FieldB DATETIME NOT NULL,
    FieldC VARCHAR(50)
)
GO
CREATE CLUSTERED INDEX IX_TableX_FieldB ON TableX(FieldB)
GO
After a period of time, it becomes clear that performance would be better if the primary key was made to be clustered instead, and the existing clustered index switched to nonclustered. The following script demonstrates how to make the switch, in a manner that prevents the primary key being violated while it is switch over, by creating a temporary unique constraint.
-- 1) Drop the existing CLUSTERED index
DROP INDEX TableX.IX_TableX_FieldB

-- 2) Create a (temporary) UNIQUE constraint on the unique fields referenced in the primary key. This will enforce the uniqueness when we drop the PK.
ALTER TABLE TableX
ADD CONSTRAINT UQ_TableX UNIQUE(FieldA)

-- 3) Drop the existing nonclustered PRIMARY KEY constraint.
ALTER TABLE TableX
DROP CONSTRAINT PK_TableX

-- 4) Recreate the PRIMARY KEY as CLUSTERED
ALTER TABLE TableX
ADD CONSTRAINT PK_TableX PRIMARY KEY CLUSTERED(FieldA)

-- 5) Drop the temporary UNIQUE constraint
ALTER TABLE TableX
DROP CONSTRAINT UQ_TableX

-- 6) Add the IX_TableX_FieldB index back on as NONCLUSTERED
CREATE NONCLUSTERED INDEX IX_TableX_FieldB ON TableX(FieldB)
This scenario is a good reason why I always avoid SQL Server's autogenerated constraint names!