Wednesday, 26 May 2010

Optimising wildcard prefixed LIKE conditions

Suppose you want to write a query to find all products in your database that have a name beginning with "Long-Sleeve". You'd more than likely use something like below (examples based on AdventureWorks LT sample database):
SELECT name 
FROM SalesLT.Product
WHERE name LIKE 'Long-Sleeve%'
This produces a good execution plan, performing an index seek on the nonclustered index that exists on the name column. If you look at the seek operation, you'll see the query optimiser has done a good job of ensuring an index can be used by looking at the seek predicate:

You'll see what it's actually done is converted the LIKE condition to a ">=" OR "<" query, searching for product names where the name >= "Long-Sleeve" and is < "Long-SleevF".

Now, suppose you want to find all products where the name ends with "Tire Tube". The immediate query that springs to mind is:
FROM SalesLT.Product
WHERE name LIKE '%Tire Tube'
The execution plan shows an index scan like below. Unfortunately, it's not ideal as it's not making good use of the index.

A way to optimise this scenario is flip the LIKE condition on its head, to get the wildcard at the end of the string being matched on. If you reverse the product name, and reverse the string being matched on you get the same results.
SELECT name 
FROM SalesLT.Product
WHERE name LIKE '%Tire Tube'
gives the same results as:
FROM SalesLT.Product
WHERE REVERSE(name) LIKE REVERSE('%Tire Tube') -- or in other words, LIKE 'ebuT eriT%'
Now that as it stands isn't any better as it still can't use an index. But add a computed column to reflect the reversed name column, and then add an index on that, then you soon can see the benefits.
-- Add a NameReversed column that is the Name column, but reversed
ADD NameReversed AS REVERSE(Name)

-- Now index that reversed representation of the product name, and INCLUDE the original Name column
CREATE INDEX IX_Product_NameReversed ON SalesLT.Product(NameReversed) INCLUDE (Name) 
Run 2 queries side by side to compare like below:
SELECT Name FROM SalesLT.Product WHERE Name LIKE '%Tire Tube'

SELECT Name FROM SalesLT.Product WHERE NameReversed LIKE REVERSE('%Tire Tube')
Note I'm keeping the use of REVERSE just to make the condition a bit more readable for demonstration purposes.

The optimised, indexed computed column approach is using an index seek, compared to the original way that uses the index scan as shown below.

Trying this on a slightly bigger table (the AdventureWorks sample database Product table has only about 800 rows) highlights the difference further. In a table with approximately 50,000 rows, the cost of the original attempt (relative to the batch) showed as 98%. Leaving the relative cost of the computed column approach at 2%. The execution stats showed:
Original Approach
Cold cache: CPU=62, Reads=138, Duration=95
Hot cache: CPU=63, Reads=124, Duration=65

Indexed Computed Column Approach
Cold cache: CPU=0, Reads=28, Duration=28
Hot cache: CPU=0, Reads=2, Duration=0

Now imagine on a larger table with million of rows, how this optimisation would scale up...

As per usual, you need to consider the specifics of your situation as adding extra indexes etc. will add to database size and means there is more work to do when performing INSERTs/UPDATEs. If you identify you have a performance problem with a similar scenario, then this approach is at least something to consider.

Monday, 24 May 2010

QCon London 2010 Videos

It's been a couple of months now since QCon London (check out my previous blog post on The QCon London 2010 Experience) and some session videos are starting to make their way out on InfoQ. With a lot of great content, they're well worth checking out.

I've re-watched the sessions I attended and enjoyed them as much the second time round as I did the first time.
The State of the Art on .NET by Josh Graham and Amanda Laucher.
Code Leaders and Beautiful Teams by Roy Osherove.
Testing C# and ASP.NET Applications Using Ruby by Ben Hall.

One session I didn't get to attend as I chose to stay on a different track was From Dev To Production Through Build Pipelines and Teamwork by Sam Newman who has a style of talk that I really enjoy, so I would have liked to have attended that one in hindsight.

Update (more session videos available):
Scale at Facebook by Aditya Agarwal.
Bad Code, Craftmanship, Engineering and Certification by Robert C. Martin.
The Wizardry of Scaling by Oren Eini (aka Ayende Rahien).
Save the Day with Noda Time by Jon Skeet.

So, check'em out, share them with your team, and keep an eye out for the others as they are made available. Chop chop!