Showing posts with label performance. Show all posts
Showing posts with label performance. Show all posts

Wednesday, 23 May 2012

ASP.NET MVC Performance Profiling

Building up a profile of how a web application functions, all the database interactions that take place and where the server-side time is spent during a request can be a challenging task when you are new to an existing codebase. If you're trying to address the generic/non-specific "we need to improve performance / it's slow" issue, you need to get a good picture of what is going on and where to prioritise effort.

There are a number of ways to identify specific problems depending on the technologies. For example, if your application is backed by SQL Server, you can query a set of DMVs to identify the top n worst performing queries and then focus your effort on tuning those. Identifying a badly performing query and tuning it can obviously yield huge benefits in terms of the end user's experience. But this doesn't necessarily flag up all the problems. If you are looking at a particular page/view within the application, then you could start a SQL Profiler trace to monitor what's going on during the lifecycle of the request - this is another common and valuable tool to use. Personally, I usually have SQL Profiler open most of the time during development. If you're developing against a shared dev database with others, you can filter out other people's events from the trace - injecting your machine name into the connection string as the ApplicationName, and then filtering on this is one of a number of ways to achieve this which works nicely.

MiniProfiler For The Win

Recently, I've started using another extremely valuable tool within an ASP.NET MVC solution - MiniProfiler which is (quote):
A simple but effective mini-profiler for ASP.NET MVC and ASP.NET
It was developed by the team over at StackOverflow. Simply put, it can render performance statistics on the page you are viewing that detail where the time was spent server-side, fulfilling that request. But the key thing for me, is it provides an ADO.NET profiler. Say you're using LINQ-to-SQL - by wrapping the SqlConnection in a ProfiledDbConnection before then passing it to the constructor of a DataContext, info on the SQL queries executed within the lifetime of a request are then also included in the statistics displayed. (It can also profile calls via raw ADO.NET / Entity Framework etc, minimal effort required).

Make the invisible, visible

Since integrating this into an MVC application, the benefits have been priceless. The key thing for me is: VISIBILITY. It provides extremely value visibility of what is happening under the covers. Going back to the start of this post, if you're new to a codebase, then having this information provided to you as you browse is invaluable. It enables you to identify problems at a glance, and increases visibility of problems to other developers so the "life expectancy" of those problems is lower - they're a lot less likely to hover undetected just under the radar if the information is being pushed right in front of the developer on screen. It also helps you build up a picture of how things hang together.

MiniProfiler includes functionality to flag up N+1 and duplicate queries, a common potential problem you could encounter with ORMs if you're not careful. If a view were performing 100 low hitting queries, these may not show themselves as queries to be tuned. But the fact that 100 database roundtrips are being made, could scream out that perhaps they could be replaced with a single roundtrip and a performance improvement gained there.

I'm now a big fan of MiniProfiler, especially due to it's simplicity to integrate into a codebase. Working on ASP.NET MVC/ASP.NET applications? You might want to give it a try!

Monday, 5 March 2012

Quick win - check your table variable use

Quick wins are awesome. Making a change that takes minimal effort and yields a significant performance improvement is very satisfying.

This particular potential quick win relates to the use of table variables vs. temporary tables. Have a non-trivial stored procedure that produces some intermediary results and stores in a table variable which then goes on to be used further in the stored procedure? Consider evaluating a switch to a temporary table instead. This very quick change, can give a great performance boost for such little effort due to the points outlined on MSDN (quote):

table variables are not supported in the SQL Server optimizer's cost-based reasoning model. Therefore, they should not be used when cost-based choices are required to achieve an efficient query plan. Temporary tables are preferred when cost-based choices are required. This typically includes queries with joins, parallelism decisions, and index selection choices.
Queries that modify table variables do not generate parallel query execution plans. Performance can be affected when very large table variables, or table variables in complex queries, are modified. In these situations, consider using temporary tables instead......Queries that read table variables without modifying them can still be parallelized.
Indexes cannot be created explicitly on table variables, and no statistics are kept on table variables. In some cases, performance may improve by using temporary tables instead, which support indexes and statistics.
I've seen cases where this alone, with no other modifications, results in noticeable better performance. This could then be built on further by adding supporting indices. Of course, you should be considering whether you need temporary tables/table variables at all in your stored procedure. If you can remove the need entirely, then look to do that.

Further reading:

Friday, 24 June 2011

The importance of "Working Set"

One of the things that I see cropping up pretty often is this thing called "working set". After recently chipping in on another StackOverflow question on the subject of "What does it meant to fit 'working set' in RAM?", I thought it was a good subject for a blog post. This is really just a copy and extension of my input on that question and focused in certain parts on MongoDB, but is also as relevant to other databases.

"Working set" is basically the amount of data and indexes that will be active/in use by your system at any given time.

Why is it important to keep your working set in RAM?

Accessing RAM is quick. Accessing disk is slow. When querying your data store, if all the data and indexes typically accessed are in RAM, then performance is blisteringly quick. If it's not in RAM, then disk access is required and that is when performance suffers. Hence it is important to ensure you have enough to hold your working set. The moment your working set exceeds the about of RAM you have, you will start to notice the performance degradation as it has to pull stuff back off disk, so it's important to monitor the situation and react.

Crude Example

Suppose you have 1 year's worth of data. For simplicity, each month relates to 1GB of data giving 12GB in total, and to cover each month's worth of data you have 1GB worth of indexes again totalling 12GB for the year.

If you are always accessing the last 12 month's worth of data, then your working set is: 12GB (data) + 12GB (indexes) = 24GB.

However, if you actually only access the last 3 month's worth of data, then your working set is: 3GB (data) + 3GB (indexes) = 6GB.

You need to understand your data set, scenario and the usage patterns, in order to work out a ball park estimate of your working set. Don't expect a black and white answer for what your working set is in your environment, from someone who doesn't know these things.

What if my working set sky-rockets?

Add more RAM. This can be a case of adding more into your existing node(s). Or, if you need non-trivial increases, making use of sharding to split the data over a number of nodes and just bring more nodes online as you need. This provides incredible potential to scale out your workload.

The key point is to ask yourself: do I have enough RAM for my working set? If the answer is: "I don't know", then get yourself to the position of knowing.

Wednesday, 12 January 2011

SqlBulkCopy to SQL Server in Parallel

In an earlier post last year, I blogged about high performance bulk loading to SQL Server from .NET using SqlBulkCopy. That post highlighted the performance gain that SqlBulkCopy gives over another batched insert approach using an SqlDataAdapter. But is it possible to squeeze more performance out? Oh yes.

First, a quick recap. For optimal performance:
  • load into a heap table (with no indexes - add any indexes you need AFTER you've loaded the data)
  • specify the TableLock SqlBulkCopy option. This results in bulk update locks being taken on the destination table instead of row locks. This is key for what I'm going to focus on in this post as I'll go on to explain. Relevant reading:
    - SqlBulkCopyOptions Enumeration MSDN documentation
  • pave the way for minimal logging by setting the database to the Bulk-Logged recovery model (or Simple). In the Full recovery model all inserts are fully logged in the transaction log, whereas in Bulk-Logged, certain bulk operations can be minimally logged reducing the growth of the transaction log. SqlBulkCopy is an operation that can be. Relevant reading:
    - Prerequisites For Mnimal Logging in Bulk Import MSDN reference
    - Operations That Can Be Minimally Logged MSDN reference
    - SQL Server Myth A Day : Bulk-Logged Recovery Model by Paul Randal (Twitter)
Get squeezing
How can we squeeze more throughput in bulk inserts using SqlBulkCopy?

Multiple bulk update locks on the table from different processes can be taken out at the same time without blocking each other, which means you can scale out the bulk loading across multiple clients in parallel. There will be limitations of course, such as the network connection and the speed of the disks behind the database, but performance gains are very much possible.

Instead of loading data from a single machine, split the data over a number of machines, each one loading it's own chunk of the data to the database using SqlBulkCopy as previously outlined. Finding an optimal distribution of data/number of clients may take a bit of investigation but the rewards can pay dividends.

Pimp My Test Environment
Unfortunately I don't have a server rack at home housing multiple servers with a fast IO subsystem - hey, call me a cheap-skate. What I do have is a laptop with an i5-560M (2.67GHz dual core, 4 threads), 8GB RAM and a 7200RPM HDD. That's how I roll.

What I've done is to knock up a quick command-line app in .NET 4.0 using the new Task Parallel Library (TPL) to split the loading of data to SQL Server across 4 threads in parallel, each using a separate database connection and SqlBulkCopy instance. To quote MSDN:
The TPL scales the degree of concurrency dynamically to most efficiently use all the processors that are available. In addition, the TPL handles the partitioning of the work, the scheduling of threads on the ThreadPool, cancellation support, state management, and other low-level details. By using TPL, you can maximize the performance of your code while focusing on the work that your program is designed to accomplish.

Starting with the .NET Framework 4, the TPL is the preferred way to write multithreaded and parallel code.

I loaded 20 million rows of data from an in-memory DataTable to a heap table in SQL Server 2008 with 2 columns: FieldA VARCHAR(10), FieldB INTEGER.
FieldA is just a random sequence of 10 characters, FieldB is an incrementing number.

I ran 2 tests, recreating the database fresh before each run:
Test 1: 1 DataTable of 20 million rows, loaded single threaded. LOAD TIME: 15.4088814s
Test 2: 4 DataTables of 5 million rows, loaded in parallel. LOAD TIME: 12.3377056s

For the fact that this was all running on a single machine (a laptop at that), a 20% reduction in loading time is not too shabby at all! On production quality hardware, with multiple clients separate to the database server, the concept of loading in parallel is well worth bearing in mind and could speed up your data loading process.

To give a brief taster of the .NET approach I used, here's the code of the key parts.
public void ParallelLoadTest()
    // GetTestData() is just a basic method that returns a List<> of 4 DataTables.
    // Each DataTable is populated with 5 million rows.
    List<datatable> tables = GetTestData(); 

    // For each of the 4 DataTables, kick off a bulk load task to run in parallel
    Parallel.ForEach(tables, table =>

public void BulkLoadData(DataTable dt)
     using(SqlConnection conn = new SqlConnection("{removed connectionstring}"))
     using (SqlBulkCopy bulkCopy = new SqlBulkCopy(conn, SqlBulkCopyOptions.TableLock, null))
          bulkCopy.DestinationTableName = "Dataload";
          bulkCopy.BulkCopyTimeout = 60;
          bulkCopy.ColumnMappings.Add("FieldA", "FieldA");
          bulkCopy.ColumnMappings.Add("FieldB", "FieldB");

Wednesday, 26 May 2010

Optimising wildcard prefixed LIKE conditions

Suppose you want to write a query to find all products in your database that have a name beginning with "Long-Sleeve". You'd more than likely use something like below (examples based on AdventureWorks LT sample database):
SELECT name 
FROM SalesLT.Product
WHERE name LIKE 'Long-Sleeve%'
This produces a good execution plan, performing an index seek on the nonclustered index that exists on the name column. If you look at the seek operation, you'll see the query optimiser has done a good job of ensuring an index can be used by looking at the seek predicate:

You'll see what it's actually done is converted the LIKE condition to a ">=" OR "<" query, searching for product names where the name >= "Long-Sleeve" and is < "Long-SleevF".

Now, suppose you want to find all products where the name ends with "Tire Tube". The immediate query that springs to mind is:
FROM SalesLT.Product
WHERE name LIKE '%Tire Tube'
The execution plan shows an index scan like below. Unfortunately, it's not ideal as it's not making good use of the index.

A way to optimise this scenario is flip the LIKE condition on its head, to get the wildcard at the end of the string being matched on. If you reverse the product name, and reverse the string being matched on you get the same results.
SELECT name 
FROM SalesLT.Product
WHERE name LIKE '%Tire Tube'
gives the same results as:
FROM SalesLT.Product
WHERE REVERSE(name) LIKE REVERSE('%Tire Tube') -- or in other words, LIKE 'ebuT eriT%'
Now that as it stands isn't any better as it still can't use an index. But add a computed column to reflect the reversed name column, and then add an index on that, then you soon can see the benefits.
-- Add a NameReversed column that is the Name column, but reversed
ADD NameReversed AS REVERSE(Name)

-- Now index that reversed representation of the product name, and INCLUDE the original Name column
CREATE INDEX IX_Product_NameReversed ON SalesLT.Product(NameReversed) INCLUDE (Name) 
Run 2 queries side by side to compare like below:
SELECT Name FROM SalesLT.Product WHERE Name LIKE '%Tire Tube'

SELECT Name FROM SalesLT.Product WHERE NameReversed LIKE REVERSE('%Tire Tube')
Note I'm keeping the use of REVERSE just to make the condition a bit more readable for demonstration purposes.

The optimised, indexed computed column approach is using an index seek, compared to the original way that uses the index scan as shown below.

Trying this on a slightly bigger table (the AdventureWorks sample database Product table has only about 800 rows) highlights the difference further. In a table with approximately 50,000 rows, the cost of the original attempt (relative to the batch) showed as 98%. Leaving the relative cost of the computed column approach at 2%. The execution stats showed:
Original Approach
Cold cache: CPU=62, Reads=138, Duration=95
Hot cache: CPU=63, Reads=124, Duration=65

Indexed Computed Column Approach
Cold cache: CPU=0, Reads=28, Duration=28
Hot cache: CPU=0, Reads=2, Duration=0

Now imagine on a larger table with million of rows, how this optimisation would scale up...

As per usual, you need to consider the specifics of your situation as adding extra indexes etc. will add to database size and means there is more work to do when performing INSERTs/UPDATEs. If you identify you have a performance problem with a similar scenario, then this approach is at least something to consider.

Monday, 19 April 2010

The SQLBits VI Experience

I had high expectations about SQLBits VI and it certainly lived up to it! Not only did it have great speakers, who really know their stuff and have an effortless way of getting the information across, but it was also free. So it goes without saying that it was a popular event. There were 4 sessions going on at any one time with a theme of performance and scalability and plenty of choice across DBA, Dev and BI areas. It did mean some tough decisions had to be made between which ones to attend!

Monitoring and Tuning Parallel Query Execution

The first session I attended was held by Ramesh Meyyappan on the DBA track (you can see some of his webcasts here). I chose it as out of the 4 opening sessions, it seemed most relevant to my day job over the others. If I'm honest, I wasn't expecting to get much from the talk beforehand; not because I thought I knew it all - quite the opposite in fact, as I didn't understand some of the terms in the overview (CXPACKET, what? what's a doobry-wotsit-dicombobulater?). Nor did I appreciate the significance of the topic. These were terms I knew of before, but that I didn't actually know about - probably as I felt they were a bit too much out of my comfort zone so when I did come across the terms in the past, I didn't dive down into them to really find out about them.

What a great session it was. Ramesh certainly knows his stuff and I'm glad I opted for it. What was great about it, was the way Ramesh shared his knowledge - his ability to explain things in such a clear way that makes complete sense to people like me who were learning new things was superb, especially given the time constraints. I now have a greater appreciation of MAXDOP, CXPACKET, SLEEP_TASK & SOS_SCHEDULER_YIELD - seeing live demonstrations gets a big thumbs up from me. Seeing real examples of where running queries in parallel can be slower was very beneficial for me. CDs were given out with all examples on so I'll firing that up soon!

Inside the SQL Server Query Optimizer

Conor Cunningham, Principal Architect at Microsoft, was one of the must-see speakers for me and one I expected to draw a big crowd. He's a natural speaker, with an informal nature and did an entertaining talk which I rate extremely highly. "DMVs basically exist because I was p*ssed off" was a particularly entertaining moment. Hearing his insight into how the Query Optimizer works definitely made it a session well worth attending, as proved by the size of the audience - the room was packed. "Snug" was becoming a theme word for the day already!

Note to self: get on with reading "Microsoft SQL Server 2008 Internals" which is currently sitting on my desk.

Performance Monitoring and Tuning - Tips and tricks

Another DBA session by Ramesh Meyyappan that didn't disappoint. And another one that is on my list that I want to watch back again when the recordings are available.

At this point in the day, 3 out of 3 sessions I'd attended were on the DBA track. This just further emphasised something I previously blogged about: Rise of the SQL Server DevBA. As an SQL Server developer, I found these of so much value. If you want to build performant, scalable database systems, you can't be blind to these topic areas.

Time to switch to Dev sessions

After lunch, I attended 2 Dev sessions: "Denormalisation - Having your cake and eating it" by Mark Whitehorn & Yasmeen Ahmed followed by "Improving Database Performance by removing the Database" by Simon Munro. Denormalisation is a topic I'm comfortable with and the session was all theory. The other session was something I followed as a general theme at QCon recently (as I previously blogged about). So I personally didn't take as much away from these sessions, but this was something I should have anticipated from the session abstracts. I did enjoy some of Simon's slides in particular, prompting a few LOLs.

Icing on the cake

The final session of the day for me, was on the BI track: "Designing and Tuning High Speed Data Loading", held by Thomas Kejser from the SQL Customer Advisory Team. I really enjoyed his talk on the various techniques for quickly loading data into SQL Server and on minimally logged operations. Another session I want to watch back again as it topped off a brilliant day.

SQLBits - you know it makes sense

This was the first SQLBits I've attended and if SQL Server is your thing, I completely recommend you try to make it to future ones. I know I wouldn't hesitate to attend again if I have the chance. It's a great opportunity to sponge up more knowledge and meet up with other SQL Server professionals. I had a number of good chats with other attendees, in particular SQL Server tweeps John Samson (Twitter) and Vivekanand Serou (Twitter).

Overall, this kind of event is great for broadening your horizons. Never be satisfied just with what you already know - go out there and push to learn more. Sometimes the most valuable nuggets of information take you by surprise!

Keeping my eyes peeled for the next event to attend!

Sunday, 21 February 2010

Optimising date filtered SQL queries

How you structure your SQL queries is very important and choosing the wrong approach can have big effects on the performance of the query.

One of the key things that should flag up the potential for needing optimisation, is if you are using a function/calculation within a WHERE clause. These can lead to inefficient execution plans, preventing sub-optimal index use.

A classic example to demonstrate this is when querying the data for a specific month. Here's the test setup, creating a table to hold customer orders, populating it with 100,000 records spread out randomly throughout the year of 2009, and creating an index on the OrderDate column that includes the "Total" data (SQL Server 2005 and later support the ability to include other non-key columns in nonclustered indexes, via the INCLUDE syntax):
    OrderDate DATETIME,
    CustomerID INTEGER,
    Total MONEY

SET @Counter = 1

WHILE (@Counter <= 100000)
    INSERT [OrderHead] (OrderDate, CustomerID, Total)
    SELECT DATEADD(dd, CAST(RAND() * 365 AS INTEGER), '20090101'), 
        CAST(RAND() * 100 AS INTEGER),
        CAST(RAND() * 100 AS DECIMAL(5,2))
    SET @Counter = @Counter + 1
ON OrderHead(OrderDate) INCLUDE(TOTAL)
Say you want to find the total value of sales made within June 2009. The key thing to do is to get optimal usage of the index on the OrderDate field.

Optimal Approach
SELECT SUM(Total) AS MonthlyTotal
FROM [OrderHead]
WHERE OrderDate >= '20090601'
  AND OrderDate < '20090701'
Average stats over 3 runs:
Duration: 45ms, CPU: 0, Reads: 29

Execution Plan:
This is a good execution plan, with an optimal index seek.

How does that compare to the alternative approaches?

Alternative #1
SELECT SUM(Total) AS MonthlyTotal
FROM [OrderHead]
WHERE MONTH(OrderDate) = 6
    AND YEAR(OrderDate) = 2009
Average stats over 3 runs:
Duration: 153ms, CPU: 36, Reads: 324

Execution Plan:

See how an index scan is now being used, not a seek. This is due to the use of the MONTH and YEAR functions within the WHERE clause preventing the most optimal use of the index.

Alternative #2
SELECT SUM(Total) AS MonthlyTotal
FROM [OrderHead]
WHERE DATEDIFF(mm, OrderDate, '20090601') = 0
Average stats over 3 runs:
Duration: 186ms, CPU: 78, Reads: 324

Execution Plan:
This is the same as Alternative #1.

In this scenario, the optimal approach reduced duration by up to 75%, took CPU down to 0 and reduced reads by about 91%.

For such a simple tweak, you can see how much of a difference it can have on performance. When you want a high performance, scalable system, it is important to optimise these scenarios in your SQL queries.

Wednesday, 17 February 2010

Using the SP:StmtCompleted SQL Profiler trace event class

One question I've seen popping up a few times recently, is how to check what individual statement(s) within an SQL Server stored procedure are taking the most time.

The scenario is that there is a stored procedure that consists of a number of SQL statements. The stored procedure has been flagged up as running slower than perhaps it was expected to, but as there are a number of statements/queries within it, it's not immediately clear where the time is being taken.

Now the immediate solution that springs to mind is to check the execution plan as that will show where the time is being spent. Job done.

But in this instance, the question related to how to extract a list of execution times/stats for each statement within a stored procedure and the focus was away from just looking at the execution plan.

One approach mentioned was to record the start and end time either side of each statement and log out to an audit table to then be reviewed. I think this is a bit clunky, and means you have to go in and add potentially a lot of temporary code that needs to be stripped out again later.

Instead, what I think is a better approach is to use SQL Profiler. When you run a standard/default trace it doesn't capture the execution of each individual statement, just the overall execution of the stored procedure. What you need to do is set up the trace to monitor the SP:StmtCompleted event so that what you end up with is a trace of every statement executed within the stored procedure call. The trace can then of course be saved to file or to a database table if needed.

It's simple, but effective. I think this particular event class often slips under the radar as it's (rightly so) not traced by the standard template that most people (including me) tend to use when using SQL Profiler. And unless you've looked through all the event classes, you may not know that this was there or what it actually does.

Friday, 12 February 2010

Sorting a DataTable - LINQ performance

Whenever there are a number of ways to achieve the same goal, I'm always inquisitive as what the performance difference is between them. I'm a firm believer in thinking about scalability from the start - if you can make your best effort to prepare for scale, then you can save yourself time and effort further down the line. I like to try and avoid pain - it doesn't agree with me.

In this instance, I become curious about sorting an in-memory DataTable, using C# / .NET 3.5. So I decided to run a few tests using the common approaches. Now, you may be thinking "Why not just order the results from the database using an ORDER BY clause?". Well yes, you could do that. But what about when you want to cache some data once in memory, and then use that cached copy for subsequent purposes to prevent round-tripping back/re-hitting the database server? Or, what if the data isn't actually coming from a database but some other source?

So I knocked up a quick test harness. For each method, I tested sorting a DataTable containing between 100 and 2 million rows of data. The DataTable contained 2 columns:
ColumnA - integer, just an incrementing number
ColumnB - string, in the format {Character}{Row Number} where {Character} just loops round from A-Z just to mix the records up a bit and give the need for ordering.

Method 1 - DataView.Sort
DataView vw = dt.DefaultView;
vw.Sort = "ColumnB ASC";

Method 2 - DataTable.Select
DataRow[] rows = dt.Select("", "ColumnB ASC");

Method 3 - LINQ to DataSet
var rows = (from r in dt.AsEnumerable()
orderby r["ColumnB"] ascending
select r).ToArray();
Note: the .ToArray() bit in the LINQ above is important - this makes the execution of the query immediate. Without it, what you are actually really doing is just defining a query object. It does not execute until you try to request data from the query object - this is known as deferred execution. So, in this example, without the enclosing brackets and the subsequent .ToArray(), the data wouldn't actually be being sorted at this point.

Here's the results:
No. Of RowsMethod 1 - DataView.SortMethod 2 - DataTable.SelectMethod 3 - LINQ
100 0.0625s 0.0520s 0.0475s
1000 0.0781s 0.0573s 0.0573s
10,000 0.1618s 0.1094s 0.0989s
100,000 1.4793s 0.8959s 0.7084s
1,000,000 16.1318s 9.8290s 8.4534s
2,000,000 35.094s 21.5995s 18.3420s

As you can see from my tests, LINQ to DataSet came out tops. With a smaller DataTable the difference is, as you'd expect, minimal. Though as the volume of rows increases, LINQ seems to keep out-performing the other two approaches, being nearly 100% quicker than a DataView Sort as you get to the level of hundreds of thousands of rows, and about 14% quicker than a DataTable.Select.

Thursday, 11 February 2010

SQL Server 2008 - Table Valued Parameters vs XML vs CSV

The scenario is, you want to create a stored procedure that returns the records relating to a finite set of keys/IDs that you pass in to it. How do you do it? Well you could use dynamic SQL, but for the purpose of this article I'm going to ignore that option.

Test scenario:
-- Create basic customer table
CREATE TABLE [dbo].[Customer]

-- Populate Customer table with 100,000 sample records
SET @Counter = 1
WHILE (@Counter <= 100000)
INSERT Customer (Name)
VALUES ('Test Customer #' + CAST(@Counter AS VARCHAR(10)))
SET @Counter = @Counter + 1

Option 1 - CSV list of keys
In SQL Server 2000, you would most likely have ended up passing in a CSV list of keys, splitting that out into a table, and then joining that on to your real data table to match the records to return. dbo.fnSplit is a user defined function that iterates through the string and splits each value out into a TABLE variable which is then returned. To save space, I haven't supplied the definition of that, but there's plenty of examples of this function out there via a quick Google.
CREATE PROCEDURE [dbo].[uspGetCustomersCSV]
@CustomerIDs VARCHAR(8000)
SELECT c.Id, c.Name
FROM [dbo].[Customer] c
JOIN dbo.fnSplit(@CustomerIDs, ',') t ON c.Id = t.item

Example Use:
EXECUTE [dbo].[uspGetCustomersCSV] '1,10,100'

Option 2 - XML
SQL Server 2005 added the XML datatype, which allows you to pass in an XML blob containing the keys and using the new built-in XML support to incorporate it into a JOIN.
CREATE PROCEDURE [dbo].[uspGetCustomersXML]
@CustomerIDs XML
SELECT c.ID, c.Name
FROM [dbo].[Customer] c
JOIN @CustomerIDs.nodes('IDList/ID') AS x(Item) ON c.ID = Item.value('.', 'int' )

Example Use:
EXECUTE [dbo].[uspGetCustomersXML] '<IDList><ID>1</ID><ID>10</ID><ID>100</ID></IDList>'

Option 3 - Table Valued Parameters
Now in SQL Server 2008, there's a new kid on the block. Table Valued Parameters. In a nutshell, this is the ability to pass a TABLE type in as a parameter, which previously was not allowed. The beauty of this, is it allows you to focus on writing the query in a more natural manner - no string parsing/manipulation, no XML functionality. Plain, simple, SQL.

First you create a new TYPE, defining the TABLE structure.

Next you create your stored procedure, taking that new type as a parameter.
CREATE PROCEDURE [dbo].[uspGetCustomersTable]
@CustomerIDs CustomerIDTableType READONLY
SELECT c.ID, c.Name
FROM [dbo].[Customer] c
JOIN @CustomerIDs t ON c.Id = t.Id

Example Use:
DECLARE @Ids CustomerIDTableType
EXECUTE [dbo].[uspGetCustomersTable] @Ids

If you're wanting to call this from .NET, it's quite simple. You define the parameter as SqlDbType.Structured and the value can be set to any IEnumerable, DataTable, or DbDataReader!

The follow stats are the average figures over 3 runs, against the 100,000 row table, retrieving just 7 rows of data for keys: 1, 10, 200, 3000, 40000, 50000 and 90000. Each time, the cache was fully cleared to allow fair comparisons.

Option 1 (CSV): Duration=123ms, Reads=107, Writes=1, CPU=15
Option 2 (XML): Duration=80ms, Reads=69, Writes=0, CPU=5
Option 3 (TVP): Duration=69ms, Reads=27, Writes=0, CPU=0

This was quite a modest test, and by no means an absolute stress test. But the results look very promising for Table Valued Parameters. But not only that. I like them, because they let you get on with writing straightforward, maintainable queries.

Tuesday, 9 February 2010

Would you like SQL cache with that?

One of the things I feel I keep badgering on about in the world of SQL Server and query tuning is to be fair and consistent when comparing the different possible variants. If you don't level the playing field, then potentially it will lead to you thinking a particular query out-performs another when in fact the opposite could be true. It's a bit like comparing 2 paper boys to see which one can complete a given round the quickest - if one of them has been doing the round for the past 2 weeks but the other has never done that round before, then are you going to be able to reliably tell which one is actually quickest?


One of them knows the route, the houses, and which houses have the dog that will chew their hand off given half the chance, but the other paper boy has no such knowledge. This puts him at an unfair disadvantage and almost certainly he will be slower. So until you compare them fair and square, from the same starting point, then you won't truly know who is quicker.

This is true when query tuning. So how do I do fair comparisons of queries? Well, nothing ground-breaking:

Clear the data cache and execution plan cache.
The first time you run a query an execution plan will be generated and stored in the execution plan cache. The data will be held in SQL Server's data cache/buffer. If you then run it for a second time, there is already an execution plan to reuse and the data is in the cache meaning it doesn't need to be re-read from disk. This can give vast improvements in query performance.

CHECKPOINT -- force dirty pages in the buffer to be written to disk
DBCC DROPCLEANBUFFERS -- clear the data cache
DBCC FREEPROCCACHE -- clear the execution plan cache

So, if you don't clear the cache down between tests, then it means you will be getting skewed results. A second different query may appear to be more performant, when in fact it's benefiting unfairly from the primed cache created by the first query.

The data cache is usually the biggest contributor to the performance improvement. If you want to get an idea for what is in the data cache, here's a basic query to run in your database that will show the number of cached data pages by table. This is a slightly adapted version of a query available here in MSDN (strips out the sys table figures) for SQL Server 2005 and above:

SELECT AS TableName, COUNT(*) AS NumberOfCachedPages
FROM sys.dm_os_buffer_descriptors AS bd
SELECT object_id, allocation_unit_id
FROM sys.allocation_units AS au
INNER JOIN sys.partitions AS p
ON au.container_id = p.hobt_id
AND (au.type = 1 OR au.type = 3)
SELECT object_id, allocation_unit_id
FROM sys.allocation_units AS au
INNER JOIN sys.partitions AS p
ON au.container_id = p.partition_id
AND au.type = 2
) AS obj
ON bd.allocation_unit_id = obj.allocation_unit_id
INNER JOIN sys.tables t ON obj.object_id = t.object_id
WHERE database_id = db_id()
ORDER BY NumberOfCachedPages DESC;

Running this after clearing the cache, and then after running a query, will soon give you at least some idea of what is being cached. I recommend having a look at sys.dm_os_buffer_descriptors.

So, this is why whenever I want to try out different versions of a query for performance, I like to follow a standard approach:
1) start a trace in SQL Profiler, filtered down to just my process (by user name)
2) clear the cache
3) run query #1
4) record the reads, writes, CPU and duration from profiler
5) repeat steps 2 to 4, at least once more
6) run query #1 again but without clearing the cache - this is because I like to see how the query runs with a hot/primed cache
7) repeat steps 2 to 6 for query #2
8) compare the stats

I find it good to try each query a number of times, to get an average performance. Note up until now I've been talking about performance in terms of raw query duration. I also monitor reads/writes/CPU as they are good to keep an eye on too. Where two queries run in about the same time, there may be a more sizeable difference in the other stats to help choose a winner. Of course, you may want to optimise for reads or CPU at the expense of overall speed, but generally it's duration that is of the most interest.

So when I see questions asking "why does query X run faster than query Y?", I feel it's worth checking whether the tests have been done from a level playing field if it's not stated. Usually the answer is yes, but on the occasion where they haven't been done in a fair manner it's well worth the gentle nudge about clearing the cache.