Optimising date filtered SQL queries

How you structure your SQL queries is very important and choosing the wrong approach can have big effects on the performance of the query.

One of the key things that should flag up the potential for needing optimisation, is if you are using a function/calculation within a WHERE clause. These can lead to inefficient execution plans, preventing sub-optimal index use.

A classic example to demonstrate this is when querying the data for a specific month. Here’s the test setup, creating a table to hold customer orders, populating it with 100,000 records spread out randomly throughout the year of 2009, and creating an index on the OrderDate column that includes the “Total” data (SQL Server 2005 and later support the ability to include other non-key columns in nonclustered indexes, via the INCLUDE syntax):

CREATE TABLE [OrderHead]  
    OrderDate DATETIME,  
    CustomerID INTEGER,  
    Total MONEY  
SET @Counter = 1  
WHILE (@Counter <= 100000)  
    INSERT [OrderHead] (OrderDate, CustomerID, Total)  
    SELECT DATEADD(dd, CAST(RAND() * 365 AS INTEGER), '20090101'),   
        CAST(RAND() * 100 AS INTEGER),  
        CAST(RAND() * 100 AS DECIMAL(5,2))  
    SET @Counter = @Counter + 1  
ON OrderHead(OrderDate) INCLUDE(TOTAL)  

Say you want to find the total value of sales made within June 2009. The key thing to do is to get optimal usage of the index on the OrderDate field.

Optimal Approach

SELECT SUM(Total) AS MonthlyTotal  
FROM [OrderHead]  
WHERE OrderDate >= '20090601'  
    AND OrderDate < '20090701'  

Average stats over 3 runs:
Duration: 45ms, CPU: 0, Reads: 29

Execution Plan:

Optimal Plan

This is a good execution plan, with an optimal index seek.

How does that compare to the alternative approaches?

Alternative #1

SELECT SUM(Total) AS MonthlyTotal  
FROM [OrderHead]  
WHERE MONTH(OrderDate) = 6  
    AND YEAR(OrderDate) = 2009  

Average stats over 3 runs:
Duration: 153ms, CPU: 36, Reads: 324

Execution Plan:

Less Optimal Plan

See how an index scan is now being used, not a seek. This is due to the use of the MONTH and YEAR functions within the WHERE clause preventing the most optimal use of the index.

Alternative #2

SELECT SUM(Total) AS MonthlyTotal  
FROM [OrderHead]  
WHERE DATEDIFF(mm, OrderDate, '20090601') = 0  

Average stats over 3 runs:
Duration: 186ms, CPU: 78, Reads: 324

Execution Plan:
This is the same as Alternative #1.

In this scenario, the optimal approach reduced duration by up to 75%, took CPU down to 0 and reduced reads by about 91%.

For such a simple tweak, you can see how much of a difference it can have on performance. When you want a high performance, scalable system, it is important to optimise these scenarios in your SQL queries.

See also