How you structure your SQL queries is very important and choosing the wrong approach can have big effects on the performance of the query.
One of the key things that should flag up the potential for needing optimisation, is if you are using a function/calculation within a WHERE clause. These can lead to inefficient execution plans, preventing sub-optimal index use.
A classic example to demonstrate this is when querying the data for a specific month. Here's the test setup, creating a table to hold customer orders, populating it with 100,000 records spread out randomly throughout the year of 2009, and creating an index on the OrderDate column that includes the "Total" data (SQL Server 2005 and later support the ability to include other non-key columns in nonclustered indexes, via the INCLUDE syntax):
Optimal Approach
Duration: 45ms, CPU: 0, Reads: 29
Execution Plan:
This is a good execution plan, with an optimal index seek.
How does that compare to the alternative approaches?
Alternative #1
Duration: 153ms, CPU: 36, Reads: 324
Execution Plan:
See how an index scan is now being used, not a seek. This is due to the use of the MONTH and YEAR functions within the WHERE clause preventing the most optimal use of the index.
Alternative #2
Duration: 186ms, CPU: 78, Reads: 324
Execution Plan:
This is the same as Alternative #1.
In this scenario, the optimal approach reduced duration by up to 75%, took CPU down to 0 and reduced reads by about 91%.
For such a simple tweak, you can see how much of a difference it can have on performance. When you want a high performance, scalable system, it is important to optimise these scenarios in your SQL queries.
One of the key things that should flag up the potential for needing optimisation, is if you are using a function/calculation within a WHERE clause. These can lead to inefficient execution plans, preventing sub-optimal index use.
A classic example to demonstrate this is when querying the data for a specific month. Here's the test setup, creating a table to hold customer orders, populating it with 100,000 records spread out randomly throughout the year of 2009, and creating an index on the OrderDate column that includes the "Total" data (SQL Server 2005 and later support the ability to include other non-key columns in nonclustered indexes, via the INCLUDE syntax):
CREATE TABLE [OrderHead] ( ID INTEGER IDENTITY(1,1) PRIMARY KEY, OrderDate DATETIME, CustomerID INTEGER, Total MONEY ) DECLARE @Counter INTEGER SET @Counter = 1 WHILE (@Counter <= 100000) BEGIN INSERT [OrderHead] (OrderDate, CustomerID, Total) SELECT DATEADD(dd, CAST(RAND() * 365 AS INTEGER), '20090101'), CAST(RAND() * 100 AS INTEGER), CAST(RAND() * 100 AS DECIMAL(5,2)) SET @Counter = @Counter + 1 END CREATE NONCLUSTERED INDEX IX_OrderHead_OrderDate ON OrderHead(OrderDate) INCLUDE(TOTAL)Say you want to find the total value of sales made within June 2009. The key thing to do is to get optimal usage of the index on the OrderDate field.
Optimal Approach
SELECT SUM(Total) AS MonthlyTotal FROM [OrderHead] WHERE OrderDate >= '20090601' AND OrderDate < '20090701'Average stats over 3 runs:
Duration: 45ms, CPU: 0, Reads: 29
Execution Plan:
This is a good execution plan, with an optimal index seek.
How does that compare to the alternative approaches?
Alternative #1
SELECT SUM(Total) AS MonthlyTotal FROM [OrderHead] WHERE MONTH(OrderDate) = 6 AND YEAR(OrderDate) = 2009Average stats over 3 runs:
Duration: 153ms, CPU: 36, Reads: 324
Execution Plan:
See how an index scan is now being used, not a seek. This is due to the use of the MONTH and YEAR functions within the WHERE clause preventing the most optimal use of the index.
Alternative #2
SELECT SUM(Total) AS MonthlyTotal FROM [OrderHead] WHERE DATEDIFF(mm, OrderDate, '20090601') = 0Average stats over 3 runs:
Duration: 186ms, CPU: 78, Reads: 324
Execution Plan:
This is the same as Alternative #1.
In this scenario, the optimal approach reduced duration by up to 75%, took CPU down to 0 and reduced reads by about 91%.
For such a simple tweak, you can see how much of a difference it can have on performance. When you want a high performance, scalable system, it is important to optimise these scenarios in your SQL queries.
No comments:
Post a comment