SqlBulkCopy to SQL Server in Parallel

In an earlier post last year, I blogged about high performance bulk loading to SQL Server from .NET using SqlBulkCopy. That post highlighted the performance gain that SqlBulkCopy gives over another batched insert approach using an SqlDataAdapter. But is it possible to squeeze more performance out? Oh yes.

First, a quick recap. For optimal performance:

load into a heap table (with no indexes - add any indexes you need AFTER you’ve loaded the data)
specify the TableLock SqlBulkCopy option. This results in bulk update locks being taken on the destination table instead of row locks. This is key for what I’m going to focus on in this post as I’ll go on to explain. Relevant reading:
- SqlBulkCopyOptions Enumeration MSDN documentation
pave the way for minimal logging by setting the database to the Bulk-Logged recovery model (or Simple). In the Full recovery model all inserts are fully logged in the transaction log, whereas in Bulk-Logged, certain bulk operations can be minimally logged reducing the growth of the transaction log. SqlBulkCopy is an operation that can be. Relevant reading:
- Prerequisites For Mnimal Logging in Bulk Import MSDN reference
- Operations That Can Be Minimally Logged MSDN reference
- SQL Server Myth A Day : Bulk-Logged Recovery Model by Paul Randal (Twitter)

Get squeezing

How can we squeeze more throughput in bulk inserts using SqlBulkCopy?

Multiple bulk update locks on the table from different processes can be taken out at the same time without blocking each other, which means you can scale out the bulk loading across multiple clients in parallel. There will be limitations of course, such as the network connection and the speed of the disks behind the database, but performance gains are very much possible.

Instead of loading data from a single machine, split the data over a number of machines, each one loading it’s own chunk of the data to the database using SqlBulkCopy as previously outlined. Finding an optimal distribution of data/number of clients may take a bit of investigation but the rewards can pay dividends.

Pimp My Test Environment

Unfortunately I don’t have a server rack at home housing multiple servers with a fast IO subsystem - hey, call me a cheap-skate. What I do have is a laptop with an i5-560M (2.67GHz dual core, 4 threads), 8GB RAM and a 7200RPM HDD. That’s how I roll.

What I’ve done is to knock up a quick command-line app in .NET 4.0 using the new Task Parallel Library (TPL) to split the loading of data to SQL Server across 4 threads in parallel, each using a separate database connection and SqlBulkCopy instance. To quote MSDN:

The TPL scales the degree of concurrency dynamically to most efficiently use all the processors that are available. In addition, the TPL handles the partitioning of the work, the scheduling of threads on the ThreadPool, cancellation support, state management, and other low-level details. By using TPL, you can maximize the performance of your code while focusing on the work that your program is designed to accomplish.
Starting with the .NET Framework 4, the TPL is the preferred way to write multithreaded and parallel code.

I loaded 20 million rows of data from an in-memory DataTable to a heap table in SQL Server 2008 with 2 columns: FieldA VARCHAR(10), FieldB INTEGER.
FieldA is just a random sequence of 10 characters, FieldB is an incrementing number.

I ran 2 tests, recreating the database fresh before each run:
Test 1: 1 DataTable of 20 million rows, loaded single threaded. LOAD TIME: 15.4088814s
Test 2: 4 DataTables of 5 million rows, loaded in parallel. LOAD TIME: 12.3377056s

For the fact that this was all running on a single machine (a laptop at that), a 20% reduction in loading time is not too shabby at all! On production quality hardware, with multiple clients separate to the database server, the concept of loading in parallel is well worth bearing in mind and could speed up your data loading process.

Snippets

To give a brief taster of the .NET approach I used, here’s the code of the key parts.

public void ParallelLoadTest()  
{  
    // GetTestData() is just a basic method that returns a List<> of 4 DataTables.  
    // Each DataTable is populated with 5 million rows.  
    List<datatable> tables = GetTestData();   
  
    // For each of the 4 DataTables, kick off a bulk load task to run in parallel  
    Parallel.ForEach(tables, table =>  
        {  
            BulkLoadData(table);  
        }  
    );  
}  
  
public void BulkLoadData(DataTable dt)  
{  
     using(SqlConnection conn = new SqlConnection("{removed connectionstring}"))  
     using (SqlBulkCopy bulkCopy = new SqlBulkCopy(conn, SqlBulkCopyOptions.TableLock, null))  
     {  
          bulkCopy.DestinationTableName = "Dataload";  
          bulkCopy.BulkCopyTimeout = 60;  
          bulkCopy.ColumnMappings.Add("FieldA", "FieldA");  
          bulkCopy.ColumnMappings.Add("FieldB", "FieldB");  
          conn.Open();  
          bulkCopy.WriteToServer(dt);  
          bulkCopy.Close();  
     }  
}

Get squeezing

Pimp My Test Environment

Snippets

See also