Monday, 15 February 2010

High performance bulk loading to SQL Server using SqlBulkCopy

If you ever want to bulk load data into an SQL Server database as quickly as possible, the SqlBulkCopy class is your friend (in the System.Data.SqlClient namespace). Since being introduced in .NET 2.0, it has provided an extremely efficient way to bulk load data into SQL Server, and is one the classes that I see as a "must know about". A usual scenario is where you want to dump some data into the database to then do some processing on. Or you want to temporarily load some data in to then query to get some stats from.

To give an idea of how well it performs, I'll compare it to an SqlDataAdapter approach.

SqlBulkCopy approach
// Instantiate SqlBulkCopy with default options,
// supplying an open SqlConnection to the database
using (SqlBulkCopy bulkCopy=new SqlBulkCopy(connection))
{
    // The table I'm loading the data to
    bulkCopy.DestinationTableName = "BulkLoadTable";
    // How many records to send to the database in one go (all of them)
    bulkCopy.BatchSize = myDataTable.Rows.Count;

    // Load the data to the database
    bulkCopy.WriteToServer(myDataTable);
    
    // Close up          
    bulkCopy.Close();
}        

SqlDataAdapter approach
using (SqlCommand insertCommand=new SqlCommand(
"INSERT BulkLoadTable(FieldA, FieldB) VALUES (@FieldA, @FieldB)", connection))
{
    insertCommand.Parameters.Add("@FieldA", SqlDbType.VarChar, 10, "FieldA");
    insertCommand.Parameters.Add("@FieldB", SqlDbType.Int, 4, "FieldB");
    // Setting UpdatedRowSource is important if you want to batch up the inserts
    insertCommand.UpdatedRowSource = UpdateRowSource.None;
    using (SqlDataAdapter insertAdapter = new SqlDataAdapter())
    {
        insertAdapter.InsertCommand = insertCommand;
        // How many records to send to the database in one go (all of them)
        insertAdapter.UpdateBatchSize = myDataTable.Rows.Count;
        
        // Send the inserts to the database
        insertAdapter.Update(myDataTable);                   
    }
}

I've left out the boring stuff like the instantiation and opening of the SqlConnection (connection). Plus I haven't shown the generation of the DataTable (myDataTable) as that's not particularly exciting either.

For this test, I'm loading 2 columns of data to the table named "BulkLoadTable" in my database. The columns in myDataTable match those in the table which keeps it nice and simple. If they didn't match, then when using SqlBulkCopy, you need to map which columns in the DataTable map to which columns in the destination table using the ColumnMappings method on the SqlBulkCopy instance. All straight forward.

So how does SqlBulkCopy perform?

I populated the DataTable with 100,000 records containing some random data. FieldA is just a GUID, FieldB is an incrementing number. The times recorded are average measurements of the actual time to persist the data (i.e. the times to complete bulkCopy.WriteToServer and insertAdapter.Update respectively):

SqlBulkCopy:1.5885s
SqlDataAdapter:25.0729s

That's about a 93% drop in time taken when using the SqlBulkCopy class over SqlDataAdapter approach.

It gets better.

One of the SqlBulkCopy constructor overloads provides the ability to specify SqlBulkCopyOptions. For maximum performance, specify the TableLock option:
SqlBulkCopy bulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, null);
This will obtain a bulk update lock on the destination table during the bulk copy process. By default, it would otherwise obtain row level locks. This further reduces the time to load 100,000 records:

SqlBulkCopy with TableLock Option:0.8229s

Of course, this lock can block out other processes until the loading is complete. Usually in my experience, this is not an issue. For optimal performance I recommend:
  • bulk loading into a new, empty table (staging table)
  • add suitable indexes on to the staging table after the data has been bulk loaded, that will help with subsequent processing/querying. If you add indexes at the start, you will just unnecessarily hinder the insert speed
  • use the TableLock option to maximise throughput

The data source doesn't have to be a DataTable either. The WriteToServer method will also accept a DataRow array, an IDataReader, or a DataTable with a DataRowState (it will persist only the DataTable rows matching the supplied DataRowState).

So, "can I use it for bulk updates?" I hear you ask. No. It's a mechanism for doing bulk inserts only. You can't have everything I suppose.

Update (12 Jan 2011):
Want to squeeze more performance out? Check out my followup post on SqlBulkCopy to SQL Server in Parallel

18 comments:

  1. Well you can indirectly use this for bulk updating. Let's say you have 100,000 rows to update. Create a temporary table, load the rows to update using SqlBulkCopy, then update the real rows in a single SQL statement. This will be order(s) of magnitude faster than 100,000 updates.

    ReplyDelete
  2. Sure - that's a good point. I'm planning to do a separate blog post on that topic at some point.

    ReplyDelete
  3. FYI, this approach works on #Temp tables, which could allow for the table-level locking improvements AND the bulk updating.

    ReplyDelete
  4. THis approach will not be help ful in situation like i have upload customers. next time i will receive with new customer as well as existing customer with updated data. so how can i accomplish it with sqlbulk upload. as for update i have to manually update the rows.

    ReplyDelete
  5. You cannot use SqlBulkCopy to perform updates - it's a mechanism to quickly *insert* data. However, you can still use it indirectly by bulk loading the data into a *new* staging table within SQL Server, before then merging across into the main table (either using MERGE statement in SQL Server or 1 INSERT + 1 UPDATE statement from the staging table to the master table.

    ReplyDelete
  6. Super helpful post; I was reading on stackoverflow some of your comments on this topic and found this more detailed post exactly what I was looking for.

    There is *some* information out there similar to this, but you boil it down better =).

    Thank you!

    ReplyDelete
  7. @simon Thanks for the kind words! I'm glad you found those comments and this post useful

    ReplyDelete
  8. Hi,

    nice posting, which definitely points into the right direction (SqlBulkCopy). Just one question: wouldn't it have been an even comparison, had you increased SqlDataAdapter.UpdateBatchSize as well?

    In my experience, one of the things that can also kill insert performance, is a bad clustered index choice (one that leads to constant clustered index node splits). In case someone is interested, here is some more detailed information on that: http://arnosoftwaredev.blogspot.com/2011/10/tips-for-lightning-fast-insert.html

    Cheers,
    Arno

    ReplyDelete
  9. @Arno - thanks. Re: SqlDataAdapter.UpdateBatchSize - it was an even comparison, the code I've given does set UpdateBatchSize so in both tests, the batch size was set to the full 100,000 rows.

    ReplyDelete
  10. @Adrian: Right, I see it now. The speedup is really amazing!

    ReplyDelete
  11. Adrian, thanks for this post. I was using the data adapter for SQLSERVER Express utilizing batch update. It was still slow. This is way, way faster. Thanks again for sharing.


    Murali

    ReplyDelete
  12. Yeah, this class rocks in terms of performance. There are some problems with it though - the error messages are not the most descriptive and it has a really bad case sensitivity bug - https://connect.microsoft.com/VisualStudio/feedback/details/94135/sqlbulkcopy-column-mappings-case-sensitivity#tabs

    The bug was reported years ago, doubtful it will ever get fixed.

    ReplyDelete
  13. The case sensitivity bug is an inconvenience, I blogged that too here: http://www.adathedev.co.uk/2010/04/sqlbulkcopy-columnmappings-mismatch.html

    ReplyDelete
  14. Just what I have been looking for. Thanks

    ReplyDelete
  15. I'm using SqlBulkCopy to upload a excel file to SQL Server. Do you have any ideas on checking for existing in SQLSERVER and ignoring it.

    ReplyDelete
    Replies
    1. SqlBulkCopy is for inserts only. But you could bulk load the file into a staging table, and then insert from there into the destination table where the row does not already exist.

      Delete
  16. Gabriel Rodriguez1 June 2013 06:30

    Wish I had learned this earlier. Great stuff man, thanks.

    ReplyDelete
  17. I love SqlBulkCopy. A couple weeks ago I stumbled across this guy's blog that explains how to combine SqlBulkCopy with a merge statement.

    http://www.jarloo.com/c-bulk-upsert-to-sql-server-tutorial/

    Since then I've been using SqlBulkCopy to dump my datatable rows into a temp table which has the same structure as the final destination table. Then I perform a MERGE into the destination table using the temp table as the source. It's works great. I don't know if this will be an issue for others, but I was having issues copying into the temp table until I created the temp table programmatically using SELECT * 1 Into #tempTableName from DestinationTableName then truncated the table. At this point SqlBulkCopy worked well for me. I using ColumnMappings, of course, to specify just the fields I want to bulkcopy.

    Anyway, the combination of SqlBulkCopy to copy data into a temp table, then Merging the temp table into the destination table works really well.

    ReplyDelete