Fastest Way To Insert the Data in MS SQL – Part 3 – Multi-value insert vs Bulk Api

Feb 11, 2024 · 4 min read · java ms sql server ·

Share on:

In a previous post, we concluded that the Bulk API insert is the fastest way to insert data into an MS SQL Server database if the batch size is reasonably large. For Jdbc Template inserts, we created a regular insert statement in the form of insert into table (c1, c2, c3) values (v1, v2, v3), where c1, c2, and c3 are column names, and v1, v2, and v3 are values.

For 100,000 Person objects with a batch size in the range of 1,000-10,000, we concluded that Bulk API insert is four to five times faster than JDBC Template. However, with MS SQL Server, JDBC Template has one more potential speedup that it can utilize, possibly competing with Bulk API and proving useful in certain situations.

If we check the MS SQL Server official syntax for the insert statement, we can see that it supports multiple values inserts in one statement. This means that if we want to insert multiple values into the same table, instead of sending multiple insert statements, we can do that in a single insert statement. The insert statement will take the following form: insert into table (c1, c2, c3) values (v1, v2, v3), (v4, v5, v6).... The maximum number of value bindings that can be included in one statement is 2,100, consistent with any other MS SQL Server statement.

How multiple values insert impact performance

Firstly, we will compare this multiple value insert with a single value insert and check if increasing the number of value bindings per insert statement improves performance.

The following code is used to perform the insertion of 1,000,000 persons. It is very similar to the code from Part 2 except that it now generates multi-value insert statements.

 1public void insertPeople(TableDescriptor<Person> tableDescriptor,
 2                         List<Person> people,
 3                         int batchSize,
 4                         int objectsPerInsert) {
 5    if (people.isEmpty()) return;
 6    objectsPerInsert = Math.min(objectsPerInsert, people.size());
 7
 8    List<List<Person>> partitions = Lists.partition(people, objectsPerInsert);
 9    if (partitions.size() == 1 || partitions.get(partitions.size() - 1).size() == objectsPerInsert) {
10        persist(tableDescriptor, batchSize, objectsPerInsert, partitions);
11    } else {
12        List<List<Person>> equalSizePartitions = partitions.subList(0, partitions.size() - 1);
13        persist(tableDescriptor, batchSize, objectsPerInsert, equalSizePartitions);
14        List<List<Person>> lastPartition = Collections.singletonList(partitions.get(partitions.size() - 1));
15        persist(tableDescriptor, 1, lastPartition.size(), lastPartition);
16    }
17}
18
19private void persist(TableDescriptor<Person> tableDescriptor, int batchSize, int objectsPerInsert, List<List<Person>> partitions) {
20    String insertQuery = getInsertQuery(tableDescriptor, objectsPerInsert);
21    jdbcTemplate.batchUpdate(insertQuery,
22            partitions,
23            batchSize,
24            (ps, argument) -> BulkPersistCustomRepositoryImpl.insertBatchData(tableDescriptor, ps, argument));
25}
26
27public <T> void insertBulk(TableDescriptor<T> tableDescriptor,
28                           List<? extends T> data,
29                           int batchSize) {
30    if (data.isEmpty()) return;
31    String insertQuery = getInsertQuery(tableDescriptor, 1);
32    jdbcTemplate.batchUpdate(insertQuery,
33            data,
34            batchSize,
35            (ps, argument) -> {
36                List<ColumnDescriptor<T>> columns = tableDescriptor.getColumns();
37                for (int i = 0; i < columns.size(); i++) {
38                    ColumnDescriptor<T> columnDescriptor = columns.get(i);
39                    ps.setObject(i + 1, columnDescriptor.getObjectFieldValue(argument));
40                }
41            });
42}

I ran multiple tests with different batch sizes and the number of people per insert. Each person has five fields, so the number of binding values per insert is actually the number of people per insert times five.

The results:

Number of people per insert	Batch size	Median duration (ms)
400	100	4175
400	10	4275
400	1000	4348
400	10000	4697
100	1000	5335
100	100	5359
100	10	5585
100	10000	5873
10	10000	6302.5
10	1000	6331
10	100	6512
10	10	8027.5
1	10000	14792.5
1	1000	15222
1	100	17812.5
1	10	25252

As we can observe, the number of bound values per insert has a significant impact on performance. It has an even greater performance impact than the batch size, which is, for me at least, very surprising. The more parameters we bind, the better the performance we achieve. Interestingly, the best performance is with a batch size of 100.

Multi-value insert vs Bulk API

After concluding that we have much better performance with multi-value inserts, I will now compare Bulk API inserts with JDBC Template 400-person inserts per statement.

Method name	Batch size	Median duration (ms)
Bulk Api	10000	3188.5
Jdbc Template	100	4175
Jdbc Template	10	4275
Jdbc Template	1000	4348
Bulk Api	1000	4358
Jdbc Template	10000	4697
Bulk Api	100	14440.5
Bulk Api	10	101251.5

Based on the results, we can see that the performance difference is much smaller than before. As the batch size increases, Bulk API performance improves. However, when we increase the batch size for JDBC Template to more than 100, the performance worsens. Therefore, we should be careful when setting the batch size for JDBC Template with multi-value inserts.

Conclusion

Based on the results above, we can conclude that multi-value inserts drastically improve insert performances and that you should definitely consider it if you have performance problems. On the other hand, if you need to insert a huge amount of data, then Bulk API insert is the king.

For updates, you can follow me on Twitter or LinkedIn.

Thank you for your attention.