Transactions Considerations

Transactionally protecting your container operations is an important ingredient to ensuring the integrity of your containers and databases. However, be aware that transactions may impact your application's performance.

The Berkeley DB Programmer's Reference Guide contains a couple of sections that can help you understand the performance impact transactions can have on your application. See the following sections in the Berkeley DB Programmers Reference Guide for this information:

The next several section in this guide provides a rough introduction to this information.

There are two areas of consideration where it comes to transactional performance. The first is disk I/O and the second has to do with lock contention.

Transaction Disk I/O

Normally when you perform a write to a BDB XML container, the write is not written to disk until a sync is called on the in-memory cache. This syncing occurs either when you force it by using DbEnv::sync, or when your environment is closed. (Note that you can suppress the sync when you close your environment, but this is not the normal case.)

When you transactionally protect your database writes, however, the data modified by the write is written to disk every time the transaction is committed. For applications that run for extremely long periods of time, and which perform relatively few write operations, this can will improve your application's performance because the commit only writes those portions of the cache that were dirtied (written) by the transaction. A full sync, on the other hand, writes the entire cache to disk which is considerably more expensive than the partial write performed by a commit.

Transaction and Lock Contention

Because transactions guarantee isolation from all other threads of control, they must perform locking, and hold those locks for the duration of the transaction. Holding these locks may cause other thread of control to have to wait in order to be able to access the locked data. How much this affects your application will depend on its data access patterns.

Additionally, with transactional applications, it is possible that conflicting lock requests from different threads of control can cause a deadlock to occur. To understand more about deadlocks and how to handle them, please refer to the Deadlock detection section of the Berkeley DB Programmer's Reference Guide (available at: http://www.sleepycat.com/docs/ref/transapp/deadlock.html).

The performance penalty that you might pay due to the additional locking required by your transactions is dependent on a number of factors:

  • The amount of time that your transaction lives. If your transaction is short-lived (the ideal situation), then there is less chance that it will be holding a lock required by another transaction.

  • The number of operations performed by the transaction. A transaction that must read and write hundreds of documents will hold considerably more locks for potentially longer periods of time than an application that reads and writes only a few documents.

  • The number of transactions (typically this means threads of control) in existence at any given time. The more transactions there are, the greater the chance for deadlock contention.

Given this, for best results try to use only short-lived transactions. Also, try to keep the number of operations performed by your transactions small, or try to keep the number of transactions in existence small.

Index Operations and Transactions

One final thing to consider when using transactions with BDB XML has to do with re-indexing containers. If you are performing index add, delete, or replace operations on a very large container (tens of thousands of documents or greater), and you are using transactions to protect these operations, then the operation can potentially fail with the following error message:

Lock table is out of available locks

When you perform an index operation on a container, you are reading and writing every document node in the container. This means that you are asking Berkeley DB to read and write every record in the underlying database.

Every time Berkeley DB performs a read or a write operation, it acquires one or more locks on the database pages on which it is operating. Normally, Berkeley DB releases those locks once it has completed the operation. However, as discussed above, when you use transactions to protect write operations, Berkeley DB holds all locks that it acquires until the transaction completes (is either committed or aborted).

Locks are a finite resource, and so Berkeley DB maintains an internal data structure that identifies how many locks it can use at any given time. By default, this number is 1,000 locks.

The end result is, if you are performing index operations on large containers and you are using transactions to protect those operations, you can run out of locks. When this happens, Berkeley DB fails the operation with the above noted error message.

To work around this problem, you must increase the number of locks available to Berkeley DB. You do this with DbEnv::set_lk_max_locks(). See the online Berkeley DB documentation for more information.