Read/Write Locks for Multi-Threaded Training and Execution

by

I finally built an application that exercised the write half of the read/write lock pattern used for most of LingPipe’s classifiers and chunkers (see the recent blog entry describing our <a href=”chunk annotation GUI).

Perhaps not coincidentally, we’re getting customer requests and mailing-list requests for the same functionality: the ability to have multiple users (perhaps over the web), add new training data dynamically, while having the server stay up to date. It’s an issue for everything from significant phrase extraction to chunking.

This post will explain how LingPipe’s models need to be synchronized so that they are thread safe under concurrent training and decoding (where decoding may be significant phrase extraction, language model estimation, chunking, classification or tagging).

Throughout, I’ve implemented our models to be thread-safe for multiple readers. That’s not an issue, and if that’s all you want to do, have at it. We have demos that run on servlets this way. A model is a member object of a servlet and multiple threads access it in the default configuration.

The problem comes in for writes, which means training in this case, including resetting training and decoding parameters. When a thread is writing to a model, no other thread may read or write from the model.

Luckily, this is a very common pattern of synchronization, known as the read-write lock pattern. In addition to generics, the 1.5 JDK finally brought Doug Lea‘s wonderful util.concurrent library into the language as java.util.concurrent. If you don’t know anything about threads, but like detailed coding books, I can recommend Brian Goetz’s master class textbook, Java Concurrency in Practice, to which Doug Lea, Josh Bloch and other Java API design and textbook gurus contributed.

In code, read-write locks are almost trivial. Here’s what you do. Let’s say you have a dynamic classifier:

CharLmHmmChunker chunker = ...;

We’ll also need a read-write lock:

ReadWriteLock mTaggerRwLock
  = new ReentrantReadWriteLock(true);

The parameter true makes the annotator fair in that it schedules threads for the lock in the order of requests. This is not necessary for LingPipe, but may be desirable for applications.

Then, if we have a piece of training data on a thread, we simply grab the write lock:

Chunking trainingData = ...
try {
  mTaggerRwLock.writeLock().lock();
  chunker.handle(chunking);
} finally {
  mTaggerRwLock.writeLock().unlock();
}

Then do the training inside the scope of the lock, making sure to release the lock in a finally block in case the handler throws an exception (this may be unchecked, such as an underlying out-of-memory exception, or may be a checked exception, such as an I/O exception in some cases).

As with training, decoding requires acquiring a lock, doing some work, and then freeing the lock in a finally block:

CharSequence text = ...;
Chunking chunking = null;
try {
  mTaggerRwLock.readLock().lock();
  chunking = chunker.chunk(text);
} finally {
  mTaggerRwLock.readLock().unlock();
}

Note that only the action requiring synchronization is locked. Whatever action the thread needs to take with the resulting chunking will happen after the try/finally block.

That’s it. And it works for any of the LingPipe classes that allow training and execution to be interleaved. Just remember that anything that modifies the object itself is a write operation.

One final note. LingPipe’s cache class, com.aliasi.util.FastCache, which may be used, for example, with HMMs, is already thread safe. Thus even though it does writes, it doesn’t need the class’s write lock. It may be used in models without requring any modification to the above read/write synchronziation pattern. For instance, we’ve used it in servlets that do spelling correction to handle caches in that multi-threaded environment.