I’m busy working on auto-completion for text inputs (like on the Google home page), so I’ve been thinking hard about efficiency, and thus about bounded priority queues. Auto-completion’s easy compared to more general spell checking in that it has a natural A* search (the A* bound is the best scoring completion of a prefix). Good A* search requires good priority queues.
Recall that a priority queue is a data structure with an element ordering (e.g. a
java.util.Comparator) that supports operations to add elements, and to pop (or peek at) the largest element. A reasonable implementation is with a resizable heap, which makes the add and pop operations logarithmic on the number of entries and the peek operation constant time.
For search algorithms, the beam heuristic (in one incarnation), restricts the queue to the best K elements (it’s a heuristic because it can result in search errors). This means a bounded priority queue, which has a fixed upper bound on size. The problem is that you can’t use a regular heap-based priority queue for this, because there’s no way to pop off the minimal element in less than linear time in the size of the heap.
You can use a fully sorted implementation, like
java.util.TreeSet, and that’s exactly what LingPipe’s
util.BoundedPriorityQueue does. But it’s inefficient, because the red-black tree underlying the set implementation does a whole lot of work, including allocating nodes.
Today’s review paper has the “right” algorithm to do bounded priority queues of fixed maximum sizes, which I implemented way back in LingPipe 2.4 in
- Atkinson, M.D., J.-R. Sack, N. Santoro and T. Strothotte. 1986. Min-max heaps and generalized priority queues. Communications of the ACM 29(10):996-1000.
The usual heap is a binary tree satisfying the heap property, namely that each element is larger than all of its descendants. It’s faster than fully ordering, because of the reduced requirement on sorting all elements.
A min-max heap is like a heap where every other level in the binary tree satisfies the heap property, with the root being the largest element in the queue, and the two daughters of the root being the two smallest elements in the queue, and so on down, alternating greater or lesser at each level. With this arrangement of nodes, getting either the largest or smallest element is constant time. The bubble-up portion of the min-max heap algorithm is very similar to that of the heap algorithm, only with a lot of grand-daughter/grand-mother computations in the tree rather than simple mother/daughter computations. Because its backed by an array, it doesn’t require any allocation of node objects on the fly. It would be possible to resize such a min-max heap, but the basic version was plenty twisty to code on its own.
A related algorithm is the double-ended priority queue, which basically keeps two tree structures overlayed on top of each other, one for the largest-to-smallest and one for the reverse ordering. It’s a better algorithm if you need intervals, which for some reason
java.util.SortedSet requires. I’ve just extended LingPipe’s priority queues to implemented
SortedSet, but I had to punt on the dynamic views returned by the
For auto-complete, I also just added a small bounded priority queue for accumulating results, because bubble-sort‘s much faster for small sets than all the tree-fiddling in tree sets.