Atkinson, Sack, Santoro, and Strothotte (1986) Min-max heaps and generalized priority queues. CACM.

by

I’m busy working on auto-completion for text inputs (like on the Google home page), so I’ve been thinking hard about efficiency, and thus about bounded priority queues. Auto-completion’s easy compared to more general spell checking in that it has a natural A* search (the A* bound is the best scoring completion of a prefix). Good A* search requires good priority queues.

Recall that a priority queue is a data structure with an element ordering (e.g. a java.util.Comparator) that supports operations to add elements, and to pop (or peek at) the largest element. A reasonable implementation is with a resizable heap, which makes the add and pop operations logarithmic on the number of entries and the peek operation constant time.

For search algorithms, the beam heuristic (in one incarnation), restricts the queue to the best K elements (it’s a heuristic because it can result in search errors). This means a bounded priority queue, which has a fixed upper bound on size. The problem is that you can’t use a regular heap-based priority queue for this, because there’s no way to pop off the minimal element in less than linear time in the size of the heap.

You can use a fully sorted implementation, like java.util.TreeSet, and that’s exactly what LingPipe’s util.BoundedPriorityQueue does. But it’s inefficient, because the red-black tree underlying the set implementation does a whole lot of work, including allocating nodes.

Today’s review paper has the “right” algorithm to do bounded priority queues of fixed maximum sizes, which I implemented way back in LingPipe 2.4 in util.MinMaxHeap:

The usual heap is a binary tree satisfying the heap property, namely that each element is larger than all of its descendants. It’s faster than fully ordering, because of the reduced requirement on sorting all elements.

A min-max heap is like a heap where every other level in the binary tree satisfies the heap property, with the root being the largest element in the queue, and the two daughters of the root being the two smallest elements in the queue, and so on down, alternating greater or lesser at each level. With this arrangement of nodes, getting either the largest or smallest element is constant time. The bubble-up portion of the min-max heap algorithm is very similar to that of the heap algorithm, only with a lot of grand-daughter/grand-mother computations in the tree rather than simple mother/daughter computations. Because its backed by an array, it doesn’t require any allocation of node objects on the fly. It would be possible to resize such a min-max heap, but the basic version was plenty twisty to code on its own.

A related algorithm is the double-ended priority queue, which basically keeps two tree structures overlayed on top of each other, one for the largest-to-smallest and one for the reverse ordering. It’s a better algorithm if you need intervals, which for some reason java.util.SortedSet requires. I’ve just extended LingPipe’s priority queues to implemented SortedSet, but I had to punt on the dynamic views returned by the headSet(), tailSet() and subSet() methods.

For auto-complete, I also just added a small bounded priority queue for accumulating results, because bubble-sort‘s much faster for small sets than all the tree-fiddling in tree sets.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s