I’ve decided, at least in the first pass, to calculate forward/backward for CRFs on a log scale. The alternative way to maintain floating point stability, by keeping log multipliers for columns of outputs, was just making my head hurt too much in the nested loops.

The log computation involves taking the logarithm of a sum over the exponentiated values found in an array. In symbols:

The problem, of course, is that if any of the x values get sufficiently large (greater than `log(Double.MAX_VALUE)`

, which is about 710), they overflow to positive infinity when you calculate the exponent; if all of the values are tiny (less than -750 or so), they all underflow to 0.0 and the resulting log sum is zero.

Luckily, we can get more arithmetic stability with a little algebra. First, find the maximum value in x:

Then use it to normalize the largest value to 0:

So the largest value passed to the exponential function is 0. If there are really tiny values after subtraction, they’ll become zero and drop out, as they should with limited precision arithmetic.

I’ve implemented a new static utility method to compute the log of the sum of the exponentiated values of an array:

`com.aliasi.util.Math.logSumOfExponentials(double[])`

.

It’ll be out in the next release (hopefully along with CRFs).

In case you’re curious where this comes up, here’s the inductive step of the forward pass of the forward-backward algorithm for CRFs (φ(n,k’) is the feature vector for token position n and previous outcome k’; β[k] is the coefficient coefficient for outcome k):

and here it is on the log scale:

Here’s our Java source from `com.aliasi.util.Math`

:

public static double logSumOfExponentials(double[] xs) { if (xs.length == 1) return xs[0]; double max = maximum(xs); double sum = 0.0; for (int i = 0; i < xs.length; ++i) if (xs[i] != Double.NEGATIVE_INFINITY) sum += java.lang.Math.exp(xs[i] - max); return max + java.lang.Math.log(sum); }

June 25, 2009 at 3:12 pm |

Many of us have defined the same operation pairwise so as to have (with operator overloading) numbers which have the usual operations. In adding pairwise you can use a special function to compute log(1+x) which is more accurate for small x – “log1p”. You could use that in your vector operation as well, although it would only be relevant if all the numbers were MUCH smaller than the maximum.

June 25, 2009 at 3:37 pm |

Thanks for reminding me about the special log calculation.

For everyone else who’s not knee deep in these derivations, the binary case is:

log(exp(a) + exp(b)) = a + log(1 + exp(b-a))

and this is where the special calculation for log(1+x) comes into play.

Check out John Cook’s discussion of log(1+x), which uses a Taylor approximation for x < 1e-4 (that is, x < 0.0001), and simply makes a lib call otherwise.

June 25, 2009 at 9:45 pm |

That’s what I said in the replay of the logistic regression post.

June 25, 2009 at 11:01 pm |

sth4nth is talking about his or her comment on my softmax without overflow blog entry.

Sorry I’m so slow with all this arithmetic precision stuff; I should’ve definitely put exp(2) and exp(2) together in this case.

July 13, 2009 at 7:46 pm |

In my limited experience, the more useful function to approximate is f(x) = log(1+exp(x)). You only need to do that for x>=0 or x<=0. For sufficiently large x, log(1+exp(x)) ~= x; and for sufficiently small x (e.g. x < 750 in IEEE double precision), log(1+exp(x)) ~= 0. Moreover, the difference between the function and its asymptotes is symmetric about 0, i.e., f(x) – x == f(-x). So you only need an approximation for f near 0 and only on one side of 0. A rational approximation of f can usually be evaluated much faster than log (or log1p) and exp, since each one of those is actually computed by a polynomial or rational approximation in turn.

(Aside: f should be familiar, though what usually shows up is df/dx.)