I just implemented a class OnlineNormalEstimator for the next version of LingPipe. Here’s the Java, which uses fewer member variables than John D. Cook’s implementation:

private long mN = 0L;

private double mM = 0.0;

private double mS = 0.0;

public void handle(double x) {

++mN;

double nextM = mM + (x – mM) / mN;

mS += (x – mM) * (x – nextM);

mM = nextM;

}

public double mean() {

return mM;

}

public double varianceUnbiased() {

return mN > 1 ? mS/(mN-1) : 0.0;

}

there’s also a link in the post to john cook’s blog which talks about the recursive formulas some more.

http://anyall.org/blog/2008/11/calculating-running-variance-in-python-and-c/

I actually seem to recall that there are simple recursive formulas you can use for the mean and variance, but the above approach should work ok, I think.

]]>The place I want(ed?) to use this is in scaling feature values to z-scores to improve the numerical stability of stochastic gradient descent:

z(x) = (x – mean)/sqrt(variance))

As variance approaches zero, the z-scores diverge. We’re just tossing out zero-variance dimensions (no way to estimate coefficients in that case anyway).

That involves walking over the corpus of vectors once collecting the sum of x and x**2, or twice, collecting first the sum of x, then the sum of (x – mean)**2.

Don’t we get the same problem with the terms in the usual definition that uses (x-mean)**2? In fact, the only way we’d seem to get catastrophic cancellation (that is, the difference between two close numbers losing all or most of its arithmetic precision) is when these terms were all near zero.

Is there a better way to compute this? How do packages like R compute this kind of thing (probably lots of ways, but I just meant the built-in)?

More generally, is there some kind of textbook presentation of this kind of thing I can read?

]]>http://en.wikipedia.org/wiki/Variance#Computational_formula_for_variance

]]>