JDK 1.6 for Number Crunching

by

OK, the next time Martin Jansche and Chris Manning independently recommend something, I’m going to listen. I was relaxing this weekend after our Friday NIH grant submission by playing around training some Gaussian mixture models with EM. As we all know, Gaussian estimates involve transcendal calculations (square roots and exponents), and EM applies them in a tight loop. I plugged in the 1.6 JDK (there’s now a release candidate; the end of 1.4 may be in sight), and EM per-iteration times went down 40%. This is running 32-bit on my 3GHz Pentium 4 at home in -server mode.

And the memory footprint was smaller with less dynamic resizing; this app doesn’t generate a lot of small objects, so I don’t know what’s up with this behavior. Maybe 1.6 will be a lot faster in other areas, too. I hear they’re still tuning the GC.

For LingPipe, 1.6 will provide a huge speed boost for all of the dynamic estimators, which compute logs for each estimate. This includes language models, taggers, chunkers and classifiers. Note that this is estimation time, not training time — training doesn’t involve transcendal operations for our models, which is one of the reasons we chose them. But it will make make compilation of the models faster, as compilation just runs all the possible estimates it can precompile.

All I can say is, wow. Thanks, Sun.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s