“select” Isn’t Broken; or Horses, not Zebras


Hunt and Thomas’s book The Pragmatic Programmer is my first recommendation to someone starting to think about the process of programming. It’s organized into a series of tips, of which the relevant one for today’s discussion is:

Number 26. “select” Isn’t Broken

It is rare to find a bug in the OS or the compiler, or even a third-party product or library. The bug is most likely in the application.

They tell a story about a stubborn colleague who insisted the select system (multithreaded network I/O concept introduced into Java with the nio package) was broken under Solaris. He finally read the doc and found the fix in minutes after weeks of burying his head in the sand. Hunt and Thomas’s advice in the text is “if you see hoof prints, think horses — not zebras”. This is actually a good operating principle. Just don’t rule out zebras if you happen to be in Africa or at a zoo. Or a patient on the T.V. show House, M.D..

While I appreciate Hunt and Thomas’s faith in OSes, compilers and third-party libraries like LingPipe, the fact is, today’s complex compilers and libraries are more often the root of problems than the ANSI C compiler was back in the day.

Let me enumerate problems we’ve had with Java that have affected Alias-i or LingPipe:

  • 2GB file transfer limit under Windows with java.nio. This is a known bug. This caused me to have to rewrite my backup scripts when recently backing up all of Alias-i’s data. They worked fine for years at home backing up photos and music.
  • JDK 1.4 Signature error for XML. This was another known bug. They missed an exception in one of the more rarely used methods, but it messed up all of LingPipe’s XML filters. I had to catch arbitrary exceptions, check them for type using instanceof, and then rethrow in order to get compatibility across versions.
  • 1.6 Generics Compilation. Another known bug. Here the 1.6 compiler can’t handle complex conjunctive, dependent type declarations.

I may be missing something, but that’s a non-trivial number of times it was Java’s fault. If you don’t trust me, check out the Java Bug Parade. In fact, in looking up the link, I found that:

  • File.deleteOnExit() doesn’t work on open files. Some of our temp files never do get cleaned up under windows. I only just realized this was Java/Windows fault and not mine in looking at the Top 25 Java Bugs (most of which have to dow ith Swing).

The most well known bug to be reported in Java recently was Josh Bloch’s bombshell about binary search:

  • Binary Search is Broken for large arrays. Josh tells the story better than me. (The source of the bug, Jon Bentley’s book Programming Pearls, is also one of my top recommendations for people who want to think like developers, not theory professors [like me].)

When I was working at SpeechWorks, we had lots of multi-threaded dialog/speech recognition code which worked fine on every platform but Linux, where threads would just die. I don’t even recall what the workaround was there. And the number of times bugs were traced to C++’s standard template library led the company to institute a no-C++ in the interface policy. It was just too non-portable.

The Latest Debugging Story

Now that I’ve probably lost most of the readers, I’ll confess that the recent perplexing problem with TF/IDF classification was indeed my fault and not Java’s. The unit tests were right — I’d spent hours building them by hand with pencil and paper. The implementation was wrong. I’d swapped two internal arrays, one hold term IDFs and one holding document TF/IDFs. It just coincidentally turned out the two arrays had the same values for “a” and “b” under the 1.5 iterator order, but not under the 1.6 iterator order. I hate dealing with problems like this where the compiler can’t catch the bug. A larger unit test almost certainly would’ve caught this problem.

The 3.1.1 release patches the bug and is now available from the LingPipe Home Page.

%d bloggers like this: