When You Hear Hoofbeats…

by

I’ve been learning C++ and feeling like a total newbie. I spent about half an hour debugging a “forgot to put a comma at the end of a class definition” bug and several hours on a “virtual functions must be defined, not just declared in abstract base classes” bug. Oddly, I sorted out my templating/overloading bug really quickly.

If you hear hoofbeats…

This got me to thinking about my old friend, Hunt and Thomas’s Pragmatic Programmer book. One of their pieces of advice for debugging is:

If you hear hoofbeats, think horses, not zebras.

(Of course, this isn’t new — Wikipedia’s Zebra (medicine) page is for a surprising diagnosis in a medical context, with the analogy attributed to Theodore Woodward in the 1940s.)

Uh, it’s not working

It dawned on me that the problem newbies have is that we’ve never seen a horse, so when we hear hoofbeats, we have no idea what to expect. A better analogy is that I grew up in the American wild west and recently moved to Africa, so the sounds are vaguely familiar in one sense (animals, birds, water) yet entirely different in another (what’s a parrot?). But my instincts are pretty off. Who’d have thought you needed to define a method in a virtual base class?

I found some online resources pretty helpful. Ironically, most useful were the clueless newbs who sent their error messages without their code and wizened sages conjectured on possible causes. I realize I can do this in my sleep in Java, but have no clue at all when it comes to C++.

Please build the Hoofbeats web site

I found sites like StackOverflow pretty useless for the problems I was having. That’s because they’re indexed sensibly by content.

What I want is a reverse index where you type in your error message, and a list of possible causes are enumerated.

You could even set it up like a community site. For instance, if your error message was “hoofbeats”, people could add possible causes like “horses” (972 votes [let’s be optimistic]), “zebras” (112 votes, qualified by “likely only if you’re in Africa or at a zoo”), and “antelope” (22 votes).

Pretty please. Not just for me, for humanity.

14 Responses to “When You Hear Hoofbeats…”

  1. Alex Says:

    Searching for errors in google often bring the right clue. Unfortunately, many important symbols in programming are filtered out by the search engines. It’d be great if they didn’t when they encounter discussions of problems.

    • Bob Carpenter Says:

      I certainly do that. Do you know the great XKCD tech support cheat sheet?

      The problem is that most of what rises to the top on the Google results list are automatic build failure reports from automated build tools for large projects. After that, there’s scads of annoying ad-based sites that barrage you with advertisements zinging all over the place. As I mentioned above, the sites where people do other people’s homework seem the best.

      The C++ FAQ is great, but doesn’t go down to the level of compiler or linker errors.

  2. Dr. Jochen L. Leidner Says:

    The trouble with C++ is that it evolved more like a disease (sorry for the analogy, Bjarne) than like a planned language, so it’s not compiler friendly. As a formal language it’s not LALR(1), I believe people have even demonstrated it has non-context free elements in it.

    What this formal language complexity statement means in practice is nobody can understand the C++ compilers’ error messages, and the compilers are bad at recovering. In the early days, STL on top of the core language made things even more mysterious. It took more than 15 years to get compilers to assign the same meaning to a pice of template code (Java developers: read “generics”), and we’re still waiting for hash tables [to be added to the ISO standard].

    At surface value, Java and C++ look very much alike, but that is an illusion – the meaning and machine model are radically different.

  3. Keith Trnka Says:

    Might be nice to have a diagnostic site like you’re suggesting. If I had to compare C++ and Java error messages, I’d say that (on the surface), C++ error messages normally have nothing to do with the piece of code you’re missing. So you end up learning the error messages as a sort of language of their own.

  4. Bob Carpenter Says:

    @Dr. Leidner: Indeed. I believe the polite term is “federation of languages”, or at least that was Scott Meyers’ first point in Effective C++.

    I do like the execution model, though. It’s just so tight compared to Java’s. It’s sort of like riding in a Formula I car compared to the Mercedes sedan that is Java.

    @Kieth Trnka: I think this is a problem with many compilers. You just don’t get the error until you’re well past where the conceptual error was. It just seems that C++ is even worse with all the preprocessing for the basic preprocessor and templates.

    The worst time I’ve had debugging Java error messages were with generics, but that’s not so much a problem with the error message as with the convoluted form of Java generics (which was largely the result of the heroic backward compatibility retrofit in which they were introduced).

  5. Tom Emerson Says:

    Some compilers are definitely better than others in this regard. If you can, try building your code in different environments. Or switch compilers all together while you’re still using training wheels: clang is well known for the quality of its diagnostics, for example.

  6. Keith Trnka Says:

    Forgive me if I’m assuming too much, but in the initial paragraph it seems like you want virtual functions to replace abstract methods in Java, but what you probably want are pure virtual functions (with the funky = 0 at the end of the prototype). In this sense I definitely agree that C++ is giving you a lot more than Java, cause you can control whether the version of a member function to use is determined by the variable type or the object type.

    • Bob Carpenter Says:

      Exactly. They’re now funky = 0-ed to be “pure virutal”. I was just used to Java, where abstract methods have the same syntax as interface methods.

  7. Mark J Says:

    If you want an experience that is both sad and scary, take a look at the plans for C++0x, the “next generation” C++. The language and the standard library are set to become much more complicated. This reminds me of what happened to CommonLisp: version 1 was pretty easy to master, but version 2 incorporated so many extensions that it became impossible for people with a day job to learn all of it. Also, it seems that some of the C++0x “improvements” may be seriously broken. Sigh.

  8. derenrich productions » Towards Better Support Says:

    […] recently read an interesting proposal for a new kind of programming support site. Something that could improve things like StackOverflow. […]

  9. Denzel Says:

    @Bob, did you ever mentioned your motivation for learning/using the C programming language? I have been using Java for normal software development, and Matlab for prototyping, but never thought about going back to C/C++. I will be learning Python though, as it is used more in Bioinformatics nowdays.

    • Bob Carpenter Says:

      Scalability and the availability of solid statistical and matrix libraries. Anything that looks efficient in Python or Matlab is written behind the scenes in Fortran or C/C++. I imagine this is why C/C++ is quite a bit more popular in both stats and large-scale machine learning than Java.

      The application I’m working on is scalable Bayesian inference, for which we’re using Hamiltonian Monte Carlo [the link is to MacKay’s book chapter and has Octave source code]. I’m using automatic differentiation based on C++ templating you can do it in Lisp-like languages and C++, but not Java) to compute the gradients. The PyMC developers tell me they’re working on something similar — we’re sharing some benchmark data with them, which will allow us to test each other’s solutions, too.

  10. Matthew H. Says:

    From the Zebra (medicine) page:

    In making the diagnosis of the cause of illness in an individual case, calculations of probability have no meaning. The pertinent question is whether the disease is present or not. Whether it is rare or common does not change the odds in a single patient. … If the diagnosis can be made on the basis of specific criteria, then these criteria are either fulfilled or not fulfilled. — A. McGehee Harvey, James Bordley II, Jeremiah Barondess

    I know people are really bad at thinking about probabilities explicitly, and that this probably wouldn’t be a useful way to come up with a diagnosis. But this is still sort of terrifying. He doesn’t think the frequency distribution of the conditions is relevant _at all_?! Maybe he just misspoke and simply meant that what matters is the posterior probability you end up with. Yeah, that must be it.

    • Bob Carpenter Says:

      I agree that the Harvey et al. quote is terrifying. And wrong.

      I suspect they’re muddling the frequentist philosophy of probability wherein probabilities are only defined in cases with potentially unbounded number of replications. The early 20th century statistician/philosophers nearly ruined statistics!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s