I had to respond to John Cook’s blog post,
My response was so long, I thought I’d repost it here.
I didn’t understand the answer properly six months ago, despite spending a couple years as a professional C coder (when I did speech recogntion) and having spent the past ten or so years writing Java. C++ is enough different than either of these to be a completely different sort of beast.
C++ is Faster than C!
At least, it’s easier to write fast code in C++ than in C these days. In fact, these days, C++ is the language of choice for optimization, not plain old C. The reason it’s so efficient is twofold.
Reason 1: Tight Data Structures
First, C++ is intrinsically stingy with memory (unlike Java objects, a C++ struct has no memory overhead if there are no virtual functions [modulo word alignment issues]). Smaller things run faster due to caching, and are also more scalable. Of course, this is true of C, too.
Reason 2: Template Metaprogramming
What makes C++ both hard to debug and understand and even more efficient than C is the use of template metaprogramming.
In particular, there’s huge gains to be made with the kind of lazy evaluation you can implement with expression templates and the kind of memory and indirection savings you get by avoiding virtual functions with the curiously recurring template pattern.
[Update: 28 July 2011: I thought an example would help, and this was the original one motivating template expressions.]
Suppose you have a scalar x
and two m
by n matrices a
and b
. Now suppose you want to evaluate d = a + (x * b)
. The standard non-lazy approach would be to create an intermediate matrix c
representing the scalar-matrix product (x * b)
and then add that to a. Basically, what’d happen is the same as if you’d coded it like this:
matrix c(m,n);
for (int m = 0; m < M; ++m)
for (int n = 0; n < N; ++n)
c[m,n] = x * b[m,n];
matrix d(m,n);
for (int m = 0; m < M; ++m)
for (int n = 0; n < N; ++n)
d[m,n] = a[m,n] + c[m,n];
return d;
With clever template expression programming, you can get around allocating an intermediate value, so that the resulting computation would amount to the same thing as coding it as follows.
matrix d(m,n);
for (int m = 0; m < M; ++m)
for (int n = 0; n < N; ++n)
d[m,n] = a[m,n] + x * b[m,n];
return d;
This saves the allocation and deallocation of matrix c
. Memory contention is a huge problem with numerical code these days as CPU speed and parallelism outstrips memory speed, so this is a very large savings. You also save intermediate index arithmetic and a number of sets and gets from arrays, which are not free.
The same thing can happen for transposition, and other operations that don’t really need to generate whole intermediate matrices. The trick’s figuring out when it’s more efficient to allocate an intermediate. The Eigen matrix package we’re using seems to do a good job of this.
[end Update: 28 July 2011]
Voting with Their Code
I just visited Google NY lat week, where some of my old friends from my speech recognition days work. Upon moving to Google, they reimplemented their (publicly available) finite state transducer library in C++ using expression templates (the OpenFST project).
You’ll also see expression templates with the curious recurrence in the efficient matrix libraries from Boost (BLAS level only) and Eigen (also provides linear algebra). They use templates to avoid creating intermediate objects for transposes or scalar-matrix products, just like a good Fortran compiler.
But Wait, C++ is also More Expressive
Templates are also hugely expressive. They lead to particularly neat implementations of techniques like automatic differentiation. This is what roused me from my life of Java and got me into C++. That, and the need for good probability and matrix libraries — why don’t these things exist in Java (or where are they if I missed them?).
The Future: Stochastic Optimization
What’s really neat is that there are now stochastic optimizers for C++ that can analyze your actual run-time performance. I really really really like that feature of the Java server compilers. Haven’t tried it in C++ yet.
PS: So Why is LingPipe in Java?
Ability to write fast programs isn’t everything. Java is much, and I mean much, easier to deploy. The whole jar thing is much easier than platform-specific executables.
Java’s code organization is much cleaner. None of this crazy header guard and .hpp versus .cpp stuff. No putting namespaces one place, code another place, and having them completely unrelated to includes (you can be more organized, but this stuff’s just much easier to figure out in Java because it’s conventionalized and everyone does it the same way). The interfaces are simpler — they’re explicit, not undocumented pure abstractions (where you have to look at the code of what they call an “algorithm” to see what methods and values it requires of its objects).
Java has support for threading built in. And a great concurrency library. There are web server plugins like servlets that make much of what we do so much easier.
There’s also programming time. I’m getting faster at writing C++, but there’s just so much more that you need to think about and take care of, from both a system author and client perspective (mainly memory management issues).
Many people don’t realize this, but C is pretty much the most portable language out there. As far as I know, there’s no place Java runs that C won’t run. C++ is a different story. The compilers are getting more consistent, but the texts are still fraught with “your mileage may vary due to compiler” warnings, as are the mailing lists for the different packages we’re using (Eigen, Boost Spirit, Boost Math), all of which lean heavily on templates.