The Futility of Commenting Code

by

I often read about novice programmers’ aspirations for a greater density of code comments, because they think it’s “professional”. I have news for you. Professional coders don’t comment their own code much and never trust the comments of others they find in code. Instead, we try to learn to read code and write more readable code.

API Documentation vs. Code Comments

I’m a big believer in clear API documentation. Java makes this easy with javadoc. It even produces relatively spiffy results.

But if you look at the LingPipe source code, you’ll find very few code comments.

Comments Lie

The reason to be very suspicious of code comments is that they can lie. The code is what’s executed, so it can’t lie. Sure, it can be obfuscated, but that’s not the same as lying.

I don’t mean little white lies, I mean big lies that’ll mess up your code if you believe them. I mean comments like “verifies the integrity of the object before returning”, when it really doesn’t.

A typical reason for slippage between comments and code is patches to the code that are not accompanied by patches to the comments. (This can happen for API doc, too, if you fiddle with your APIs.)

Another common reason is that the code author didn’t actually understand what the code was doing, so wrote comments that were wrong. If you really mess up along these lines, your code’s likely to be called out on blogs like Coding Horror or Daily WTF.

Most Comments Considered Useless

The worst offenses in the useless category are things that simply repeat what the code says.

// Set x to 3
x = 3;

It’s even better when embedded in a bloated Fortran-like comment block:

/******************************************
 ************* add1 ***********************
 ******************************************
 *  A function to add 1 to integers       *
 *  arguments                             *
 *     + n, any old integer               *
 *****************************************/
public int add1(int n) { return n+1; }

Thanks. I didn’t know what int meant or what n+1 did. This is a classic rookie mistake, because rookies don’t know the language or its libraries, so often what they do seems mysterious to them. For instance, commenting n >>> 2 with “shift two bits to right and fill in on the left with zeroes” may seem less egregious, but your reader should know that >>> is the unsigned shift operator (or look it up).

There is a grey line in the sand (I love mixed metaphors) here. When you’re pulling out techniques like those in Bloch and Gafter’s Java Puzzlers, you might want to reconsider how you code something, or adding a wee comment.

Eliminate, don’t Comment Out, Dead Code

I remember being handed a 30-page Tcl/Tk script at Bell Labs back in the 1990s. It ran some kind of speech recognition pipeline, because that’s what I was up to at the time. I picked it up and found dozens of methods with the same name, and lots of commented-out code. This makes it really hard to follow what’s going on, especially if whole blocks get commented out with /* ... */.

Please don’t do this. You should lLearn any of thea version control systems instead, like SVN.

When do I Comment Code?

I add comments in code in two situations. The first is when I wrote something inefficiently, but I know a better way to do it. I’ll write something like “use dynamic programming to reduce quadratic to linear”. This is a bad practice, and I wish I could stop myself from doing it. I feel bad writing something inefficient when I know how to do it more efficiently, and I certainly don’t want people reading the code to think I’m clueless.

I know only one compelling reason to leave comments: when you write code that’s not idiomatic in the language it’s written in, or when you do something for efficiency that’s obscure. And even then, keep the comments telegraphic, like “C style strings for efficiency”, “unfolds the E step and M step of EM”, “first step unfolded for boundary checks” or something along those lines.

Update: 13 Dec 2012. I’ve thought about this issue a bit more and wanted to add another reason to comment code: to associate the code with an algorithm. If you’re implementing a relatively complex algorithm, then you’re going to have the algorithm design somewhere and it can be helpful to indicate which parts of the code correspond to which parts of the algorithm. Ideally, though, you’d just write functions with good names to do the stages of the algorithm if they’re clear. But often that’s not really an option because of the shape of the algorithm, mutability of arguments, etc.

Also, I want to be clear that I’m not talking about API comments in code, which I really like. For instance, we do that in LingPipe using Javadoc. I think these comments are really really important, but I think of them somehow more as specifications than as comments.

Write Readable Code Instead

What you should be doing is trying to write code that’s more readable.

I don’t actually mean in Knuth’s literate programming sense; Knuth wants programs to look more like natural language, which is a Very Bad Idea. For one, Knuth has a terrible track record, bringing us TeX, which is a great typesetting language, but impossible to read, and a three-volume set of great algorithms written in some of the most impenetrable, quirky pseudocode you’re ever likely to see.

Instead, I think we need to become literate in the idioms and standard libraries of the programming language we’re using and then write literately in that language.

The biggest shock for me in moving from academia to industry is just how much of other people’s code I have to read. It’s a skill, and I got much better at it with practice. Just like reading English.

Unfortunately, effective programming is as hard, if not harder, than effective essay writing. Writing understandable code that’s also efficient enough and general enough takes lots of revision, and ideally feedback from a good code reviewer.

Not everyone agrees about what’s most readable. Do I call out my member variables from local variables with a prefix? I do, but lots of perfectly reasonable programmers disagree, including the coders for Java itself. Do I use really verbose names for variables in loops? I don’t, but some programmers do. Do I use four-word-long class names, or abbreviations? The former, but it’s another judgment call.

However you code, try to do it consistently. Also, remember that coding standards and macro libraries are not good places to innovate. It takes a while to develop a consistent coding style, but it’s worth the investment.

19 Responses to “The Futility of Commenting Code”

  1. David R. MacIver Says:

    I find that commenting code is useful when the desired end result is qualitative – UI code, things where you’re trying to rank things according to some fuzzy and ill defined notion of “goodness”, etc. There ends up being a lot of trial and error trying to figure out what works, and the design process isn’t always obvious from the end result: Why we do it this way and not the other, what this magic parameter is for, etc. The rules are relatively ad hoc, and thus no matter how clearly they’re expressed the reason behind them can’t be obvious from the code.

  2. Daniel Marbach Says:

    Hy David,
    Unfortunately you are not quite right. Whe you apply the principles of clean code to your own code and refactor a lot in your TDD process your code is automatically easier to read and understand. The true art in coding lies in writing understandable and readable code which is self explanatory.

    When you mention the magic parameter. That’s were the problem lies hidden. If you define meaningful names for your magic parameters and assign it for example with predefined meaningful constants your client of the code must not think a minute why you are passing in these parameters. Because the context and the name defines its purpose.

    Daniel

    • David R. MacIver Says:

      I disagree. It doesn’t matter if you define your parameters: You still need to comment as to why they have that value.

      Suppose I want to ignore all objects that have some attribute smaller than a particular value

      if (foo.stuff < 3)
      ignore(foo);

      I can rewrite this to

      val StuffThreshold = 3;

      if(foo.stuff < StuffThreshold)
      ignore(foo);

      or

      def isBad(foo : Foo) = foo.stuff < StuffThreshold

      if(isBad(foo))
      ignore(foo);

      You could argue that this is more clear. I don't particularly think it is, but I don't care to have that argument. Either way it's certainly not done anything to enlighten us about why the value "3" was chosen, and for good reason: It was derived experimentally.

      Your argument presupposes exactly the reason why it is wrong: Not all values have "meaning" other than "this is the value that works". Comments which explain what happens when you twiddle these values and why this particular one has been chosen. Factoring out this supposedly "unclear" code has done nothing except go to great lengths to elucidate the bit that was obvious (what the code was doing) and nothing to explain the process by which it was arrived at.

  3. Eric Lafrance Says:

    I do not totally agree with this post, though I believe in writting readable code.

    Comments such as “// add 1 to x” is indeed useless (should be punishable IMO). However, writting a small and clear comment for a group of lines of code can be a real time saver for someone who has to go through your code to correct a bug, especially if it’s a language they are not totally familiar with, or with classes they have not yet mastered (e.g. GUI code). The main goal of writting comments should be to give extra details (something that cannot be expressed with clear lines of code, that happens). that also means that you don’t have to write comments in all methods (though the method comment (e.g. javadoc) should be written in most cases).

    Comments should also be something a developper does not overlook when writing/updating code. This is a boring task, but not doing it well will end up costing more money that what you saved when you “forgot” to write and/or update them).

  4. Sandman Says:

    I have to disagree even more strongly with your theory. I have never seen code that wouldn’t benefit from better commenting. Occasionally (rarely) that’s fewer or more accurate comments, often its clearer comments but its almost always just ‘more comments’. If you ignore the red herrings of comments that are wrong or say something utterly useless then you have the *usual* case. (In 15 years of programming I’ve seen the /* increase x by 3 */ type of comment dozens of times and every one was in an article about bad comments. I’m pleased to report that few people are that dumb but we all have one or two anecdotes that come close.)

    So, why comment in the usual case (which is code that has about 2-4 lines per hundred… my random measure for the sake of this discussion)? Its always clearest to me when I think about what exactly we do that has value. We write lines of code. A line of code is kind of like an atom — its not too useful when you break it apart (sometimes dangerous) and nothing (software, anyway) is built from anything else. A class… is a bunch of lines. A module/library is a bunch of classes. An application is a bunch of modules. And enterprise application is usually stitched together applications (stitched by… lines of code). For sure, you will want to document your applications. Its all a hierarchy of lines of code so the question isn’t “why are comments bad when API docs as javadoc comments are good?”. The question is where in the hierarchy is the best place to *stop* commenting/documenting.

    The obvious answer to that question is: “When comments create worse problems than they solve”. Thats an abstract quality that we are all supposed to understand. I don’t understand why we should think that API docs are (even usually) the place where that happens. It almost never is. “…but what about self-documenting code?”. Self-documenting code is like the Axis of Evil. Its a string of weasel words that pre-supposes the correctness of one side of the argument. If code is self-documenting then the comments were written when the code was. Take an example. Say there is one function… 15 or so lines long. You are familiar with the application because you read the docs and got what you could from browsing the API docs. Maybe you even looked at some of the code for a few minutes. You are asked to figure out what one specific function is for (you have had this exact experience before. Its called debugging :-) ). How much time will it take to answer that question (to reasonable certainty… I’m not talking about really digging deep… just searching for why you are ‘getting this crash’). My rule of thumb is this:

    Function has a comment (again, not a moronic one but one written by someone who at least tries to write good comments) : scale factor = 1x
    “Self-documenting” code: scale factor = 10x +
    “Normal” code (which even most proponents of not commenting will admit isn’t exactly self-documenting): scale factor = 100x +

    It would never take more than about 2-3 seconds to read the comment or I could probably read and understand the ‘self-documenting’ function in 30-60 seconds. Do that, say, 8 times and then you’ve tracked down the bug. Do that a few times a day for a few months and you will want the original code’s author hanged for sloth.

    One last argument I frequently hear against comments is that ‘all comments are wrong’ (by this, I assume people mean ‘most’ comments are wrong). This is probably true but its also missing the point. Most code is wrong. No one suggests you stop writing code. The suggestion is: “Do better”. Thats what comments call for. I’ve also found that many ‘pros’ don’t comment their code. I work with many such pros. Its too bad because they are good engineers whose careers stall because their projects slip or require so much of their time in maintenance while my projects live on for years and do not require so much maintenance (and managers notice…). How is that? Easy. My code isn’t better than theirs but anyone can *fix* or augment my code in a short period of time because its … commented. All the while, other’s tools become more and more of a time sink.

    Lastly, the thing I understand least about this debate is that coding is so much easier when you write the comments *first* (CDD?). It goes back to the hierarchy debate but laying out the hierarchy in comments and code ‘framework’ first (function declarations with empty definitions, etc). Are we still fighting the code-planning step of development? If not, I don’t know why this is still such a holy war. I want to know about someone who was burned by a valid comment. The worst I can come up with is that they can be a little ugly (although good comments are compact and cleanly formatted… which usually means unformatted). For every one mentioned, I will post 10 times that a good comment saved my sanity. Feels like a safe bet.

  5. Frank Smith Says:

    Sandman, you are obviously passionate about this topic. Given that my experience is quite different than yours it causes me to wonder why different programmers have such different perspectives on this issue. One possibility is that some developers are better at reading code than others. In other words, some programmers would prefer to read a few sentence natural language description than read a similar number of lines of expressive code (let’s assume expressive code written by good programmers just as you assumed good comments). For myself, I make an effort to write code that both humans and computers can understand. Again assuming the humans can read code. I prefer that approach since it eliminates duplication and the extra effort and risk of errors associated with it. Also, my experience is that most programmers don’t know when to write appropriate comments (in other words, your assumption is not typically true in my experience) and instead they comment many things that don’t need comments and over time the comments diverge from the implementation. Both lead to wastes of time which I imagine are comparable to your own imaginary time multipliers attributed to uncommented code. You want to know someone who’s been burned by a valid comment. I’m certainly not one, but I’ve been burned by useless and bad comments (I know you didn’t ask that question). I seldom pay much attention to comments when debugging code and I’m known to be quite good at debugging software problems. In fact, it’s not uncommon for me to help someone debug their software over the phone with no access to the code at all.

    I also wonder if there might be some aspect of this debate that is related to open source software. During my several decade career as a programmer I often learn how software works by reading source code. Maybe people that do that frequently become better at reading code, see more instances of invalid comments, and therefore value expressive code over (bad) comments. On the other hand, I also value valid, useful comments that are not just duplicating what’s clear in the code. To play the devil’s advocate, I do understand why people that must work with non-expressive code would prefer more heavily commented code. Given that you appear convinced that self-describing, expressive code is very rare or even possible then I understand your position. However, I believe the premise is incorrect based on my experience.

  6. Jim Danby Says:

    @Frank Smith

    Is this a case of “If you don’t like the argument, attack the source”? It seems to me that though Sandman’s comment is a reasoned argument, your response is simple “I’m better than you”. If your argument is that some developers can’t read code as well as a genius such as yourself, you’d better comment your code so that they will be able to maintain it.

  7. Frank Smith Says:

    I’m surprised you think I’m attacking the source. I attacked some of the “statistics” drawn from thin air, but I assume Sandman is a good, solid programmer. I did explore some of the possible reasons why the perspective he described exists when there are many who believe quite differently. The ability to read code is one reason and, yes, I believe code reading ability is one skill, among many, that is important for a good programmer. Fortunately, I work with developers who tend to have that skill (it’s certiainly not my personal genius) and good code writing skills so commenting is not as necessary as some people might believe. As you said, if I worked with programmers who generally did not have skill I’d tend to comment more (and look for other employment).

  8. Development Error Says:

    I had to rant about this one so much I started a blog and posted my reply there.

    http://developmenterror.blogspot.com/2009/10/theres-no-futility-in-commenting-code.html

  9. lingpipe Says:

    I added some examples in this followup, Examples of “Futility of Commenting Code”.

  10. Daniel Marbach Says:

    Interesting I just visited the blog 30 minutes ago and it was there…

    Here is the comment he made on the blog:

    I fired up the RSS reader this morning and spotted a real gem, “The Futility of Commenting Code”. By “gem”, I mean a piece of dirt that has been buried for a damn long time. OK, it hasn’t been buried yet but it should be.

    Every month or two, if you subscribe to DZone feeds, you will see a blog post or article that explains why you shouldn’t comment your code. This post was one of them but they all need to be refuted. My irritation at this recurring theme has led me to be a refuter via this blog. I could have commented on the OP but I reckon this rant may last a while.

    Let’s take it from the top…

    Professional coders don’t comment their own code much and never trust the comments of others they find in code. Instead, we try to learn to read code and write more readable code.

    I never get the read-the-code argument. Code that others wrote, and probably code you wrote yourself twelve months ago, can be hard to read. Other people have different styles to you. I like short, well-named methods. It means that what some developers might code in a twenty line method I may put into four five liners or five four-liners. Now, when Mr Maintenance-Programmer looks at my code I hope he finds it easy to read and specifically, to understand the intent. However, he didn’t spend months working with ABC plc and doesn’t understand the inner workings of project accounting so maybe he doesn’t get it. He could spend five mintes reading the code (or perhaps a lot longer in a large codebase), following the possible flow of control of multiple methods and struggling with the rule of seven. He could also read a one-line succinct comment and realise that that this isn’t the code he is looking for.

    The reason to be very suspicious of code comments is that they can lie. The code is what’s executed, so it can’t lie.

    Wanna bet? I’ve seen plenty of code that isn’t maintained properly that lies. Sometimes when the pressure is on and potential lawsuits are building, management insist on quick fixes. That often means that that wonderfully-named method now does something that really belongs elsewhere. Of course, when you are doing quick-and-dirty fixes you could edit a hundred methods. When you come back and refactor ninety-eight of them you are left with method names that lie. Yes comments can lie too, but only if you treat them as second-class citizens. Maintain your code, maintain your comments. Problem solved.

    Another common reason is that the code author didn’t actually understand what the code was doing, so wrote comments that were wrong.

    Maybe if you had commented that code he wouldn’t have had to read and misunderstand it.

    The worst offenses in the useless category are things that simply repeat what the code says.

    Ah yes, good old // Set i to 3. Personally I have never seen such an inane comment in production code. If I did, I would remove it and talk with the developer who put it there in the first place (unless he has resigned and gone off to pursue his gardening passion). This is a straw man argument, not a sensible one. Comments are suppose to tell you the intent of the programmer – something often not present and maybe impossible to express in the code. Or maybe to tell you why that apparently inefficient code is good because it avoids a framework bug (and which version of the framework because maybe it’s been fixed now).

    Eliminate, don’t Comment Out, Dead Code

    Good point, though sometimes leaving the dead (or maybe still twitching) code in there can be useful. At least until you have the new code living breathing and tested to within an inch of its life. Then stamp on the old stuff and hit Del before your next check-in.

    This all feels a little bit like the wrong type of laziness. Some laziness is good. It’s the reason we have cars, planes, ships and bad air quality. OK, perhaps the air quality argument is a tricky one to win. Bad laziness means that software isn’t documented, there isn’t a help file, the code isn’t well-structured and is hard to read and uncommented. You just need to treat comments as a part of the source code and ensure you update them. If, like me, you have short methods, it’s not even as if the comments are off the screen so get missed.

    So why did I write all this?

    Some programmers are very vocal in their expression that comments are bad and shouldn’t exist at all. They probably really mean that most comments are a waste but some are good. Even the author of the OP mentions that he does comment occasionally. Bad comments are as bad as bad code. Good comments improve the code and increase efficiency. This topic is not, at some would reason, black and white.

    • lingpipe Says:

      @Daniel Marbach: Thanks for the repost.

      I didnt’ realize this was a recurring theme on developer blogs, but I wouldn’t be surprised.

      Just to be clear, I didn’t mean to imply that method names can’t lie. I meant the code itself can’t lie. You have to be just as suspicious of method names as comments.

      Also, I’m all in favor of documenting intent. I’m pretty sure we’d all agree that the main intent of a package, class or method should be documented in the API.

      So what about really tricky bits of code that aren’t clear (and either can’t be refactored due to efficiency/modularity, or aren’t worth refactoring)? By all means, comment away.

      I didn’t mean to imply that code shouldn’t be commented at all! Maybe it was the marketing department’s provocative title?

      I actually think the code that Sun distributes with the JDK for most of the public classes in Java is a good example. You’ll find very few comments in there. Now having said that, I’m actually seeing many more in java.util.ArrayList than I remember seeing anywhere else. Ditto for java.lang.String. Many more than I’d likely use, but nothing of the silly or completely useless variety. On the other hand, java.lang.Math has all sorts of comments on delegation that fall into what I’d consider the totally useless category:

      public static double pow(double a, double b) {
          // default impl. delegates to StrictMath
          return StrictMath.pow(a, b); 
      }
      
  11. Helltime for October 19 « I Built His Cage Says:

    […] something called The LingPipe Blog comes the futility of writing comments. It’s an old, oft-repeated-but-hardly-followed rule: comments are a code smell. But like all […]

  12. Michael Says:

    I routinely write multi-paragraph comment blocks (>100 lines) to explain the math behind some intricate C code (talking about scientific/numerical code). These would typically appear at the head of a function implementing the idea. Then the within-code comments can be kept to a manageable level.

    I’ve never had second thoughts about this. As I’ve spent more time programming, these comment blocks have become more typical of code I write.

    For the easy stuff, I agree, better to restrict to API comments, gotchas, loop invariants — key bits.

  13. Daily Digest for October 28th Says:

    […] Shared The Futility of Commenting Code « LingPipe Blog. […]

  14. Gene Golovchinsky Says:

    Real programmers don’t comment their code. If it was hard to write, it should be hard to understand :-)

  15. Eric Says:

    Hog Wash!

    We don’t live in a perfect world where everything is immediately clear; we don’t live in a perfect world where every programmer on a team actually cares or has the ability to write crystal clear code; we don’t live in a perfect world where every bit of code we write will be so clear that it documents itself. We don’t have perfect memories that never forget what we were doing the day before or why we chose a certain method of doing something over another. Comments help with this!

    The that gets me is that the people saying ‘no comments’ haven’t done any real research into the topic. For example, what state is your mind in when reading code (i.e. variable names and unit test)? What state is your mind in when reading natural language? Which do you comprehend more easily? I know that when I read code my mind isn’t in the same state as when I read natural language prose. Also, research (REAL LIVE RESEARCH — novel isn’t it) indicates that programmers are pretty good at updating comments when code changes.

    http://portal.acm.org/citation.cfm?id=1339530

    Could we stop propagating this lie please?

    • Bob Carpenter Says:

      If you go back and read what I wrote, you’ll see that I don’t say “no comments”. I specifically think API comments are critical for both inter-developer communication and the eventual clients. I also list a couple reasons why I think comments in code can be useful.

      Judging from the abstract of the paper you link (the content’s paywalled), they seem to include API and method comments as well as the in-code comments I was specifically trying to address.

      I’m speaking from experience, not hearsay or myth propagation, when I say that comments get stale in almost every piece of code I see. For instance, another message that arrived in my e-mail today is about the API drift and stale comments in the API doc for the C++ Matrix library Eigen. Last week, I sent the Eigen developers comments about stale abstract base class doc in an obscure corner of their API. I’m not picking on Eigen, which is a great lib and very well documented and testedl. The point is that I had to dig into the code to see how to extend a virtual class properly to act as a template arg.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s