Archive for February 2008
Read Before Running
Debugging: How to Start (commonly accepted method)
Search for “debugging” on the internet and most entries will tell you the first step is to reproduce the problem (properly called “the software failure”) and the second step is to create the simplest configuration that also reproduces the problem.
I agree that it is essential to reproduce the problem, because you need to know you’re working on the right problem.
Fun to Run, or “the thrill of the chase”
But programmers, myself included, like to run things—to let the computer do the work—so there’s a great temptation to reproduce the problem, change something, run the software, add some “debugging statements”, and run it again. And there goes an hour of valuable time.
Does readability fit in here somewhere?
If the code is written clearly, and by that I mean that it’s readable, with meaningful variable names and the proper levels of abstraction, then the second step might be to actually read through the code.
A short debugging story
Today, I showed myself both the wrong way, and the right way, to respond to a “bug report”.
Where I work, I’ve turned out to be the owner of a small set of text-parsing scripts, written in awk. We use them to pull out compiler warning messages from the long build logs, so that developers can see them and address them. These parsing scripts are short—half a page of code including comments—and are tested only with the few build logs that I’ve had time to try. Given limited time (it is not my main job to write parsing scripts!), I have to hope that all the build logs for a given compiler are pretty similar, so whatever pattern I’ve identified will be followed in other build logs. It isn’t always so.
A developer noticed (first by eye, confirmed with a simple “grep”) that there were more compiler warnings in his build log than in the parsed csv (comma-separated-value) output file. Not good. I set about debugging!
It took about 5 minutes to reproduce the problem. Really just gathering the sample input and bad output files, and setting up a copy of the relevant scripts. One run (takes about 2 seconds) showed that, indeed, some warnings were missing.
My next step, by the standard debugging technique, was to start printing all the partial results, run the script over and over again, and look to see where it was failing. Then some more time to adjust the failing script and retest.
Haven’t I seen this somewhere before?
After the first 5 minutes (reproducing the problem), I already had a pretty good idea of which script was failing. A sed one-liner that adds newlines after each compilation command line, and before the first warning message, so that the awk script would be able to separate the first warning message as a record and select it. What was odd was that when I opened that script (just one executable line—the rest is comments including change history), the last modification comment was about how I had fixed exactly the same problem before. Briefly I wondered about that, but rushed onward to more than half an hour of running parts of the script pipeline to prove to myself exactly what was failing and why.
I could have saved myself the time.
Debugging: How to Start (the code-reading method)
After reproducing the initial problem (again, just 5 minutes), and arriving at the suspect script with the worrisome comment, I should have stopped running the code, and started reading it. After all, if my last modification was to fix this same problem, then apparently that modification was either wrong, or insufficient.
It turned out, and this was visible by comparing the one-line script and the new build log on which it failed, that it was insufficient. The new build log had some extra spaces at the end of the compilation command line, and the sed one-liner, designed to identify such lines, was unprepared for the extra spaces.
It did take me another 15 minutes to verify that most new build logs had a random number of extra spaces, and thus to pick the right two-character adjustment to the regular expression. Regression testing took another half hour.
But I could have saved myself almost an hour in the middle of “debugging-by-running” if I had applied a few minutes of “debugging by reading the code”.
Even better would have been reading it out loud, or reading and explaining it to someone else.
Why We Write Software
“Isn’t it nice to know that, when all else fails us, we have an innate decision-making tool to fall back on?”
Robert L. Glass, “Intuition’s Role in Decision Making” (IEEE Software, January/February 2008)
Yes, Glass admits, estimates that come unbidden from a manager’s subconscious seem the very opposite of quantitative or rational. But most decision-making methods have a common theme: using historical data to decide what to do next. Quantitative estimation puts everything out in the open. Rational support for a desired deadline is at least based on the facts. Intuition, at the other extreme, pulls that data, informally known as “experience”, from the subconscious. And since a good manager has experience, intuition works.
Sort of.
What’s hidden by Glass’ essay are the unstated quality standards that justify each method. Only a complete, quantitative estimate, matched by continuous measurement and adjustment, can promise high quality by the target date. Intuition, on the other hand, even coming from an experienced manager, makes quality likely, but hardly guaranteed. His successes are what keep him in his job, and his few failures are forgiven. Rightly so. He is employed to deliver good enough software, quickly, to meet modern society’s ever-growing appetite for computerized life.
So should we favor intuitive, seat-of-the-pants estimating, with its benefit of early delivery, but its cost of uncertain quality?
Digging deeper, we find that there are some very different reasons why we write software, which strongly influence how we plan our work.
- Profit
- Functionality
- Beauty
Anyone who gets paid for writing commercial software must acknowledge that profit pays his or her salary. Profit leads eventually to pretty good quality, via competition. In the short run, though, we experience a lot of pressure, a lot of errors, and many customer-accepted (if sometimes societally unacceptable) failures. In that context, it seems, we should estimate quickly and intuitively, but admit that quality is expendable.
In a well-funded, non-profit organization (e.g. NASA, famed for error-free software — wholly separate from failure-free flights), software can be about complete functionality. Take as much time as needed to implement everything, perfectly. After all, when the spaceship flies past Jupiter, there’s no second chance.
Where, then, is guaranteed perfection, estimated correctly and delivered quickly? The only place we see that in life is artistic performance. A pianist plays, from memory, a piece that she only decided a few months ago to perform. An hour long, no mistakes. The event starts, and finishes, on time. The customer — the audience — rises to its feet in noisy appreciation. The enabler of perfection is seeking beauty, or doing a thing for its own sake.
Many software developers write software because it’s beautiful, fun, or spiritually rewarding. And that reason engenders the highest-quality work. A quality that delivers on time, with no costly rework. Functionality. Profit too.
Somehow, in the commercial context, that reason why we write software must be harnessed and encouraged by software managers. Otherwise, dismissing a developer’s data with an intuitive estimate is not a fall-back, but falling backwards.