Archive for February 2010
We may never know all the details of the recent Toyota failures, because, unlike the Challenger and NASA, Toyota is a commercial entity rather than taxpayer-funded government. And it’s a bit early to be looking for definitive analysis. What we can do, as embedded software developers, is realize what we’re up against, and how it didn’t start yesterday.
When microprocessors first came out the buzz on the news was that soon they were going to be inside everything. I remember wondering what do we need them in everything for? My washer and my microwave oven, for example, worked just fine with a mechanical dial. But I was wrong and now they are in everything.
Much earlier on, complex systems were based on the Six Simple Machines. I can’t say which is simplest, but maybe it’s the lever. What’s important here is that the behavior of the lever is governed by physical properties of solids, which by now are pretty well understood. Push down on one end, and the other end goes up. Exceptions exist: solids can stretch under tension, deform under pressure, and if you push too hard, they can break. Apparently Toyota had problems with this too, in their gas pedal problem and fix.
Technical advances brought hydraulics — based on the properties of fluids and gases. I also remember replacing a brake line on an old car once, and getting to see how a leak in the line leads to spongy braking, and, if I hadn’t been refilling it weekly before the replacement, to brake failure.
In both of those technologies, the basic idea is that the system user (e.g. the driver) pushes on the control, force is transmitted immediately through a physical medium, and the desired effect occurs. Failure modes are well understood and (usually) are prevented.
Now replace the lever with software. No more physical properties: the behavior of the software transmission line is governed by the behavior of people — software developers writing lists of instructions which are translated several times before reaching the hardware. Now you push on the control, and a cacophony of instructions (okay, I’m being unfair — if the software is well-written, it’s a concert of instructions) rush down a wire to an actuating motor at the other end.
Let’s be clear about this: real-time software is a myth. The metal of the lever, or even the fluid of the hydraulic line, has been removed and replaced by your entire software development organization. Materials science out the door, organizational sociology in its place. If you want to know where that leaves us, read this short post about drive-by-wire, and the many comments there.
What’s the answer? In the code itself — perhaps it’s avoiding interrupts and multi-threading, and moving to synchronous programming where everything proceeds in lock-step. At least then a modern car would run like clockwork. Other than that, notice that many of the suggestions in the discussion talk about mechanical back-up systems, which are basically putting the lever back into the system.
But software development is not, as commonly believed, about technology. It is about writing, reading, and communication in ever-larger groups. Those who have said, “The pen is mightier than the sword” were thinking of politics. Now writing, in the form of software, interposes itself between every hand and hammer. We have broken the lever; do we know how to wield its replacement?
Sometimes I get the feeling that when people consider and talk about code review, they’re talking about different things but they don’t know it. I see code review everywhere during coding.
Try this: put aside the image of code review as a meeting, or even code review as an independent activity. Limit your description of code review to: someone else is reading your code, you’re discussing it together, and there’s effective writing as a result. If you see those three things happening, you’re seeing code review.
The great hope was that code review would be solely a preventive activity: do code review before execution, and there won’t be anything to debug. That’s not going to happen. Too much code is delivered under pressure, without code review. But all is not lost. Just review the code when debugging. Look again at the three steps — how does code review, as defined above, change debugging? First, don’t debug alone. Whatever you may think of pair programming for or against, pair debugging is a must. Especially if you wrote the code, having someone else look for the problem, or better, restate the problem, is essential. Second — discussion. Debugging that rushes to a fix often produces the wrong fix. And finally, effective writing. Sure there’s the writing which is fixing the code. But what about adding comments? And perhaps more important and effective, especially if it’s a quick fix: submit a new defect or change request to your tracking system for what really needs changing to reduce the chance of this failure recurring.
Tools! If I have tools that can find problems, I won’t have to review code. Yes, you will. And you should want to. Static Analysis tools are great for finding simple errors, and potential defects. For narrowing the field of focus. But then, to find out what’s really wrong … code review. Additional benefit of the tool — since it has no feelings, it can be two of you against the reviewer: read the code with a partner, discuss why the tool thinks there’s a problem and whether you agree, and again, effective writing. Not just the fix, but why that fix, and also perhaps an adjustment to the tool so it will do better next time. High-end static analysis systems let you document that right inside the tool.
Testing, and its partner, code coverage, are often seen as the opposite of code review. Whatever I couldn’t catch by code review (and static analysis) is left for exercising the code under controlled conditions, causing failures, and fixing them. Actually code review is the most important part of developer testing. Not only reviewing the code under test in order to design good white-box tests. But also reviewing the code based on the test results and coverage results. Why does the software intermittently fail this test? Why is it so difficult to cover all the branches in this function? Again, read together, discuss, write. Here the effective writing is likely to be refactoring the code. If someone wants code review “documentation” — give them a diff and a one-sentence explanation of what you did and why.
Code Review is All-Purpose
There are many more activities in the developer’s daily life. Detailed design, branching and merging, delivery — all can benefit from in-line code review with this three-step model of reading, discussion, and effective results-writing. And by acknowledging this model, we see that code review, or almost code review, is happening all the time. Just have to complete the steps to get code review that real people will do.