Talk About Quality

Tom Harris

Archive for the ‘Prevention’ Category

Code review is code use

with 3 comments

Wordle: Code Quality Jesse Gibbs at Atlassian sent me to the following post from Scott Bilas at the game software company Loose Cannon Studios. In Peer Code Reviews: Good Commenting Practices, Bilas says code reviewers should seek architectural issues, and adherence to good software development practices and coding standards. And that they should look for mentoring opportunities. At the same time, he lets them off from other responsibilities, saying “Reviewers aren’t expected to catch everything,” and, “Reviewers aren’t expected to catch deep or systemic design problems.”

It’s a pretty good tutorial on the current best practice of code review. So why does it feel like something big is missing?

Upon reflection, it turns out that, unlike, say, editing and commenting on a book, code review is not really a reviewer-author relationship at all. An editor may make a lot of changes, but those changes end with publication. Code, on the other hand, will be fixed, extended, and refactored by future developers. Each future, or “next” developer will need exactly two things from the code in order to do his or her job: readability and understandability. Code is readable if it has the right balance of abstraction vs. detail at each level of the code. Code is understandable if the reviewer can, without help from the author, see what the code does, and why. If code is clear, that is, readable and understandable, modifications will be easy and error-free.

So the question “what are the responsibilities of a code reviewer” turns out to be a trick question. Like this one, from driving class:

Question: In the following traffic diagram (imagine any diagram you want), who has the right-of-way?

Answer: Nobody has the right-of-way. Right-of-way can never be possessed, only given.

The code reviewer does not have any review responsibility. The code author has all the responsibility to write clear code, and provide it to the code reviewer. We should stop calling that person a “code reviewer”; instead, say “code user”.

Code review becomes a simple and easily explained two-step process. First the “code user” reads the code and notes down what s/he doesn’t understand how to use, and second, s/he meets with the code author to present what was not clear. In that manner, the “code user” successfully stands in for the “next developer”, and gives the code author an early chance to make things better. As a bonus, all the things a traditional code reviewer was supposed to check, or at least the important things, will be checked. Mentoring will happen. All this when we realize that code review is code use.

Written by Tom Harris

July 28, 2009 at 1:06 am

The Only Valid Measure of Code Quality

without comments

It’s Thom Holwerda — keeping things simple for us:

Simple Code Quality Metric

Simple Code Quality Metric

Here’s his site — with a search for “code quality” (warning: if you can’t ignore “Evony” web game ad graphics, stay away).

Written by Tom Harris

July 16, 2009 at 4:13 am

Errors are always cumulative

without comments

Errors are always cumulative
Nobody likes to write error-handling code, but at least it’s easy (if boring): check inputs and results with “if” statements, and reject or recover on failure. But is it really that simple?
A little thought shows that errors are cumulative, and that failure is always the gathering or intensification of some faulty condition. Let’s prove that by contradiction. One of the simplest and most common cases of error handling is input data checking in a user interface. For example, password checking for your bank account login. The simple error-handling code is:
if username.password <> password then reject login
That should work fine, right? Nothing cumulative there. Every time I submit a wrong password, it tells me “wrong password” and prompts for a retry. But software developers (and many bank website users) will recognize the problems with that solution, among them:
1. Unbounded retry loop — if I keep getting it wrong, I can’t escape login
2. Denial of service — bring down the server by overwhelming it with bad logins
3. Eliminating wrong passwords — if it tells me “wrong”, I cross that try off my list
All of these real outcomes have something cumulative in them:
1. Time — user may run out of patience
2. Load — too much for server to handle
3. Learning — revealing more and more information about correct password
Take another common example: the elevator. Would simple limit-checking work for stopping at the right floor? Let’s try.
if floor.location <> floor.desired then keep descending
Bang! I wouldn’t want to be on that elevator. What’s cumulative? The momentum of the elevator, and the decreasing distance between current and desired location. (Even though nothing is wrong, that distance is commonly called “error”.) But a better example is the elevator’s door-closing protection. It started out with a simple:
if not (door.can_close) then door.re-open
Wasn’t it fun back then to keep waving your hand in the door and watch it open again, and keep everyone on the other floors waiting? Quickly, though, elevator software designers realized that even if it’s totally unacceptable to ignore a deliberate foot in the door and start moving, it’s equally wrong to go on closing and re-opening forever. So they added that unpleasant buzzing sound, triggered when the retries reach a certain number or amount of time. Cumulative again.
Real-life error-handling, then, has to do more than test for the limit and reject it. It has to recognize faults, count or measure them, and prevent them from growing and leading to failure. In fact, since a fault may be just the limit of an otherwise acceptable condition (e.g. buffer almost full — OK; buffer overflow — fault), error prevention requires identifying and tracking resources even before they reach their limits.

Nobody likes to write error-handling code, but at least it’s easy: check inputs and results with “if” statements, and reject or recover on failure. But is it really that simple?

A little thought shows that errors are cumulative, and that failure is always the gathering or intensification of some faulty condition. Let’s prove that by contradiction. One of the simplest and most common cases of error handling is input data checking in a user interface. For example, password checking for your bank account login. The simple error-handling code is:

if username.password <> password login.reject(“wrong password”)

That should work fine, right? Every time I submit a wrong password, it rejects the attempt and prompts for a retry. But software developers (and many bank website users) will recognize the problems with that solution, among them:

  1. Unbounded retry loop — if I keep getting it wrong, I can’t escape login
  2. Denial of service — bring down the server by overwhelming it with bad logins
  3. Eliminating wrong passwords — if it tells me “wrong”, I cross that try off my list

All of these real outcomes have something cumulative in them:

  1. Time — user may run out of patience
  2. Load — too much for server to handle
  3. Learning — revealing more and more information about correct password

Take another common example: the elevator. Would simple limit-checking work for stopping at the right floor? Let’s try.

if floor.location <> floor.desired elevator.descend

Bang! I wouldn’t want to be on that elevator. What’s cumulative? The momentum of the elevator, and the decreasing distance between current and desired location. (Even when nothing is wrong, that distance is commonly called “error”.) But a better example is the elevator’s door-closing protection. It started out with a simple:

if not (door.can_close) door.re-open

Wasn’t it fun back then to keep waving your hand in the door and watch it open again, and keep everyone on the other floors waiting? Quickly, though, elevator software designers realized that even if it’s totally unacceptable to ignore a deliberate foot in the door and start moving, it’s equally wrong to go on closing and re-opening forever. So they added that unpleasant buzzing sound, triggered when the retries reach a certain number or amount of time. Cumulative again.

Real-life error-handling, then, has to do more than test for the limit and reject it. It has to recognize faults, count or measure them, and prevent them from growing and leading to failure. In fact, since a fault may be just the limit of an otherwise acceptable condition (e.g. buffer almost full — OK; buffer overflow — fault), error prevention requires identifying and tracking resources even before they reach their limits.

Written by Tom Harris

July 6, 2009 at 9:26 am

The Tip of the Iceberg

without comments

We all like to think that functional requirements are the main thing, and successfully designing and coding to them is enough. Who wants to worry about all the suprises from users, data, and even hardware?

But as Professor Behrooz Parhami shows, in a short (2-page!) article, Defect, Fault, Error,…, or Failure? (pdf), the “Ideal” state that we focus on is just one of 7 common possibilities. The other 6, descending into unpleasantness, are Defective, Faulty, Erroneous, Malfunctioning, Degraded, and Failed.

Our job is really twofold:

  1. Meet the functional requirements of the ideal state
  2. Keep the system in that ideal state, and avoid failure

Does failure avoidance have to take 86% (6/7) of the code? I don’t know. But it certainly sounds like the bottom half of an iceberg–a lot more than half is underwater.

Don’t get stuck

without comments

Having a standalone consumer application get stuck or crash, requiring reboot, is not the worst thing that can happen. (Worse is incorrect behavior that causes data loss or physical harm.) But requiring a reboot is the most annoying failure in non-safety-critical systems.

If there’s any good news, it’s that the list of fault modes is short:

  • System resources exhausted
  • Mistakenly idling
  • Waiting for acknowledgement that never comes
  • Deadlock

Did I miss any?

Only exception-safe code can avoid these undesired end states.

Design by Contract (DbC) is one way to exception safety.

Failure mode and effects analysis (FMEA) helps you plan a path to get there.