Talk About Quality

Tom Harris

Errors are always cumulative

leave a comment »

Errors are always cumulative
Nobody likes to write error-handling code, but at least it’s easy (if boring): check inputs and results with “if” statements, and reject or recover on failure. But is it really that simple?
A little thought shows that errors are cumulative, and that failure is always the gathering or intensification of some faulty condition. Let’s prove that by contradiction. One of the simplest and most common cases of error handling is input data checking in a user interface. For example, password checking for your bank account login. The simple error-handling code is:
if username.password <> password then reject login
That should work fine, right? Nothing cumulative there. Every time I submit a wrong password, it tells me “wrong password” and prompts for a retry. But software developers (and many bank website users) will recognize the problems with that solution, among them:
1. Unbounded retry loop — if I keep getting it wrong, I can’t escape login
2. Denial of service — bring down the server by overwhelming it with bad logins
3. Eliminating wrong passwords — if it tells me “wrong”, I cross that try off my list
All of these real outcomes have something cumulative in them:
1. Time — user may run out of patience
2. Load — too much for server to handle
3. Learning — revealing more and more information about correct password
Take another common example: the elevator. Would simple limit-checking work for stopping at the right floor? Let’s try.
if floor.location <> floor.desired then keep descending
Bang! I wouldn’t want to be on that elevator. What’s cumulative? The momentum of the elevator, and the decreasing distance between current and desired location. (Even though nothing is wrong, that distance is commonly called “error”.) But a better example is the elevator’s door-closing protection. It started out with a simple:
if not (door.can_close) then door.re-open
Wasn’t it fun back then to keep waving your hand in the door and watch it open again, and keep everyone on the other floors waiting? Quickly, though, elevator software designers realized that even if it’s totally unacceptable to ignore a deliberate foot in the door and start moving, it’s equally wrong to go on closing and re-opening forever. So they added that unpleasant buzzing sound, triggered when the retries reach a certain number or amount of time. Cumulative again.
Real-life error-handling, then, has to do more than test for the limit and reject it. It has to recognize faults, count or measure them, and prevent them from growing and leading to failure. In fact, since a fault may be just the limit of an otherwise acceptable condition (e.g. buffer almost full — OK; buffer overflow — fault), error prevention requires identifying and tracking resources even before they reach their limits.

Nobody likes to write error-handling code, but at least it’s easy: check inputs and results with “if” statements, and reject or recover on failure. But is it really that simple?

A little thought shows that errors are cumulative, and that failure is always the gathering or intensification of some faulty condition. Let’s prove that by contradiction. One of the simplest and most common cases of error handling is input data checking in a user interface. For example, password checking for your bank account login. The simple error-handling code is:

if username.password <> password login.reject(“wrong password”)

That should work fine, right? Every time I submit a wrong password, it rejects the attempt and prompts for a retry. But software developers (and many bank website users) will recognize the problems with that solution, among them:

  1. Unbounded retry loop — if I keep getting it wrong, I can’t escape login
  2. Denial of service — bring down the server by overwhelming it with bad logins
  3. Eliminating wrong passwords — if it tells me “wrong”, I cross that try off my list

All of these real outcomes have something cumulative in them:

  1. Time — user may run out of patience
  2. Load — too much for server to handle
  3. Learning — revealing more and more information about correct password

Take another common example: the elevator. Would simple limit-checking work for stopping at the right floor? Let’s try.

if floor.location <> floor.desired elevator.descend

Bang! I wouldn’t want to be on that elevator. What’s cumulative? The momentum of the elevator, and the decreasing distance between current and desired location. (Even when nothing is wrong, that distance is commonly called “error”.) But a better example is the elevator’s door-closing protection. It started out with a simple:

if not (door.can_close) door.re-open

Wasn’t it fun back then to keep waving your hand in the door and watch it open again, and keep everyone on the other floors waiting? Quickly, though, elevator software designers realized that even if it’s totally unacceptable to ignore a deliberate foot in the door and start moving, it’s equally wrong to go on closing and re-opening forever. So they added that unpleasant buzzing sound, triggered when the retries reach a certain number or amount of time. Cumulative again.

Real-life error-handling, then, has to do more than test for the limit and reject it. It has to recognize faults, count or measure them, and prevent them from growing and leading to failure. In fact, since a fault may be just the limit of an otherwise acceptable condition (e.g. buffer almost full — OK; buffer overflow — fault), error prevention requires identifying and tracking resources even before they reach their limits.

Written by Tom Harris

July 6, 2009 at 9:26 am

Posted in Exception handling

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s