Cancel that!

July 11, 2009

A medical radiation machine operator types the letter ‘x’, realizes it’s an error, backspaces, types ‘e’, and continues. Consequences of the error and related defects lead to the patient’s untimely death.

Ten years later, a Japanese stock broker mistakenly switches the share price and amount on the sell order of a new stock. He tries to cancel the order, but fails. His employers lose $225 million, and are involved in lawsuits for years.

A blogger clicks the “back arrow” by mistake, gets a warning message, concludes he meant to continue writing, and clicks “OK” to continue. He loses his work, and has to rewrite it.  Here, the consequences are annoying, if trivial by comparison.

What do all these cases have in common? Canceling a request. If it’s not hard enough to program computers to do what we want them to do, who would have thought that telling them not to do it would be hard too?

The cancellation scenario — or “use case” as it’s called in software design — is the silent partner of every positive request supported by a piece of software. It has to cover giving the user clear options, executing the cancellation, and rolling back any partial results. Things get more complicated if authorization is required, or if the transaction has already gone through (both of those requirements figured into the Japanese broker error story). Cancellation and rollback are also part of automatic requests that may occur if one software module (the “server”) cannot complete a request by another (the “client”) and has to make sure to put everything back the way it was, and send the proper response code.

So the next time you’re designing a piece of software, no matter how simple, think what it’s supposed to do, but also what it will do if the user or client calls out, “Cancel that!”


Software’s Unwelcome Advantage

July 10, 2009
Software’s Unwelcome Advantage
You can do anything in software. That’s the mantra, and it’s true. It’s why hordes of
eager computer science graduates, not to mention brilliant open source coders, keep
joining the ranks of software development. It’s fast, it’s fun, and you can make a
machine that does anything.
The fact that software is instructions to a machine (the “hard” in “hardware”) seems
to have been the only thing John Tukey had in mind when he coined the word “software”
back in 1958. (Dr. Tukey, an accomplished statistician, was more focused on
computerizing the Fast-Fourier Transform, and criticizing the Kinsey Report for its
questionable sampling methods.)
While software attracts developers with its ease in creating things, it tempts all of
us with its other “softness”: amenability to change. Software can be endless fixed,
extended, improved. And that advantage demands something of developers which was
unexpected, and, well … hard.
Hardware must fit form to function so it’s easy to use. And come with good
documentation for maintenance and repair. But that’s all on the outside. Software,
always ready for change, has to be clear and readable on the inside too. In other
words, software developers have to be good writers, because the next developer will
have to read and quickly understand what’s going on in order to change it. And those
written changes have to leave the software again in a clear and readable state.
Good writers? In high school, I avoided English class like the plague (and got bad
marks for using cliches in my papers), preferring to go into school on snow days to
use the (one) computer. Good writing is not why people become programmers. But it’s
exactly what we need. Clear written communication. Now equal in impact to life-
changing books (pen mightier than sword and all that), software crucially affects our
lives — from cars, to food transport, to the electric grid.
That good writing is unwelcome requirement of sofware is why developers code quickly
and obscurely, hate documentation, and shun code review. And  why managers push for
features, delivery, and fixes, while devaluing internal quality.
Is there hope? The only one I can think of must exploit these other likes and
dislikes: managers want software changes fast, while developers like writing new code
more than fixing someone else’s (or their own) bugs. Good writing is the only way to
make code support that scenario, and reap the real advantages of software.

You can do anything in software. Hordes of eager computer science graduates, not to mention brilliant open source coders, keep joining the ranks of software development because it’s fast and fun.

The fact that software is instructions to a machine (the “hard” in “hardware”) seems to have been the only thing John Tukey had in mind when he coined the word “software” back in 1958. (Dr. Tukey, an accomplished statistician, was more focused on computerizing the Fast-Fourier Transform, and criticizing the Kinsey Report for its questionable sampling methods.)

While software attracts developers with its ease in creating things, it tempts us all with its other “softness”: amenability to change. Software can be endlessly fixed, extended, improved. And that advantage demands something of developers which was unexpected, and, well … hard.

Hardware must fit form to function so it’s easy to use. And come with good documentation for maintenance and repair. But that’s all on the outside. Software, always ready for change, has to be clear and readable on the inside too. In other words, software developers have to be good writers, because the next developer will have to read and quickly understand what’s going on in order to change it.  Written changes, again, have to leave the software in a clear and readable state.

In high school, I avoided English class like the plague (and got bad marks for using cliches in my papers), preferring to go into school on snow days to use the (one) computer. Good writing is not why people become programmers. But it’s exactly what we need. Clear written communication. Equal in impact to life-changing books (pen mightier than sword and all that), software crucially affects our lives — from cars, to food transport, to the electric grid.

That good writing is an unwelcome requirement of sofware is why developers code quickly and obscurely, hate documentation, and shun code review. And  why managers push for features, delivery, and fixes, while devaluing internal quality.

Is there hope? The only one I can think of must exploit other likes and dislikes: managers want software changes fast, while developers like writing new code more than fixing someone else’s (or their own) bugs. Good writing is the only way to make code support that scenario, and reap the advantages of software.


Errors are always cumulative

July 6, 2009
Errors are always cumulative
Nobody likes to write error-handling code, but at least it’s easy (if boring): check inputs and results with “if” statements, and reject or recover on failure. But is it really that simple?
A little thought shows that errors are cumulative, and that failure is always the gathering or intensification of some faulty condition. Let’s prove that by contradiction. One of the simplest and most common cases of error handling is input data checking in a user interface. For example, password checking for your bank account login. The simple error-handling code is:
if username.password <> password then reject login
That should work fine, right? Nothing cumulative there. Every time I submit a wrong password, it tells me “wrong password” and prompts for a retry. But software developers (and many bank website users) will recognize the problems with that solution, among them:
1. Unbounded retry loop — if I keep getting it wrong, I can’t escape login
2. Denial of service — bring down the server by overwhelming it with bad logins
3. Eliminating wrong passwords — if it tells me “wrong”, I cross that try off my list
All of these real outcomes have something cumulative in them:
1. Time — user may run out of patience
2. Load — too much for server to handle
3. Learning — revealing more and more information about correct password
Take another common example: the elevator. Would simple limit-checking work for stopping at the right floor? Let’s try.
if floor.location <> floor.desired then keep descending
Bang! I wouldn’t want to be on that elevator. What’s cumulative? The momentum of the elevator, and the decreasing distance between current and desired location. (Even though nothing is wrong, that distance is commonly called “error”.) But a better example is the elevator’s door-closing protection. It started out with a simple:
if not (door.can_close) then door.re-open
Wasn’t it fun back then to keep waving your hand in the door and watch it open again, and keep everyone on the other floors waiting? Quickly, though, elevator software designers realized that even if it’s totally unacceptable to ignore a deliberate foot in the door and start moving, it’s equally wrong to go on closing and re-opening forever. So they added that unpleasant buzzing sound, triggered when the retries reach a certain number or amount of time. Cumulative again.
Real-life error-handling, then, has to do more than test for the limit and reject it. It has to recognize faults, count or measure them, and prevent them from growing and leading to failure. In fact, since a fault may be just the limit of an otherwise acceptable condition (e.g. buffer almost full — OK; buffer overflow — fault), error prevention requires identifying and tracking resources even before they reach their limits.

Nobody likes to write error-handling code, but at least it’s easy: check inputs and results with “if” statements, and reject or recover on failure. But is it really that simple?

A little thought shows that errors are cumulative, and that failure is always the gathering or intensification of some faulty condition. Let’s prove that by contradiction. One of the simplest and most common cases of error handling is input data checking in a user interface. For example, password checking for your bank account login. The simple error-handling code is:

if username.password <> password login.reject(”wrong password”)

That should work fine, right? Every time I submit a wrong password, it rejects the attempt and prompts for a retry. But software developers (and many bank website users) will recognize the problems with that solution, among them:

  1. Unbounded retry loop — if I keep getting it wrong, I can’t escape login
  2. Denial of service — bring down the server by overwhelming it with bad logins
  3. Eliminating wrong passwords — if it tells me “wrong”, I cross that try off my list

All of these real outcomes have something cumulative in them:

  1. Time — user may run out of patience
  2. Load — too much for server to handle
  3. Learning — revealing more and more information about correct password

Take another common example: the elevator. Would simple limit-checking work for stopping at the right floor? Let’s try.

if floor.location <> floor.desired elevator.descend

Bang! I wouldn’t want to be on that elevator. What’s cumulative? The momentum of the elevator, and the decreasing distance between current and desired location. (Even when nothing is wrong, that distance is commonly called “error”.) But a better example is the elevator’s door-closing protection. It started out with a simple:

if not (door.can_close) door.re-open

Wasn’t it fun back then to keep waving your hand in the door and watch it open again, and keep everyone on the other floors waiting? Quickly, though, elevator software designers realized that even if it’s totally unacceptable to ignore a deliberate foot in the door and start moving, it’s equally wrong to go on closing and re-opening forever. So they added that unpleasant buzzing sound, triggered when the retries reach a certain number or amount of time. Cumulative again.

Real-life error-handling, then, has to do more than test for the limit and reject it. It has to recognize faults, count or measure them, and prevent them from growing and leading to failure. In fact, since a fault may be just the limit of an otherwise acceptable condition (e.g. buffer almost full — OK; buffer overflow — fault), error prevention requires identifying and tracking resources even before they reach their limits.


Really simple code review tool

June 17, 2009

I know what I want: highlight-and-comment capability in a text editor. And since plain text really shouldn’t be marked up, I’d want it to save the comments in a separate text file, with line numbers of what was commented on, and the comments’ text. I’m not the first one looking for this capability: “climbanymountain” asked about it on Joel on Software back in 2006: “win32 text editor for code review?

I started thinking about writing a plugin for some popular text editor, such as Notepad++ or Notepad2 (the latter is my favorite for simple editing). But before the “build or buy” decision, I thought I’d ask around the office.

A lot of ideas, but here was the simplest. Light and off-the-shelf. Use a diff tool. 

The following procedure works fine, and “supports” any version control system. In my example, I’m using open-source WinMerge as my diff tool, IBM Rational ClearCase as my version control system, and Microsoft Windows Explorer as my tree browser. Say the source file to review is called “MyFile.c”, and the reviewer’s name is Firstname Lastname.

The parts in square brackets are optional: omit them if you don’t want to clutter up your version control system with code review records, but just want to use them to prepare for sitting with code author.

  1. Browse to the source code file I want to review; [check it out and] open it.
  2. Immediately Save As MyFile_flastname_review.c in the same directory. Close the source code file.
  3. Select the source code file and the review file (they are next to each other in alphabetical order), right-click and open WinMerge on them.
  4. Go to the pane of the review file and type review comments anywhere I want. Save regularly with CTRL-S. (Optionally, for better syntax highlighting, can precede comments with comment character such as “//”. But don’t have to.)
  5. Generate a diff (patch) file, call it the same name as the review file, but add .diff on the end. (In WinMerge, that’s on the menu with Tools > Generate a patch.)
  6. [Check in all 3 files (adding 2 new ones to source control) -- check-in activity per defect tracking system, and comment with one-line summary of the review.]

From this point, code review comments are easily viewable side-by-side with the code by viewing in WinMerge or similar viewer.

Also, the diff file clearly identifies the line number commented on, and the comment text. I figure I could write an AWK script to turn that into CSV, for tabular display and tracking in Microsoft Excel, if I wanted to.


The Tip of the Iceberg

May 10, 2009

We all like to think that functional requirements are the main thing, and successfully designing and coding to them is enough. Who wants to worry about all the suprises from users, data, and even hardware?

But as Professor Behrooz Parhami shows, in a short (2-page!) article, Defect, Fault, Error,…, or Failure? (pdf), the “Ideal” state that we focus on is just one of 7 common possibilities. The other 6, descending into unpleasantness, are Defective, Faulty, Erroneous, Malfunctioning, Degraded, and Failed.

Our job is really twofold:

  1. Meet the functional requirements of the ideal state
  2. Keep the system in that ideal state, and avoid failure

Does failure avoidance have to take 86% (6/7) of the code? I don’t know. But it certainly sounds like the bottom half of an iceberg–a lot more than half is underwater.


Don’t get stuck

May 7, 2009

Having a standalone consumer application get stuck or crash, requiring reboot, is not the worst thing that can happen. (Worse is incorrect behavior that causes data loss or physical harm.) But requiring a reboot is the most annoying failure in non-safety-critical systems.

If there’s any good news, it’s that the list of fault modes is short:

  • System resources exhausted
  • Mistakenly idling
  • Waiting for acknowledgement that never comes
  • Deadlock

Did I miss any?

Only exception-safe code can avoid these undesired end states.

Design by Contract (DbC) is one way to exception safety.

Failure mode and effects analysis (FMEA) helps you plan a path to get there.


Software Developer

May 7, 2009

That’s right, not “development” but “developer”.

The latest issue of The Embedded Muse newsletter—#179 (pdf)—has a reference to an interesting and very different kind of book about C. It’s about the business pressures and the thought processes of C software developers. And a lot more. It’s also long—1600 pages! So download a copy and skim for what interests you:

The New C Standard: An Economic and Cultural Commentary

Some lines that caught my eye, just in the first 100 pages, skimming with the “Page Down” key:

“Software developer interaction with source code occurs over a variety of timescales [for example, those] whose timescale are measured in seconds.”

“The act of reading and writing software has an immediate personal cost.”

“Source code faults are nearly always clichés; that is, developers tend to repeat the mistakes of others and their own previous mistakes.”

“A list of possible deviations should be an integral part of any coding guideline.”

“The following are different ways of reading source code, as it might be applied during code reviews …”

Other related references from the book’s author Derek M. Jones (http://www.knosof.co.uk/):

And somehow related, from another author, Les Hatton (http://www.leshatton.org/):


Simple Use Cases

May 1, 2009

Google “use case format” and you’ll get lots of templates. Which one to use? Here’s my really simple template that people around me have started using:

  • Who I am (role)
  • What I’m doing (current task)
  • What I want (current goal)
  • What I type (into the system being defined)
  • What I expect to get (out of the system being defined)

True, it doesn’t address pre-conditions, post-conditions, extensions, and more. But it serves well for high-level discussions, and gives a clear understanding of what use cases are for. After using this template for a while, go to Google and get one of those fancier ones if you want.


Keeping Embedded Software on Track

April 23, 2009

Consider a function in software code, and how to write it properly so that it meets its requirements and doesn’t fail. The standard way of looking at it is that a function has an API — a signature — of input and output variables and types. The code for the function has a pretty standard form. There are checks for invalid input, and a function body which implements a mini-flowchart of conditional checks, and maybe a state machine if needed. All this to take every given input vector to the correct output.

Unit tests, whether they’re written before coding (test-driven development), or after, apply test input vectors to the function and check if the expected output is produced. The same story applies to a module: a collection of functions or classes that implements a feature. There are still valid inputs to check if they give correct output, and invalid inputs that must be rejected. Best-practice test design methods, such as boundary-value checking and equivalence classes, help the developer choose the best test vectors from the sea of possibilities. The ones that will produce good tests that cause failures early, while the code is still in the hands of the developer.

So why do we still have so many “surprise” failures? Unit-tested code that nevertheless comes back from system testing with failures? And the Steps to Reproduce — so simple — like, “I left the system running overnight and when I came back in the morning it had crashed.”

The answer may be in the flow of data. Functions and modules in embedded software don’t just deal with single inputs one by one. Generally they are expected to process streams of data in real time. For example, a modern television set-top box, with a built-in disk, has to process multiplexed video data and metadata, the same from the hard disk, as well as a much slower stream of user input via the TV remote control. If the code slips up, or leaks memory, sooner or later, you get a wrong, stuck, or crash situation that QC proudly reports.

Where have we seen this before? How about those robot car competitions that university engineering departments often hold? Contests which have expanded outward to worldwide challenges to design, for example, an auto-piloted car. (See “The DARPA Urban Challenge“). These contests demand software that will keep a driverless car on track for long periods of time, avoiding stationary and moving obstacles (other cars).

Could it help, then, to see a lowly function in embedded software not as a flowchart of if-then-else statements, but as a little car — or perhaps delivery truck– that must be kept on track, avoiding obstacles in the data stream while delivering packages to the right place? How would that view affect code layout, code review, and unit test design? See you at the track!


Zen and the Art of Boyle

April 17, 2009

“Britain’s Ugly Duckling Breaks Out in Song” I slowly translated off the showbiz page of a foreign-language newspaper. I had to work at it to figure out that it said “Susan Boyle” and then look up her appearance on YouTube from a week ago. Anyone who wants to be moved by song, and doesn’t mind (or enjoys) the contrast with “beautiful people” celebrity judges putting feet in their mouths, should stop and listen.

I thought I would have nothing to add to the commentary on a musical appearance that saw over two million views (and that’s just on one upload, let alone the original TV broadcast). But after reading the 5-day-old Wikipedia entry, and some of the newspaper articles in the references, I wondered why nobody offered the obvious. Good singing comes from interpretive ability, soul, and poise. Anyone who has enjoyed opera would have no reason to be surprised–and every reason to be moved–by Boyle’s voice. Similarly by that of her “predecessor” Paul Potts. It is the furthest thing from coincidence that Potts sang opera, and Boyle sang from a musical, both genres that are formally performed live without a microphone.

So, while Tanya Gold and others may not be wrong in their social analyses, they are missing the point. Song, in contrast to the child of yesteryear, is to be heard and not seen.