Archive for the ‘Design’ Category
Software is Writing
Software is not construction
Software is not construction, no matter how popular that statement may be. A program is not a building. Parts are not chosen first and then fitted into place. Design continues almost to the last minutes of coding. Further, almost nobody spends time creating new code, at least not for very long. Sure there are new products and new codebases. But even then, after a short period, most additional coding is extending or changing an existing codebase. So if coding is not construction, and mostly involves changing what’s already created, what is it?
Well, coding is writing. It’s using text to communicate a message. That may not sound like much, but understanding this helps a lot. Instead of struggling to make up new ways of thinking and learning, as if programming were a completely new human activity, we can look at how good writers write. People have been writing for many hundreds of years, and have become pretty good at it.
Learning to write
How do people get into writing? Do we start, as most do in programming, full grown, writing essays? No, not at all. We start as children, and learn first to talk, then to read, and finally to write. Yes, I know that school seems to teach writing, at least forming the letters, before reading. But it’s an illusion: listening to a bedtime story is essentially reading – paying attention to someone else’s writing, and understanding it – and comes before writing. Most people don’t try writing that communicates a message – expository prose – until secondary school or later.
Good writers have their audience in mind at all times. They edit their writing through several drafts, as well as getting others to review it, and give written comments. And they talk about their writing. Maybe not always directly, but when a political columnist goes to a dinner party, you can bet he’ll talk politics. When a novelist chats in the park, she’s talking about the same human struggles that will later appear in her books. Even for the solitary writer typing in a darkened room, the life that feeds the writing is one of talking with, listening to, and reading about people.
Software is Writing
Now back to software: software is not like writing, it is writing. In our industrial setting, it is too late or too radical to suggest re-teaching programming, by teaching first how to talk, then to read, then to write programs. But we can highlight where daily programming work already involves each of these three activities, and where it doesn’t, but should.
Talking includes formal design reviews and code reviews, but also whiteboard conversations and hallway arguments. How should I write this here? Why is that data structure that way? Why does the compiler give that warning? Don’t just think alone – talk it out with someone nearby. It is the same with tests. Why does that scenario fail intermittently? How should I write it differently? And even for defect entries -what does the submitter really mean? Phone him up. Why does this change fix the problem? Find the developer and ask.
Reading is for design documents, training materials and perhaps professional books. But mainly, it includes reading lots of existing code. That’s reading it until you’re really sure what it does, so you know how to change it, fix it, extend it. There is also code review — where the author asks you to read their code, not just to say that it looks fine, but to identify what’s wrong, or unclear.
Writing is the coding, of course. But who is the audience? There are two audiences. One, the compiler, which is your personal translator, from a non-native language (nobody’s mother spoke to them in C), to an almost unknown language – the machine language that the hardware actually understands. The compiler is a rigid, simple-minded, sometimes unpredictable audience, just a ghost of the person who wrote it. Here you have to be very precise, careful, and limit yourself to what the compiler understands and to what the hardware will accept. The other audience, is so very different – the next developer. This may be yourself a few weeks later, or someone else, but there will always be another developer, coming back months or years later, to fix or extend the code. She wants comments, explanations, meaningful names for things, and an easily readable structure. Because now you’re long gone, she can’t talk to you: she must read, so you must write.
Tools!
Well, with all this talk about programming being writing, let’s come back to earth. Software developers are technical people, and expect to use the best tools to do their jobs. There’s the whiteboard in the hallway — we need more of those, for talking. Defect tracking systems, where we can make sure we write clear resolution notes. And for automatically sharing and discussing code, web-hosted code tools for static analysis and code review are the way to go.
–
Originally published in company internal newsletter. Edited and released here with permission.
Cancel that!
A medical radiation machine operator types the letter ‘x’, realizes it’s an error, backspaces, types ‘e’, and continues. Consequences of the error and related defects lead to the patient’s untimely death.
Ten years later, a Japanese stock broker mistakenly switches the share price and amount on the sell order of a new stock. He tries to cancel the order, but fails. His employers lose $225 million, and are involved in lawsuits for years.
A blogger clicks the “back arrow” by mistake, gets a warning message, concludes he meant to continue writing, and clicks “OK” to continue. He loses his work, and has to rewrite it. Here, the consequences are annoying, if trivial by comparison.
What do all these cases have in common? Canceling a request. If it’s not hard enough to program computers to do what we want them to do, who would have thought that telling them not to do it would be hard too?
The cancellation scenario — or “use case” as it’s called in software design — is the silent partner of every positive request supported by a piece of software. It has to cover giving the user clear options, executing the cancellation, and rolling back any partial results. Things get more complicated if authorization is required, or if the transaction has already gone through (both of those requirements figured into the Japanese broker error story). Cancellation and rollback are also part of automatic requests that may occur if one software module (the “server”) cannot complete a request by another (the “client”) and has to make sure to put everything back the way it was, and send the proper response code.
So the next time you’re designing a piece of software, no matter how simple, think what it’s supposed to do, but also what it will do if the user or client calls out, “Cancel that!”
Errors are always cumulative
Nobody likes to write error-handling code, but at least it’s easy: check inputs and results with “if” statements, and reject or recover on failure. But is it really that simple?
A little thought shows that errors are cumulative, and that failure is always the gathering or intensification of some faulty condition. Let’s prove that by contradiction. One of the simplest and most common cases of error handling is input data checking in a user interface. For example, password checking for your bank account login. The simple error-handling code is:
if username.password <> password login.reject(“wrong password”)
That should work fine, right? Every time I submit a wrong password, it rejects the attempt and prompts for a retry. But software developers (and many bank website users) will recognize the problems with that solution, among them:
- Unbounded retry loop — if I keep getting it wrong, I can’t escape login
- Denial of service — bring down the server by overwhelming it with bad logins
- Eliminating wrong passwords — if it tells me “wrong”, I cross that try off my list
All of these real outcomes have something cumulative in them:
- Time — user may run out of patience
- Load — too much for server to handle
- Learning — revealing more and more information about correct password
Take another common example: the elevator. Would simple limit-checking work for stopping at the right floor? Let’s try.
if floor.location <> floor.desired elevator.descend
Bang! I wouldn’t want to be on that elevator. What’s cumulative? The momentum of the elevator, and the decreasing distance between current and desired location. (Even when nothing is wrong, that distance is commonly called “error”.) But a better example is the elevator’s door-closing protection. It started out with a simple:
if not (door.can_close) door.re-open
Wasn’t it fun back then to keep waving your hand in the door and watch it open again, and keep everyone on the other floors waiting? Quickly, though, elevator software designers realized that even if it’s totally unacceptable to ignore a deliberate foot in the door and start moving, it’s equally wrong to go on closing and re-opening forever. So they added that unpleasant buzzing sound, triggered when the retries reach a certain number or amount of time. Cumulative again.
Real-life error-handling, then, has to do more than test for the limit and reject it. It has to recognize faults, count or measure them, and prevent them from growing and leading to failure. In fact, since a fault may be just the limit of an otherwise acceptable condition (e.g. buffer almost full — OK; buffer overflow — fault), error prevention requires identifying and tracking resources even before they reach their limits.
The Tip of the Iceberg
We all like to think that functional requirements are the main thing, and successfully designing and coding to them is enough. Who wants to worry about all the surprises from users, data, and even hardware?
But as Professor Behrooz Parhami shows, in a short (2-page!) article, Defect, Fault, Error,…, or Failure? (pdf), the “Ideal” state that we focus on is just one of 7 common possibilities. The other 6, descending into unpleasantness, are Defective, Faulty, Erroneous, Malfunctioning, Degraded, and Failed.
Our job is really twofold:
- Meet the functional requirements of the ideal state
- Keep the system in that ideal state, and avoid failure
Does failure avoidance have to take 86% (6/7) of the code? I don’t know. But it certainly sounds like the bottom half of an iceberg–a lot more than half is underwater.
Don’t get stuck
Having a standalone consumer application get stuck or crash, requiring reboot, is not the worst thing that can happen. (Worse is incorrect behavior that causes data loss or physical harm.) But requiring a reboot is the most annoying failure in non-safety-critical systems.
If there’s any good news, it’s that the list of fault modes is short:
- System resources exhausted
- Mistakenly idling
- Waiting for acknowledgement that never comes
- Deadlock
Did I miss any?
Only exception-safe code can avoid these undesired end states.
Design by Contract (DbC) is one way to exception safety.
Failure mode and effects analysis (FMEA) helps you plan a path to get there.