Archive for the ‘Testing’ Category
Recently I went to a workshop on Exploratory Testing given by John Stevenson. Just as exploratory testing finds the unexpected, so I learned something unexpected. Every time John used the word “testing”, he meant it as (re)defined in Michael Bolton’s Testing vs. Checking. It’s a powerful redefinition, and simpler to say than what I’ve always meant by “adversarial testing”. Here’s how the redefinition works:
- Notice that many tests don’t find defects
- Decide we need a new definition for the word “test”
- But the current definition is useful, so move it to the word “check”
- Now the word “test” is open for redefinition
- Define “test” as in Testing vs. Checking
Quick summary of new and useful definition:
Let “test” mean an exercise of the software under test that helps a thinking tester answer the question, “Is there a problem here?”
Recently I became the unwitting user of a new piece of consumer software, when our office upgraded our mobile phones. Suddenly, with every phone call, appointment reminder, and even when charging the phone, I’m acutely aware of what I want from this piece of embedded software that I did not even buy. As a software user I want only two things: more or better features, and problems fixed. I don’t care what version the software is — I care what’s changed. It would make sense, then, that software developers should focus on changes, not versions, and that tools should support them.
The most common tool on a software development project is always the “bug tracker” or defect tracking system. It’s really a change request tracking system, with two kinds of requests for changes: enhancements (performance improvement or new feature requests) and defects (fix requests). Tracking requests for changes. That’s good.
Now look at the other major tool in every software development group — it’s usually called a “version control system”. But see above — users don’t care about versions. And developers have to report to the defect tracking system what they’ve done, and that system is which is full of requests for changes, not requests for versions. So why would we want a version control system?
That’s why the light went on for me when I read Joel Spolsky’s Hg Init: a Mercurial tutorial. Not because Mercurial (or Git) makes branching cheap and merges easy. (It does.) But because, as Spolsky writes:
Subversion likes to think about revisions. A revision is what the entire file system looked like at some particular point in time. In Mercurial, you think about changesets. A changeset is a concise list of the changes between one revision and the next revision. (When Spolsky writes “revisions”, read “versions”.)
Now I understand why there’s so much trouble connecting our defect (change) tracking and version control workflows. Testing too — what we really want to know is, “Which change broke the build?” And bringing in code review, more problems, because people want to review whether each change accomplishes the goal of adding or fixing something. Software development is all about changes: specifying them, designing them, coding them, testing them, and delivering them. Not versions. Changes. Sure, versions have a place in software delivery — they are labels for the software after a certain number of changes. (Mercurial supports them, as “tags”.) So if you want all the tools to work together and make your software development life easier, switch to a software repository that lets you control changes.
What should we call them? Maybe “software changeset control systems”. The two big ones these days are Mercurial, and Git. For Mercurial, see Spolsky’s tutorial. For Git, listen to Linus Torvalds on git.
Automated testing is essential for test-driven development, and for regression testing. But remember:
- If you have a system that runs automated tests, look at the results. Leaving it on “FAIL” won’t help.
- Supplement with adversarial testing, to catch what only the human eye sees
Still not sure?
Okay, what test was ignored or missed before releasing this context-sensitive advertising software?
(all relevant credit to The New York Times)
Google’s “Mr. Automated Testing”, Miško Hevery, talks about Software Testing Categorization. From the title, it doesn’t sound like much, but it prompted a lot of good discussion. Some excerpts on key points, edited for length and clarity:
Know what you’re talking about
You hear people talking about small/medium/large/unit/integration/functional/scenario tests but do most of us really know what is meant by that? — Miško Hevery
Need for speed
Let’s start with unit test. The best definition I can find is that it is a test which runs super-fast (under 1 ms) and when it fails you don’t need debugger to figure out what is wrong. — Miško Hevery
Make slow tests faster by running them in the background on spare machine cycles. Works for static analysis tools too. For example, Riverblade’s Visual Lint add-on for Visual Studio and Gimpel Software’s PC-lint. — Talk About Quality
I heard a senior test manager present at a conference who espoused the sophistication of his automated test regime (on a mainframe no less). Countless tests ran every evening, unattended. Lots of clapping in the audience. In the break I met a guy who worked for the presenter. He asked, “do you want to know why the tests run so fast? We took out all the comparison checks”. — Paul Gerrard
You’re right to have a healthy concern. Whether automated tests run in a millisecond or in several hours, they always have to be re-reviewed to see if they test something useful. Passing tests are especially dangerous. Everyone thinks green is great so they don’t look at the tests to discover that features have moved on and the tests don’t test anything. Automated tests are like automatic shift (does anyone still drive stick like I do?): very nice and we take it for granted, but when you press on the gas you still have to look where you’re going so you don’t cause an accident. — Talk About Quality
Test Planning applies to automated testing too
At my previous employer, the standard unit tests took upwards of an hour (for fewer than 2000 tests). This discouraged developers from running them, which resulted in broken builds for everyone. When we worked on fixing this, mock objects provided most of the salvation. However, a good portion of the tests were really integration tests and by correctly identifying them as such, we were able to remove them from the standard suite, and instead run them nightly. What Misko is missing from his post is how test classifications go hand in hand with scheduling tests. — TheseThingsNeedFixed
Thanks to Bulkan Evcimen for tweeting about an article I doubt I would have found otherwise: “A Genetic Programming Approach to Automated Software Repair” (pdf link here), by Stephanie Forrest et al. They describe there how they fed a genetic programming algorithm the following inputs:
- Source code with a known defect
- A negative test case
- Several positive test cases
The authors show how software with the genetic algorithm is capable of repairing the defect such that the software under test no longer fails, while it still passes the positive test cases. They complete the activity with additional software that reviews the change and reduces its scope to the minimum.
Is this the end of the maintenance programmer? If so, this time there would be a happy ending: John Henry lives, and goes on to be a railroad engineer.
A magnifying glass brings us closer to the subject — allows us to examine the details. A prism, on the other hand, separates the rainbow of colors that are otherwise hidden in white light.
What if testing were separated from coding?
While there’s a popular trend these days towards combining them again in a new way — Test-Driven Development — many software organizations still have separate testing groups at the product or system level. We can ask if that’s good or bad, and why? But that’s an ongoing argument.
I’d like to ask a less familiar “what if” question.
What if debugging were separated from coding?
Imagine a software development organization where developers write code, another group tests it (that’s QC), and then the developers fix what QC has found. But that a separate group does all the debugging. All the figuring out what’s wrong, setting breakpoints, reading logs, doing traces etc. In short, the troubleshooting. Developers are not allowed to do that. The debugging team are experts in the codebase, and narrow down the problems, all the way to identifying where to fix the code and how. Then they return their results to the original coders for use.
No, I’m not suggesting such a process. Rather, I’ve been surprised to see so many books about professional programming say that debugging is an essential skill for a programmer. If it’s such an important activity, maybe it should have its own professional team to do it.
On the other hand, what does the need for such an activity say about the code?
Just the other day I saw my own coding project get stuck in debugging. The irony is that I was working on error-handling, and I was debugging an assert statement that didn’t seem to be working. I felt several degrees off from forward motion in coding.
Maybe we have to go back to history.
The First Bug
Many people are familiar with the story. The first computer “bug” actually walked into the machine and caused a failure. A hardware failure.
But do bugs walk into code? So that we should spend so much time, tools, and energy on searching for them, capturing them, and getting rid of them?
For most people, “talking to a computer” conjures up scenes of being trapped in a low-budget retailer’s voice mail system. But today I was reminded of when it can be helpful to talk to a computer, or specifically, to explain something to a computer.
I saw a great demonstration today of an automated test system for on-screen TV guide software. We got to try it out in a hands-on tutorial. I learned the most, though, from just one question and answer.
We were using a screen capture feature with text recognition to set up a test of whether the right words appeared at the top of a menu on screen. All visual, drag-and-drop design. You draw a rectangle on the screen around the area whose text you want to check.
How do I know how much blank space to include around the text for capture? Makes sense to just mark the text, but maybe I should mark more? What if the right text appears, but aligned on the other side of the screen? Is that an error, or not?
Automated testing is very black and white. In the middle of the night, the test will either pass or fail. If you’re not sure what should be considered an error and what not an error, talk to the person who wrote the requirement. Maybe s/he didn’t think of the question either, but will now.
I always thought of automated testing, even with its setup costs, as a way to save time, avoid boring, repetitious work, build in self-test, and run reliable regression testing.
But the real benefit may be in its power to clarify requirements. Put a computer on your test team and it will ask all sorts of detail questions that a human reviewer didn’t think of.