Second Sign of Madness: check vs test

Showing posts with label check vs test. Show all posts

Monday, February 9, 2015

The Paucity of Pass/Fail

Pass and Fail seem to be ubiquitous terms in the testing industry. "Did the test pass?" and "How many failures?" seem to be innocent questions. But what, exactly, does pass or fail mean?

What does "pass" mean to you?

Think about that question for a second, before you move on.

Pass/Fail Reports

Checks

Let's say we're talking about a check done by a tool (sometimes called "automation"). We look at the results of that check, and it reports "PASS". What does that mean?

Let's look at this:

That's the result of a custom computer program I wrote for this automation example.

I think most people would be forgiven for thinking that this means that there is a test that's been coded that checks the login page to with a wrong password to ensure that it doesn't let the user into the system, and it passed. Most people would be forgiven for then thinking "well, we've testing logging in with an incorrect password".

If you (yes you) are a tester, then you are not the "most people" I was referring to. You should be highly suspicious of this.

What does pass mean to you now?

Tests

Let's look at this:

Tester	Test	Result
Chris	Logging in with the wrong password	PASS

This is a report given by me, a tester, for the sake of this example. I'll tell you that a human wrote all of this. Look at that PASS. Surely anything that green couldn't be wrong? If you were most people, you could be forgiven for thinking that the "logging in with the wrong password" test has passed. The coverage has been achieved. The testing has been done.

If you (yes, still you) are a tester, then you are not most people, and you should be highly suspicious of this.

So What?

Okay, testers, let's focus on those passes.

Checks

What does pass mean to you now?

To most people "pass" often means "as far as that test is concerned, everything's okay". Sometimes it means "that's been fully tested and is okay". Often it means "we don't have to worry about that thing any more".

But let's take a closer look. That automated test suite I showed you, the one with one test in it? It says that the "Invalid Login - Wrong Password" test passed. But it's not a full test, it's a check. What is it actually doing? I mean what does the "Invalid Login - Wrong Password" test do?

Well we know what it doesn't do. It doesn't investigate problems, evaluate risk, take context into consideration, interpret implicature, or do anything except what it was told to do. Maybe we investigate further and find out from the developer that what it does is enter an invalid login by having the computer enter "test1" into the username field and "password123" (which isn't test1's password) into the password field. Let's say that if, after clicking the "Login" button, the "Invalid Password" text appears, then it reports a pass, otherwise it reports a fail.

What does pass mean to you now?

Well, the explanation of the code means that the code that checked it returned a particular value (PASS) based on a specific check of a specific fact at a specific time on a specific platform, on this occasion.

Can we still have that happy feeling of "coverage" or "not having to worry" or "the testing being done"? Well, of course not. Here are some other things that would cause the code to return a "PASS" value and invalidate the test:

The test data is wrong, or failed to load properly, and doesn't include a test1 user at all
The "Invalid Password" text ALWAYS appears for any password
The "Invalid Password" text appears for every first attempted login
The text "Invalid Password" is hidden on the screen, but the checking system finds it in the DOM and reports it as found
The text that appears for a valid password entry has been copy-pasted and is also "Invalid Password"
The text "Invalid Password" appears elsewhere on the page after a valid login

These are scenarios that are isomorphic to the check's observed behaviour. That is to say that what the check "sees" appears the same in all of these cases, meaning that the check doesn't actually check for a wrong password login, it only checks for specific text after a specific event on a system with an unknown platform and data.

This means that for all of these cases the check reported a pass and there's a serious problem with the functionality.

We might say think because a computer said "pass" there is no problem. However, it might be that there are problems the check is not coded to do, or there may be a problem that the check does not describe because it's badly written, or something unexpected happened.

What does pass mean to you now?

Here's the actual real Ruby code I wrote for this automation example:

puts "Running Test Suite..."
puts ""
puts "Test: 'Invalid Login - Wrong Password' => PASS"
puts ""
puts "1/1 Tests run. 1 passes, 0 failures."
puts "\n\n\n\n"

What does pass mean to you now?

Tests

Okay let's move on to that tester's report. A tester did the testing this time, and it passed! But while testers are intelligent and computers are not, they are frequently less predictable. What did the tester ACTUALLY do? Well, let's ask them.

Well, let's say that they said that they first checked the test data for a test1 user, and tried a valid login to confirm this. The system displayed "valid password" and gave them access.

Let's say that they said that they logged out then tried a bad password and found that it prevented them from logging in, and gave a useful, expected error message.

Let's say that they said that they tried to repeat the test a few times, and found the same behaviour.

What does pass mean to you now?

Feel better? I kind of feel better. But I think we know enough to question this, by now. What scenarios can you think of where the testing didn't find a problem related to this simple test idea? Here's a few that represent false positives or false negatives:

The message doesn't display (or some other problem) on certain browsers
The message doesn't display if you click the Login button twice
When trying to log in with different passwords the system was just presenting the same screen instead of trying to log in
The tester is testing an old version of the software that works, but the functionality is broken in the latest version.

Of more interest here are some that represent problems that weren't found:

Every time the tester fails to log in it increments a "failed logins" value in the database. It's stored as a byte, so when it reaches 255 it throws a database error.
The value mentioned above is responsible for locking out the user after 10 tries, so after 10 tries it ALWAYS displays "Invalid Login" on the screen, even with a valid login.

It's a fun experiment thinking of all the ways the tester didn't find existing problems while reporting a pass.

Guess what I (the tester) actually did to test the login page? That's right, nothing. I made it up. You should never have trusted me.

And what wasn't I testing? What system was I trying to test? How important is it that this works? Is it a small website to sell shoes, or a government site that needs to withstand foreign attacks?

A Passing Interlude

So let's review our meaning of "pass".

It seems to give the impression of confidence, coverage and a lack of problems.

It should give the impression of no found problems - which by itself is of exceedingly little value. Unless you know what happened and why that's important as far as "no found problems" is concerned you can't tell the difference between "good coverage and risk assessment" and "I didn't actually do any testing". Remember my test report, and my big green PASS? I didn't do any testing. The "PASS" by itself has no value. A non-tester might try one happy-path test and write PASS on their report having done no real investigation of the system.

If you're interested in a better way to report your testing then I recommend this Michael Bolton post as a jumping off point.

"PASS" is closer to meaning "Whatever we did to whatever we did it to, with however much we understand the system and however much we understand what it's supposed to do and whatever capability we have to detect problems we did not, on this occasion, with this version of this software on this platform find any problems."

I've focused on "pass" here, to show how weak a consideration it can be, and how much complexity it can obscure, but I'm going to leave it as homework to consider your version of "fail". What does "fail" mean to you? Do you use it as a jumping off point for investigations? Why don't you use "pass" as a similar jumping off point? How are your coded checks written - hard to pass or hard to fail? Why?

What Do I Do?

Remember what you're interested in. Don't chase the result of checks, chase down potential problems. Investigate, consider risk, empathise with users and learn the product. Consider problems for humans, not pass/fail on a test report. Use your sense and skill (and ethics) to avoid the pitfalls of dangerous miscommunication.

One of our jobs as testers is to dispel illusions people have about the software. People have illusions about the software because of various levels of fear and confidence about the product. Let's not create false fear and false confidence in our reporting - understand what pass and fail really are and communicate your testing responsibly.

What does pass mean to you now?

Monday, July 28, 2014

It's Just Semantics

Why is it so important to say exactly what we mean? Checking vs testing, test cases aren't artefacts, you can't write down a test, best practice vs good practice in context - isn't it a lot of effort for nothing? Who cares?

You should. If you care about the state of testing as an intellectual craft then you should care. If you want to do good testing then you should care. If you want to be able to call out snake oil salesmen and con artists who cheapen your craft then you should care. If you want to be taken seriously then you should care. If you want to be treated like a professional person and not a fungible data entry temp then you should care.

Brian needs to make sure that their software is okay. The company doesn't want any problems in it. So they hire someone to look at it to find all the problems. Brian does not understand the infinite phase space of the testing problem. Brian wants to know how it's all going - if the money the tester is getting is going somewhere useful. He demands a record of what the tester is doing in the form of test cases because Brian doesn't understand that test cases don't represent testing. He demands pass/fail rates to represent the quality of the software because Brian doesn't understand that pass/fail does not mean problem/no problem. The test cases pile up, and writing down the testing is becoming expensive so Brian gets someone to write automated checks to save time and money because Brian doesn't understand that automated checks aren't the same as executing a test case. So now Brian has a set of speedy, repeatable, expensive automated checks that don't represent some test cases that don't represent software testing and can only pretend to fulfil the original mission of finding important problems. I won't make you sit through what happens in 6 months when the maintenance costs cripple the project and tool vendors get involved.

When we separate "checking" and "testing", it's to enforce a demarcation to prevent dangerous confusion and to act as a heuristic trigger in the mind. When we say "checking" to differentiate it from "testing" we bring up the associations with its limitations. It protects us from pitfalls such as treating automated checks as a replacement for testing. If you agree that such pitfalls are important then you can see the value in this separation of meaning. It's a case of pragmatics - understanding the difference in the context where the difference is important. Getting into the "checking vs testing" habit in your thought and language will help you test smarter, and teach others to test smarter. Arguing over the details (e.g. "can humans really check?") is important so we understand the limitations of the heuristics we're teaching ourselves, so that not only do we understand that testing and checking are different in terms of meaning, but we know the weaknesses of the definitions we use.

So, we argue over the meaning of what we say and what we do for a few reasons. It helps us talk pragmatically about our work, it helps to dispel myths that lead to bad practices, it helps us understand what we do better, it helps us challenge bad thinking when we see it, it helps us shape practical testing into something intellectual, fun and engaging, and it helps to promote an intellectual, fun and engaging image of the craft of testing. It also keeps you from being a bored script jockey treated like a fungible asset quickly off-shored to somewhere cheaper.

So why are people resistant to engaging with these arguments?

It's Only Semantics

When we talk about semantics let's first know the difference between meaning and terminology. The words we use contain explicature and implicature, operate within context and have connotation.

Let's use this phrase: "She has gotten into bed with many men".

First there's the problem of semantic meaning. Maybe I'm talking about bed, the place where many people traditionally go in order to sleep or have sex, and maybe you think I'm talking about an exclusive nightclub called Bed.

Then there's the problem with linguistic pragmatic meaning (effect of context on meaning). We might disagree on the implication that this women had sex with these men. Maybe it's an extract of a conversation about a woman working with a bed testing team where she's consistently the only female and the resultant effects of male-oriented bed and mattress design.

Then there's the problem with subjective pragmatic meaning. We might agree that she's had sex with men, but might disagree on what "many" is, or what it says about her character, or even agree that I'm saying that she's had sex with many men where I am implying something negative about her character but you disagree because you've known her for years and she's very moral, ethical, charitable, helpful and kind.

So where we communicate we must understand the terms we use (in some way that permits us to transmit information, so when I say "tennis ball" you picture a tennis ball and not a penguin), and we must appreciate the pragmatic context of the term (so when I say "tennis ball" you picture a tennis ball and not a tennis-themed party).

When we say "it's only a semantic problem" we're inferring that the denotation (the literal meaning of the words we use) is not as important as the connotation (the inference we share from the use of the words).

Firstly, to deal with the denotation of our terminology. How do we know that we know what we mean unless we challenge the use of words that seem to mean something completely different? Why should we teach new people to the craft the wrong words? Why do we need to create a secret language of meaning? Why permit weasel words for con artists to leverage?

Secondly, connotation. The words we use convey meaning by their context and associations. Choosing the wrong words can lead to misunderstandings. Understanding and conveying the meaning IS the important thing, but words and phrases and sentences are more than the sum of their literal meaning, and intent and understanding of domain knowledge do not always translate into effective communication. Don't forget that meaning is in the mind of the beholder - an inference based on observation processed by mental models shaped by culture and experience. It's arguments about the use of words that lead to better examination of what we all believe is the case. For example some people say that whatever same-sex marriage is it should be called something else (say, "civil union") because the word marriage conveys religious or traditional meaning that infers that it is between a man and a woman. The legal document issued is still the marriage licence. We interpret the implicature given the context of the use of these words which affects the nature of the debate we have about it. We can argue about "marriage" - the traditional religious union, and we can argue about "marriage" - the relationship embodied in national law and the rights it confers. We can also talk about "marriage" - the practical, day to day problems and benefits and trials and goings on of being married.

Things Will Never Change

Things are changing now. Jerry Weinberg, James Bach, Michael Bolton and many others' work is rife with challenging the status quo, and now we have a growing CDT community with increasing support from people who want to make the testing industry respected and useful. There's work in India to change the face of the testing industry (Pradeep Soundararajan's name comes to mind). There's STC, MoT, AST and ISST. The idea that a problem won't be resolved is a very poor excuse not to be part of moving towards a solution. Things are always changing. We are at the forefront of a new age of testing, mining the coalface of knowledge for new insight! It's an incredibly exciting time to be in a changing industry. Make sure it changes for the better.

It's The Way It's Always Been

Yes, that's the problem.

I'll Just Deal With Other Challenges

Also known as "someone else will sort that out". Fine, but what about your own understanding? If you don't know WHY there's a testing vs checking problem then you're blinding yourself to the problems behind factory testing. If you don't investigate the testing vs checking problem then you won't know why the difference between the two is important to the nature of any solution to the testing problem, and you're leaving yourself open to be hoodwinked by liars. Understand the value of what you're doing, and make sure that you're not doing something harmful. At the very least keep yourself up to date with these debates and controversies.

There's another point to be made here about moral duty in regard to such debates and controversies. If you don't challenge bad thinking then you are permitting bad thinking to permeate your industry. If we don't stand up to those who want to sell us testing solutions that don't work then they will continue to do so with complete repudiation. I think that this moral duty goes hand-in-hand with a passion for the craft of testing, just as challenging racist or sexist remarks goes hand-in-hand with an interest in the betterment of society as we each see it. It's okay to say "I'm not going to get into that", but if you claim to support the betterment of the testing industry and its reputation as an intellectual, skilled craft then we could really use your voice when someone suggests something ridiculous, and I'd love to see more, innovative approaches to challenging bad or lazy thinking.

"Bad men need nothing more to compass their ends, than that good men should look on and do nothing." - John Stuart Mill

Sounds Like A Lot Of Effort

Testing is an enormously fun, engaging, intellectual activity, but it requires a lot of effort. Life is the same way. For both of these things what you get out of it depends on what you put in.

Arguments Are Just Making Things Less Clear/Simple

This is an stance I simply do not understand. The whole point of debate is to bash out the ideas and let reason run free. Clear definitions give us a starting point for debate, debate gives us rigour to our ideas. I fail to see how good argument doesn't make things better in any field.

As for simplicity, KISS is a heuristic that fails when it comes to the progress of human knowledge. We are concerned here with the forefront of knowledge and understanding, not implementing a process. Why are we so eager to keep it so simple at the expense of it being correct? I've talked to many testers and I've rarely thought that they couldn't handle the complexity of testing, as much as they could the complexity of life. I don't know who we're protecting with the idea that we must keep things simple, especially when the cost is in knowledge, understanding and the progression of our industry.

P.S.
I've been reluctant to release this post. It represents my current thinking (I think!), but I think it's uninformed which may mean that it's not even-handed. Linguistics and semantics are fairly new to me. I'm reading "Tacit and Explicit Knowledge" at the moment, I may come back and refine my ideas. Please still argue with me if you think I'm wrong, I have a lot to learn here.