Monday, February 9, 2015

The Paucity of Pass/Fail

Pass and Fail seem to be ubiquitous terms in the testing industry. "Did the test pass?" and "How many failures?" seem to be innocent questions. But what, exactly, does pass or fail mean?

What does "pass" mean to you?



Think about that question for a second, before you move on.

Pass/Fail Reports

Checks

Let's say we're talking about a check done by a tool (sometimes called "automation"). We look at the results of that check, and it reports "PASS". What does that mean?

Let's look at this:


That's the result of a custom computer program I wrote for this automation example.

I think most people would be forgiven for thinking that this means that there is a test that's been coded that checks the login page to with a wrong password to ensure that it doesn't let the user into the system, and it passed. Most people would be forgiven for then thinking "well, we've testing logging in with an incorrect password".

If you (yes you) are a tester, then you are not the "most people" I was referring to. You should be highly suspicious of this.

What does pass mean to you now?


Tests

Let's look at this:

Tester Test Result
Chris Logging in with the wrong password PASS

This is a report given by me, a tester, for the sake of this example. I'll tell you that a human wrote all of this. Look at that PASS. Surely anything that green couldn't be wrong? If you were most people, you could be forgiven for thinking that the "logging in with the wrong password" test has passed. The coverage has been achieved. The testing has been done.

If you (yes, still you) are a tester, then you are not most people, and you should be highly suspicious of this.


So What?

Okay, testers, let's focus on those passes.

Checks


What does pass mean to you now?

To most people "pass" often means "as far as that test is concerned, everything's okay". Sometimes it means "that's been fully tested and is okay". Often it means "we don't have to worry about that thing any more".

But let's take a closer look. That automated test suite I showed you, the one with one test in it? It says that the "Invalid Login - Wrong Password" test passed. But it's not a full test, it's a check. What is it actually doing? I mean what does the "Invalid Login - Wrong Password" test do?

Well we know what it doesn't do. It doesn't investigate problems, evaluate risk, take context into consideration, interpret implicature, or do anything except what it was told to do. Maybe we investigate further and find out from the developer that what it does is enter an invalid login by having the computer enter "test1" into the username field and "password123" (which isn't test1's password) into the password field. Let's say that if, after clicking the "Login" button, the "Invalid Password" text appears, then it reports a pass, otherwise it reports a fail.

What does pass mean to you now?

Well, the explanation of the code means that the code that checked it returned a particular value (PASS) based on a specific check of a specific fact at a specific time on a specific platform, on this occasion.

Can we still have that happy feeling of "coverage" or "not having to worry" or "the testing being done"? Well, of course not. Here are some other things that would cause the code to return a "PASS" value and invalidate the test:

  • The test data is wrong, or failed to load properly, and doesn't include a test1 user at all
  • The "Invalid Password" text ALWAYS appears for any password
  • The "Invalid Password" text appears for every first attempted login
  • The text "Invalid Password" is hidden on the screen, but the checking system finds it in the DOM and reports it as found
  • The text that appears for a valid password entry has been copy-pasted and is also "Invalid Password"
  • The text "Invalid Password" appears elsewhere on the page after a valid login

These are scenarios that are isomorphic to the check's observed behaviour. That is to say that what the check "sees" appears the same in all of these cases, meaning that the check doesn't actually check for a wrong password login, it only checks for specific text after a specific event on a system with an unknown platform and data.

This means that for all of these cases the check reported a pass and there's a serious problem with the functionality.

We might say think because a computer said "pass" there is no problem. However, it might be that there are problems the check is not coded to do, or there may be a problem that the check does not describe because it's badly written, or something unexpected happened.

What does pass mean to you now?

Here's the actual real Ruby code I wrote for this automation example:

1
2
3
4
5
6
puts "Running Test Suite..."
puts ""
puts "Test: 'Invalid Login - Wrong Password' => PASS"
puts ""
puts "1/1 Tests run. 1 passes, 0 failures."
puts "\n\n\n\n"

What does pass mean to you now?


Tests

Okay let's move on to that tester's report. A tester did the testing this time, and it passed! But while testers are intelligent and computers are not, they are frequently less predictable. What did the tester ACTUALLY do? Well, let's ask them.

Well, let's say that they said that they first checked the test data for a test1 user, and tried a valid login to confirm this. The system displayed "valid password" and gave them access.

Let's say that they said that they logged out then tried a bad password and found that it prevented them from logging in, and gave a useful, expected error message.

Let's say that they said that they tried to repeat the test a few times, and found the same behaviour.

What does pass mean to you now?

Feel better? I kind of feel better. But I think we know enough to question this, by now. What scenarios can you think of where the testing didn't find a problem related to this simple test idea? Here's a few that represent false positives or false negatives:
  • The message doesn't display (or some other problem) on certain browsers
  • The message doesn't display if you click the Login button twice
  • When trying to log in with different passwords the system was just presenting the same screen instead of trying to log in
  • The tester is testing an old version of the software that works, but the functionality is broken in the latest version.
Of more interest here are some that represent problems that weren't found:
  • Every time the tester fails to log in it increments a "failed logins" value in the database. It's stored as a byte, so when it reaches 255 it throws a database error.
  • The value mentioned above is responsible for locking out the user after 10 tries, so after 10 tries it ALWAYS displays "Invalid Login" on the screen, even with a valid login.
It's a fun experiment thinking of all the ways the tester didn't find existing problems while reporting a pass.

Guess what I (the tester) actually did to test the login page? That's right, nothing. I made it up. You should never have trusted me.

And what wasn't I testing? What system was I trying to test? How important is it that this works? Is it a small website to sell shoes, or a government site that needs to withstand foreign attacks?

A Passing Interlude

So let's review our meaning of "pass".

It seems to give the impression of confidence, coverage and a lack of problems.

It should give the impression of no found problems - which by itself is of exceedingly little value. Unless you know what happened and why that's important as far as "no found problems" is concerned you can't tell the difference between "good coverage and risk assessment" and "I didn't actually do any testing". Remember my test report, and my big green PASS? I didn't do any testing. The "PASS" by itself has no value. A non-tester might try one happy-path test and write PASS on their report having done no real investigation of the system.

If you're interested in a better way to report your testing then I recommend this Michael Bolton post as a jumping off point.

"PASS" is closer to meaning "Whatever we did to whatever we did it to, with however much we understand the system and however much we understand what it's supposed to do and whatever capability we have to detect problems we did not, on this occasion, with this version of this software on this platform find any problems."

I've focused on "pass" here, to show how weak a consideration it can be, and how much complexity it can obscure, but I'm going to leave it as homework to consider your version of "fail". What does "fail" mean to you? Do you use it as a jumping off point for investigations? Why don't you use "pass" as a similar jumping off point? How are your coded checks written - hard to pass or hard to fail? Why?

What Do I Do?

Remember what you're interested in. Don't chase the result of checks, chase down potential problems. Investigate, consider risk, empathise with users and learn the product. Consider problems for humans, not pass/fail on a test report. Use your sense and skill (and ethics) to avoid the pitfalls of dangerous miscommunication.

One of our jobs as testers is to dispel illusions people have about the software. People have illusions about the software because of various levels of fear and confidence about the product. Let's not create false fear and false confidence in our reporting - understand what pass and fail really are and communicate your testing responsibly.

What does pass mean to you now?

2 comments:

  1. Great post Chris! This is my big gripe with people thinking that checking is "enough"...

    Just another perspective on one of your examples:
    "Maybe we investigate further and find out from the developer that what it does is enter an invalid login by having the computer enter "test1" into the username field and "password123" (which isn't test1's password) into the password field. Let's say that if, after clicking the "Login" button, the "Invalid Password" text appears, then it reports a pass, otherwise it reports a fail."
    - I'd say that the "Invalid Password" text is actually a security threat because that text implies that the "username" is correct...
    (I point to trello.com for this example - if you are on the trello.com login screen and go to log in with any garble username and password it says 'there isn't an account for this username', but try 'test' as the username with any password and you get 'invalid password' - showing that the "test" account is actually a valid account... Now all someone would have to do is plug in Brutus!).

    Anyway, more people have to be aware of the message in this post. You should turn it into a talk and take it to some Automation & Dev conferences (and some old school testing conferences too!!)

    ReplyDelete
  2. Thank you very much! I hadn't considered turning it into a talk, but I might try it out :).

    "I'd say that the "Invalid Password" text is actually a security threat"

    I'd certainly agree with you! This is another great example of just how poor pass/fail results are, and how little they can actually describe. A pass gives the sense that the product is okay, but actually contains a security flaw which (given more context, which I excluded on purpose) we might think of as very important.

    Not that the example I gave is necessarily true, of course, even in the sense of it being a valid example. I was careful to say "find out from the developer"... and I think I know enough about you to know that you'd treat information from a developer with a respectful scepticism :).

    Interesting find for Trello, by the way!

    ReplyDelete