Principles of Test Automation
The book has now been published and the content of this chapter has likely changed substanstially.About This Chapter
In the Goals of Test Automation narrative chapter I described the goals we should strive to achieve to help us be successful at automating our unit tests and customer tests. In the Philosophy Of Test Automation narrative I discussed some of the differences in the way people approach software design, construction and testing. This provides the background for the principles that experienced test automaters follow while automating their tests. I call these principles because they are too high level to be patterns and because they represent a value system that not everyone will shared. A different value system may cause you to choose different patterns than the ones I would choose. By making this value system explicit I hope to accelerate the process of understanding where we disagree and why.
The Principles
When Shaun Smith and I came up with the list in the original Test Automation Manifesto [TAM], we were considering what was driving us to write tests the way we did. The Manifesto is a list of the qualities we'd like to see in a test, not a set of patterns that can be directly applied. However, those principles have lead us to identify a number of somewhat more concrete principles, some of which I list here. What makes these different from the goals is that there is more debate about them.
Principles are more "prescriptive" than patterns and also higher-level in nature. Unlike patterns, they don't have alternatives but are presented in a "do this because" fashion. To distinguish them from patterns, I have given them imperative names rather than the noun-phrase names I use for goals, patterns and smells.
FOr the most part, these principles apply equally to unit tests and story tests. The possible exception to this is the principle Verify One Condition per Test which may not be practical for customer tests that exercise more involved chunks of functionality. It is, however, still worth striving to follow these principles and to deviate from them fully cognizant of the consequences.
Principle: Write the Tests First
Also known as: Test-Driven Development, Test First Development
Test-driven development is very much an acquired habit. Once one has "gotten the hang of it", writing code in any other way can seem just as strange as TDD seems to those who have never done. There are two major arguments in favor of doing TDD:
- The unit tests save us a lot of debugging effort; effort that often fully offsets the cost of automating the tests.
- Writing the tests before we write the code forces the code to be designed for testability. We don't need to think about testability as a separate design condition; it just happens because we have written tests.
Principle: Design for Testability
Given the last principle, this principle may seem redundant. For those who choose to ignore Write the Tests First (see Principles of Test Automation on page X), Design for Testability becomes an even more important principle because they won't be able to write automated tests after the fact if the testability wasn't designed in. Anyone who has tried to retrofit automated unit tests onto legacy software can testify to the difficulty this raises. Mike Feathers talks about special techniques for introducing tests in this case in [WEwLC].
Principle: Use the Front Door First
Also known as: Front Door First
Objects have several kinds of interfaces. There is the "public" interface that clients are expected to use and there may also be a "private" interface that only close friends should use. Many objects also have an "outgoing interface" consisting of the used part of the interfaces of any objects on which they depend.
The types of interfaces we use has an influence on the robustness of our tests. The use of Back Door Manipulation (page X) to set up the fixture or verify the expected outcome or a test can result in Overcoupled Software (see Fragile Test on page X) that needs more frequent test maintenance. Overuse of Behavior Verification (page X) and Mock Objects (page X) can result in Overspecified Software (see Fragile Test) and tests that are more brittle and that may discourage developers from doing desirable refactorings.
When all choices are equally effective we should use round trip tests to test our system under test (SUT). To do this, we test an object through it's public interface and use State Verification (page X) to determine whether it behaved correctly. If this is not sufficient to accurately describe the expected behavior we can make our tests layer-crossing tests and use Behavior Verification to verify the calls the SUT makes to depended-on components (DOCs). If we must replace a slow or unavailable DOC with a faster Test Double (page X), using a Fake Object (page X) is preferable because it encodes fewer assumptions into the test (the only assumption being that the component that the Fake Object replaces is actually needed.)
Principle: Communicate Intent
Also known as: Higher Level Language, Single Glance Readable
Fully Automated Tests, especially Scripted Tests (page X), are programs. They need to by syntactically correct to compile and semantically correct to run successfully. They need to implement whatever detailed logic is required to put the SUT into the appropriate starting state and to verify that the expected outcome has occurred. While these characteristics are necessary, they are not sufficient because they neglect the single most important interpreter of the tests: the test maintainer.
Tests that contain a lot of code (anything more than about ten lines is getting to be too much) or Conditional Test Logic (page X) are usually Obscure Tests (page X). They are much harder to understand because we need to infer the "big picture" from all the details. This takes extra time each time we need to revisit the test either to maintain it or to use the Tests as Documentation. This increases the cost of ownership of the tests and reduces their return on investment.
Tests can be made easier to understand and maintain if we Communicate Intent. We can do this by calling Test Utility Methods (page X) with Intent Revealing Names[SBPP] to set up our test fixture and to verify that our expected outcome has been realized. It should be readily apparent within the Test Method (page X) how the test fixture influences the expected outcome of each test: which inputs result in which outputs. A rich library of Test Utility Methods also makes tests easier to write because we don't have to code the details in every test.
Principle: Don't Modify the SUT
Effective testing often requires us to replace a part of the application with a Test Double or override part of its behavior using a Test-Specific Subclass (page X). This may be because we need to get control of its indirect inputs or to do Behavior Verification by intercepting its indirect outputs. It may also be because parts of its behavior have unacceptable side-effects or dependencies that are impossible to satisfy in our development or test environment.
Modifying the SUT is a dangerous thing whether we are putting in test "hooks", overriding behavior in a Test-Specific Subclass or replacing a DOC with a Test Double because we may no longer actually be testing the code we plan to put into production.
We need to ensure that we are testing the software in a configuration that is truly representative of how it will be used in production. If we do need to replace something it depends on to get better control of the context surrounding the SUT, we must make sure that we are doing so in a representative way. Otherwise, we may end up replacing part of the SUT that we think we are testing. Suppose, for example, that we are writing tests for objects X, Y and Z where object X depends on object Y which depends on object Z. When writing tests for X, it is reasonable to replace Y and Z with a Test Double. When testing Y, we can replace Z with a Test Double but when testing Z, we cannot replace it with a Test Double because Z is what we are testing! This is particularly salient when we have to refactor the code to improve testability.
When we use a Test-Specific Subclass to override part of the behavior of an object to allow testing, we have to be careful that we only override those methods that the test specifically needs to null out or inject indirect inputs. If we choose to reuse a Test-Specific Subclass created for another test, we must ensure that it does not override any of the behavior that this test is verifying.
Another way of looking at this principle is as follows: The term SUT is relative to the tests we are writing. In our X uses Y uses Z example, the SUT for some component tests might be the aggregate of X, Y and Z while for unit testing purposes, it might be just X for some tests, just Y for other tests and and just Z for yet other tests. Just about the only time we consider the entire application to be the SUT is when we are doing user acceptance testing using the user interface and going all the way back to the database. Even here, we might only be testing one module of the entire application (e.g., the "Customer Management Module"). So, SUT rarely equals "application".
Principle: Keep Tests Independent
Also known as: Independent Test
When doing manual testing, it is common to have long test procedures that verify many aspects of the SUT's behavior in a single test. This is necessary because the steps involved in setting up the starting state of the system for one test may be simply a repetition of the steps used to verify other parts of it's behavior. When tests are executed manually, this repetition is not cost-effective. As well, human testers have the ability to recognize when a test failure should preclude continuing execution of the test, when it should cause certain tests to be skipped or when the failure is irrelevant.
If tests are interdependent and (even worse) order dependent, we will be depriving ourselves of the useful feedback test failures provide. Interacting Tests (see Erratic Test on page X) tend to fail in a group. The failure of a test that moved the SUT into the state required by the dependent test will lead to the failure of the dependent test too. With both tests failing, how can we tell if it is because of a problem in code that both rely on in some way or is it a problem in code that only the first relies on. With both tests failing we can't tell. We are only talking about two tests here. Imagine how much worse this is with tens or hundreds of tests.
An Independent Test can be run by itself. It sets up its own Fresh Fixture (page X) to put the SUT into a state that lets it verify the behavior it is testing. Tests that build a Fresh Fixture are much more likely to be independent than tests that use a Shared Fixture (page X). The latter can lead to various kinds of Erratic Tests including Lonely Tests, Interacting Tests and Test Run Wars. With independent tests, unit test failures give us Defect Localization to help us pinpoint the source of the failure.
Principle: Isolate the SUT
Some pieces of software depend on nothing but the (presumably correct) runtime system or operating system. Most pieces of software build on other pieces of software developed by us or by others. When our software depends on other software that may change over time our tests may suddenly start failing because the behavior of the other software has changed. I call this problem Context Sensitivity (see Fragile Test), a form of Fragile Test.
When our software depends on other software whose behavior we cannot control, we may find it difficult to verify that our software behaves properly with all possibly return values. This is likely to lead to Untested Code (see Production Bugs on page X) or Untested Requirements (see Production Bugs). To avoid this we need to be able to inject all possible reactions of the dependency into our software under complete control of our tests.
Whatever application, component, class or method we are testing, we should strive to isolate it as much as possible from all the other parts of the software that we are choosing not to test. This allows us to Test Concerns Separately and allows us to Keep Tests Independent of each other. It also allows helps us achieve Robust Test by reducing the likelihood of Context Sensitivity caused by too much coupling between our SUT and the software that surrounds it.
We can achieve this by designing our software so that each piece of depended-on software can be replaced with a Test Double using Dependency Injection (page X) or Dependency Lookup (page X) or overridden with a Test-Specific Subclass that gives us control of the indirect inputs of the SUT. This makes our test more repeatable and robust.
Principle: Minimize Test Overlap
Most applications have lots of functionality to verify. Proving the functionality all works correctly in all the combination and interaction scenarios is pretty much impossible. Therefore, picking the tests to write is an exercise in risk management.
We should structure our tests so that we have as few tests as possible depending on a particular piece of functionality. This may seem counter-intuitive at first because one would think that we would want to improve test coverage by testing the software as often as possible. Unfortunately, tests that verify the same functionality most typically fail at the same time. They also tend to need the same maintenance when the functionality of the SUT is modified. Having several tests verify the same functionality is likely to increase test maintenance costs and won't likely improve quality very much.
We do want to ensure that all the test conditions are covered by the tests that we do have. Each test condition should be covered by exactly one test, no more, no less. If it seems to provide value to test several different ways, we may have identified several different test conditions.
Principle: Minimize Untestable Code
Some kinds of code are difficult to test using Fully Automated Tests. GUI components, multi-threaded code and Test Methods come immediately to mind as "untestable" code. The problem all these kinds of code share is being embedded in a context that makes it hard to instantiate or interact with them from automated tests.
Untestable code simply won't have any Fully Automated Tests to protect it from those nefarious little bugs that creep into our code when we aren't looking. That makes it harder to refactor safely and more dangerous to modify to introduce new functionality.
It is highly desirable to minimize the amount of untestable code that we have to maintain. We can refactor the untestable code to improve its testability by moving the logic we want to test out of the class that is causing the lack of testability. For active objects and multi-threaded code we can refactor to Humble Executable (see Humble Object on page X) while for user interface objects we can refactor to Humble Dialog (see Humble Object). Even Test Methods can have much of their untestable code extracted into Test Utility Methods that can be tested.
When we Minimize Untestable Code, we improve the overall test coverage of our code and in so doing we also improve our confidence in the code and our ability to refactor at will. The fact that it improves the quality of the code is another benefit.
Principle: Keep Test Logic out of Production Code
Also known as: No Test Logic in Production Code
When the production code hasn't been designed for testability (whether a result of test-driven development or otherwise), we may be tempted to put "hooks" into the production code to make it easier to test. These hooks typically take the form of if testing then ... and may either run alternate logic or may prevent certain logic from running.
Testing is about verifying the behavior of a system. If the system behaves differently when under test then how can we be certain that the production code actually works? Even worse, the test hooks could cause the software to fail in production!
The production code should not contain any conditional statements of the "if testing then" sort. There should be no test logic in production code. A well designed system (from a testing perspective) is one that allows for the isolation of functionality. Object-oriented systems are particularly amenable to testing since they are composed of discrete objects. Unfortunately even object-oriented systems can be built in such a way as to be difficult to test and we still encounter code with embedded test logic.
Principle: Verify One Condition per Test
Also known as: Single Condition Test
Many tests require a starting state other than the default state of the SUT and many operations of the SUT leave it in a different state from that in which it started. There is a strong temptation to reuse the end state of one test condition as the starting state of the next by combining them into a single test because this makes it more efficient. This is not recommended because when one assertion fails, the rest of the test is not executed. This makes it hard to achieve Defect Localization.
Verifying multiple conditions in a single test makes sense when executing tests manually because of the high overhead of test setup and because the liveware can adapt to test failures. It is too much work to set up the fixture for a large number of tests manually so they tend to write long multi-condition tests. They also have the intelligence to work around any issues they encounter so all is not lost if a single step fails. With automated tests, a single failed assertion will cause the test to stop running and the rest of the test will provide no data on what works and what doesn't.
Each Scripted Test should verify a single test condition. This is possible because the test fixture is set up programmatically rather than by a human. Programs can set up fixtures very quickly and they don't have trouble executing exactly the same sequence of steps hundreds of times! If several tests need the same test fixture, we can either move the Test Methods onto a single Testcase Class per Fixture (page X) so we can use Implicit Setup (page X) or we can call Test Utility Methods to set up the fixture using Delegated Setup (page X).
We design each test to have four distinct phases (see Four-Phase Test on page X) that are executed in sequence. The four parts are fixture setup, exercise SUT, result verification and fixture teardown.
- In the first phase, we set up the test fixture (the "before" picture) that is required for the SUT to exhibit the expected behavior as well as anything you need to put in place to be able to observe the actual outcome (such as using a Test Double.)
- In the second phase, we interact with the SUT to exercise whatever behavior we are trying to verify. This should be a single, distinct behavior; if we try to exercise several parts of the SUT, we are not writing a Single Condition Test.
- In the third phase, we do whatever is necessary to determine whether the expected outcome has been obtained.
- In the fourth phase, we tear down the test fixture to put the world back into the state in which we found it.
Note that there is a single exercise SUT phase and a single verify outcome phase. We do not have a series of such alternating calls (exercise, verify, exercise, verify) because that would be verifying several distinct conditions that would be better done in distinct Test Methods.
One possibly contentious aspect of Verify One Condition per Test is what we mean by "one condition". Some test drivers insist on one assertion per test. This insistence may be based on using a Testcase Class per Fixture organization of the Test Methods and naming each test based on what the one assertion is verifying(E.g. AwaitingApprovalFlight.validApproverRequestShouldBeApproved.). Having one assertion per test makes such naming very easy but it does lead to many more test methods if we have to assert on many output fields. Of course, we can often comply with this interpretation by extracting a Custom Assertion (page X) or Verification Method (see Custom Assertion) that allows us to reduce the multiple assertion method calls into one. Sometimes that makes the test more readable but when it doesn't, I wouldn't be too dogmatic about insisting on a single assertion.
Principle: Test Concerns Separately
The behavior of a complex application is made up of the aggregate of a large number of smaller behaviors. Sometimes, several of these behaviors are provided by the same component. Each of these behaviors is a different concern and may have a significant number of scenarios in which it needs to be verified.
The problem with testing several concerns in a single Test Method is that it will be broken whenever any of the tested concerns is modified. Even worse, it won't be obvious which concern is the one at fault. This typically requires Manual Debugging (see Frequent Debugging on page X) because of the lack of Defect Localization. The net effect is that more tests will fail and each test will take longer to trouble-shoot and fix. Because we are testing several concerns in the same test, it will be harder to "tease apart" the eager class into several independent classes that each implement a single concern because the tests will need extensive redesign.
Testing our concerns separately of each other allows a failure to tell us that we have a problem in a specific part of our system rather than simply telling us that we have a problem somewhere. They also make it easier to understand the behavior now and to separate the concerns in subsequent refactorings because we should just be able to move a subset of the tests to a different Testcase Class (page X) that verifies the newly created class; it shouldn't be necessary to modify the test much more than changing the class name of the SUT.
Principle: Ensure Commensurate Effort and Responsibility
The amount of effort it takes to to write or modify tests should not exceed the effort it takes to implement the corresponding functionality. Likewise, the tools required to write or maintain the test should require no more expertise than the tools used to implement the functionality. As an example, if we can configure the behavior of a SUT using metadata and we want to write tests that verify that the metadata is set up correctly, we should not have to write code to do it. A Data-Driven Test (page X) would be much more appropriate.
What's Next?
In previous chapters we have covered off the common pitfalls (in the form of test smells) and goals of test automation. This chapter made the value system we use while choosing patterns explicit. In the Test Automation Strategy narrative I start examining the "hard to change" decisions that we should try to get right early in our project.
Copyright © 2003-2008 Gerard Meszaros all rights reserved