Long long ago, so long ago, no one knows how long ago, but still not all that long ago, when developers treated test suites like web pages with static content and were manually maintained.
Tom and Jerry are software developers working on a large J2EE project for a bank. All the developers have been using TDD. The team has already written around 1500 unit tests, 200 integration tests and 300 acceptance tests. The team is using JUnit for unit and integration tests and Fitnesse for acceptance tests. Currently it takes 2 mins to run all the unit tests, 3–4 mins to run all the integration tests and 5–6 mins to run all the acceptance tests.
Discovering the problem:
Tom and Jerry are pairing on a 4 hour task. They are happily TDDing by writing a unit test, then writing just enough code to make the test pass, then refactoring and so on. Each time they want to validate that everything is still working fine, they execute all the unit tests. This takes about 2 mins and 2 mins seems to be a long time to wait.
So Jerry suggests that they should create different unit test suites. The idea is to split the tests into multiple suites, along cross functional boundaries. This will maximize the feedback cycle by running the most relevant tests and minimize the cost of running least relevant tests. This will ensure they don‘t wait for 2 mins each time they change code and still be sure that they did not break anything. Tom likes the idea. They start creating test suites and only running the test suite under which they have the unit test to validate their code changes. Now the feedback time drops to 20 secs. Occasionally there are cases when one or two tests might break outside this suite. But they are fine with it because the static tests suites save them a lot of time spent waiting and it helps them to go faster.
Each time they make a major change to the code they run all the unit test suites. Waiting for 2 mins once in a while is fine with them. Before checking-in the code changes, they run all tests [all unit test suites, integration tests and acceptance tests] and wait for 10–11 mins for the feedback. If all the tests work, they check-in the code.
A year goes by. Tom and Jerry are pairing again on a 6 hour task. Since the code has grown, each test suite takes about 2 mins to execute. There are more and more cases where tests outside the suite fail due to changes in one place. Jerry is not comfortable with the feedback given by just running one test suite, and so they often end up running all the unit test suites anyway. After spending a lot of time creating and maintaining these static test suites, they give up. They need a better way to know which tests to run and in what order to execute them. There is a constant race between accuracy v/s time. Is there a way to be fairly accurate and still not pay the cost of waiting?
After working on the project for 2 years, Jerry realized that for a given change in code not every test gives him the feedback he needs. A lot of time is spent waiting for unnecessary tests to execute which have nothing to do with the code change just made. Creating static test suites worked for a while but after that he was back to square one. Because his test suite had grown so big, it did not give him the feedback he needed in a timely fashion. In an ideal world he would not have any test suites; he could just run all the unit tests and get complete feedback immediately. Even though all the unit tests are pure unit tests that do not talk to any external systems [File system, network, database, etc], the sheer number of tests means there will be a long waiting time to run all the tests all the time. It would be great if there was an intelligent way of finding out which tests matter and only running those tests in the order of importance. This applies also for other types of tests like integration tests and acceptance tests.
So Jerry spoke to a few developers on the team who shared the same frustration. They come up with an idea to create a tool which can give them an ordered list of tests for a given set of code changes. Executing the tests in this order would give them the feedback they need in an acceptable time frame. This tool would help their tests to fail fast. Instead of waiting for 5 mins to find out a failing unit test, the tool will execute this test right at the beginning and let them know if it fails.
As they start talking, they realize that there are different complimentary ways of building this test suite dynamically at run time. They could use the knowledge from previous test runs to find out which all tests had failed. They can figure out what classes were changed and find out what tests are affected by those changes. They can run the faster tests before the slower ones and so on.
Live happily ever after:
Today, Tom and Jerry use ProTest (Prioritized Test) to execute all their tests written in JUnit and Fitnesse. They are able to make any code change, run the tests and get the feedback in less than 10 secs. ProTest maintains a history of test runs and knows what code changes affect which tests. ProTest is able to run the mostly-likely-to-fail tests first. Tom and Jerry are able to tell ProTest what level of feedback they want. For Ex: If they are making a small change to a few classes. They set ProTest to give top 10% feedback. ProTest only runs the top 10% tests from the dynamic suite it built. If they make a significant change in an architectural component, they set ProTest to give 100% feedback. ProTest will run all tests from the dynamic suite in the order based on its failure ranking for each tests. Finally they have a tool which can learn about the test execution order and pattern. It executes just enough tests to give the required feedback.
[Welcome to reality]
Dennis Byrne, Kent Spillner, myself and a few other from ThoughtWorks are working on ProTest, an open source project. At the Agile Alliance Functional Testing Tool Visioning Workshop in 2007, I gave a talk about ProTest.
Dennis Byrne has a blog on the same topic: TDD Re-visted.