"Just re-run the build - it should go green"
Do you hear this on your project? Are your tests flaky? Does the build change from green to red and back again without any real explanation? What are you doing about it?
Recently I joined a project that had this problem. It's quite normal to encounter it and it's quite common to get around it by just running the test again. Some teams had actually added scripts to pick up failed tests and run them again automatically at the end of the same build, in case they pass the second time around.
Flaky test or flaky system?
But how can you tell the difference between a flaky test and a flaky system? We certainly couldn't - it turned out that the system was flaky and our users experienced problems. This is not an easy problem to solve. Here are two approaches that helped us:
1. Sniff out the non-deterministic tests:
You should attempt to work out whether the problem is in the test code or the app code. Read Martin Fowler's bliki on Non-deterministic tests. Think about your own test suites - do you have any non-deterministic tests? You probably do unless you've been very careful. You might try to deconstruct your flaky tests and implement them at a lower level. You may lose a certain degree of coverage but the results of the test will be reliable. At the very least, you could watch the logs in real-time and where appropriate, check the UI of the test machine as the steps are running to see if they're behaving as expected.
2. Don’t ignore them, fix them.
Understanding your problem doesn't make it go away. You need to leave adding new features by the wayside to go and fix this stuff. Using the head-in-the-sand approach, our functional test suite quickly descended from being mostly green to mostly red. The day we had 13 out of 17 builds go red was the last straw and we decided more drastic action needed to be taken. What was our action? Simply having better discipline to follow the process we already had in place. We decided that we'd remove the option to re-run the suite without a code change, and we reinforced the rule that you were only allowed to push on a red build if what you were pushing was intended to fix the build.
Sure enough, by the end of the next day there were 7 people crowded around the build machine trying to fix these flaky build. As a result we discovered some badly written tests and identified six genuine issues with the application. The build light was still red, but instead of crossed fingers we had six juicy, clearly defined bugs to fix.