Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.
"Just re-run the build - it should go green"
Do you hear this on your project? Are your tests flaky? Does the build change from green to red and back again without any real explanation? What are you doing about it?
Recently I joined a project that had this problem. It's quite normal to encounter it and it's quite common to get around it by just running the test again. Some teams had actually added scripts to pick up failed tests and run them again automatically at the end of the same build, in case they pass the second time around.
But how can you tell the difference between a flaky test and a flaky system? We certainly couldn't - it turned out that the system was flaky and our users experienced problems. This is not an easy problem to solve. Here are two approaches that helped us:
You should attempt to work out whether the problem is in the test code or the app code. Read Martin Fowler's bliki on Non-deterministic tests. Think about your own test suites - do you have any non-deterministic tests? You probably do unless you've been very careful. You might try to deconstruct your flaky tests and implement them at a lower level. You may lose a certain degree of coverage but the results of the test will be reliable. At the very least, you could watch the logs in real-time and where appropriate, check the UI of the test machine as the steps are running to see if they're behaving as expected.
Understanding your problem doesn't make it go away. You need to leave adding new features by the wayside to go and fix this stuff. Using the head-in-the-sand approach, our functional test suite quickly descended from being mostly green to mostly red. The day we had 13 out of 17 builds go red was the last straw and we decided more drastic action needed to be taken. What was our action? Simply having better discipline to follow the process we already had in place. We decided that we'd remove the option to re-run the suite without a code change, and we reinforced the rule that you were only allowed to push on a red build if what you were pushing was intended to fix the build.
Sure enough, by the end of the next day there were 7 people crowded around the build machine trying to fix these flaky build. As a result we discovered some badly written tests and identified six genuine issues with the application. The build light was still red, but instead of crossed fingers we had six juicy, clearly defined bugs to fix.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.
Thoughtworks acknowledges the Traditional Owners of the land where we work and live, and their continued connection to Country. We pay our respects to Elders past and present. Aboriginal and Torres Strait Islander peoples were the world's first scientists, technologists, engineers and mathematicians. We celebrate the stories, culture and traditions of Aboriginal and Torres Strait Islander Elders of all communities who also work and live on this land.
As a company, we invite Thoughtworkers to be actively engaged in advancing reconciliation and strengthen their solidarity with the First Peoples of Australia. Since 2019, we have been working with Reconciliation Australia to formalize our commitment and take meaningful action to advance reconciliation. We invite you to review our Reconciliation Action Plan.