This post has been sitting in my drafts pile for far too long. I wrote an outline for this post nearly a year ago! Then Janet Gregory and Lisa Crispin recently made a post about flaky tests which reminded me I still had this post awaiting my attention.
I decided it’s time to pick this back up and finish writing my guide to handling flaky tests.

Test flakiness is frustrating for everyone. But before we write it off as a flaky test and delete it, we need to be sure of one thing. Is it the test or the application that is flaky?

Our test failing intermittently could be a sign of an intermittent issue in our application and so warrants investigation just as much as any other test failure. For this reason, I encourage you to stop referring to them as flaky tests. There are plenty of alternatives, inconsistent results, unstable result, and my favourite, fluctuating results.

How to handle fluctuating results

Ultimately, whatever you call them, the process is the same. We need to stop running them until we get to the bottom of why the results are inconsistent, then fix it. Below I’ll provide a high-level guide for how I recommend teams approach automated tests with fluctuating results.

Disable The Test

If the test isn’t providing reliable results, the first thing we need to do is disable it. The last thing we want is it failing intermittently causing people to lose faith in the accuracy of our automated tests. Even worse, they may begin ignoring failing tests and miss a real issue. So, disable any tests with fluctuating results straight away.

Review and Document the Issue

I recommend that all tests with fluctuating results get at least a preliminary review straight away. This offers an opportunity to identify and resolve minor issues quickly.

When we can’t resolve the issue immediately, we must document it. This means we need to capture as much detail as possible about the fluctuating results and what our initial review uncovered. This allows for the next stage, prioritising the fix.

Prioritise The Fix

Now that we have disabled the test and documented our findings, we need to prioritise the fix. To do this we need to understand the risk. If not having test coverage of this area of the application is high risk, then we need to prioritise the fix far more greatly than we would if the lack of coverage is low risk.

What makes it high or low risk? There are many factors. Is this an area of the system that is undergoing regular change? Does it belong to legacy code that received no updates? Would there be compliance, regulatory, or reputational issues if this area of the system was not working or inaccurate? All these and more contribute to the risk.
Understanding these questions, and more, are crucial for correctly prioritizing the fix, which might necessitate collaboration with product or other teams.

Review The Test

Now we need to get to the bottom of the fluctuating results. To do this we need to first understand how the results vary when a test passes and fails. Then we need to understand if this is intended system behaviour or not. It could be that there is a subtle difference in system behaviour based on things like time of day etc. If this is the case, we need to ensure that our tests allow for any such expected difference.

Ultimately there could be a wide range of reasons why we are receiving fluctuating results, far more than I could possibly cover in a single post. What we are trying to determine at this stage is whether the application is behaving as expected or not. This may require documenting all of the observed behaviours and validating them with a product owner.

If you do uncover an issue with the application, then raise a bug! I recommend creating a linked ticket for the test. You can then review this for re-enabling or updating your test once the issue is resolved.

When fixing the app is low priority

In some cases, the team may consider the uncovered issue a low priority and won’t address it quickly. In these situations, you have 3 options.

  1. Forget about it – Sometimes you have done your job by raising it, and the rest sits with other people. We never like a defect to go unresolved, but if it’s out of your hands, you might not be able to do anything about it.
  2. Fix the app yourself – There is nothing that says a Quality Engineer can’t write production code. If the fix is something you are confident with and won’t take up too much of your time, then discuss it with your team and if they are onboard, have at it!
  3. Patch your test – This is where we update our test to be resilient to the failing scenario. It allows us to continue to get some value from the test, without the fluctuating results. This may not always be possible. However, in scenarios where it is possible, it may be desirable to do so.

Summary

Diagnosing and understanding whether tests with fluctuating results are caused by the test code or the application undertest can be a frustrating process. Especially when it feels like there is a time pressure to do so. By taking a step back, following the above steps to ensure you review and prioritise the risk accordingly. That removes the pressure and ensures that you are able to focus on the next most important thing, whatever that may be.

Further reading

If you enjoyed this post then be sure to check out my other posts on Quality and Testing.

Subscribe to The Quality Duck

Did you know you can now subscribe to The Quality Duck? Don’t miss a post – get them delivered directly to your mailbox whenever I create a new one. Don’t worry, you won’t get flooded with emails; I post at most once a week.