The recent “Is TDD Dead?” hangout between DHH, Martin Fowler and Kent Beck has made the level of dissatisfaction about too much mocking and stubbing when writing automated tests pretty clear. DHH expressed his strong opinions about fundamentalism around tests that can’t access collaborators. This has also been a key point in Martin’s recent post. The three of them claimed to “mock almost nothing”.
The “Can you sleep at night?” test
Kent Beck emphasized, that at the end of the day as developers/programmers, it’s our job to make sure we sleep at night knowing we didn’t break anything. Very different from early days when developers would just commit code and wait for a different group of people - the testers - to make sure nothing had been broken. This is one of the main purposes of an automated suite of tests, regardless of whether it has been written in a “test first” way or after the code. The objective of the tests is to make sure things are working and that the new code doesn’t cause any problems with the existing code. We do not get this level of confidence if we have a test that mocks or stubs all its collaborators. I’ll explain why.
Why is mocking/stubbing dangerous?
Mocks and Stubs are not evil per se. However, as with any other tool, it’s the way we use them that makes them bad. The problem is not when we stub or mock external dependencies like web service calls or other integration points. The main issues emerge when we isolate our code from our own code. Tautological TDD is an anti-pattern that explains some of the reasons why mocking and stubbing too much is dangerous. It was said during the hangout, “If I use TDD I can refactor”. But, what happens if your test is too white box and knows so much about how things are implemented that if you refactor something, your test fails and you have to refactor your test? It entirely defeats the purpose of having a test to make sure that you didn’t break anything with your changes.
When testing the controller, mockists would decide to isolate the model, and then mock or stub the method errors on the model. Ruby, dynamic as it is, and testing frameworks like Mocha, allow us to stub/mock the model.
If you pay attention, the method in the model is called errors (plural). However, the controller has a problem because it’s calling it in singular error. Which means that the code is wrong! But…the test passes! What do I have here? A false positive. A green test giving the developer some sense of security, when their code is actually wrong. Not only could the name of the methods be wrong, but also their interface. Recently, after upgrading a dependency, I discovered that a method that was being “stubbed” in several unit tests changed its return contract: instead of returning nil it returns an empty array, for when there are no errors. And, once again, our tests are green, but the code is broken. Furthermore, not only names and interfaces, but also most importantly the behavior of the code could be wrong. Wrong methods could be called, but if the test is too white box and only checks for interactions, the test will pass and the developers will think they’re sleeping well and safe at night, when their code is broken, sometimes already in production, where problems will be caught by real users. Believe me, I’ve seen this happen, have you?
So what’s the purpose of these tests that only look for collaborator’s interactions and mocks more than what it needs to? It’s better not to have them; at least we would have known we had to test this behaviour from a different level. The 3 gurus vigorously discussed this topic as well: What is a Unit?
How do you define a “Unit”?
The unit to be tested is the entire point of confusion and debate. As Martin pointed out “Object-oriented design tends to treat a class as the unit, procedural or functional approaches might consider a single function as a unit”, when the unit is actually a behavior. It has to be up to the developers, without any fundamentalism, to determine what a unit is. Think about the depth of your tests (DOT) and make sure you make them shallow enough that you can test one behavior. The image below illustrates the concept of DOT.
It’s not necessarily a class or a method or a function, but it’s whatever YOU, as the developer writing the code and the test, decide it to be based on your design and your boundaries. And, of course, if after determining the depth of your test you still need some stubs or mocks underneath its boundaries, go for it. But let’s stop mocking and stubbing just because we can or think we have to.