I’ve been struggling to use this principle in practice. I’ve read about it, heard it mentioned, heard it discussed and explained. But it only clicked when I stumbled into this example at work:
We needed to sync certain tasks from one tool to the other. But we needed to make sure that tasks were only sent once. There was to be no duplication at the receiving end. We also wanted to show a count of the tasks that were synced in the stats
As a solution, we decided to add a flag to our task in the database. And I wrote the following tests:
- Tasks are flagged as synced when synced
- Tasks that are flagged as already synced are not synced
- Stats count of synced equals tasks flagged as already synced
After a while I changed them to
- Syncing a task a second time does nothing
- Stats count of synced equals tasks flagged as already synced
But I still had this check of the flag in my test for the stats. And even worse, now it had no assurance that synced tasks would have the flag. So stats might be broken! It was bugging me pretty hard, and I considered reverting so it at least was consistent, and I considered just leaving it event though it bothered me. Because I was certain that it was much closer to testing behavior.
And then I realized I could do the same thing I did for the stats part, as I did for the sync part.
“What is the behavior I want to have?” For syncing I don’t want there to be flags. That’s not the end goal. It’s just an implementation detail. The end goal is having no duplicates, or not uploading the same task twice.
For the behavior I want for the stats it’s the same. I don’t want it to count the flags. It’s the same implementation detail. I want it to count the synced tasks, or increment by the amount of tasks uploaded.
So I ended up with these tests:
- Syncing a task a second time does nothing
- Syncing a task causes the stats to increment
Now I’m testing the behavior, not the implementation.