Debugging Part Two: make it fail, show you’re in control
Debugging can be hard – everyone wants to spend less time debugging and more time writing new code. That’s why we’re writing tools to make your debugging life easier. But it’s also why we’ve put together this series of posts sharing everything we know about debugging tools, tips and tricks so you can spend more time creating and less time debugging.
Lesson 2: Sometimes you have to break something to fix something
Bear with me. Here’s what I mean…
Debugging a difficult problem (especially in a complex system) can lead to multiple ideas about what the underlying cause is, which in turn can lead to lots of speculative changes, not all of which have the desired effect. And sometimes they have no effect at all.
It’s surprisingly common to see people make changes which may not be having any real effect on the system. Often in complex systems or when responsibility isn’t clear, one engineer may find themselves making a change which might have the desired effect. Or, equally, it might have no effect at all.
It can seem funny in the first few minutes, but I’ve seen programmers (especially tired programmers) realise that after hours of debugging nothing has changed because their changes were having no effect.
Even programmers who have years of experience can find themselves making small changes to configuration files or included libraries or devices only to notice… nothing. I’ve seen programmers change the version of a library, or tweak the build files or any number of other guesses but nothing changes.
Nothing changes because you’re not actually changing anything. At least, not anything relevant.
It’s surprisingly easy to have the wrong Python version, or the wrong build step, or cached config or any number of things trick you into thinking that the changes you’re making are having an effect. But this needn’t be the way…
Here’s how to avoid it…
To fix something, break something first
The first fact you need to establish about any system that you want to debug is that you have control of it. If you have no control, you have no hope of debugging it.
Breaking it is a cunning – and often surprisingly easy – way of confirming you have control of it. It often requires little detailed knowledge of how the system should work but can save you hours of fruitless editing. Make the system die, crash, output errors, mess up data … anything just to show that your changes are having an effect.
Make it crash, throw exceptions, kill the process, exit and end. Calling methods that don’t exist works in certain languages. Chuck in syntax errors. Delete important database records.
For a new codebase where you haven’t got the mental model of how things fit together and you can’t reason about what’s going on, this can let you zero in much faster.
For scripting languages such as Ruby and Python, which are often used as part of a larger suite of programs, the humble syntax error can be used to confirm that the script you’re looking at is the one that is being run.
Why not just log information?
Logs are often the first source of debugging information but getting information into a log and reading it out often requires running the entire process. This is where even experienced programmers lose time. They litter the code with printf() statements, run the code and then scour the logs. For a long running process, just stopping is the best debugging feedback.
Of course, programmers with breakpoint and reverse/time-travel debuggers will be able to pause the code while it’s executing, avoiding having to run the code to the end.
You shouldn’t have gotten here in the first place
Ideally you’ll have code coverage reports, automated tests, breakpoint debugging and reverse debugging, which together allow you to zero in on the part of the code which is being run so you can start reasoning about what steps to take next if you are debugging the right way.
If you don’t have these tools (which you should!), then breaking something to fix something is a fine strategy.