Technical Paper: Time Travel Debugging
Understand complex code and fix bugs faster
Overview
Over the last decade, software development has become mind-blowingly complex – as have the challenges related to finding and fixing bugs.
Two problems make debugging increasingly difficult:
- Complexity hinders developers’ ability to understand what the code is doing, and efficiently fix bugs. Complex architectures and large teams mean that developers often have to troubleshoot bugs that have little or nothing to do with the code they created.
- 91% of developers admit to having software defects which are unresolved because they cannot reproduce them (see: an analyst’s study on software reliability).
Traditional methods of debugging and analyzing code are no longer sufficient for the challenges of understanding complex codebases and debugging modern applications.
This technical paper explores what every software engineer working on complex codebases (e.g. multithreaded or multiprocess programs) should know about time travel debugging.
It goes on to outline how upgrading to time travel debugging (TTD) can save you time debugging – enabling you to get changes into the pipeline faster, complete your code deliverables on time, and resolve customer-reported defects in hours, not weeks.
▶︎ This is an excerpt from our technical paper “Time Travel Debugging”. Download below.
Debugging is a costly productivity killer
As the world becomes increasingly dependent on software, finding and fixing software failures in complex systems has moved from being an inconvenience to a major problem. Delays in shipping code because of a growing backlog of bugs push back product releases and negatively impact engineering productivity and innovation.
While tools exist to prevent bugs when coding, there has been little innovation in tools that help with debugging once these bugs surface.
In addition to the productivity cost of excessive lengths of time spent debugging, software errors that cause failures can also cost businesses in other ways.
So how are developers currently going about the process of debugging?
Traditional debugging options
There is a range of traditional options and approaches currently available to developers to help diagnose errors in a codebase.
Why it’s time to transform the way we debug
Traditional debuggers can be painful and inefficient
The most inefficient part of traditional debugging stems from only being able to work forward toward the point of the crash or error. This way of working is slow and results in the need to repeatedly restart the application and work through a complex series of steps to get back to the area of the point of failure.
The time cost of debugging this way can be high. A lot of reproducing the issue over and over again, a lot of stepping-in and lots of “Oops, I’ve stepped over too far. Now I have to restart.”
Working this way is time consuming and inefficient; just at a point when a developer is under pressure to simply fix a bug as quickly as possible.
Time travel debugging is a game changer
Time travel debugging (TTD) is the ability to wind back the clock to any point in an application’s execution and see exactly what it was doing. Integral to TTD is the ability to reverse debug through program execution history.
This transformative debugging capability allows developers to observe and understand the precise conditions that led to a specific bug. By simply letting them rewind the programmatic execution path directly back to the root cause, TTD accelerates finding and fixing bugs. It is also a powerful way for developers to learn about and navigate an unfamiliar codebase.
Take an example use-case of tracking down some corrupted memory. With time travel debugging, a developer can put a watchpoint (aka data breakpoint) on the variable that contains bad data, and run backward to go straight to the line of code in the thread that most recently modified it.
This “direct to root cause” approach accelerates debugging by eliminating the need for trial and error and repeatedly restarting the program with different breakpoint locations. It reduces multiple iterations down to one loop.
Types of bugs that time travel debugging can resolve quicker
Time travel debugging is really powerful for:
• Any bug where time passes between the bug occurring and the symptoms presenting themselves, i.e. assertion failures, segmentation faults, or simply bad results being produced.
• Any bug which occurs intermittently or sporadically – for example, a bug that occurs one time in a thousand, or occurs in a different way on each run of the program. Bugs tend to manifest in this way when the program’s execution is non-deterministic due to multithreading and/or interaction with other processes and services.
The following are some common scenarios that TTD can help resolve more quickly and more efficiently than other debugging methods.
- Race conditions
- Memory corruption
- Segmentation faults (aka segfault)
- Stack corruption
- Memory leaks
- Long run times
- Frequently called functions
- Dynamic code
- The key benefits of using time travel debugging
- Reveal the root cause of bugs that other methods of debugging cannot reach
- Boost developer productivity
- Increased observability and understandability
The key benefits of using time travel debugging
The key benefits of using time travel debugging
- Reveal the root cause of bugs that other methods of debugging cannot reach
- Boost developer productivity
- Increased observability and understandability
Introducing LiveRecorder
LiveRecorder makes bugs 100% reproducible with time travel debugging.
LiveRecorder lets developers start debugging test failures instantly. LiveRecorder provides a one-click workflow from a test failure to a time-travel debugger placed exactly at the point of failure – skipping the tedious steps usually required to reproduce the problem.
Developers working on complex C/C++, Go, and Java software can now save a huge amount of time diagnosing the root causes of new regressions, legacy bugs, and flaky tests. Bugs that took days or weeks to isolate can now be resolved in hours.
Three-step method:
- Record CI / System Test failures to capture the execution of failing test runs, including intermittent failures; store the recordings for later analysis and cross-team collaboration.
- Replay recordings: jump from the test failure (or bug report) straight into a ready-to-go, fully set up, debug session in your web browser.
- Resolve bugs fast by tracing from symptom to root cause in one cycle: go back to any point in the execution history to inspect application state (including contents of all the variables and the heap) and see exactly what your software did.
LiveRecorder boosts developer productivity
LiveRecorder fits in your existing development workflow – no software to install and no set-up required.
And once inside a recording in your VS Code interface (or in the command line), LiveRecorder incorporates the full functionality expected of modern debuggers (such as scripting, conditional breakpoints and watchpoints, full inspection of globals and locals).
It also allows these features to be used with the program running in reverse or forward.
▶︎ This is an excerpt from our technical paper “Time Travel Debugging”. Download below.