Technical Paper: Time Travel Debugging

Understand complex code and fix bugs faster

7 minute read

Overview

Over the last decade, software development has become mind-blowingly complex – as have the challenges related to finding and fixing bugs.

Two problems make debugging increasingly difficult:

  1. Complexity hinders developers’ ability to understand what the code is doing, and efficiently fix bugs. Complex architectures and large teams mean that developers often have to troubleshoot bugs that have little or nothing to do with the code they created.
  2. 91% of developers admit to having software defects which are unresolved because they cannot reproduce them (see: an analyst’s study on software reliability).

This is an excerpt from our technical paper "Time Travel Debugging". For the full paper click below to download.

Download the Technical Paper


Traditional methods of debugging and analyzing code are no longer sufficient for the challenges of understanding complex codebases and debugging modern applications.

This technical paper explores what every software engineer working on complex codebases (e.g. multithreaded or multiprocess programs) should know about time travel debugging.

It goes on to outline how upgrading to time travel debugging (TTD) can save you time debugging – enabling you to get changes into the pipeline faster, complete your code deliverables on time, and resolve customer-reported defects in hours, not weeks.

Debugging is a costly productivity killer

As the world becomes increasingly dependent on software, finding and fixing software failures in complex systems has moved from being an inconvenience to a major problem. Delays in shipping code because of a growing backlog of bugs push back product releases and negatively impact engineering productivity and innovation.

25–50%

While tools exist to prevent bugs when coding, there has been little innovation in tools that help with debugging once these bugs surface.

In addition to the productivity cost of excessive lengths of time spent debugging, software errors that cause failures can also cost businesses in other ways.

So how are developers currently going about the process of debugging?

Traditional debugging options

There is a range of traditional options and approaches currently available to developers to help diagnose errors in a codebase.

debugging options

Why it’s time to transform the way we debug

Traditional debuggers can be painful and inefficient

The most inefficient part of traditional debugging stems from only being able to work forward toward the point of the crash or error. This way of working is slow and results in the need to repeatedly restart the application and work through a complex series of steps to get back to the area of the point of failure.

The time cost of debugging this way can be high. A lot of reproducing the issue over and over again, a lot of stepping-in and lots of “Oops, I’ve stepped over too far. Now I have to restart.”

Working this way is time consuming and inefficient; just at a point when a developer is under pressure to simply fix a bug as quickly as possible.

Time travel debugging is a game changer

Time travel debugging (TTD) is the ability to wind back the clock to any point in an application’s execution and see exactly what it was doing. Integral to TTD is the ability to reverse debug through program execution history.

This transformative debugging capability allows developers to observe and understand the precise conditions that led to a specific bug. By simply letting them rewind the programmatic execution path directly back to the root cause, TTD accelerates finding and fixing bugs. It is also a powerful way for developers to learn about and navigate an unfamiliar codebase.

Time travel debugging quote

Take an example use-case of tracking down some corrupted memory. With time travel debugging, a developer can put a watchpoint (aka data breakpoint) on the variable that contains bad data, and run backward to go straight to the line of code in the thread that most recently modified it.

This “direct to root cause” approach accelerates debugging by eliminating the need for trial and error and repeatedly restarting the program with different breakpoint locations. It reduces multiple iterations down to one loop.

Types of bugs that time travel debugging can resolve quicker

Time travel debugging is really powerful for:

• Any bug where time passes between the bug occurring and the symptoms presenting themselves, i.e. assertion failures, segmentation faults, or simply bad results being produced.

• Any bug which occurs intermittently or sporadically – for example, a bug that occurs one time in a thousand, or occurs in a different way on each run of the program. Bugs tend to manifest in this way when the program’s execution is non-deterministic due to multithreading and/or interaction with other processes and services.

The following are some common scenarios that TTD can help resolve more quickly and more efficiently than other debugging methods.

  • Race conditions
  • Memory corruption
  • Segmentation faults (aka segfault)
  • Stack corruption
  • Memory leaks
  • Long run times
  • Frequently called functions
  • Dynamic code

The key benefits of using time travel debugging

  • Reveal the root cause of bugs that other methods of
    debugging cannot reach
  • Boost developer productivity
  • Increased observability and understandability
unfamiliar code

Introducing LiveRecorder


LiveRecorder makes bugs 100% reproducible with time travel debugging.

LiveRecorder lets developers start debugging test failures instantly. LiveRecorder provides a one-click workflow from a test failure to a time-travel debugger placed exactly at the point of failure – skipping the tedious steps usually required to reproduce the problem.

Developers working on complex C/C++, Go, and Java software can now save a huge amount of time diagnosing the root causes of new regressions, legacy bugs, and flaky tests. Bugs that took days or weeks to isolate can now be resolved in hours.

Three-step method:
  1. Record CI / System Test failures to capture the execution of failing test runs, including intermittent failures; store the recordings for later analysis and cross-team collaboration.
  2. Replay recordings: jump from the test failure (or bug report) straight into a ready-to-go, fully set up, debug session in your web browser.
  3. Resolve bugs fast by tracing from symptom to root cause in one cycle: go back to any point in the execution history to inspect application state (including contents of all the variables and the heap) and see exactly what your software did.

LiveRecorder boosts developer productivity

LiveRecorder fits in your existing development workflow – no software to install and no set-up required.

And once inside a recording in your VS Code interface (or in the command line), LiveRecorder incorporates the full functionality expected of modern debuggers (such as scripting, conditional breakpoints and watchpoints, full inspection of globals and locals).

It also allows these features to be used with the program running in reverse or forward.

This is an excerpt from our technical paper "Time Travel Debugging". For the full paper click below to download.

Download the Technical Paper

Final Thoughts

Most organizations building complex software are still severely limited in their delivery velocity by the rate at which software bugs can be discovered and resolved.

Time travel debugging empowers developers to capture and inspect application state at any point in time during the process failure. By allowing developers to see exactly what happened, it saves a huge amount of time spent on trying to reproduce the problem.

It shortens the iterative “reproduce–guess– test–restart” traditional debugging cycle down to one loop. This leaves developers free to fast-track to the exact root cause and fix the problem.

Large development teams at companies including SAP, Siemens, Synopsys, Juniper Network, and Palo Alto Networks are already using this technology and see time- and cost-savings that free up developers to code more productively and deliver new features. Time is of the essence while troubleshooting bugs, and every day that can be saved makes a big difference to getting to market faster.

The pressure to increase the speed of software development is growing relentlessly. Now is the time to switch to time travel debugging – a faster, more efficient method of debugging, fit for the 21st century. It’s time to Debug Different.