Resources

UndoDB use cases: intermittent bugs, UndoDB reversible debugging and GDB scripting

One use case where reversible debugging is particularly useful is in finding intermittent bugs – where you may have to run a program hundreds or thousands of times before something goes wrong.

Conventional debuggers such as GDB don’t handle this sort of bug very well, because if you step past something by mistake or want to use a different watchpoint or breakpoint, then you will have to re-run the program many times before capturing the bug again with the new watchpoint and breakpoint being active. Worse, if the new breakpoint or watchpoint halts execution earlier than before, it may be difficult to even tell whether the bug has actually struck.

A reversible debugger like UndoDB helps to avoid these sorts of problem. Once you have captured a bug in UndoDB, you can go backwards and forwards as much as you like, setting different breakpoints and watchpoints as required. You never have to re-run the program multiple times in order to re-capture a bug.

You still have to capture the bug for the first time in UndoDB. You can use gdb’s python scripting to help with this, by repeatedly running the program with the bug (the debuggee) until the bug strikes.

For example, if a debuggee usually exits successfully but occasionally an intermittent bug causes it to exit with a non-zero return code, you can repeatedly run it until the bug has been captured using a small python script:

File repeat_until_non_zero_exit.py:

'''
Repeatedly loop until debuggee fails.
'''
import gdb
while 1:
    gdb.execute( 'run')
    e = gdb.parse_and_eval( '$_exitcode')
    print( '$_exitcode is: %s' % e)
  if e != 0:
    break

Then at the undodb-gdb prompt, do:

(undodb-gdb) source repeat_until_non_zero_exit.py

This will repeatedly run the debuggee until it exits with a non-zero return code, whereupon you can use UndoDB to explore the execution of the debuggee and figure out what went wrong.

Similarly, if an intermittent bug results in a breakpoint being hit rather than the debuggee exiting, you can capture the bug using a script like this:

File repeat_until_breakpoint.py:

'''
Repeatedly run debuggee until we hit a breakpoint.
'''
import gdb

events = []
def event_handler( event):
    events.append( event)
gdb.events.exited.connect( event_handler)
gdb.events.stop.connect( event_handler)

while 1:
  events = []
  gdb.execute( 'run')
  breakpoint = None
  for event in events:
    if isinstance( event, gdb.BreakpointEvent):
      breakpoint = event
      break
  if breakpoint:
    print( 'have hit breakpoint: %s' % breakpoint)
    break

Then at the undodb-gdb prompt, do:

(undodb-gdb) source repeat_until_breakpoint.py

Finally, if an intermittent bug results in a signal being delivered rather than the debuggee exiting, you can capture the bug using a script like this:

File repeat_until_signal.py:

'''
Repeatedly run debuggee until it receives SIGSEGV.
'''
import gdb
import signal

events = []
def event_handler( event):
  events.append( event)
gdb.events.exited.connect( event_handler)
gdb.events.stop.connect( event_handler)

while 1:
  events = []
  gdb.execute( 'run')
  breakpoint = None
  for event in events:
    if ( isinstance( event, gdb.SignalEvent)
        and event.stop_signal == 'SIGSEGV'
        ):
        breakpoint = event
        break
  if breakpoint:
    print( 'have hit breakpoint: %s' %breakpoint.stop_signal)
    break

Then at the undodb-gdb prompt, do:

(undodb-gdb) source repeat_until_signal.py


Links