Using a reversible debugger to recover from stack-corruption
Debugging, Reversible debugging, Stack corruption, UndoDB

If a program overwrites its own program counter register, it is almost impossible to recover using a conventional debugger – without the program counter, the debugger cannot figure out which function the program was running, and so cannot even give any useful information about what is on the stack or where the code was immediately before the stack was corrupted. This makes debugging pretty much impossible.

With a reversible debugger however, recovery is almost comically simple; simply do:

reverse-step

– to rewind one instruction, and the state of the program will move back to the instruction that corrupted the program counter, allowing you to see what’s gone wrong, and also allowing the debugger to know what function was running and so be able to interpret the stack and display it to you in a useful way. You can replay your code and consequently find the issue in order to then debug and fix it quickly.

For example, in this program, the function foo overwrites its stack with zeros and then attempts to return, which results in the program counter registers being set to zero.

#include <strings.h>

static void
foo( void)
{
    int b[1];
    bzero( b, 100); /* Overwrite our own stack. */
    return;
}

int
main( void)
{
    foo();
    return 0;
}

The program crashes when run:

> gcc -g foo.c
> ./a.out
Segmentation fault (core dumped)

Looking at the core file with gdb doesn’t give us much information because the program counter register has been trashed, so there’s no usable backtrace:

> gdb -q a.out core
Reading symbols from .../a.out...done.
[New LWP 30704]

warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?

warning: no loadable sections found in added symbol-file system-supplied
 DSO at 0x7fff8f3ee000
Core was generated by `./a.out'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000000000 in ?? ()
(gdb) backtrace
#0 0x0000000000000000 in ?? ()
#1 0x0000000000000000 in ?? ()
(gdb) info reg pc
pc 0x0 0x0
(gdb)

Running under gdb doesn’t help either – there’s still no backtrace after the program crashes:

> gdb -q a.out
Reading symbols from .../a.out...done.
(gdb) run
Starting program: .../a.out
warning: no loadable sections found in added symbol-file system-supplied
 DSO at 0x7ffff7ffa000
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) backtrace
#0 0x0000000000000000 in ?? ()
#1 0x0000000000000000 in ?? ()
(gdb) info reg pc
pc 0x0 0x0
(gdb)

With UndoDB however, we can quickly figure out what has gone wrong:

> undodb-gdb -q a.out
undodb-gdb: UndoDB reversible debugging system. Copyright 2006-2014 Undo
 Ltd.
undodb-gdb: undodb-4.0.3363
undodb-gdb: Licensed to: test user <support@undo-software.com>
undodb-gdb: By running this software you agree to the terms in:
undodb-gdb: /usr/local/lib/undodb-4.0.3363/demo_license.html
Reading symbols from .../a.out...done.
(undodb-gdb) r
undodb-gdb: debug-server pid 31546, port 34773
Starting program: .../a.out
warning: no loadable sections found in added symbol-file system-supplied
 DSO at 0x7fff2a1fe000
undodb: license type: UndoDB version 4.0, demo, arm, user: test user
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(undodb-gdb) backtrace
#0 0x0000000000000000 in ?? ()
#1 0x0000000000000000 in ?? ()
(undodb-gdb) info reg pc
pc 0x0 0x0
(undodb-gdb)

Do the reverse-stepi trick to recover from the program counter corruption:

(undodb-gdb) reverse-stepi
0x0000000000400518 9 }
(undodb-gdb) backtrace
#0 0x0000000000400518 in foo () at foo.c:9
#1 0x0000000000000000 in ?? ()
(undodb-gdb)

Now we know where we are, we can step back and figure out what went wrong:

(undodb-gdb) reverse-next
8           return;
(undodb-gdb) backtrace
#0 foo () at foo.c:8
#1 0x0000000000000000 in ?? ()
(undodb-gdb) reverse-next
7           bzero( b, 100);
(undodb-gdb) backtrace
#0 foo () at foo.c:7
#1 0x0000000000400522 in main () at foo.c:14
(undodb-gdb)

So UndoDB has enabled us to figure out exactly what has gone wrong in seconds.

Top