This is a talk Dewang Li (Software Architect at Synopsys) and I gave at Cppcon earlier this year (2019) on how some of the seemingly-magical modern Linux C++ tools actually work - so that you can make the most of them. In the talk, we give an overview of the kinds of debugging tools available, explain how they work and give debugging tips.
I've pulled out some particularly interesting and useful points below, but do watch the video to get the juicy details.
Types of debugging tools
There are 4 main categories of debugging tools which we cover in the talk: the debugger, record and replay, dynamic checkers, and static analysis.
The debugger is something like GDB which can pause time during execution to allow you to look around and inspect what's going on inside your program so you can find out what it's doing.
The record and replay tools such as rr and LiveRecorder help you answer: what did my program do? There's a bit of overlap between these tools and the debugger; but unlike the debugger, you can travel backwards and forwards through time to see how your program arrived where it did.
Dynamic checkers focus on finding out if a class of a certain thing happened. Buffer overruns are the canonical example, but there are others such as race condition checkers, CPU cache checkers and heap analyzers.
Lastly, the static checkers help you investigate whether a given problem could occur. Dewang goes into detail on this towards the end of the talk.
How debuggers work
Debuggers can seem like magic; but it's definitely worth knowing how they work by understanding what's going on under the hood. What I cover in the talk is about GDB, but any Linux debugger works in much the same way.
When debugging, there are three parts to consider: GDB, the kernel, and the program you are debugging (which GDB calls the "inferior"). GDB talks to the program being debugged over a kernel API called ptrace, and asynchronous notifications go back to the debugger over signals. So you have this 2-way interaction between GDB and the program via ptrace and signals.
How a program handles a signal depends on what the signal is and how the program has been configured. One of the potential signals is a tracing stop, which GDB uses to control the program.
Suppose a hello world program is being debugged and receives some kind of signal, such as SIGALRM. It would then enter a stopped state. The debugger would then let the program continue with the PTRACE_CONT. The program then continues running as normal until it receives a signal - at which point it will stop in a tracing stop, and the debugger is notified again.
Signals only reach the tracee if they are passed in via PTRACE_CONT. What's useful to know is that this is how breakpoints happen: when your target process hits a breakpoint, this is actually just a signal - SIGTRAP.
To see how GDB will behave towards each type of signal, run the following in GDB:
(gdb) info signals
This lists out each signal along with whether it will stop the process, whether it will be printed out, and whether it will be passed to the program being debugged. For example, CTRL-C would normally kill your program because it passes it a SIGINT; but by default GDB does not pass this to the program.
DWARF, which stands for Debugging With Attributed Record Formats, was created at the same time as ELF (Executable and Linkable Format) was made and so got its name as much as a pun as for what it means.
The DWARF information contains the detailed description of your program which the debugger needs to allow you to debug your program. For instance, some of the simplest information maps a program counter to a source line; so when your program has stopped at, for example instruction address 0x1234, the debugger can look at the DWARF information and see that it corresponds to foo.c line 42.
DWARF contains far more than just this: it contains information on types, functions, classes, templates, macros and more. If you're interested in this, I have previously covered it in another gdbWatchPoint post.
We've all been debugging, tried to print out a variable, and been told that it's optimized out:
(gdb) print foo
$1 = <optimized out>
This is annoying and quite misleading. Let's see what's really going on. Suppose you have a program:
$ cat optimized.c
int foo = rand();
printf("foo is %d\n", foo++);
... which you compile and debug:
gcc -03 -g3 optimized.c
In GDB you then
start debugging and print a variable
foo which you know appears in the code:
7 int foo = rand();
(gdb) print foo
$1 = <optimized out>
This suggests to me that the value just doesn't exist anymore because the compiler has been able to get rid of
foo completely. I find this a really unhelpful error message because that isn't what it means. What it actually means is that the variable isn't yet live.
Knowing this, if I do
print foo again, then I'm able to see the value:
8 printf(“foo is %d\n”, foo++);
(gdb) print foo
$2 = 1804289384
In the Cppcon video above, I show how you can use the
readelf utility to see where the variable is live and why the variable hasn't really been optimized out. You can also follow a similar example in another gdbWatchPoint video.
Static analyzers allow you to find issues which might arise in a program without having to run the code. Coverity is a tool which provides static analysis for C++; it provides a nice complement to GDB, LiveRecorder and the other tools discussed in the talk.
It shows you what conditions would cause your code to have an issue by taking the raw source code, compiling it and generating semantic representations to find command injection, crashes, resources leaks and other vulnerabilities.
The result is presented as an annotation of the code which describes the steps that will lead to whatever problem Coverity has found. This can be delivered on the command line, in an IDE or into code review tools.
In the talk, Dewang gives a demo of Coverity showing how to find issues which might only occur in very specific cases.