Resources
Tutorial: Getting Started with UDB
Overview
In this tutorial, we will walk you through how to use UDB and time travel debugging to identify the root cause of a small bug.
You will:
- get familiar with using UDB
- learn the basics of time travelling backward and forward through code in order to inspect and easily understand program state
The principles learned by following this guide can be used on more complex code.
Unpack the Tar file
If you have just downloaded UDB, you will need to install it to get going.
If you have already done this, you can skip this step by clicking ‘Next’ to continue.
It’s easy to install UDB. Here’s what you need to do:
- Open terminal
- Unpack the .tgz file that you just downloaded with the following command:
tar -xzf UDB-Individual-Evaluation-<version>.tgz
- This will create a folder named
UDB-Individual-Evaluation-<version>
Build & run the example program
We will use the sample program cache.c
(cache calculate) that can be found in the examples
directory of the UDB-Individual-Evaluation-<version>
folder you just created.
This sample program maintains a square root cache data structure in memory and validates it through repeatedly looking up values, caching additional new values on a cache miss.
- Change directory to the examples:
cd UDB-<TAB>/examples
- Feel free to open
cache.c
in a text editor so you can follow along - Build the cache example:
make cache
- Run the program to see where it fails:
./cache.
The sample program crashes with an error message – the number that the program is pulling out of the cache is not the expected number.
Run UDB & diagnose the problem
- Let’s open the program with the UDB debugger to analyze the program execution and diagnose the reason for this failure.
../udb cache
PressEnter
to page through the license andy
to accept - Next, let’s run the application, so type
run
The application runs to the point where it crashes. - Lets now examine the call stack to see a summary of how our program has got where it is. Type
backtrace
- Because UDB is a time travel debugger, we can run the execution of the program in reverse. Use the
reverse-finish
command twice to reverse up the stack to theabort()
statement inmain()
at cache.c line 85.
Note: in UDB – like GDB – pressing enter on an empty line repeats the previous command. - It will be best to switch into TUI (Text User Interface) mode now to see what is going on in the source code more easily. You can do this by pressing
Ctrl+X
and thenA
- With a time travel debugger, you can go back to any line of code that executed and see the complete program state. So type
info locals
to see the state of the variables at this point.
We can see that the integer square root of255
is15
(insqroot_correct
). Butsqroot_cache
is0
; which is the wrong value.
This is the point where the defect manifests, but it’s not the root cause of the defect. We need to find the point where the cache is populated with the incorrect value. - Line 78 is where the
sqroot_correct
variable is set.
Thereverse-next
command executes the program backwards to the previous source line in the same file. So use thereverse-next
command 3 times to go back in time to line 78.
- The previous line is where the
sqroot_cache
variable is set to its incorrect value.
Thereverse-step
command executes the program backwards until it reaches a different source line. So use thereverse-step
command once to step back into thecache_calculate()
function and again to go back to line 39 where it returns this incorrect value.
- Type
print g_cache[i]
We see that the square root stored in the cache for 255 is 0; which is incorrect. - Now we need to find out where this cache entry was populated with this incorrect value. We can do this by setting a watchpoint (a.k.a. a data breakpoint) on the incorrect entry in the cache and running back in time to where it was set.
First set the watchpoint,watch -l g_cache[i].sqroot
Then typereverse-continue
to run backwards in time to see where this value was written to.
The incorrect value was written tog_cache[i].sqroot
- Type
info locals
again to see the state of the variables at this point.
Here we see thatsqroot_adj
is a very large negative number. But when stored in tog_cache[i].sqroot,
it is being stored as0
.
This suggests a type casting error. We need to investigate why the large negative number is being stored as a0
. - Let’s print the type of the variables.
ptype sqroot_adj
ptype g_cache
The data structure shows the array which is made up of 100 pairs ofunsigned char
. Butunsigned char
can only hold values 0 to 255, so whensqroot_adj
is stored in the cache, its value is truncated. - Now we know how we got the zero, and similarly, when -1 is cast to an
unsigned char
, it becomes 255, but where did the -2,147,483,648 actually come from? - Line 48 shows that
sqroot_adj
is set tosqrt(number_adj)
, statically cast to an integer assqrt()
returns a double.
print sqrt(number_adj)
- You’ve discovered that the root cause of this application failure happens as a result of attempting to put the square root of -1 into the cache, which was not intended.That happens because the for loop in line 46 loops from number-1 to number+1, but there is no protection anywhere to deal with the special case that we just hit where the number is zero. A simple
if (number_adj < 0) continue;
at the start of the for loop would have avoided this error.
Tutorial Complete
Congratulations! you used time travel debugging successfully to diagnose the root cause of the error in no time!
You’re now a time travelling Bug Hunter!
Next steps
- Try it on your own code (See Docs)
Help and Support
If you get stuck, help is always at hand.