Diagnosing database application failures

Diagnosing database application failures

Data center

Building a database system is hard. Really hard. And here, we don’t mean creating a database of customers or invoices. We mean building the database management systems which power those databases. While much of the software world aims to make stateless, independent microservices, building a database is exactly the opposite: it’s all about storing, updating and persisting state.

Skimming the core developer forums or the git change logs for any open source database shows the kinds of challenges that occur when building database systems and reveals the approaches many developers take: from blind fixes which appear to stop the issues to tests which pass only sometimes and other software voodoo.

Race conditions are a good example of hard-to-reproduce bugs and, in database systems which have lots of concurrency and often a lack of access to production data, this is a good example of a very difficult bug. Database engineers are often without the information or environment to understand what the issue is.

As we’ve written before in our post about debugging race conditions in C/C++, the hard (but normal) way of solving race conditions boils down to identifying something unexpected and using that to work back to what happened which shouldn’t have. This can be painstaking.

It turns out that the fastest way to fix hard to reproduce bugs is not to try to reproduce them. If you don’t have access to live data, live systems or good bug reports, the best thing is just to know what happened. LiveRecorder provides the ability to replay a bug exactly as it occurred, and then the ability to step back through the code and find the offending line just as though you were debugging the program locally.

Fixing race conditions with LiveRecorder

In this example, we’ll show how LiveRecorder can run the program just as it would in the wild and create a recording of the execution for later replaying and debugging.

We’ll be using an example file (which is at the end of this post) which causes a race condition by having one thread unsafely changing a shared variable. This is a toy example, but the process of recording and debugging remains the same and to a large extent just as simple when scaled up.

The live-record command comes with LiveRecorder and can be used in a variety of ways, including compiling it with your program and attaching to running processes. In this example, we’re going to use LiveRecorder to run an existing program.

Once you have LiveRecorder installed, the live record command is used to execute the so-called ‘inferior’ program. The first argument sets where the recording will go while the second is the example program we’re going to debug:

./live-record -o ../recording.undo examples/race_x64
s_threadfn: it=0
s_threadfn: it=10000
s_threadfn: it=20000
...
race_x64: examples/race.cpp:35: void* s_threadfn(void*): Assertion `s_value == old_value + a' failed.
live-record: Saving to '/home/vagrant/install/undodb-4.7.13988/../recording.undo'...
live-record: Termination recording written to /home/vagrant/install/undodb-4.7.13988/../recording.undo

The program, `examples/race_x64` executes as it normally would and –  as intended for this program – it aborts. During its execution, LiveRecorder is capturing a recording of the program in the file `recording.undo` so that we can later replay it.

Why is this relevant? Imagine this program had been your database service which ships to hundreds or thousands of customers, or perhaps millions of devices. And imagine the abort wasn’t intended but was a bug happening apparently randomly in production but without any way of reproducing it. The power of LiveRecorder comes into play here: we can debug exactly what just happened in production.

To use the recording, we fire up UDB (formerly known as UndoDB) and pass the recording to it:

./udb recording.undo
udb: UndoDB reversible debugging system. Copyright 2006-2018 Undo Ltd.
...
udb: Have loaded Undo Recording:
udb:     recording.undo
udb: Note that the debuggee is currently at the beginning of
udb: the recording. You can use the continue command to udb: run to the end of the recording. 

This has just taken us back in time to the point when the program was about to be run. We can now start the program and move forward in time to reproduce the error:

(udb) continue
Continuing.
...
Program received signal SIGABRT, Aborted.
[Switching to Thread 8792.8818]
0x00007f2a477f7c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56    ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
udb: Have reached end of recorded history.
udb: The recorded program has exited.
udb: (You may use undodb commands to go backwards.)

In 2 steps, loading the file and issuing the continue command, we have reproduced the production error perfectly.

But we still don’t know what happened, so let’s debug the program. We first reverse-finish back to a point in our program we recognize:

(udb) rf
89    in abort.c
...
(udb) rf
35            assert( s_value == old_value + a);

Next, we see where we got to:

(udb) list
30            }
31            /* Increment <s_value>. Should be safe because we own s_mutex. */
32            int old_value = s_value;
33            int a = s_random_int( 5);
34            s_value += a;
35            assert( s_value == old_value + a);
36            pthread_mutex_unlock( &s_mutex);
37        }
38        return NULL;
39    }

We notice that the assertion on s_value causes the abort so we know the direct cause of the issue but not the root cause.

To find out how the program got into this error state we need to go back in time to find out which lines of code affected s_value. To do this we put a watch on s_value:

(udb) watch s_value
Hardware watchpoint 1: s_value

And then issue reverse-continue, which takes us back in time to the next point when s_value was changed:

(udb) rc
Continuing.
Hardware watchpoint 1: s_value

Old value = 236253
New value = 236250
0x0000000000400c73 in s_threadfn () at examples/race.cpp:34
34            s_value += a;
(udb) list
29                std::cout << __FUNCTION__ << ": it=" << it << "\n";
30            }
31            /* Increment <s_value>. Should be safe because we own s_mutex. */
32            int old_value = s_value;
33            int a = s_random_int( 5);
34            s_value += a;
35            assert( s_value == old_value + a);
36            pthread_mutex_unlock( &s_mutex);
37        }
38        return NULL;

This is just the line above, so we reverse-continue again:

(udb) rc
Continuing.
[New Thread 8792.8820]
[Switching to Thread 8792.8820]
Hardware watchpoint 1: s_value

Old value = 236250
New value = 236249
0x0000000000400b86 in s_threadfn2 () at examples/race.cpp:50
50            s_value += 1;   /* Unsafe. */

Now, obviously the “Unsafe” comment isn’t going to be in a real system but you get the idea: in just a few steps, we have arrived at the line which is causing the race condition:

(udb) list
45        {
46            if ( it % (100) == 0)
47            {
48                std::cout << __FUNCTION__ << ": it=" << it << "\n";
49            }
50            s_value += 1;   /* Unsafe. */
51            usleep(10*1000);
52        }
53        return NULL;
54    }

And there it is.

To replay what just happened: first, we ran a program, which could have been your database system in production. It hit a mysterious error and, without asking any users, customers or sysadmins, we took a recording and found the bug.

This shortcuts the entire process of getting a bug report, interpreting it and reproducing the error, none of which is required with LiveRecorder. We simply load a recording of what happened into UDB (formerly UndoDB) and replay the exact steps.

Database customers often don’t want to share example data or might not have the time to reproduce the error but they still want the bug fixed. For database vendors, this can be an impossible situation but using the right solution – LiveRecorder – can make it possible to find and fix production errors in race conditions and other hard to reproduce bugs.

If you’ve read our post on debugging race conditions with UDB, this follows exactly the same steps as debugging a program locally. For software engineers, this means that debugging an event from production doesn’t require a different environment or setting up dependencies.

This also allows the engineer to debug without having to set up precise OS config, hardware peripherals, other processes, or a database state: tasks which can take a huge amount of time before you’ve even got to the debugging step.

The code we used

#include <iostream>
#include <assert.h>
#include <pthread.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>

static int
s_random_int( int max)
{
    return rand() / ( RAND_MAX + 1.0) * max;
}

static int  s_value = 0;

static pthread_mutex_t  s_mutex = PTHREAD_MUTEX_INITIALIZER;

static void*
s_threadfn( void*)
{
    for ( int it=0;; ++it)
    {
        pthread_mutex_lock( &s_mutex);
        if ( it % (10*1000) == 0)
        {
            std::cout << __FUNCTION__ << ": it=" << it << "\n";
        }
        /* Increment <s_value>. Should be safe because we own s_mutex. */
        int old_value = s_value;
        int a = s_random_int( 5);
        s_value += a;
        assert( s_value == old_value + a);
        pthread_mutex_unlock( &s_mutex);
    }
    return NULL;
}

static void*
s_threadfn2( void*)
{
    for ( int it=0;; ++it)
    {
        if ( it % (100) == 0)
        {
            std::cout << __FUNCTION__ << ": it=" << it << "\n";
        }
        s_value += 1;   /* Unsafe. */
        usleep(10*1000);
    }
    return NULL;
}

int
main()
{
    pthread_t   t1;
    pthread_t   t2;
    pthread_t   t3;
    pthread_create( &t1, NULL, s_threadfn, NULL);
    pthread_create( &t2, NULL, s_threadfn, NULL);
    pthread_create( &t3, NULL, s_threadfn2, NULL);

    pthread_join( t1, NULL);
    pthread_join( t2, NULL);
    pthread_join( t3, NULL);

    return 0;
}

Stay informed. Get the latest in your inbox.

Leave a Reply

Your email address will not be published.Required fields are marked *