Resources
Detecting, Managing, and Preventing Deadlocks in C/C++
Deadlocks are a common concurrency problem in C and C++ programming, occurring when multiple threads or processes are stuck waiting for resources held by one another, causing the system to freeze.
Detecting deadlocks is a critical step in building high-performance applications in a multithreaded environment. In this article, Ed Tait, Senior Software Engineer at Undo explores how to detect deadlocks, as well as manage and prevent them, in your C and C++ code.
Detecting Deadlocks
Symptomatic
The typical manifestation is a hang—the program ceases to produce output and becomes unresponsive to input. Often (though not always) examination with ps
will show no CPU time being consumed and the process and its threads in the S
(sleep) state. It is possible to have a deadlock which continues to consume CPU, if the synchronization primitives involved use ‘spinning’ instead of system calls.
Example
The following is a very simple example of a deadlock. You can build it with gcc -Og -ggdb -o deadlock deadlock.c -lpthread
. I’ll use this program to illustrate some of the debugging methods later in this post. This example is a case of the dining philosophers problem, where there are only two philosophers.
/** Simple program which can deadlock */ #include <stdio.h> #include <pthread.h> /* Two worker threads claim the same two locks, but * do so in a different order. This means we can end * up in a situation where each thread holds one of the * locks and is waiting for the lock held by the other. */ static void * s_worker_1(void *arg) { pthread_mutex_t *locks = arg; pthread_mutex_lock(&locks[0]); pthread_mutex_lock(&locks[1]); printf("Thread 1 - both locks locked\n"); pthread_mutex_unlock(&locks[1]); pthread_mutex_unlock(&locks[0]); } static void * s_worker_2(void *arg) { pthread_mutex_t *locks = arg; pthread_mutex_lock(&locks[1]); pthread_mutex_lock(&locks[0]); printf("Thread 2 - both locks locked\n"); pthread_mutex_unlock(&locks[0]); pthread_mutex_unlock(&locks[1]); } int main (void) { pthread_t t1, t2; pthread_mutex_t locks[2]; pthread_mutex_init(&locks[0], NULL); pthread_mutex_init(&locks[1], NULL); pthread_create(&t1, NULL, s_worker_1, locks); pthread_create(&t2, NULL, s_worker_2, locks); pthread_join(t1, NULL); pthread_join(t2, NULL); return 0; }
Even this simple program exhibits the key complication of multithreaded programming – it is non-deterministic: A single run of the program may complete successfully or it may deadlock. Run it in a loop using shell scripting: while true; do ./deadlock; done
. You should end up with output like this:
$ while true; do ./deadlock; done Thread 1 - both locks locked Thread 2 - both locks locked Thread 2 - both locks locked Thread 1 - both locks locked Thread 1 - both locks locked Thread 2 - both locks locked Thread 1 - both locks locked Thread 2 - both locks locked Thread 1 - both locks locked Thread 2 - both locks locked Thread 1 - both locks locked Thread 2 - both locks locked
Eventually, no more output will be produced – the program has deadlocked. You can kill it with Ctrl-C.
The situation when the deadlock happens is a little like this, with each thread owning one lock, but needing to own the other in order to make progress.
Investigation
Debugger
Attaching a debugger is probably the surest way to diagnose a deadlock. I’ll discuss GDB here, though other debuggers will support similar commands and workflows. Firstly, identify your process. I normally do ps a | grep deadlock
to find the process id. Once you’ve got this you can attach with gdb -p $PID
.
Note, you may need to adjust certain system settings to enable attaching to existing processes. See the documentation on yama for more information.
Get your bearings by issuing thread apply all bt
to see backtraces for all the threads (this can produce a lot of output if you’ve got many threads or they’re deep in nested calls. You can limit the number of frames in the backtrace by supplying a numeric argument to bt: thread apply all bt 3
gives the three deepest nested frames.
Armed withsome backtraces, you can get your sleuthing hat on. There are particular functions to look for in the backtrace. Names containing strings like: mutex_lock
or cond_wait
should arouse your suspicions.
$ gdb -p 17541 <snip startup messages> Attaching to process 17541 [New LWP 17542] [New LWP 17543] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".__pthread_clockjoin_ex (threadid=140737351599872, thread_return=0x0, clockid=<optimised out>, abstime=<optimised out>, block=<optimised out>) at pthread_join_common.c:145 145 pthread_join_common.c: No such file or directory. (gdb) thread apply all bt Thread 3 (Thread 0x7ffff7593700 (LWP 17543)): #0 __lll_lock_wait (futex=futex@entry=0x7fffffffdad0, private=0) at lowlevellock.c:52 #1 0x00007ffff7f950a3 in __GI___pthread_mutex_lock (mutex=0x7fffffffdad0) at ../nptl/pthread_mutex_lock.c:80 #2 0x000055555555522a in s_worker_2 (arg=0x7fffffffdad0) at deadlock.c:27 #3 0x00007ffff7f92609 in start_thread (arg=<optimised out>) at pthread_create.c:477 #4 0x00007ffff7eb7353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 2 (Thread 0x7ffff7d94700 (LWP 17542)): #0 __lll_lock_wait (futex=futex@entry=0x7fffffffdaf8, private=0) at lowlevellock.c:52 #1 0x00007ffff7f950a3 in __GI___pthread_mutex_lock (mutex=0x7fffffffdaf8) at ../nptl/pthread_mutex_lock.c:80 #2 0x000055555555526b in s_worker_1 (arg=0x7fffffffdad0) at deadlock.c:16 #3 0x00007ffff7f92609 in start_thread (arg=<optimised out>) at pthread_create.c:477 #4 0x00007ffff7eb7353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 1 (Thread 0x7ffff7d95740 (LWP 17541)): #0 __pthread_clockjoin_ex (threadid=140737351599872, thread_return=0x0, clockid=<optimised out>, abstime=<optimised out>, block=<optimised out>) at pthread_join_common.c:145 #1 0x0000555555555308 in main () at deadlock.c:42
There are three threads here – Thread 1 is the main thread. It’s inside __pthread_clockjoin_ex
, a function used to rendezvous with other threads. That’s what we’d expect as the main thread waits for the other two threads to complete. The other two threads are the workers. We can see that they are both attempting to claim a lock (___ll_lock_wait
).
Via the /proc filesystem
It may be possible to avoid using a debugger and instead directly check what system calls an application is blocked in. Certain syscalls are associated with deadlocks. The key one to look for is futex
. You can examine what system call a process is currently running by looking at /proc/$PID/syscall
. You can look at the current syscall for each thread using /proc/$PID/task/$TID/syscall
:
Analysis
Once you think you have a deadlock on your hands, you need to analyze how it came about. Establishing which locks each thread is waiting on is a good first step (this is a little easier than finding out which locks a thread holds). Assuming your locks are implemented using futex
(pthread’s locks are), you can check the first argument in /proc/$PID/syscall
. Alternatively, if you’ve attached a debugger, then the arguments in the back trace will give you what you want.
Example – Inspecting pthread locks using a debugger
The pthread_mutex_t
structure is intended to be opaque (so it can be changed as the threading library is updated); but for any specific version it is possible to extract useful information. Of particular interest is the ‘owner’ field. Once a deadlock has occurred you can inspect this field in a debugger to work out which threads own which locks.
(gdb) thread 2 # Switch to one of the deadlocked threads [Switching to thread 2 (Thread 0x7ffff7d94700 (LWP 17542))] #0 __lll_lock_wait (futex=futex@entry=0x7fffffffdaf8, private=0) at lowlevellock.c:52 52 lowlevellock.c: No such file or directory. (gdb) backtrace #0 __lll_lock_wait (futex=futex@entry=0x7fffffffdaf8, private=0) at lowlevellock.c:52 #1 0x00007ffff7f950a3 in __GI___pthread_mutex_lock (mutex=0x7fffffffdaf8) at ../nptl/pthread_mutex_lock.c:80 #2 0x000055555555526b in s_worker_1 (arg=0x7fffffffdad0) at deadlock.c:16 #3 0x00007ffff7f92609 in start_thread (arg=<optimised out>) at pthread_create.c:477 #4 0x00007ffff7eb7353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) up # select the frame where 'mutex' is in scope #1 0x00007ffff7f950a3 in __GI___pthread_mutex_lock (mutex=0x7fffffffdaf8) at ../nptl/pthread_mutex_lock.c:80 80 ../nptl/pthread_mutex_lock.c: No such file or directory. (gdb) print *mutex $1 = pthread_mutex_t = {Type = Normal, Status = Acquired, possibly with waiters, Owner ID = 17543, Robust = No, Shared = No, Protocol = None}
Here we can see that the lock thread 2 (with id 17542) is trying to claim. It is currently owned by the thread with id 17543. Consulting info threads
tells us that this is thread 3:
(gdb) info threads Id Target Id Frame 1 Thread 0x7ffff7d95740 (LWP 17541) "deadlock" __pthread_clockjoin_ex (threadid=140737351599872, thread_return=0x0, clockid=<optimised out>, abstime=<optimised out>, block=<optimised out>) at pthread_join_common.c:145 * 2 Thread 0x7ffff7d94700 (LWP 17542) "deadlock" __lll_lock_wait (futex=futex@entry=0x7fffffffdaf8, private=0) at lowlevellock.c:52 3 Thread 0x7ffff7593700 (LWP 17543) "deadlock" __lll_lock_wait (futex=futex@entry=0x7fffffffdad0, private=0) at lowlevellock.c:52
We can switch to thread 3 and see that thread is trying to claim a lock owned by thread 2:
(gdb) thread 3 [Switching to thread 3 (Thread 0x7ffff7593700 (LWP 17543))] #0 __lll_lock_wait (futex=futex@entry=0x7fffffffdad0, private=0) at lowlevellock.c:52 52 lowlevellock.c: No such file or directory. (gdb) backtrace #0 __lll_lock_wait (futex=futex@entry=0x7fffffffdad0, private=0) at lowlevellock.c:52 #1 0x00007ffff7f950a3 in __GI___pthread_mutex_lock (mutex=0x7fffffffdad0) at ../nptl/pthread_mutex_lock.c:80 #2 0x000055555555522a in s_worker_2 (arg=0x7fffffffdad0) at deadlock.c:27 #3 0x00007ffff7f92609 in start_thread (arg=<optimised out>) at pthread_create.c:477 #4 0x00007ffff7eb7353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) up #1 0x00007ffff7f950a3 in __GI___pthread_mutex_lock (mutex=0x7fffffffdad0) at ../nptl/pthread_mutex_lock.c:80 80 ../nptl/pthread_mutex_lock.c: No such file or directory. (gdb) print *mutex $2 = pthread_mutex_t = {Type = Normal, Status = Acquired, possibly with waiters, Owner ID = 17542, Robust = No, Shared = No, Protocol = None}
Prevention
Enforce order of locking / unlocking deadlocks can be avoided by ensuring that locks are always claimed in a particular order. This can become complicated in large codebases, and is prone to apparently benign code changes introducing lock order violations. Helgrind can help you here, by detecting ordering violations. You can read more about Valgrind (the suite that contains the Helgrind tool).
In certain circumstances, it may be possible to avoid deadlocks by using a trylock call; instead of blocking, this will give up and return an error code if the target lock has already been claimed. This is useful in cases where the task that requires the lock is optional, or can be retried later.
Maybe you don’t need locks at all? Consider lockless algorithms instead, or remove the need to synchronize.