Resources

Comparison of ThreadSanitizer and Thread Fuzzing

Author: Gareth Rees, Senior Software Engineer at Undo

ThreadSanitizer and Thread Fuzzing are two tools for detecting data races in multi-threaded code. This article compares the tools.

ThreadSanitizer

ThreadSanitizer is a compiler extension, originally developed by Google, that has been added to various compilers, including Clang/LLVM and GCC, where it can be enabled using the -fsanitize=thread option.

It works by maintaining, for each word of memory allocated by the program, a set of N “shadow words”, where N is 2, 4, or 8, depending on the configuration (the larger the set of shadow words, the more accurate the analysis but the greater the memory overhead). The shadow words represent recent accesses to the word, or parts of the word, and are managed using random eviction. Each shadow word contains the thread that accessed the memory, the “epoch” of the access, the bytes accessed, and whether the access was read or write. The epoch is a global counter which is incremented when there is a synchronization between threads.

When the -fsanitize=thread option is provided, the compiler turns every access to memory by the compiled program into a call to a runtime function that updates the shadow words, and identifies cases where two threads accessed the same bytes in the same epoch. See the ThreadSanitizer algorithm documentation for details.

Thread Fuzzing

Thread Fuzzing is a feature of Undo’s LiveRecorder product. LiveRecorder records the runtime behaviour of a program and saves it as an Undo recording so that it can later be replayed in a debugger. LiveRecorder allows only one thread to run at a time, by taking a lock in that thread, and letting all other threads block on the lock, but it regularly releases the lock to give other threads the opportunity to claim it and run.

When Thread Fuzzing is enabled, LiveRecorder varies the timing with which the lock is released and other threads may run. There are several fuzzing strategies which can be configured: see the fuzzing modes documentation for details.

Comparison

Build configuration

ThreadSanitizer requires a special build configuration, since the program must be compiled with the -fsanitize=thread option.

Thread Fuzzing can be applied to any program, and does not require a special build configuration.

Runtime configuration

ThreadSanitizer allocates the shadow words at locations determined only by the address of the original words, so that no memory accesses are required to find the shadow words. This means that it may not run under Address-Space Layout Randomization (ASLR), resulting in runtime failures like this:

FATAL: ThreadSanitizer: unexpected memory mapping 0x6395c9526000-0x6395c9527000

If this affects your program, you need to run it with ASLR disabled, for example, by running the program under setarch --addr-no-randomize, or by calling personality() with ADDR_NO_RANDOMIZE.

Thread Fuzzing works both with ASLR and without ASLR.

Performance

The Clang documentation says that “typical slowdown introduced by ThreadSanitizer is about 5×–15×.”

The slowdown due to Thread Fuzzing depends on the program. There is a slowdown due to the LiveRecorder recording overhead: this varies according to the type of workload. If the program is mostly working in its own private memory the slowdown may be as low as 1.5×, but if the program makes extensive use of shared memory then the slowdown can be much larger, as much as 50× if all memory is shared. This overhead needs to be multiplied by the slowdown due to serialization of threads, which is roughly proportional to the parallel workload: if the program keeps n threads busy when run natively, then the slowdown under LiveRecorder should be multiplied by n. For example, a program that keeps 4 threads busy, using mainly private memory and a small amount of shared memory, might see a slowdown of 2 (execution overhead) × 4 (threading overhead) = 8 times.

Memory Usage

The Clang documentation says that “typical memory overhead introduced by ThreadSanitizer is about 5×–10×.”

The memory overhead introduced by LiveRecorder is usually less than 2×. Thread Fuzzing adds no further memory overhead.

Classes of bugs detected

ThreadSanitizer reports data races regardless of whether they cause a program failure. This catches races as soon as they occur, instead of waiting for incorrect results to propagate through the program until it crashes or asserts (if it ever does).

Data races reported by ThreadSanitizer may in some cases be false positives. For example, consider a program in which increment_count() is called from multiple threads:

unsigned count = 0;

void
increment_count(void)
{
    ++count;
}

ThreadSanitizer correctly reports this as a data race since an increment is not an atomic operation, but a read followed by a write. However, if the program uses count only as a lower bound on the number of times that increment_count() was called, then the code is safe. ThreadSanitizer provides mechanisms for suppressing false positives, either by using a suppression file or by annotating the source.

Thread Fuzzing reproduces data races, locking errors, and deadlocks. It does not reproduce races related to weak memory model semantics (out-of-order updates) on the ARM64 architecture. It does not detect or report errors: it is up to the program to bring data races to the attention of developers, for example, by crashing or asserting. It only reproduces error conditions that can occur in practice: that is, there are no false positives.

Interpreting the results

ThreadSanitizer emits its findings as reports giving backtraces for the racing threads. For example, consider these functions implementing lockless push and pop operations on a linked list:

static void
s_push(list_t *item)
{ 
   list_t *tmp = __atomic_load_n(&list_head.next, __ATOMIC_ACQUIRE); 
   /* (A) */ item->next = tmp; 
   __atomic_store_n(&list_head.next, item, __ATOMIC_RELEASE);
}

static void *
s_pop(void)
{
   list_t *item = __atomic_load_n(&list_head.next, __ATOMIC_ACQUIRE); 
   if (!item) 
   { 
       return NULL; 
   } 
   /* (B) */ __atomic_store_n(&list_head.next, item->next, __ATOMIC_RELEASE); 
   return item;
}

static void *
s_consumer(void *arg)
{ 
   for (;;) 
   { 
       list_t *item = s_pop(); 
       if (item) 
       { 
           free(item); 
       } 
    } 
    return NULL;
}

When this code is compiled with ThreadSanitizer we get the following report:

==================
WARNING: ThreadSanitizer: data race (pid=945926) 
  Read of size 8 at 0x72040008b1e0 by thread T2: 
    #0 s_pop examples/linked-list.c:65 (linked-list+0x1401) 
    #1 s_consumer examples/linked-list.c:117 (linked-list+0x1543) 

  Previous write of size 8 at 0x72040008b1e0 by thread T1: 
    #0 s_push examples/linked-list.c:43 (linked-list+0x1388) 
    #1 s_producer examples/linked-list.c:100 (linked-list+0x1508) 

  Location is heap block of size 16 at 0x72040008b1e0 allocated by thread T1: 
    #0 malloc src/libsanitizer/tsan/tsan_interceptors_posix.cpp:665 (libtsan.so.2+0x54b3f) 
    #1 s_producer examples/linked-list.c:99 (linked-list+0x14f8) 

  Thread T2 (tid=945929, running) created by main thread at: 
    #0 pthread_create src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1022 (libtsan.so.2+0x5ac1a) 
    #1 main examples/linked-list.c:146 (linked-list+0x1651) 

  Thread T1 (tid=945928, running) created by main thread at: 
    #0 pthread_create src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1022 (libtsan.so.2+0x5ac1a) 
    #1 main examples/linked-list.c:145 (linked-list+0x1634)

SUMMARY: ThreadSanitizer: data race examples/linked-list.c:65 in s_pop
==================

The developer reading the report must use pure deduction to figure out the cause of the race. In this case it is not too hard to see that the write of item->next at (A) is racing with the read of item->next at (B) and that a lock is needed to avoid interleaving of pushes and pops. However, in more complex cases it may not be so easy to deduce the cause.

Thread Fuzzing does not detect or report data races, but LiveRecorder saves an Undo recording of the program’s behaviour, which allows a failure to be reproduced by replaying an instruction-precise trace of the program’s behavior, allowing all of memory to be inspected at any point in time. In the example below, the program crashes with a segmentation fault and live-record saves an Undo recording:

$ live-record --thread-fuzzing ./linked-list
live-record: Termination recording will be written to 
linked-list-968476-2025-03-18T16-40-45.616.undo
live-record: Maximum event log size is 1G.
live-record: Saving to 
linked-list-968476-2025-03-18T16-40-45.616.undo ...
live-record: Saving.. 100%

live-record: Termination recording written to 
linked-list-968476-2025-03-18T16-40-45.616.undo
live-record: Detaching...
Segmentation fault (core dumped)

We can load the Undo recording into UDB:

$ udb -q linked-list-968476-2025-03-18T16-40-45.616.undo
0x00005bad0e0f6120 in _start ()

The debugged program is at the beginning of recorded history. Start debugging
from here or, to proceed towards the end, use: 
continue - to replay from the beginning 
ugo end - to jump straight to the end of history

The crash is at the end of history, so we jump there:

start 1> ugo end
[New Thread 968476.968506]
[New Thread 968476.968507]
[Switching to Thread 968476.968507]
0x00005bad0e0f625e in s_pop () at examples/linked-list.c:65
65 __atomic_store_n(&list_head.next, item->next, 
__ATOMIC_RELEASE);
end 15,289,385> bt
#0 0x00005bad0e0f625e in s_pop () at examples/linked-list.c:65
#1 0x00005bad0e0f632d in s_consumer (arg=0x0) at 
examples/linked-list.c:117
#2 0x0000759b4b49caa4 in start_thread (arg=<optimised out>) at 
./nptl/pthread_create.c:447
#3 0x0000759b4b529c3c in clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

The reason for the crash is that item points at unmapped memory:

end 15,289,385> p item->next
Cannot access memory at address 0x759c1db41545
end 15,289,385> p item
$1 = (list_t *) 0x759c1db41545
end 15,289,385> p *item
Cannot access memory at address 0x759c1db41545

We can run backwards to see how item got this bad value:

end 15,289,385> last item
Searching backward for changes to 0x759b4a98ee88-0x759b4a98ee90 for the]
expression: 
  item

Thread 3 "linked-list" hit Hardware watchpoint -25: *(list_t * *) 0x759b4a98ee88

Was = (list_t *) 0x759c1db41545
Now = (list_t *) 0xffffffffffffff88
0x00005bad0e0f6248 in s_pop () at examples/linked-list.c:58
58 list_t *item = __atomic_load_n(&list_head.next, 
__ATOMIC_ACQUIRE);
end 15,289,385> p list_head.next
$3 = (struct list *) 0x759c1db41545
end 15,289,385> p *list_head.next
Cannot access memory at address 0x759c1db41545

The bad value was loaded from list_head.next. Running backwards again to find out how this value became bad:

end 15,289,385> last list_head.next
Searching backward for changes to 0x5bad0e0f9030-0x5bad0e0f9038 for the
expression: 
  list_head.next
Enable debuginfod for this session? (y or [n]) n

Thread 3 "linked-list" hit Hardware watchpoint -28: *(struct list * *) 0x5bad0e0f9030

Was = (struct list *) 0x759c1db41545
Now = (struct list *) 0x759b44005190
s_pop () at examples/linked-list.c:65
65 __atomic_store_n(&list_head.next, item->next, __ATOMIC_RELEASE);
99% 15,289,375> bt
#0 s_pop () at examples/linked-list.c:65
#1 0x00005bad0e0f632d in s_consumer (arg=0x0) at examples/linked-list.c:117
#2 0x0000759b4b49caa4 in start_thread (arg=<optimised out>) at ./nptl/pthread_create.c:447
#3 0x0000759b4b529c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
99% 15,289,375> p item
$4 = (list_t *) 0x759b44005190
99% 15,289,375> p *item
$5 = {next = 0x759c1db41545, data = 0x0}
99% 15,289,375> p *item->next
Cannot access memory at address 0x759c1db41545

Running backwards a third time:

99% 15,289,375> last item->next
Searching backward for changes to 0x759b44005190-0x759b44005198 for the
expression: 
  item->next

Thread 3 "linked-list" hit Hardware watchpoint -36: *(struct list * *) 0x759b44005190

Was = (struct list *) 0x759c1db41545
Now = (struct list *) 0x759b44005270
0x0000759b4b4ab2f6 in _int_free (av=0x759b44000030, p=<optimised out>, have_lock=0) at ./malloc/malloc.c:4619
4619 ./malloc/malloc.c: No such file or directory.
99% 15,283,100> bt
#0 0x0000759b4b4ab2f6 in _int_free (av=0x759b44000030, p=<optimised out>, have_lock=0) 
     at ./malloc/malloc.c:4619
#1 0x0000759b4b4addae in __GI___libc_free (mem=0x759b44005190) at ./malloc/malloc.c:3398
#2 0x00005bad0e0f634b in s_consumer (arg=0x0) at examples/linked-list.c:128
#3 0x0000759b4b49caa4 in start_thread (arg=<optimised out>) at ./nptl/pthread_create.c:447
#4 0x0000759b4b529c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

So the crash was caused by a use-after-free and we are at the location of the free in the debugger ready for further investigation of the cause.

Conclusion

Each tool has its own strengths and weaknesses. ThreadSanitizer detects data races directly, which is effective if the program omits to check the consistency of its own data structures. Thread Fuzzing varies the scheduling of threads, which can be effective when a data race occurs only under unusual conditions, and produces an Undo recording that makes it possible to reproduce the failure in a debugger, which is effective when the cause of a race is complex.

It makes sense to use both tools and take advantage of each one in the cases where it is best suited. For example, you might run the product first under ThreadSanitizer and fix the easy-to-reproduce races, then run the product under Thread Fuzzing to vary the scheduling of threads and discover hard-to-reproduce races, or capture recordings of races which can’t be solved by pure deduction from the ThreadSanitizer report.

Interested in seeing Thread Fuzzing in action? Book a slot with one of our Solutions Engineers for a quick demo and determine whether this will work in your environment.

See a demo

Stay informed. Get the latest in your inbox.