Replaying, not reproducing bugs – how LiveRecorder works at a technical level

In our previous blog we looked at the worst case scenario for a software vendor – finding a bug in your software, when the bug is discovered in production at a remote customer site. We covered the business impact and the time and effort it takes to reproduce bugs of this type, as well as introducing LiveRecorder, our new tool, which has been launched to help solve the problem.

LiveRecorder allows Linux programs to make a detailed recording of themselves while they are running. The resulting Undo Recording can then be sent back to developers, letting them debug an exact copy of the original program’s execution on their own work machine. This means developers can track down bugs without needing to reproduce them in-house, write test cases or make time-consuming visits to customer sites.

LiveRecorder – how it works

In this blog I want to look at the technical details of LiveRecorder – how it works and how developers can integrate it into their code.

LiveRecorder comes as a library (available as a .so shared object for dynamic linking and as a .a archive for static linking) with a very simple API consisting of a small number of easy-to-use C functions declared in a single undolr.h header file. These functions allow the application itself to control all aspects of LiveRecorder’s behaviour.

There are functions to start recording and save the current recording to a file. There is also a function that sets up an automatic save to a file when the application terminates, so that unexpected failures can be easily captured.

For example, if the developer wants to ensure that a recording is created whenever the application terminates unexpectedly, they can call

undolr_recording_start()

and

undolr_save_on_termination()

at the start of main(), then call

undolr_save_on_termination_cancel()

before a normal exit.

Alternatively the vendor may want to allow the user to be in control of recording, in which case the vendor should add an appropriate user interface to the application and make calls to

undolr_recording_start()

and

undolr_recording_save()

as directed by the user.

We’ve also just developed the ability to stop (and then re-start) recording – this is currently undergoing final testing, so contact us if you want early access to it.

The LiveRecorder event log

LiveRecorder uses an event log to store all non-deterministic input to the application. Non-deterministic input is all data that isn’t generated by the application itself, for example data read from sockets and files. More generally, the event log includes all data returned to the application by system calls.

The event log is a circular buffer in memory and so LiveRecorder will discard early portions of the application’s history if the event log fills up. This means the amount of history that can be saved is a function of how much external input the application receives and the size of the event log (not on how long the recording is active).

The application can use the

undolr_event_log_size_set()

function to set the size of the event log.

From our beta program, and our own testing, we’ve seen that LiveRecorder is simple to add to files and provides a seamless way of recording exactly what a program is doing when executing. And as you can then replay a bug, rather than having to go through the potentially expensive and time-consuming process of reproducing it, bugs can be found faster, keeping customers happy and increasing efficiency. To see more about how LiveRecorder works, watch our demo video or alternatively download our whitepaper on the subject.

Stay informed. Get the latest in your inbox.