How to quickly resolve long-standing failures in your test suite with UDB

How to quickly resolve long-standing failures in your test suite with UDB

When dealing with unfamiliar code, there is a huge productivity benefit in being able to go backwards and forwards over the same section of code until you fully understand what it does.

Rob Thompson, Senior Software Engineer, Siemens EDA

 

I was working on our high level synthesis tool (HLS) called Catapult. The tool takes as input an algorithm in C or C++  and a set of constraints; it then generates output that can be used to build that algorithm as part of a chip. A typical example of how it would be used is to take the description of a video compression algorithm in C++ and generate a hardware implementation of that compression algorithm in a cellphone or tablet.

The problem I was trying to debug manifested as a test failure in a suite that tests the “save and restore” capabilities in Catapult, which uses an XML based file to save a snapshot of the current work. I checked our records and found that the test in question (in fact the entire test suite) had been exhibiting random failures. The particular test was showing that the names of objects (which should be the same every time the test is run) were changing and it was the change of name that was causing the problem as it no longer matched what the test expected. A command to manipulate an object with a particular name was failing with an “object not found” message.

Because of what the test expected, I knew a “good” name and a “bad” name for an object in Catapult’s database. I created a breakpoint that would fire when the incorrect name was first created. I ran the test repeatedly in UDB until the breakpoint fired. Fortunately, this only took 2 or 3 tries. The way Catapult generates names means that names – or parts of them – often are inherited from other objects. I used conditional breakpoints in the constructor for the database objects to track the history of the name. In this case, a link between objects that was supposed to indicate parentage was null when it shouldn’t be. The root cause of the random failures was the way the data was originally written into an internal data structure. The data structure was subsequently referenced by a dictionary which used the object’s address as the key. The XML writer iterated over the dictionary in an order that was sensitive to the address of the objects. Since Catapult uses multiple threads and a shared memory pool, the address of an object is not reproducible from run to run. Therefore the order of objects in the XML file was changing, and if it happened that a child object appeared before its parent, the linking of the parent would fail. The solution was to add additional code to build a list of forward referenced objects, and resolve the forward references once all data had been read.

From start to finish, this took most of a week to resolve. The purely mechanical aspects of creating breakpoints and letting the debugger reverse into them probably took 1 or 2 days, the longer part was making the intellectual leap to understand what the debugger was showing me.

Without UDB, I seriously doubt this problem would have been solved. I discovered that the test suite had been giving random failures for the past year and other smart people had tried to investigate the failures and not succeeded in finding the cause. The problem with random failures in the test suite went away with this fix.

The bug took me into areas of code that I was unfamiliar with. Being able to step through unfamiliar code in both directions, is a huge advantage. Sometimes in debugging, you overshoot the point you really wanted to examine and you’re left at a point of “OK, that’s not what I expected – just how did that happen?” This is especially true when looking at a section of someone else’s code for the first time. When there is also an element of non-repeatability involved, that ability to replay in greater detail is even more important – you may not be able to get back to look in greater detail again.

Using conditional breakpoints in a constructor to trace back where an object came from, and doing that repeatedly until a problem ancestor is found has turned out to be a powerful technique which I’ve employed several times since. Even when a result is reproducible, the mechanics of going backwards in a single debug session turned out to be easier than trying to do the same with conventional tools. When dealing with unfamiliar code, there is a huge productivity benefit in being able to go backwards and forwards over the same section of code until (finally) you fully understand what it does.

Learn more about UDB’s reverse debugging capabilities

This is a guest article by Rob Thompson, Senior Software Engineer at Siemens EDA

Stay informed. Get the latest in your inbox.