Resources

How Time Travel Debugging Tames Complex Codebases and Overconfident AI

Author: Chris Croft-White, Staff Solutions Architect at Undo

As a C and C++ developer, I recently found myself in unfamiliar territory: debugging the MySQL optimizer with zero prior knowledge of the codebase. What made this possible wasn’t years of MySQL expertise, but rather the combination of time travel debugging and carefully managed AI assistance. This is the story of how modern debugging tools can empower you to tackle even the most complex, unfamiliar systems, and how to prevent AI overconfidence from leading you astray.

The challenge: debugging the unknown

It started with MySQL bug #113097: a query returning an empty set when it should return data. I had never touched the MySQL codebase before. I didn’t even fully understand the SQL query in question. Traditional debugging wisdom would suggest this was foolhardy, how can you debug what you don’t understand?

This is where time travel debugging fundamentally changes the game: I could capture the bug happening and then explore it forensically, learning the codebase through observation rather than study.

The power of observable behavior

My first step wasn’t to read MySQL internals documentation. Instead, I did the following:

Built MySQL with debug symbols,
Reproduced the issue,
Recorded the entire execution with Undo,
Started exploring.

The recording immediately revealed crucial information:

The issue was 100% reproducible (likely ruling out a race condition),
Changing the WHERE clause column made it work (suggesting algorithmic, not data issues),
The derived table had zero rows for the failing query.

Without reading a single line of documentation, I could see the behavior difference between working and failing queries. Time travel debugging turned the opaque MySQL server into a glass box.

Learning through exploration

Not being an SQL expert, I used ChatGPT to explain what the query from the bug should do. Then I used the debugger to see what it actually did. This gap between expectation and reality became my guide.


CREATE TABLE temp (createdAt datetime, timezone varchar(64));

INSERT INTO temp VALUES('2022-02-15 16:47:12', null);

SELECT
  *
FROM
  (
    SELECT
      coalesce(timezone, '+00:00') tz,
      convert_tz(createdAt, '+00:00', (select tz)) ca,
      if(createdAt, if((select 1) = 1,1,1), 1) random

   FROM temp
  ) aux    
   WHERE ca = '2022-02-15 16:47:12';

The query should return a single row table, but it returns the empty set. See the ChatGPT explanation here to learn more.

I searched the codebase to find the two functions used in the column definitions, and setting breakpoints on the coalesce and convert_tz functions, I discovered something fascinating: the third argument passed to convert_tz changed from (select tz) to just tz when the outer WHERE clause changed from a working query (WHERE tz = '+00:00' which returns the correct single row of data, with the ca column containing '2022-02-15 16:47:12') to the bad one above. This was observable, concrete evidence, not speculation.

The time travel debugger let me:

Start from the network event receiving the query
Follow execution forward to see where it diverged
Jump backward when I went too deep
Compare execution paths between working and failing queries side-by-side

The AI partnership: benefits and dangers

Armed with concrete observations, I engaged Claude Code. Here’s where the story becomes instructive about AI-assisted debugging.

First session: the overconfidence trap

My first Claude Code session immediately fell into the classic AI trap. Despite my careful observations, Claude latched onto a plausible pattern it recognized: “This is clearly a subquery handling issue in condition pushdown.” It confidently produced a patch adding validation for subqueries.

The problem? The patch checked for item->has_subquery(), which returned false at runtime. Claude had assumed (select tz) would be an Item::SUBSELECT_ITEM, but MySQL had already optimized it to an Item::REF_ITEM. The entire patch was based on a false assumption. Claude was very confident about its hypothesis as well as being very convincing to someone who doesn’t know the code base in explaining the hypothesis, root cause, and the proposed patch.

Second session: evidence-based success

Before the second session with Claude Code, I discovered MySQL’s optimizer tracing, which revealed the exact transformation: (select tz) was becoming temp.timezone in the “original_condition”. Armed with this concrete evidence, I started fresh with Claude Code.

This time, I insisted on runtime verification at every step:

Me: “I have a recording of this bug. Let’s verify everything at runtime.”

Claude: “The issue appears to be with subquery handling…”

Me: “Let’s check. What’s the actual type at this breakpoint?”

[Result: Item::REF_ITEM, not SUBSELECT_ITEM]

The recording allowed us to:

Verify field indexes were correct (tz=0, ca=1)
See that the REF_ITEM correctly pointed to COALESCE initially
Trace through transformations to find corruption during clone_expression()
Discover the unexpected serialize/reparse implementation
Pinpoint the exact line where serialization went wrong

Time travel debugging: the great equalizer

What made it possible for someone with zero MySQL experience to find this bug? Time travel debugging provided capabilities that traditional debugging cannot:

1. Non-destructive exploration

I could aggressively explore code paths, knowing reverse-next would bring me back. No need to restart, no lost state, no fear of going too deep.

2. Comparative analysis

With recordings of both working and failing queries, I could literally diff the execution paths. Where did they diverge? What was different? Observable facts, not theories.

3. Evidence collection

Every assumption could be immediately verified:

“The field index might be wrong” → Check it
“The expression might be corrupted” → Examine it
“This might be a race condition” → Run it multiple times in the recording

4. Learning by observation

I learned MySQL’s architecture by watching it run:

How derived tables are created,
How expressions are transformed,
How the optimizer works,
How cloning is implemented (serialize/reparse, who knew?)

5. Precision root cause analysis

When we found data corruption happened inside a function, we could:

reverse-next  # Step back to before corruption
step          # Step forward into the corrupting function
# Repeat until we find the exact line

The journey to root cause

The successful debugging session revealed a fascinating bug:

Observable symptom: WHERE clause on derived table returns no results
First evidence: convert_tz third argument changes with outer WHERE clause
Deeper investigation: Optimizer trace shows (select tz) becomes temp.timezone
Runtime discovery: (select tz) is optimized to REF_ITEM, not SUBSELECT_ITEM
Transformation tracking: REF_ITEM correctly points to COALESCE initially
Corruption point: During clone_expression()
Architectural surprise: Cloning uses serialize/reparse
Root cause: Item_ref::print() prints just “tz” instead of full expression
Initial fix: Add check for FUNC_ITEM in print condition

The patch that wasn’t complete

Just when we thought we had solved the MySQL optimizer bug, reality provided a humbling lesson: we got the MySQL main testsuite running and our fix broke other tests. Time travel debugging once again proved invaluable, I could record the failing tests and explore exactly why our patch caused problems.

Discovering the real complexity

The initial fix added a check for FUNC_ITEM and if so, the code would fall through to use the print() function for that FUNC_ITEM instead, rather than just using the reference’s name (all other parts of the if predicate will be true at this point, and ref_item()->real_item()->item_name returns the name of the reference, whereas we want to output the full function expression):

   if (!const_item() && m_alias_of_expr &&
 +     ref_item()->type() != Item::FUNC_ITEM &&  // Initial fix
       ref_item()->type() != Item::CACHE_ITEM && ref_type() != VIEW_REF &&
       table_name == nullptr && item_name.ptr()) {
        Simple_cstring str1 = ref_item()->real_item()->item_name;
        append_identifier(thd, str, str1.ptr(), str1.length());
      } else {
        ref_item()->print(thd, str, query_type);
      }

But recording the failing tests revealed this broke legitimate cases where a column reference in a HAVING clause should print as just the column name. For example:

SELECT COALESCE(col, 0) AS alias FROM t HAVING alias > 5

The HAVING clause should be printed “alias“, not “COALESCE(col, 0)“, which the fix above caused.

Understanding the context with time travel debugging

The key insight was that the problematic behavior only occurs during a specific operation: when pushing WHERE conditions down to derived tables and serializing expressions for re-parsing. The Item *Query_block::clone_expression() function in sql_derived.cc function is used for cloning expressions to push down conditions onto derived tables.

This is the function which calls static Item *parse_expression() function to serialize and reparse the Item which represents the ca column of the derived table. Inside parse_expression(), before it calls print() on the Item (recursing down the representation of the ca column definition, ultimately regenerating the string representation of the convert_tz() function call), it sets a state variable in the thread data structure to provide context for the operation at hand, specifically thd->lex->reparse_derived_table_condition is set to true for the duration of the recursive print().

This was the key insight! The problematic behavior only occurs during this specific operation. By stepping through both the original bug and the failing tests, I could see:

During normal query printing: the flag is false
During derived table condition pushdown: the flag is true

The refined fix

The refined patch leverages this context:

   if (!const_item() && m_alias_of_expr &&
 +     (ref_item()->type() != Item::FUNC_ITEM || !thd->lex->reparse_derived_table_condition) && 
       ref_item()->type() != Item::CACHE_ITEM && ref_type() != VIEW_REF &&
       table_name == nullptr && item_name.ptr()) {

This ensures that:

During normal operations (HAVING/WHERE clauses): aliases are preserved
During derived table condition pushdown: function expressions are fully serialized

The optimizer trace mystery

Even with the fix applied, the optimizer trace reveals an interesting anomaly. The “expanded_query” shows:

select coalesce(`temp`.`timezone`,'+00:00') AS `tz`,
       convert_tz(`temp`.`createdAt`,'+00:00',`tz`) AS `ca`,
       if(`temp`.`createdAt`,if((1 = 1, 1, 1), 1) AS `random` 
from `temp`

This expanded query is actually invalid SQL, MySQL would report “Unknown column 'tz' in 'field list'” because tz is being referenced in the same SELECT where it is defined.

However, the “original_condition” in the trace shows the critical difference that caused the bug (`tz` was replaced by `temp`.`timezone`) and the patch now fixes this to correctly evaluate the coalesce() function, effectively the same as `tz` in the derived table:

With the fix:

"original_condition": "(convert_tz(`temp`.`createdAt`,'+00:00',coalesce(`temp`.`timezone`,'+00:00')) = '2022-02-15 16:47:12')"

Without the fix:

"original_condition": "(convert_tz(`temp`.`createdAt`,'+00:00',`temp`.`timezone`) = '2022-02-15 16:47:12')"

This reveals that the fix works at the crucial moment, when the condition is being serialized for pushdown, even though the trace’s “expanded_query” representation remains confusing. Potentially there is a better location further upstream to address this bug which an expert in MySQL could identify, but with my lack of understanding of the full codebase, I could not see how to fix it upstream and carry down the necessary context to clone the query properly.

Lessons for complex debugging

1. Recordings democratize debugging

You don’t need years of codebase experience. You need observable behavior and the ability to trace execution.

2. AI assists but doesn’t lead

Use AI for navigation and hypothesis generation, but verify everything against the recording. Remember that AI can be very convincing when it explains what it thinks it understands, question everything.

3. Evidence beats expertise

My lack of MySQL knowledge was overcome by concrete runtime evidence. The recording doesn’t lie.

4. Test comprehensively and iteratively

A fix that solves one problem while creating two others is not a fix. Time travel debugging lets you record failing tests and understand exactly why your fix breaks them. Remember the AI is likely focusing on the bug details given and not looking for better or more general fixes.

5. Context is everything

The same function (Item_ref::print()) needs different behavior in different contexts. Time travel debugging reveals these contexts by letting you trace the full call stack and understand the state at each point.

6. Architecture reveals itself

By stepping through execution, surprising implementations (like serialize/reparse cloning) reveal themselves. The reparse_derived_table_condition flag discovery came from methodically tracing the execution path.

7. Question upstream (or backwards in time)

The fact that the optimizer trace shows invalid SQL suggests there might be a better fix location upstream. However, without deep MySQL knowledge, the pragmatic fix at the serialization point is defensible.

8. Document edge cases

The refined patch specifically handles the derived table pushdown case while preserving normal behavior. This specificity should be documented in the code to explain this patch should further conditions see erroneous behavior here.

9. Tracers can mislead you

The optimizer trace helped identify the problem but also shows confusing intermediate states. Understanding what the trace represents versus what actually executes is important.

The debugging protocol

For anyone facing a complex bug in unfamiliar code:

Record first: Capture the failing behavior
Observe differences: Record working cases for comparison
Trace execution: Follow data flow, not code structure
Verify everything: Every AI suggestion gets runtime verification
Find corruption points: When data goes bad, reverse-next and dive deeper
Test and re-record: When fixes fail tests, record those too
Understand context: Trace back to understand when and why code is called
Document evidence: Build a trail of verified facts

Conclusion

Time travel debugging transforms debugging from an exercise in expertise to an exercise in investigation. Combined with AI assistance, properly managed to avoid overconfidence, it enables developers to tackle bugs in any codebase, no matter how complex or unfamiliar.

The MySQL bug that seemed impossibly complex to someone who had never seen the codebase became a straightforward investigation with the right tools. The AI that initially led us astray became invaluable when constrained by runtime evidence. Even when our first fix proved incomplete, time travel debugging allowed us to understand why and refine our solution.

The key insight wasn’t just finding where the bug occurred, but understanding the different contexts in which the code operates. The reparse_derived_table_condition flag discovery exemplifies how time travel debugging reveals not just what code does, but when and why it does it.

The next time you face a bug in unfamiliar territory, remember: you don’t need to be an expert in the code. You just need to be competent at investigation. Record the behavior, trace the execution, verify every assumption, test comprehensively, and let the evidence guide you to the complete solution.

Bugs do not hide in code; they reveal themselves in execution. You just need the right tools to see it, and the discipline to keep investigating until the fix is truly complete

Epilogue: when great minds debug alike

After submitting my findings to MySQL bug #113097, the MySQL Verification Team provided an unexpected response:

Thank you for the feedback.

Confirmed from the development team that this issue has been already fixed by an internally reported bug Bug #35710183.

Despite bug #113097 remaining open for months, it had already been fixed internally. Their solution, committed to MySQL 8.0.36, was remarkably similar to mine, with one key difference that perfectly illustrates the value of domain expertise.

Comparing the fixes

My proposed fix:

   if (!const_item() && m_alias_of_expr &&
 +     (ref_item()->type() != Item::FUNC_ITEM || !thd->lex->reparse_derived_table_condition) && 
       ref_item()->type() != Item::CACHE_ITEM && ref_type() != VIEW_REF &&
       table_name == nullptr && item_name.ptr()) {

The official MySQL fix:

 - if (!const_item() && m_alias_of_expr &&
 + if (!thd->lex->reparse_derived_table_condition && m_alias_of_expr &&
       ref_item()->type() != Item::CACHE_ITEM && ref_type() != VIEW_REF &&
       table_name == nullptr && item_name.ptr()) {

The core insight was identical: use the reparse_derived_table_condition flag to determine when to print the full expression versus the alias. However, their fix was more general, they removed the const_item() check entirely and didn’t limit it to just FUNC_ITEM cases.

The optimizer trace mystery persists

Interestingly, even with the official fix in MySQL 8.0.36, the optimizer trace anomaly I discovered remains. The trace still shows:

select coalesce(`temp`.`timezone`,'+00:00') AS `tz`,
       convert_tz(`temp`.`createdAt`,'+00:00',`tz`) AS `ca`,
       if(`temp`.`createdAt`,if((1 = 1, 1, 1), 1) AS `random` 
from `temp`

Where tz is still being referenced in the same SELECT where it is defined (invalid). While the actual query execution now works correctly (the fix ensures proper serialization during condition pushdown), the trace representation issue suggests there may be additional refinements needed in how MySQL generates its optimizer traces.

Why might MySQL generate invalid expanded queries?

It seems odd that MySQL’s optimizer trace would output SQL that isn’t semantically equivalent to the original query. The transformation from (select tz) to just tz breaks SQL scoping rules – in SQL, column aliases can’t be referenced in the same SELECT clause where they’re defined, but (select tz) creates a scalar subquery context that makes this reference valid (thank you Claude for explaining that one to me).

The expanded_query may show an intermediate internal representation rather than valid SQL. MySQL may be displaying how it conceptually transforms queries before applying proper scoping rules, prioritizing clarity of transformations over syntactic validity in this debugging output. At this point I very much leave this one to the MySQL team to determine if this is a bug or a feature.

This is where time travel debugging proved invaluable. While the optimizer trace produced confusing output that seemed to contradict SQL semantics, the recording let us see exactly what MySQL actually executed at runtime. We could verify that the real issue was in the serialization during clone_expression(), not in some mysterious scope-breaking transformation. When built in debugging tools themselves produce misleading output, being able to trace actual execution becomes essential for finding ground truth.

The power of time travel debugging validated

This outcome validates the entire debugging approach. Starting with zero MySQL knowledge, using time travel debugging and carefully managed AI assistance, I arrived at the same fundamental solution as the MySQL experts who were fixing a different bug (#35710183: “mysql server 8.1.0 crashed at Item_ident::walk”).

Their commit message reveals they encountered a similar issue with user variables during condition pushdown, concluding: “In case of condition pushdown, we always want the original expression to be printed instead of the alias.”

Conservative vs. general solutions

The difference between our fixes perfectly illustrates the expertise gap:

My fix: Conservative, handling only the specific case I could observe and verify (function references)
Their fix: General, handling all cases where condition pushdown occurs

I couldn’t know that the fix could be more general without understanding the full scope of condition pushdown scenarios. The MySQL team, having encountered this pattern with user variables, recognized it as a broader issue requiring a comprehensive solution.

The debugging method stands strong

This outcome reinforces the key lessons:

Time travel debugging enables accurate root cause analysis – I found the exact problem location and mechanism
Observable evidence leads to correct solutions – The fix addressed the actual issue, just more conservatively
Domain expertise refines, but doesn’t replace investigation – The experts made a more general fix, but the core insight was the same
AI assistance works when constrained by evidence – Claude helped navigate, but the recording provided truth

The fact that bug #113097 was closed only after my submission, despite being fixed internally, suggests that connecting symptoms to root causes across different bug reports isn’t always straightforward, even for experts. Time travel debugging cut through this complexity by following the execution directly.

Final reflection

Starting from complete unfamiliarity with MySQL’s codebase, time travel debugging led me to implement the same fundamental fix as the official developers. The only difference was scope, they knew it was safe to apply more broadly.

This experience demonstrates that modern debugging tools truly democratize complex problem-solving. You don’t need to be an expert to find and fix bugs in complex systems. You need the right tools, a systematic approach, and the discipline to follow evidence rather than assumptions.

The bug wasn’t hiding, it was revealing itself in execution, waiting for someone with the right tools to see it. Whether that someone is a MySQL expert or a complete newcomer with a time travel debugger, the truth remains the same.

Stay informed. Get the latest in your inbox.

Undo - C/C++/Go/Rust

Undo - Java/Kotlin

Use Case

Industries

Boost Developer Productivity

Watchpoint

Technical Papers

Compare

Resource Center

Runtime Context to Make Coding Agents Work with Complex Code

Contact

Undo