Resources
How Time Travel Debugging Tames Complex Codebases and Overconfident AI
Author: Chris Croft-White, Staff Solutions Architect at Undo
As a C and C++ developer, I recently found myself in unfamiliar territory: debugging the MySQL optimizer with zero prior knowledge of the codebase. What made this possible wasn’t years of MySQL expertise, but rather the combination of time travel debugging and carefully managed AI assistance. This is the story of how modern debugging tools can empower you to tackle even the most complex, unfamiliar systems, and how to prevent AI overconfidence from leading you astray.
The challenge: debugging the unknown
It started with MySQL bug #113097: a query returning an empty set when it should return data. I had never touched the MySQL codebase before. I didn’t even fully understand the SQL query in question. Traditional debugging wisdom would suggest this was foolhardy, how can you debug what you don’t understand?
This is where time travel debugging fundamentally changes the game: I could capture the bug happening and then explore it forensically, learning the codebase through observation rather than study.
The power of observable behavior
My first step wasn’t to read MySQL internals documentation. Instead, I did the following:
- Built MySQL with debug symbols,
- Reproduced the issue,
- Recorded the entire execution with Undo,
- Started exploring.
The recording immediately revealed crucial information:
- The issue was 100% reproducible (likely ruling out a race condition),
- Changing the
WHERE
clause column made it work (suggesting algorithmic, not data issues), - The derived table had zero rows for the failing query.
Without reading a single line of documentation, I could see the behavior difference between working and failing queries. Time travel debugging turned the opaque MySQL server into a glass box.
Learning through exploration
Not being an SQL expert, I used ChatGPT to explain what the query from the bug should do. Then I used the debugger to see what it actually did. This gap between expectation and reality became my guide.
CREATE TABLE temp (createdAt datetime, timezone varchar(64)); INSERT INTO temp VALUES('2022-02-15 16:47:12', null); SELECT * FROM ( SELECT coalesce(timezone, '+00:00') tz, convert_tz(createdAt, '+00:00', (select tz)) ca, if(createdAt, if((select 1) = 1,1,1), 1) random FROM temp ) aux WHERE ca = '2022-02-15 16:47:12';
The query should return a single row table, but it returns the empty set. See the ChatGPT explanation here to learn more.
I searched the codebase to find the two functions used in the column definitions, and setting breakpoints on the coalesce
and convert_tz
functions, I discovered something fascinating: the third argument passed to convert_tz
changed from (select tz)
to just tz
when the outer WHERE
clause changed from a working query (WHERE tz = '+00:00'
which returns the correct single row of data, with the ca
column containing '2022-02-15 16:47:12'
) to the bad one above. This was observable, concrete evidence, not speculation.
The time travel debugger let me:
- Start from the network event receiving the query
- Follow execution forward to see where it diverged
- Jump backward when I went too deep
- Compare execution paths between working and failing queries side-by-side
The AI partnership: benefits and dangers
Armed with concrete observations, I engaged Claude Code. Here’s where the story becomes instructive about AI-assisted debugging.
First session: the overconfidence trap
My first Claude Code session immediately fell into the classic AI trap. Despite my careful observations, Claude latched onto a plausible pattern it recognized: “This is clearly a subquery handling issue in condition pushdown.” It confidently produced a patch adding validation for subqueries.
The problem? The patch checked for item->has_subquery()
, which returned false
at runtime. Claude had assumed (select tz)
would be an Item::SUBSELECT_ITEM
, but MySQL had already optimized it to an Item::REF_ITEM
. The entire patch was based on a false assumption. Claude was very confident about its hypothesis as well as being very convincing to someone who doesn’t know the code base in explaining the hypothesis, root cause, and the proposed patch.
Second session: evidence-based success
Before the second session with Claude Code, I discovered MySQL’s optimizer tracing, which revealed the exact transformation: (select tz)
was becoming temp.timezone
in the “original_condition”. Armed with this concrete evidence, I started fresh with Claude Code.
This time, I insisted on runtime verification at every step:
Me: “I have a recording of this bug. Let’s verify everything at runtime.”
Claude: “The issue appears to be with subquery handling…”
Me: “Let’s check. What’s the actual type at this breakpoint?”
[Result: Item::REF_ITEM, not SUBSELECT_ITEM]
The recording allowed us to:
- Verify field indexes were correct (
tz=0, ca=1
) - See that the
REF_ITEM
correctly pointed toCOALESCE
initially - Trace through transformations to find corruption during
clone_expression()
- Discover the unexpected serialize/reparse implementation
- Pinpoint the exact line where serialization went wrong
Time travel debugging: the great equalizer
What made it possible for someone with zero MySQL experience to find this bug? Time travel debugging provided capabilities that traditional debugging cannot:
1. Non-destructive exploration
I could aggressively explore code paths, knowing reverse-next
would bring me back. No need to restart, no lost state, no fear of going too deep.
2. Comparative analysis
With recordings of both working and failing queries, I could literally diff the execution paths. Where did they diverge? What was different? Observable facts, not theories.
3. Evidence collection
Every assumption could be immediately verified:
- “The field index might be wrong” → Check it
- “The expression might be corrupted” → Examine it
- “This might be a race condition” → Run it multiple times in the recording
4. Learning by observation
I learned MySQL’s architecture by watching it run:
- How derived tables are created,
- How expressions are transformed,
- How the optimizer works,
- How cloning is implemented (serialize/reparse, who knew?)
5. Precision root cause analysis
When we found data corruption happened inside a function, we could:
reverse-next # Step back to before corruption step # Step forward into the corrupting function # Repeat until we find the exact line
The journey to root cause
The successful debugging session revealed a fascinating bug:
- Observable symptom:
WHERE
clause on derived table returns no results - First evidence:
convert_tz
third argument changes with outerWHERE
clause - Deeper investigation: Optimizer trace shows
(select tz)
becomestemp.timezone
- Runtime discovery:
(select tz)
is optimized toREF_ITEM
, notSUBSELECT_ITEM
- Transformation tracking:
REF_ITEM
correctly points toCOALESCE
initially - Corruption point: During
clone_expression()
- Architectural surprise: Cloning uses serialize/reparse
- Root cause:
Item_ref::print()
prints just “tz
” instead of full expression - Initial fix: Add check for
FUNC_ITEM
in print condition
The patch that wasn’t complete
Just when we thought we had solved the MySQL optimizer bug, reality provided a humbling lesson: we got the MySQL main testsuite running and our fix broke other tests. Time travel debugging once again proved invaluable, I could record the failing tests and explore exactly why our patch caused problems.
Discovering the real complexity
The initial fix added a check for FUNC_ITEM
and if so, the code would fall through to use the print()
function for that FUNC_ITEM
instead, rather than just using the reference’s name (all other parts of the if
predicate will be true at this point, and ref_item()->real_item()->item_name
returns the name of the reference, whereas we want to output the full function expression):
if (!const_item() && m_alias_of_expr && + ref_item()->type() != Item::FUNC_ITEM && // Initial fix ref_item()->type() != Item::CACHE_ITEM && ref_type() != VIEW_REF && table_name == nullptr && item_name.ptr()) { Simple_cstring str1 = ref_item()->real_item()->item_name; append_identifier(thd, str, str1.ptr(), str1.length()); } else { ref_item()->print(thd, str, query_type); }
But recording the failing tests revealed this broke legitimate cases where a column reference in a HAVING clause should print as just the column name. For example:
SELECT COALESCE(col, 0) AS alias FROM t HAVING alias > 5
The HAVING
clause should be printed “alias
“, not “COALESCE(col, 0)
“, which the fix above caused.
Understanding the context with time travel debugging
The key insight was that the problematic behavior only occurs during a specific operation: when pushing WHERE
conditions down to derived tables and serializing expressions for re-parsing. The Item *Query_block::clone_expression()
function in sql_derived.cc
function is used for cloning expressions to push down conditions onto derived tables.
This is the function which calls static Item *parse_expression()
function to serialize and reparse the Item
which represents the ca
column of the derived table. Inside parse_expression(), before it calls print() on the Item (recursing down the representation of the ca column definition, ultimately regenerating the string representation of the convert_tz() function call), it sets a state variable in the thread data structure to provide context for the operation at hand, specifically thd->lex->reparse_derived_table_condition
is set to true
for the duration of the recursive print().
This was the key insight! The problematic behavior only occurs during this specific operation. By stepping through both the original bug and the failing tests, I could see:
- During normal query printing: the flag is false
- During derived table condition pushdown: the flag is true
The refined fix
The refined patch leverages this context:
if (!const_item() && m_alias_of_expr && + (ref_item()->type() != Item::FUNC_ITEM || !thd->lex->reparse_derived_table_condition) && ref_item()->type() != Item::CACHE_ITEM && ref_type() != VIEW_REF && table_name == nullptr && item_name.ptr()) {
This ensures that:
- During normal operations (
HAVING/WHERE
clauses): aliases are preserved - During derived table condition pushdown: function expressions are fully serialized
The optimizer trace mystery
Even with the fix applied, the optimizer trace reveals an interesting anomaly. The “expanded_query
” shows:
select coalesce(`temp`.`timezone`,'+00:00') AS `tz`, convert_tz(`temp`.`createdAt`,'+00:00',`tz`) AS `ca`, if(`temp`.`createdAt`,if((1 = 1, 1, 1), 1) AS `random` from `temp`
This expanded query is actually invalid SQL, MySQL would report “Unknown column 'tz' in 'field list'
” because tz
is being referenced in the same SELECT
where it is defined.
However, the “original_condition
” in the trace shows the critical difference that caused the bug (`tz`
was replaced by `temp`.`timezone`
) and the patch now fixes this to correctly evaluate the coalesce()
function, effectively the same as `tz`
in the derived table:
With the fix:
"original_condition": "(convert_tz(`temp`.`createdAt`,'+00:00',coalesce(`temp`.`timezone`,'+00:00')) = '2022-02-15 16:47:12')"
Without the fix:
"original_condition": "(convert_tz(`temp`.`createdAt`,'+00:00',`temp`.`timezone`) = '2022-02-15 16:47:12')"
This reveals that the fix works at the crucial moment, when the condition is being serialized for pushdown, even though the trace’s “expanded_query
” representation remains confusing. Potentially there is a better location further upstream to address this bug which an expert in MySQL could identify, but with my lack of understanding of the full codebase, I could not see how to fix it upstream and carry down the necessary context to clone the query properly.
Lessons for complex debugging
1. Recordings democratize debugging
You don’t need years of codebase experience. You need observable behavior and the ability to trace execution.
2. AI assists but doesn’t lead
Use AI for navigation and hypothesis generation, but verify everything against the recording. Remember that AI can be very convincing when it explains what it thinks it understands, question everything.
3. Evidence beats expertise
My lack of MySQL knowledge was overcome by concrete runtime evidence. The recording doesn’t lie.
4. Test comprehensively and iteratively
A fix that solves one problem while creating two others is not a fix. Time travel debugging lets you record failing tests and understand exactly why your fix breaks them. Remember the AI is likely focusing on the bug details given and not looking for better or more general fixes.
5. Context is everything
The same function (Item_ref::print()
) needs different behavior in different contexts. Time travel debugging reveals these contexts by letting you trace the full call stack and understand the state at each point.
6. Architecture reveals itself
By stepping through execution, surprising implementations (like serialize/reparse cloning) reveal themselves. The reparse_derived_table_condition
flag discovery came from methodically tracing the execution path.
7. Question upstream (or backwards in time)
The fact that the optimizer trace shows invalid SQL suggests there might be a better fix location upstream. However, without deep MySQL knowledge, the pragmatic fix at the serialization point is defensible.
8. Document edge cases
The refined patch specifically handles the derived table pushdown case while preserving normal behavior. This specificity should be documented in the code to explain this patch should further conditions see erroneous behavior here.
9. Tracers can mislead you
The optimizer trace helped identify the problem but also shows confusing intermediate states. Understanding what the trace represents versus what actually executes is important.
The debugging protocol
For anyone facing a complex bug in unfamiliar code:
- Record first: Capture the failing behavior
- Observe differences: Record working cases for comparison
- Trace execution: Follow data flow, not code structure
- Verify everything: Every AI suggestion gets runtime verification
- Find corruption points: When data goes bad,
reverse-next
and dive deeper - Test and re-record: When fixes fail tests, record those too
- Understand context: Trace back to understand when and why code is called
- Document evidence: Build a trail of verified facts
Conclusion
Time travel debugging transforms debugging from an exercise in expertise to an exercise in investigation. Combined with AI assistance, properly managed to avoid overconfidence, it enables developers to tackle bugs in any codebase, no matter how complex or unfamiliar.
The MySQL bug that seemed impossibly complex to someone who had never seen the codebase became a straightforward investigation with the right tools. The AI that initially led us astray became invaluable when constrained by runtime evidence. Even when our first fix proved incomplete, time travel debugging allowed us to understand why and refine our solution.
The key insight wasn’t just finding where the bug occurred, but understanding the different contexts in which the code operates. The reparse_derived_table_condition
flag discovery exemplifies how time travel debugging reveals not just what code does, but when and why it does it.
The next time you face a bug in unfamiliar territory, remember: you don’t need to be an expert in the code. You just need to be competent at investigation. Record the behavior, trace the execution, verify every assumption, test comprehensively, and let the evidence guide you to the complete solution.
Bugs do not hide in code; they reveal themselves in execution. You just need the right tools to see it, and the discipline to keep investigating until the fix is truly complete
Epilogue: when great minds debug alike
After submitting my findings to MySQL bug #113097
, the MySQL Verification Team provided an unexpected response:
Thank you for the feedback.
Confirmed from the development team that this issue has been already fixed by an internally reported bug Bug #35710183.
Despite bug #113097
remaining open for months, it had already been fixed internally. Their solution, committed to MySQL 8.0.36, was remarkably similar to mine, with one key difference that perfectly illustrates the value of domain expertise.
Comparing the fixes
My proposed fix:
if (!const_item() && m_alias_of_expr && + (ref_item()->type() != Item::FUNC_ITEM || !thd->lex->reparse_derived_table_condition) && ref_item()->type() != Item::CACHE_ITEM && ref_type() != VIEW_REF && table_name == nullptr && item_name.ptr()) {
The official MySQL fix:
- if (!const_item() && m_alias_of_expr && + if (!thd->lex->reparse_derived_table_condition && m_alias_of_expr && ref_item()->type() != Item::CACHE_ITEM && ref_type() != VIEW_REF && table_name == nullptr && item_name.ptr()) {
The core insight was identical: use the reparse_derived_table_condition
flag to determine when to print the full expression versus the alias. However, their fix was more general, they removed the const_item()
check entirely and didn’t limit it to just FUNC_ITEM
cases.
The optimizer trace mystery persists
Interestingly, even with the official fix in MySQL 8.0.36, the optimizer trace anomaly I discovered remains. The trace still shows:
select coalesce(`temp`.`timezone`,'+00:00') AS `tz`, convert_tz(`temp`.`createdAt`,'+00:00',`tz`) AS `ca`, if(`temp`.`createdAt`,if((1 = 1, 1, 1), 1) AS `random` from `temp`
Where tz is still being referenced in the same SELECT
where it is defined (invalid). While the actual query execution now works correctly (the fix ensures proper serialization during condition pushdown), the trace representation issue suggests there may be additional refinements needed in how MySQL generates its optimizer traces.
Why might MySQL generate invalid expanded queries?
It seems odd that MySQL’s optimizer trace would output SQL that isn’t semantically equivalent to the original query. The transformation from (select tz)
to just tz
breaks SQL scoping rules – in SQL, column aliases can’t be referenced in the same SELECT clause where they’re defined, but (select tz)
creates a scalar subquery context that makes this reference valid (thank you Claude for explaining that one to me).
The expanded_query may show an intermediate internal representation rather than valid SQL. MySQL may be displaying how it conceptually transforms queries before applying proper scoping rules, prioritizing clarity of transformations over syntactic validity in this debugging output. At this point I very much leave this one to the MySQL team to determine if this is a bug or a feature.
This is where time travel debugging proved invaluable. While the optimizer trace produced confusing output that seemed to contradict SQL semantics, the recording let us see exactly what MySQL actually executed at runtime. We could verify that the real issue was in the serialization during clone_expression()
, not in some mysterious scope-breaking transformation. When built in debugging tools themselves produce misleading output, being able to trace actual execution becomes essential for finding ground truth.
The power of time travel debugging validated
This outcome validates the entire debugging approach. Starting with zero MySQL knowledge, using time travel debugging and carefully managed AI assistance, I arrived at the same fundamental solution as the MySQL experts who were fixing a different bug (#35710183: “mysql server 8.1.0 crashed at Item_ident::walk”).
Their commit message reveals they encountered a similar issue with user variables during condition pushdown, concluding: “In case of condition pushdown, we always want the original expression to be printed instead of the alias.”
Conservative vs. general solutions
The difference between our fixes perfectly illustrates the expertise gap:
- My fix: Conservative, handling only the specific case I could observe and verify (function references)
- Their fix: General, handling all cases where condition pushdown occurs
I couldn’t know that the fix could be more general without understanding the full scope of condition pushdown scenarios. The MySQL team, having encountered this pattern with user variables, recognized it as a broader issue requiring a comprehensive solution.
The debugging method stands strong
This outcome reinforces the key lessons:
- Time travel debugging enables accurate root cause analysis – I found the exact problem location and mechanism
- Observable evidence leads to correct solutions – The fix addressed the actual issue, just more conservatively
- Domain expertise refines, but doesn’t replace investigation – The experts made a more general fix, but the core insight was the same
- AI assistance works when constrained by evidence – Claude helped navigate, but the recording provided truth
The fact that bug #113097
was closed only after my submission, despite being fixed internally, suggests that connecting symptoms to root causes across different bug reports isn’t always straightforward, even for experts. Time travel debugging cut through this complexity by following the execution directly.
Final reflection
Starting from complete unfamiliarity with MySQL’s codebase, time travel debugging led me to implement the same fundamental fix as the official developers. The only difference was scope, they knew it was safe to apply more broadly.
This experience demonstrates that modern debugging tools truly democratize complex problem-solving. You don’t need to be an expert to find and fix bugs in complex systems. You need the right tools, a systematic approach, and the discipline to follow evidence rather than assumptions.
The bug wasn’t hiding, it was revealing itself in execution, waiting for someone with the right tools to see it. Whether that someone is a MySQL expert or a complete newcomer with a time travel debugger, the truth remains the same.