Then it hit me: we’re tracking version history wrong in the AI era. Or I guess, we’re not tracking it at all.
Let’s consider prompting Claude Code as its own programming language. When I write “modify out user authentication system to handle edge case X and constraint Y,” that’s not just a request - that’s source code. The actual JavaScript it outputs is the ‘compiled’ result.
But, we’re then only tracking the git history of the code base at the “assembly” level. It’s like if you could only git fetch your compiled binaries and were always throwing away all your original C code. Every time you wanted to modify your program, you’d have to reverse-engineer what you originally meant to write from the assembly output.
Why don’t we track the original inputs? The conversation contains the real logic: * Requirements and constraints * Edge cases discovered through iteration * Why we rejected certain approaches * Business context that shaped decisions
Right now, all of that reasoning just… disappears. We’re both (Me and Claude) left with just the code and have to guess at the intent.
The scale problem is real - you can’t just dump entire conversation threads into version control. A single coding session might be 50k tokens of back-and-forth. But most of that is noise. The signal is in specific moments: the user prompt, and the agent reasoning that led to each code modification.
What if we tracked it line by line? Claude Code already works line by line - when it edits code, it rewrites entire lines. We could tag every line of code with conversation IDs. Store all Claude code conversations as JSON where each prompt and agent reasoning gets its own ID. When a prompt makes the agent think and produce a tool call to edit lines -> those lines of code have its own metadata of conversation ids that directly relate to why it was written.
Imagine browsing your codebase and seeing that line 23 has 4 conversation tags. Click to expand and see: “I want to make this page only allow users who… [P_001], The user wants me to X, I should change the server file… [P_047], make sure we also include… [P_089], All tests passed but we still haven’t solved x edge case…. [P_156].”
You can trace the entire decision history behind every single line.
I’m sure there are implementation challenges I’m not seeing as a coding newcomer. We’ve figured out how to make AI write code that works - but we’re losing the most valuable part (the why) and only keeping the output.
Has anyone experimented with conversational version control? Are there technical reasons this wouldn’t work that I’m missing?
My suggestion is that you try using actual, literal Git for this, and then evaluate for yourself. Git doesn't care about programming language syntax. It cares about being able to feed text files to a diff algorithm.
Aside from saving versioned conversations as text files, if you have e.g. an entire commit generated from a prompt, you can include the prompt in the commit message.
At the moment I use just extensive docs to track decisions and business logic but its static and is constantly going stale.