Search for articles on agentic coding and you’ll find the same story repeated. Someone builds a small application without writing a line by hand – sometimes an internal tool, sometimes a side project, always a clean greenfield prototype. The stories are true, and the technology can do this. What’s missing is the next question: does it scale to a production system with users, history, and stakes? When you search for that, the articles thin out. The ones that do exist mostly catalogue the problems without solving them.
That gap was why I stayed skeptical about agentic development. The prototype demos were impressive, but I couldn’t see the bridge from a weekend tool to a system real people depended on. And the writing wasn’t building that bridge for me either.
Then I joined a team that was already running 100% agentic development on a production-scale system – complex enough that the answer mattered, with real users on the way – and the question stopped being theoretical. I saw the approach work at that scale. I also saw where it didn’t – the practices were partial, and a lot of the gotchas hadn’t been worked out yet. So I kept working, on that project and on the ones after it, all in the same pure agentic mode, and kept filling the gaps as I hit them.
Now I have enough to share. This is the first piece in the series.
What vibe coding actually is
Vibe coding, briefly: describe what you want, accept what the agent produces, ship it. Don’t review the code. Trust the model.
For a prototype, that’s fine. The whole point of a prototype is to be cheap to discard. Vibe coding gets you from idea to demo faster than anything else available, and it puts working software in the hands of people who never learned to code. That’s a genuinely new thing in the world. I’m not dismissing it.
But “fast from idea to demo” is a different problem comparing to “ship and maintain a production system.” The gap between those two problems is where most agentic coding writing currently has nothing useful to say.
Where vibe coding breaks
The failure mode isn’t that the model is too dumb. The failure mode is that nobody reviewed the decisions.
What happens, basically, is this. Every time you ask the agent for a new feature, it makes a set of locally reasonable choices based on the context it has at that moment. You never give it the full roadmap – because you don’t have the full roadmap yourself, or because it doesn’t fit in context, or because you’re working feature by feature and you haven’t thought that far ahead. So the agent assumes. And the assumptions accumulate.
For a while, nothing is visibly wrong. Each feature works. Tests pass. The application runs. The drift is invisible, just because you never look at the code.
Then one day you hit a wall. The next feature you ask for doesn’t fit. The application throws errors in places you don’t recognize, and the agent’s fixes don’t converge because the problem isn’t local – the global structure of the code was never planned, and now it’s incoherent. Because you never looked at the code, you can’t help. You don’t know what was built, where the wrong assumption is buried, or which of the agent’s fixes is making things worse. The system is too large to throw away. You have no map to navigate it with.
This wall is well documented at this point. Analyst forecasts predicting that a large share of AI-generated codebases will be cancelled or rewritten. Academic papers naming it the “flow-debt tradeoff.” Practitioner reports with vivid labels like “the pit of despair” for the state of trying to unwind it. What’s missing isn’t acknowledgment of the problem. What’s missing is a working solution that holds up at production scale. Most of the prescriptions on offer are governance abstractions – “separate agent roles,” “enforce project memory,” “gate on test coverage” – rather than a day-to-day practice you can actually run on.
The fix is getting the developer back into the loop
The fix isn’t using less AI. Going back to coding by hands gives up the whole productivity gain, and the agent really does code well – across stacks, on production systems, at a speed no one on the team could match. What it does badly is own decisions with global consequences in a system it can’t see all of.
So you put the developer back in. The agent codes. The developer owns everything else – intent, planning, architecture, review, validation, and the judgment that ties them together. This isn’t a simple handoff. Both sides do real work – the agent executes fast across the full codebase, the developer keeps the decisions intentional and the system coherent. Basically, each part of what the developer owns is its own topic, with its own gotchas and failure modes.
Two of them are where to start tomorrow. First, always start in planning mode. The agent maps out what it’s going to do before writing any code – you get a review point before any time is spent, and the assumptions get surfaced where you can still answer them. Second, review the changes. Not line by line, but with enough attention to catch the moments the agent did something nobody would have signed off on. Neither is a magic switch. Planning mode done wrong still produces drift. Review without knowing what to look for misses the problems that matter. Both have enough underneath them to fill a piece each, and I’ll get there. But skip either and you end up back at the wall.
That’s the shape of it. The developer doesn’t type anymore. But the developer is in the loop on every decision that shapes the system – and that’s what makes this pair programming and not vibe coding. Coding by hand is gone. Decision ownership isn’t.
Where this goes next
This is the start of a series. The setup that makes the loop work, the practices inside the loop that took me the longest to figure out, the gotchas nobody warned me about, the tools I actually use, what changes when a whole team runs this way – I’ll get into all of it across the pieces that follow.
Pair programming with an agent is a moving target. Vendors ship, models get better, last year’s practice stops being the best option. I’m not going back to writing code by hand, and I’ll keep filling gaps as I hit them – in the work, and in the writing. The next piece gets into the first one.
