The AI Productivity Lie Nobody Wants to Admit

I’ve been producing bad code. And it’s not because I forgot how to code.

I’ve tried every workflow. Terminal agents, IDE copilots, full vibe coding, augmented coding. I keep exploring because that’s what I do — evaluate, keep what works, move on from what doesn’t.

But here’s where I am right now: most of the time, it’s still more reliable for me to write the code myself than to let the agent do it.

When the AI writes it, yes — it’s faster sometimes. But then I review it. I find things. I correct things. And suddenly I’m in a loop where I’m either spending more time than if I just did it myself, or the same time doing a more tedious version of the work. I’m not writing code anymore. I’m auditing code I didn’t write and don’t fully trust.

And I’m starting to wonder what that’s doing to my engineering skills. Not because AI replaced them. Because the constant pressure to delegate everything is pulling me away from the deep thinking that built those skills in the first place.

The Pressure Nobody Talks About

The expectation to be dramatically more productive with AI is real. It’s coming from management, from Twitter, from inside your own head. Every demo makes it look like everyone else figured it out and you’re falling behind.

And yeah — AI can be a huge boost. But only if you have perfect project structure, perfect product context, perfect documentation.

That doesn’t exist. Not in any real codebase I’ve ever worked on.

You know what exists? Legacy code. Tech debt. Patterns that were “temporary” three years ago. Business logic that lives in someone’s head and nowhere else.

When you point an AI agent at that, it doesn’t fix the problems. It copies them. It amplifies them. It confidently reproduces your worst patterns at scale.

So now you’re not just dealing with tech debt. You’re dealing with AI-generated tech debt that looks clean because the agent formatted it nicely.

The Data That Should Make You Uncomfortable

Here’s what made me feel less crazy about all of this. And each study hits harder than the last.

You’re Not Even Faster

METR, a nonprofit research organization, ran what might be the most rigorous study on this topic to date. They took 16 experienced open-source developers — people who maintain large repositories, averaging 22,000+ stars and over a million lines of code — and had them work on real issues in their own codebases. Real bugs, real features, real refactors. Not toy problems.

Half the time they could use AI. Half the time they couldn’t.

The result? When developers used AI tools, they took 19% longer to complete their tasks.

Not faster. Slower.

But here’s the part that should genuinely unsettle you: before the study, these developers predicted AI would make them 24% faster. After using it — after actually experiencing the slowdown — they still believed AI had made them 20% faster.

19% slower. Felt 20% faster. A 40 percentage point gap between perception and reality.

This is what I call the speed mirage. You feel like you’re flying. The data says you’re walking backwards. And you can’t even tell.

You Understand Less of What You Ship

Anthropic, the company that makes Claude, ran a randomized controlled trial on their own tool. Developers using AI assistance scored 17% lower on comprehension tests compared to developers who coded manually. The AI group finished slightly faster, but that speed difference wasn’t even statistically significant.

Marginal speed gain. Real understanding loss.

It gets worse. The developers who delegated code generation to AI scored below 40% on comprehension. The ones who used AI for conceptual questions — asking “why” and “how does this work” — scored above 65%.

Same tool. Completely different outcomes depending on how you used it. And the biggest gap was in debugging — the skill you need most when things break in production.

The Industry Already Knows

Stack Overflow’s 2025 Developer Survey: 84% of developers use or plan to use AI tools, but only 33% trust the output. Down from 43% the year before.

Adoption up. Trust down.

So we’re not faster. We understand less. And we don’t even trust what we ship. But we keep using it because everyone else seems to have figured it out.

Throughput vs. Confidence

Let me name the thing clearly.

The industry is optimizing for throughput when it should be optimizing for confidence.

Throughput is lines of code generated. PRs opened. Features “shipped.” It’s the metric that looks good on a dashboard and falls apart in production.

Confidence is different. Do I understand what this code does? Do I trust it handles the edge cases? Can I debug it at 2 AM when something breaks?

Vibe coding optimizes for throughput. You get a dopamine spike. You feel productive. And then you spend the rest of the day cleaning up after the machine.

I’m not anti-AI. I use it every day. It’s incredible for researching tradeoffs, validating ideas, catching things I missed in review. When I use AI as a thinking partner, it genuinely makes me better.

But when I use it as a coding replacement, it makes my output worse. And that’s the gap the industry isn’t willing to talk about.

The Confidence Boundary

So where does this leave us?

I don’t have the perfect workflow. I’m not going to pretend I do. But I’ve been paying attention to what actually works, and the pattern is consistent.

The developers getting the best results from AI aren’t the ones who figured out the perfect prompt. They’re the ones who figured out what to delegate and what to keep.

I’ve been calling this the confidence boundary.

Here’s a real example. I had a feature to build recently. Instead of just opening the terminal and prompting the agent to do it, I stopped. I wrote the spec first. A clear, detailed explanation of what needed to be accomplished. The edge cases that the implementation had to survive. The constraints. The things I explicitly didn’t want.

Then I handed that to the agent and let it implement against my spec.

Because I did the thinking upfront, reviewing the output took minutes instead of hours. I knew exactly what should be there and what shouldn’t.

But here’s the thing nobody tells you — and this is where it gets uncomfortable.

To get a good result from the agent, you have to be very specific. You’re writing a detailed spec, thinking through edge cases, defining constraints. And at some point you realize: for certain features, you’ve already done most of the hard work. The thinking is the work.

At that point, it’s genuinely faster to just write the code yourself and use the agent as a pair reviewer to make sure you’re on the right track.

Other times — boilerplate, scaffolding, repetitive patterns, implementations where the spec is clear and the risk is low — full delegation is absolutely the move. Hand the agent the guidance and let it run.

The real skill isn’t prompting. It’s learning what to delegate and what to keep. And that judgment comes from understanding your codebase, the complexity of the task, and honestly — how much you trust the output for that specific context.

The messier your codebase — legacy code, real tech debt, patterns with history the agent will never know — the more that judgment matters. The tooling is irrelevant. Neovim, Cursor, whatever. The bottleneck is you. Whether you know where your confidence boundary is and whether you’re honest about it.

The Bet I’m Making

If you feel like AI is making you more productive but less confident in what you ship — you’re not falling behind. You’re paying attention.

If your engineering skills feel soft because you keep delegating instead of thinking — that’s not paranoia. The research says it’s real.

The speed mirage is powerful. It feels like progress. The dashboards say it’s progress. But if you can’t explain what you shipped, debug it when it breaks, or trust it handles the edge cases — that’s not progress. That’s debt with a nice commit message.

I’m not quitting AI. I’m quitting the lie that it makes everything faster.

The developers who are going to thrive aren’t the ones who ship more code. They’re the ones who learned what to keep and what to let go. Who built the judgment for when to delegate and when to do the work themselves.

Confidence over throughput. That’s the bet I’m making.