The Review Bottleneck: Why AI Explanations Are Making Us Trust Less, Not More

Last week I spent 3 hours reviewing code that took 20 minutes to write.

The AI was faster. The review wasn’t.

And I’m starting to realize: that’s the problem.

“Less coding, more engineering.”

I keep hearing this phrase everywhere. The idea is simple: AI handles the coding, so developers focus on the higher-level work. The engineering. The architecture. The review.

But here’s what nobody’s talking about: AI isn’t just writing the code anymore. It’s reviewing it too.

And the paradox is obvious once you see it: AI generates code faster, but reviewing it takes longer than ever.

Here’s what those 3 hours looked like:

I read through 300 lines of code carefully. Checked the tests. Verified the logic flow. Examined edge cases.

But that was only the first hour.

The next two hours? Reading AI-generated explanations. Reviewing the AI code reviewer’s feedback. Cross-referencing the AI’s architectural justifications with the actual implementation. Trying to reconcile conflicting suggestions from different AI systems.

By the end, I understood the code. But I’d spent more time processing AI commentary than reviewing actual logic.

And here’s what bothered me: I see people in the industry are approving similar PRs in 20 minutes.

Are they reading all of this? Or are they skimming the AI explanations and trusting by default?

I’m pretty sure it’s the second one.

This isn’t about being thorough versus lazy. It’s a recognition of something shifting.

300 lines of actual code
1,200 words of AI-generated explanation
800 words of AI code review feedback
15 inline comments from the AI about trade-offs and alternatives

I had more documentation to review than code.

300 lines of actual implementation
2,000+ words of AI-generated commentary

The code was the easy part. The cognitive load came from synthesizing multiple AI perspectives, each confident, each reasonable-sounding, some subtly contradicting each other.

The tests passed. The linting passed. The AI explanations sounded reasonable. The AI reviewer’s concerns seemed addressed.

So I trusted the process and moved on.

And that’s becoming the norm.

I’m not alone.

At Anthropic—the company building Claude—engineers are generating 2,000 to 3,000 line pull requests regularly. Mike Krieger, their Chief Product Officer, openly admits: “pretty much 100%” of their code is now AI-generated.

And they’re using Claude to review it too.

Boris Cherny, head of Anthropic’s Claude Code team, hasn’t written a single line of code in over two months. He shipped 22 pull requests in one day, 27 the next.

“Each one 100% written by Claude.”

This isn’t the future. It’s happening right now, at the companies building the AI tools we’re all using.

The Confidence Trap

Code reviews were already hard. They required skill, domain knowledge, and patience.

Now multiply that by the sheer volume AI generates.

But volume isn’t even the real issue.

The real issue is that AI writes confident code. It comes with detailed explanations. Trade-off analysis. References. Architecture justifications.

Enough well-articulated reasoning to make everything sound sensible.

When you look at a 500-line PR with a 2,000-word explanation of why every decision was made, the cognitive load is enormous.

You can dig in and verify every claim.

Or you can trust that the explanation sounds reasonable and move on.

Most developers are choosing “move on.”

Here’s where we are:

Claude Code and Codex are generating code at unprecedented scale. 46% of developers’ code is now AI-written across major tools like Claude Code, Codex, and GitHub Copilot.

84% of developers use AI coding tools regularly.

And here’s the kicker: while AI generates nearly half our code, only 30% of AI-suggested code actually gets accepted.

The rest gets rejected during review—or should get rejected.

Then We Added AI Code Review

So teams did the obvious thing: bring in AI code review tools.

Now every PR has:

The AI-generated code (500 lines)
The AI’s explanation of what it built and why (2,000 words)
The AI reviewer’s analysis (another 1,500 words)
Sometimes multiple AI reviewers, each with their own opinions

You’re staring at 4,000+ words of confident, reasonable-sounding explanations from multiple AI systems.

All of it well-structured. All of it articulate. Much of it contradicting itself in subtle ways.

And you’re supposed to synthesize all of this, make a judgment call, and approve or reject.

What actually happens?

You skim the AI’s explanation. You skim the AI reviewer’s comments. If they roughly agree and the tests pass, you approve it.

The AI’s confidence became your confidence by default.

Research confirms what we all feel: AI-generated code creates 1.7x more issues than human-written code.

Unclear naming. Mismatched terminology. Generic identifiers everywhere.

All of it increasing cognitive load for reviewers.

And here’s the kicker: all of it explained so confidently you don’t question it.

This is what researchers call “automation bias”—our tendency to accept answers from automated systems, even when we encounter contradictory information.

We’re not carefully evaluating the code. We’re trusting that the volume of explanation equals correctness.

More Explanation ≠ More Understanding

The paradox is obvious once you see it:

Adding AI code reviewers didn’t make reviews better. It made them worse.

Not because the AI reviewers are bad. But because the sheer volume of explanation—from the writer AI, from the reviewer AI, sometimes from multiple reviewer AIs—has become impossible to actually process.

We traded one problem (not enough context) for another (too much confident noise).

And the human reviewer, the supposed quality gate, is now just the person who clicks “Approve” after skimming thousands of words they don’t have time to verify.

The bottleneck isn’t writing code anymore.

It’s not even reviewing code.

It’s trusting code we don’t fully understand because we’re drowning in explanations that sound reasonable but are too expensive to verify.

Even OpenAI acknowledges this in their Codex documentation: “It still remains essential for users to manually review and validate all agent-generated code.”

But are we actually doing that?

The evidence suggests no.

Wait. See What Just Happened?

I need to be honest with you.

I almost did the exact same thing to you.

I almost buried this post in citations.

16 footnotes. Statistics every other paragraph. Research from Anthropic, OpenAI, arXiv, CodeRabbit, Qodo. All credible. All well-sourced. All making the same point.

And if you’re like most readers, you would have skimmed them. Trusted that they said what I claimed. Moved on.

That’s exactly what we’re doing with code reviews.

The volume of explanation—even when accurate—becomes its own problem. Too many words. Too much confidence. Not enough time to verify.

So we trust by default.

What I’m Trying

I don’t have this solved. But here’s what’s working for me:

The 30-minute rule – If I can’t understand the PR in 30 minutes of focused review, it’s too big. Send it back or break it down.

No AI reviewer without human review – AI review is a supplement, not a replacement. I still need to read the actual code, not just the summary.

The explain-it test – If I can’t explain the core logic to someone else, I don’t approve it. Knowing “the tests passed” isn’t good enough.

Does this slow me down? Yes.

Does it help? I think so.

But I’m also watching my team ship faster by trusting more. And I don’t know if I’m being careful or just stubborn.

Where This Leaves Us

I’m caught in the same trap.

I want to ship faster. But I also want to understand what I’m shipping.

And the current tools make both feel impossible at the same time.

Some days I slow down and review everything carefully. Other days I skim and trust.

And I’m not sure which approach is right anymore.

Anthropic’s Dario Amodei predicts the industry may be “just six to twelve months away from AI handling most or all of software engineering work from start to finish.”

25% of Google’s code is already AI-assisted.

30% of Microsoft’s code is AI-generated.

These aren’t small experiments. This is how we’re building software now.

But here’s what we’re not saying out loud:

We’ve replaced code we wrote and didn’t fully understand with code AI wrote and we definitely don’t understand.

We’re not talking about this problem honestly enough.

The “less coding, more engineering” narrative assumes we’re still doing the review work.

We’re not.

We’re skimming AI-generated justifications and hoping for the best.

Maybe that’s fine. Maybe the tests are good enough. Maybe AI review plus AI generation actually works.

But we should stop pretending we’re still doing the review work.

Because “less coding, more engineering” sounds great until you realize:

We’re not doing more engineering.

We’re doing more trusting.

So here’s my question:

Are you actually reviewing AI code? Or are you just hoping the explanation is right?

Because if it’s the second one—and the data suggests it is—we need to start talking about what comes next.

The quality gate we automated away isn’t coming back. We need to figure out what replaces it.

Next up: I’m going to share how I’m breaking down AI-generated features into bite-sized review sessions that force comprehension instead of trust. It’s slower. It’s deliberate. And it might be the only way to stay honest about what we’re shipping.

References & Further Reading

Key Sources:

Fortune: “Top engineers at Anthropic, OpenAI say AI now writes 100% of their code” – Mike Krieger and Boris Cherny interviews, January 2026
GitHub Copilot Statistics 2026 – 46% of code AI-generated
CodeRabbit: “AI code creates 1.7x more issues” – Cognitive load study, 2025
Index.dev: Developer Productivity Statistics 2026 – 84% adoption, 30% acceptance rate
OpenAI: Using Codex with ChatGPT – Manual review guidance
Springer: Automation bias in human–AI collaboration – AI & Society, July 2025