Blog

Context Is a Documentation Problem

20/04/2026

A product manager with a coding background I’ve been working with wowed everyone a couple of weeks ago. He’d wired Claude Code into his team’s monorepo and had an agentic workflow to squash bugs, sometimes almost as soon as they’d been raised. The team were freed up to take on the big picture tasks.

Then the hallucinations started.

The agent had started inventing things. Confident, well-formed things — functions that didn’t exist, types that were plausible but wrong, imports from packages they hadn’t installed.

For every successful fix there was another that wasted review cycles. Code that was — at least to the engineers — obviously wrong. Code that failed pipelines. The problem was only in this one codebase: another that the agent worked in was fine.

That was the clue.

The agent wasn’t actually worse in one repo. The context it had to work within was different. “Use the whole codebase” is a shape of context that works fine if the codebase is already consistent, It fails when the agent has to reconcile competing design patterns, conflicting architectural paradigms, or a framework that leaves too much to convention. The agent has to pattern-match across code that disagrees with itself, and when nothing in its window is definitive, it invents the most plausible thing. That’s what hallucination is — a guess made under uncertainty.

So the question then is: should we narrow the context? Restrict the agent to specific directories? My answer to that is emphatic no — that’s almost always worse. An agent that can’t see the payments module when writing against the checkout module produces fast, confident, architecturally wrong code. You end up with a new failure mode rather than fewer of the old one.

The real fix isn’t narrower context. It’s denser context.

Here’s what I mean. An agent working against your codebase is reading it the way a new hire would — line by line, trying to build a mental model. When your codebase is silent — no READMEs, no architecture notes, no comments explaining why the weird bit is weird — both the new hire and the agent have to infer everything. The new hire asks a teammate on Slack. The agent hallucinates.

So the fix was boring and obvious: we spent a week writing the documents the team had needed for two years.

Specifically:

  • A one-page architecture README at the root, explaining what each service is for and which ones talk to which.
  • Module-level READMEs in the three most-edited directories, explaining the domain, the invariants, and the “don’t do this” footguns.
  • A CONTRIBUTING.md with conventions — not style rules, conventions. Where we use event buses and where we use RPC. When we care about idempotency. How we think about transactions.
  • Three architecture decision records for the choices the team had made but no one on the team remembered why.

None of this was written for the agent. It was documentation his team should have had anyway. Every line was aimed at a human reader — the hypothetical engineer joining next month who would otherwise have to ask the same three questions every senior had already answered a dozen times.

The hallucinations didn’t go to zero. Nothing does. But they fell from “multiple per day” to “occasional and usually obvious.” More importantly, the team stopped treating every agent suggestion with suspicion. They were reading the same documents the agent was, so they were now genuinely collaborating in a shared context instead of guessing past each other.

The lesson I’ve taken away — and I’ve since seen the same pattern play out at other teams — is that good context engineering is indistinguishable from good documentation for humans. If you write the README your team actually needs, you’ve already done the bulk of the work to make AI agents trustworthy in your codebase. If you haven’t, no amount of tooling will rescue you.

It’s tempting to think of AI as a new category that demands new practices. For this particular problem, it isn’t. The agent is just another reader — one that happens to have no institutional memory, no Slack to ask questions in, and no sense of smell for things that feel off. The thing that helps humans work well in your codebase is, almost always, the thing that will help your agents work well too.

If you’re rolling out AI agents in a codebase and seeing great results, watch the signal over time. A dip at six to twelve weeks, as the codebase grows, is the normal symptom of thin context. The fix usually lives in a folder called docs/ — which probably needs more love and attention than it has seen in months (if not years).

If that folder’s mostly empty and you’d like a second pair of eyes on what belongs in it, that’s a conversation I have most weeks.

Back to all posts