I maintain 7 AI context files. What goes in them.
After maintaining 7 AI context files, I learned shorter files beat longer ones - with a 41% token reduction. Here's what to include and leave out.
On this page
I maintain 7 AI context files. Here’s what actually goes in them.
I added one feature to PitchDocs last week. A new skill. That meant updating “15 skills” to “16 skills” in CLAUDE.md. And AGENTS.md. And .cursorrules. And copilot-instructions.md. And .windsurfrules, .clinerules, and GEMINI.md. Seven files, same number change, across the same repo.
I missed two of them. Of course I did.
TL;DR:
- Every AI coding tool wants its own instruction file. Maintaining them all is a losing game unless you centralise.
- An ETH Zurich study found that overstuffed context files actually make AI agents worse - reducing task success and increasing inference costs by over 20%.
- What works: one canonical AGENTS.md with short bridge files per tool, containing only what the AI can’t figure out from your code.
Two 2025-2026 studies, same finding. A February 2026 ETH Zurich paper (Gloaguen et al.) evaluated AGENTS.md across four coding agents and found that LLM-generated context files reduced task success on AGENTbench while increasing inference cost by over 20%. A separate November 2025 study of 2,303 context files across 1,925 repos found developers over-weight implementation details (69.9% of content) and architecture (67.7%) at the expense of constraints the agent actually needs. A March 2026 MSR paper analysing 401 repos of Cursor Rules went further: 28.7% of all lines were duplicates of what the code already tells the agent. The evidence is consistent - shorter, developer-written, constraint-focused files beat longer auto-generated ones.
The irony is not subtle. I build a documentation tool. My own docs went stale because I couldn’t keep 7 files in sync. Most developers won’t hit 7 - you’ll probably have 2-3 if you use more than one AI coding tool. Even 2 files that say the same thing will drift, and everything I learned about keeping them honest applies at any scale.
Why do I have 7 context files?
Each AI coding tool reads its own instruction file from your project root. That’s how this happened - not through any grand design. Each tool shipped its own convention independently.
| File | Tool | Auto-loaded? |
|---|---|---|
CLAUDE.md | Claude Code, OpenCode | Yes, every session |
AGENTS.md | Codex CLI, OpenCode, Gemini CLI | Yes, at startup |
.cursorrules | Cursor | Yes, from project root |
.github/copilot-instructions.md | GitHub Copilot | Yes, GitHub convention |
.windsurfrules | Windsurf | Yes, from project root |
.clinerules | Cline | Yes, from project root |
GEMINI.md | Gemini CLI | Yes, at startup |
If you only use one tool, you have one file. Easy. Most of my repos have 2-3. PitchDocs is different, though - it’s a documentation tool that generates context files for other people’s repos, so it needs to work with every editor. I use Claude Code daily (including from my phone while walking the dog) and test PitchDocs with Cursor, Codex CLI, and Gemini CLI for cross-tool compatibility. That repo alone has all 7 files.
The problem isn’t writing them. Writing a context file takes 20 minutes. How many of your context files say the same thing right now? Are you sure? The problem is that they drift within days. I added a PitchDocs feature, updated the skill count in CLAUDE.md, and forgot .cursorrules and GEMINI.md. Three days later, Cursor was telling a contributor there were 15 skills when there were 16. Small thing. That’s how trust erodes, though - not through dramatic failures. Through quiet inaccuracies that pile up.
A study analysing 2,303 context files across Claude Code, Codex, and GitHub Copilot found the median update interval for Claude Code files is 24.1 hours. People are touching these files constantly. And every update that doesn’t propagate to the other 6 files creates drift.
What actually goes in a context file?
Most context files I’ve seen are too long. They contain directory trees, architecture overviews, file listings, and detailed explanations of how the codebase works. The instinct makes sense - give the AI more context, get better results. Right?
Wrong.
An ETH Zurich study from February 2026 tested context files against four coding agents (Claude Code with Sonnet-4.5, Codex with GPT-5.2, Codex with GPT-5.1 mini, and Qwen Code with Qwen3-30B). Their finding: LLM-generated context files - the kind that dump everything about a project into one file - reduced task success rates by 2% on AGENTbench. Inference costs went up over 20%. The agents took more steps, consumed more reasoning tokens, and solved fewer problems.
The developer-written context files performed better - they actually improved success rates by 4% on the same benchmark. The difference? The developer-written files were shorter and focused on what the developer knew was unusual about the project. Not architecture overviews. Not file trees. Just the gotchas.
A March 2026 MSR paper by Jiang and Nam found the same pattern from a different angle. They analysed Cursor Rules across 401 repositories and found 28.7% of all lines were duplicates of code or documentation the agent could already see. Every one of those duplicated lines is tokens paid for a second time, and the ETH Zurich data shows what you get for them: slower agents, higher bills, fewer solved problems.
I call this the Signal Gate principle: only include what the AI cannot discover by reading your code.
What to include
Think of it like a tourist phrase book. You don’t hand someone a dictionary when they land in a new country. You give them the five phrases locals actually use and a warning about the taxi scam. Context files work the same way - the agent already speaks the language, it just doesn’t know your local customs.
Your AI coding agent is going to read your source files, your tests, your config. It’ll figure out your framework, your language, your project structure. You don’t need to tell it any of that. What you need to tell it is the stuff that isn’t in the code:
- Naming conventions that break expectations. If your Python project uses
camelCasebecause it wraps a JavaScript API, say so. The agent will default tosnake_caseotherwise - Build quirks and environment setup.
direnv exec . wrangler deployinstead of justwrangler deploy. The agent won’t guess that your shell doesn’t inherit direnv - Architectural decisions that aren’t obvious from the code. “We use D1 for transactional data and KV for config. Don’t mix them.” The agent can see both bindings. It doesn’t know why you split them that way
- Testing conventions. “Integration tests hit the real database. Don’t mock D1.” The agent would default to mocking
- Things that have gone wrong before. “The content filter blocks CODE_OF_CONDUCT generation. Use the content-filter-guard hook.” Also the $4,868 bill I got from an infinite loop - context files are where those lessons live so the next AI session doesn’t repeat them
What to leave out
Anything the agent can figure out by reading files. Here’s the split at a glance:
| Include (undiscoverable) | Leave out (discoverable) |
|---|---|
| Naming conventions that break defaults | Directory trees and file listings |
| Build environment quirks | Architecture overviews |
| Testing rules that contradict framework | API documentation |
| Past mistakes and gotchas | Dependency lists |
| Deployment constraints | How the framework works |
Every line of redundant context wastes tokens. The ETH Zurich data shows it actively degrades performance too. The agents spend reasoning tokens processing information they would have discovered anyway. Sometimes they contradict themselves because the context file describes something slightly differently than the code does.
For perspective: React’s CLAUDE.md is 8 lines. The most popular JavaScript library on GitHub, 244,000 stars, and their context file says “React is a JavaScript library for building user interfaces” and points to one subdirectory. That’s it. An Anthropic engineer on Hacker News recommended keeping context files under 1,000 tokens - roughly 60 lines. “We remove from it with every model release,” they said, “since smarter models need less hand-holding.”
Line budgets that work
After three months of iteration across multiple repos, I’ve landed on these limits:
- AGENTS.md: Under 120 lines. This is the shared canonical source
- CLAUDE.md: Under 80 lines. Starts with
@AGENTS.mdto import the shared context, then adds Claude-specific instructions - Bridge files (.cursorrules, copilot-instructions.md, etc.): Under 60 lines. Just tool-specific additions
Before I enforced these limits, my auto-loaded context was consuming about 2,613 tokens per session. After trimming to only undiscoverable signals, it dropped to about 1,554 tokens - a 41% reduction. The AI didn’t get worse. If anything, it got more focused because it wasn’t wading through architectural details it already knew.
One file to rule them all
Once I accepted that 7 independent context files is unsustainable, the answer was obvious: write the conventions once and reference them everywhere. The AGENTS.md-first model puts all your project conventions, gotchas, and build quirks in one canonical file. Each AI coding tool gets a thin bridge file that either imports AGENTS.md directly (Claude Code does this with @AGENTS.md) or contains only the tool-specific additions that can’t go in the shared file. When I first started using this approach, the maintenance burden dropped immediately - one file to update instead of seven, with bridges that rarely change at all.
In practice, for Claude Code, it looks like this:
# CLAUDE.md
@AGENTS.md
## Claude Code specific
- Use `direnv exec .` for all wrangler and gh commands
- MCP servers: brave-search, github, trello, jina, pal, context7
- Hook: content-filter-guard.sh handles CODE_OF_CONDUCT generation
That’s it. CLAUDE.md is 8 lines. Everything else lives in AGENTS.md, which Claude Code loads through the @AGENTS.md import.
For Cursor, the .cursorrules bridge might be:
Read AGENTS.md for project conventions.
Cursor-specific: use the built-in terminal for build commands,
not the chat interface. TypeScript strict mode is enforced.
The bridge files are short because they should be. The conventions live in one place. When I update a skill count or change a naming convention, I update AGENTS.md and everything stays in sync.
This isn’t just my approach. In December 2025 the Linux Foundation launched the Agentic AI Foundation (AAIF) with AGENTS.md as a founding project, alongside Anthropic’s MCP and Block’s goose. Platinum members include AWS, Anthropic, Cloudflare, Google, Microsoft, and OpenAI. Over 60,000 open-source projects already use AGENTS.md. Codex CLI and Gemini CLI auto-load it at startup. OpenCode reads it natively. The ecosystem is moving toward “one canonical file, tool-specific bridges where needed” - and for good reason. Maintaining 7 independent files that say the same thing is a solved problem, and the solution is don’t.
When context files go stale
Context files rot faster than I expected. Not months. Days. When was the last time you actually checked whether your CLAUDE.md matches your code?
Here’s the specific moment I knew I had a real problem. I ran my own documentation audit tool - PitchDocs - on the PitchDocs repo itself. It scored my docs as stale. My documentation tool told me my documentation was out of date. I mentioned this in my last blog post and it’s still the most embarrassing moment in this project’s history.
The staleness was real. I’d added the platform-profiles skill and the /pitchdocs:platform command, which bumped skill count from 15 to 16 and command count from 12 to 13. I updated CLAUDE.md. I did not update .cursorrules, GEMINI.md, or copilot-instructions.md. Three files, wrong for 4 days before I noticed.
Stale context doesn’t just look bad. It actively misleads the AI. When CLAUDE.md says “16 skills” and .cursorrules says “15 skills”, a Cursor user asking “what commands are available?” gets an incomplete answer. When the file mentions a command that was renamed, the AI hallucinates the old command name. The cost of stale context isn’t theoretical - it’s wrong output that looks right.
How I stopped the rot
I built a two-tier enforcement system. The first tier is a nudge: at the end of each Claude Code session, a hook checks whether context files have drifted from the codebase. If they have, it suggests running an update. The second tier is harder: a pre-commit hook that blocks the commit entirely if context files are stale. You can’t merge code that makes your context files wrong.
Then I automated the fix. When the hooks detect drift, they can launch a context-updater agent that patches the files automatically - updating counts, adding new commands, removing references to deleted features. The human reviews the diff. The tedious work of finding and fixing every stale reference is handled.
Why I split one plugin into two
After maintaining context files across Python, TypeScript, and SwiftUI codebases, this whole experience led to a bigger realisation. I’d been treating documentation generation and context file management as one problem because they both involve files that describe your project. They’re not the same problem.
Documentation (README, CHANGELOG, user guides) is public-facing, written for humans, and changes with releases. Context files (CLAUDE.md, AGENTS.md, .cursorrules) are AI-facing, written for agents, and change with every single feature you ship. Completely different audiences, completely different update cadences.
So I split PitchDocs v2.0.0 into two separate tools: PitchDocs handles repository documentation - README, CHANGELOG, ROADMAP, user guides, and 15 other doc types. ContextDocs handles AI context files - AGENTS.md generation, bridge files for 8 tools, health scoring, and the Context Guard hooks that keep everything in sync. It was a breaking change with 39 files modified, but the separation has been clean. Each tool does one job and does it well.
What I’d tell you to do right now
-
Start with AGENTS.md. Not CLAUDE.md, not .cursorrules - AGENTS.md. It works across the most tools natively (Codex CLI, Gemini CLI, OpenCode, Claude Code via import), and it positions you for the direction the ecosystem is heading.
-
Keep it under 120 lines. The ETH Zurich data backs this up - shorter, human-written context files outperform longer LLM-generated ones. If your file is over 200 lines, you’re probably including things the agent already knows.
-
Only write what the agent can’t discover from your code. No directory trees, no architecture overviews, no framework explanations. Naming conventions, build quirks, testing rules, past mistakes. That’s it.
-
If you use multiple AI tools, create thin bridge files that reference AGENTS.md rather than duplicating content. The bridge should be under 60 lines and contain only what’s specific to that tool.
-
Automate the sync. If you maintain more than 2 context files, you will not remember to update all of them manually. I didn’t, and I built the tools that are supposed to solve this problem. If I can’t do it manually, you probably can’t either. I ended up building Context Guard hooks to catch drift automatically, but even a simple CI check that diffs file counts between AGENTS.md and your bridges would help.
The boring truth is that context files are a maintenance problem, not a writing problem. Writing the first version is easy. Keeping it accurate as your project evolves is the actual work. And the research is clear: a short, accurate file beats a long, stale one every time.
If you want to see how I actually use these files when jumping between 30 active projects, I wrote about the re-onboarding problem separately - it’s the workflow side of what this post covers.
Common questions about AI context files
Do I need a separate context file for every AI coding tool?
No. Start with one AGENTS.md file. Claude Code, Codex CLI, Gemini CLI, and OpenCode all read it natively. For tools like Cursor or Windsurf that need their own format, create a thin bridge file that references AGENTS.md rather than duplicating content. Seven independent files with overlapping content is the fastest way to ensure at least three of them are wrong at any given time.
How often should I update AI context files?
Every time you change a convention the AI needs to follow. Research on 2,303 context files found Claude Code files get updated every 24.1 hours on average. That’s not a target - it’s an observation. The real question is whether your context file has drifted since you last checked. If you added a feature, renamed a command, or changed a testing pattern and didn’t update your context files, they’re stale.
What is the difference between AGENTS.md and CLAUDE.md?
AGENTS.md is the cross-tool canonical source for project conventions. It works with Codex CLI, Claude Code (via @AGENTS.md import), Gemini CLI, and OpenCode. CLAUDE.md is a Claude Code-specific bridge file that imports AGENTS.md and adds Claude-only instructions - MCP server configuration, hook behaviour, tool-specific patterns. Think of AGENTS.md as the shared rules and CLAUDE.md as the Claude-specific addendum.
Does adding more context to CLAUDE.md make AI better?
No. An ETH Zurich study testing four coding agents found that LLM-generated context files reduced task success rates while increasing inference costs by over 20%. Shorter, developer-written files that contained only undiscoverable information performed better - improving success rates by 4%. More tokens spent on context means fewer tokens available for reasoning.
About the author: Nathan Schram is a solo developer at Little Bear Apps building open-source tools from Melbourne, Australia. 13 years in tech, currently building things he actually uses. Find him on GitHub, Bluesky, or get in touch.
Last reviewed: 2026-04-24. All numbers in this post are verified from GitHub commits, published research papers (ETH Zurich arXiv:2602.11988, MSR arXiv:2512.18925, Nov 2025 study arXiv:2511.12884), and project changelogs. PitchDocs and ContextDocs are both open source - the split and every commit referenced here is public.