Arrow Left Back to blog

We scored the CLAUDE.md files of the biggest open-source projects. The best ones break the rules.

We scored the CLAUDE.md files of the biggest open-source projects. The best ones break the rules.

We recently shipped a CLAUDE.md scorer: paste your project's agent memory file and get it graded against current best practices. To pressure-test it, we pointed it at the most popular open-source projects on GitHub. Then we kept pulling the thread and surveyed the 155 most-starred TypeScript, Python, Rust, and Go repositories for root-level agent memory files.

What came back surprised us three times. Here is what we found, with the receipts.

Finding 1: CLAUDE.md is quietly becoming an alias

Of the 155 most-starred repos we probed, 67 (43%) now ship a root-level CLAUDE.md or AGENTS.md. That alone is remarkable for a file category that barely existed two years ago.

But look closer and the story gets stranger. Of those 67 repos, 46 (69%) treat AGENTS.md, the vendor-neutral standard now stewarded by the Linux Foundation's Agentic AI Foundation and used by over 60,000 projects, as the canonical file. Thirty keep CLAUDE.md around purely as an alias: a symlink, an @AGENTS.md import, or a one-line pointer.

The reason is mundane: Claude Code does not natively read AGENTS.md. The feature request has thousands of reactions, and until it lands, the shim is the workaround. It is even the officially sanctioned one: Anthropic's memory docs now tell AGENTS.md repos to create a CLAUDE.md that imports it, or symlink the two. Home Assistant, Ghostty, Dify, and Goose all do exactly this.

And the standards war is not settled. Bun runs the alias in reverse: its 75,000-character CLAUDE.md is canonical and AGENTS.md symlinks to it. Zed points both files at its own .rules. Hugging Face's transformers points both at .ai/AGENTS.md. Browser-use maintains two divergent real files, an 11k CLAUDE.md and a 38k "AGENTS.md Version 2", which is its own cautionary tale about drift.

Finding 2: nobody agrees how big this file should be

Anthropic's guidance is unambiguous, and unusually quantitative. The best-practices guide says "keep it short and human-readable," and for every line ask "Would removing this cause Claude to make mistakes? If not, cut it. Bloated CLAUDE.md files cause Claude to ignore your actual instructions!" The memory docs put a number on it: "target under 200 lines per CLAUDE.md file. Longer files consume more context and reduce adherence." The AGENTS.md spec, by contrast, deliberately offers no size guidance at all.

Here is what the field actually does. Across the 67 repos with a root memory file, measuring the effective file (following aliases):

Percentile Size
25th ~2,000 chars
Median ~5,400 chars (roughly 1,300 tokens)
75th ~11,200 chars
90th ~19,200 chars
Max 74,669 chars (Bun)

The median project is reasonably aligned with Anthropic's advice: a page or two of dense instructions, well under the 200-line target. The best files in our scored set are tighter still; Ghostty and Home Assistant both land around 40 lines. But a third of projects carry more than 10,000 characters, and the right tail is wild. Bun's file is 75k characters across 435 lines. Nous Research's hermes-agent runs 1,369 lines. OpenClaw, the most-starred project in our entire sample, carries 36k characters.

Here is a wrinkle the line-count metric misses: OpenClaw's file is only 279 lines, barely 40% over the official target, because its compressed telegraph style packs roughly 130 characters into every line. Measured in lines it almost complies; measured in tokens it is seven times the median project. If adherence degrades with context consumed, tokens are the honest unit, and "200 lines" is only a proxy for it.

These are not careless teams. Which brings us to the most interesting finding.

Finding 3: the biggest files belong to the most agent-native teams

When we ran ten of these files through our scorer, OpenClaw landed at 75/100, dinged hard on token economy. The orthodox critique writes itself: split that monster into scoped files.

Except OpenClaw already did. The repo contains more than twenty AGENTS.md files. The root file opens by declaring its own architecture: "Telegraph style. Root rules only. Read scoped AGENTS.md before subtree work." The prose is brutally compressed, grammar dropped, maximum rules per token.

So why is it still 36k characters? Because OpenClaw is not using the file the way the guidance assumes. The repo is substantially operated by agents: ClawSweeper, its AI maintainer bot, fans out dozens of parallel workers to review issues and PRs, and the root file explicitly requires every review worker to read it in full before rendering a verdict. Most of the bulk is review law: evidence requirements, merge-risk rules, and verification gates like "No direct ../codex check means no Codex verdict," each one reading like scar tissue from a specific incident of an agent hallucinating dependency behavior.

Anthropic's guidance optimizes a file that helps an agent assist a human. OpenClaw's file governs development that agents perform. When an unsupervised agent can merge a bad PR into a project with 380,000 stars, policy completeness beats token thrift. The maintainer, Peter Steinberger, famously spent over a million dollars in API tokens in a single month; a few thousand tokens of policy preamble per review worker is noise against the cost of one wrong autonomous merge.

So who is right?

Both, within their assumptions, and the empirical evidence genuinely cuts both ways.

The case for brevity is not just vibes. Chroma's context rot research tested 18 frontier models and found every one degrades as input length grows, even on simple tasks, with mid-context information hit hardest. Anthropic's "important rules get lost in the noise" warning is a real, measured failure mode, not house style.

The case for the giant constitution is operational. Teams running agent fleets have something most CLAUDE.md authors lack: a feedback loop. Thousands of agent runs a day tell you empirically whether a rule earns its tokens. Best practices exist for people who cannot measure; when you can, you override the prior. It is worth noting that Anthropic's own guide closes by saying its patterns "aren't set in stone," they are starting points.

Our read: the size of your agent memory file should track the autonomy of your agents. Human-supervised assistant? The eight-line ideal is right, and the Chroma data says ruthless pruning pays. Unsupervised fleet merging PRs? You are not writing a memory aid anymore, you are writing law, and law is long.

What the scorer found in everyone else

For the 99% of projects that are not agent-operated, the benchmark surfaced consistent, fixable patterns:

  • The universal sin is pasting file trees. Deleting a directory tree or structure inventory was the number-one fix for 4 of the 10 files we scored, including two of the three highest scorers. Agents can read your repo; the tree wastes tokens today and lies tomorrow.
  • The most commonly missing line is how to run a single test. It was the top fix for Home Assistant's otherwise excellent file. It is the highest-leverage line you can add, because agents that re-run full suites burn time and tokens on every iteration.
  • Specificity is a solved problem at the top. Across all ten projects, "specific and actionable" was the strongest criterion. Big OSS projects have learned to write "use 2-space indentation" instead of "write clean code."
  • Check what your file asks the model to do. One 98k-star project's file instructs the model to print "!!!!" whenever policy prevents a normal answer, inside a section also asking for a dismissive personality. Your agent config is shared infrastructure; it deserves review like code.

That last point is really the thesis. These files are load-bearing now. Half of the biggest projects in open source ship one, agents read them on every single conversation, and almost nobody reviews them with the rigor of code. They quietly shape what your agents do.

That is, incidentally, the problem Blume exists to solve: making the hidden files, rules, and skills that steer your agents visible. If you want to know where your own CLAUDE.md stands, the scorer is free. Median time to a graded report: about twenty seconds.


Methodology: we probed the 155 most-starred TypeScript, Python, Rust, and Go repositories on GitHub (plus 19 known adopters, excluded from adoption statistics) for root-level CLAUDE.md and AGENTS.md files in June 2026, following symlinks and imports to measure effective content. Ten files were scored with our Gemini-based scorer against a rubric derived from Anthropic's published guidance. Scores reflect project-root expectations; OpenClaw's file is discussed above as a deliberate exception to them.