The Karpathy CLAUDE.md: Four Rules That Fix AI Coding Agents
Andrej Karpathy posted a thread about LLM coding agents that hit a nerve. Not because it was surprising, but because it named things precisely:
“The models make wrong assumptions on your behalf and just run along with them without checking. They don’t manage their confusion, don’t seek clarifications, don’t surface inconsistencies, don’t present tradeoffs, don’t push back when they should.”
“They really like to overcomplicate code and APIs, bloat abstractions, don’t clean up dead code… implement a bloated construction over 1000 lines when 100 would do.”
“They still sometimes change/remove comments and code they don’t sufficiently understand as side effects, even if orthogonal to the task.”
Three failure modes. Most developers who use Claude Code, Cursor, or Copilot CLI daily recognize all three.
multica-ai/andrej-karpathy-skills is a single CLAUDE.md file that addresses all of them — four principles, one file, drop it in any project.
The Four Principles
1. Think Before Coding
LLMs pick an interpretation and run with it. This principle forces explicit reasoning:
- State assumptions explicitly — if uncertain, ask rather than guess
- Present multiple interpretations — don’t pick silently when ambiguity exists
- Push back when warranted — if a simpler approach exists, say so
- Stop when confused — name what’s unclear and ask for clarification
The failure mode it fixes: agents that generate 200 lines of confident, wrong code because they assumed one meaning of an ambiguous requirement.
2. Simplicity First
Minimum code that solves the problem. Nothing speculative.
- No features beyond what was asked
- No abstractions for single-use code
- No “flexibility” or “configurability” that wasn’t requested
- No error handling for impossible scenarios
- If 200 lines could be 50, rewrite it
The test: Would a senior engineer say this is overcomplicated? If yes, simplify.
This directly addresses the pattern where agents implement a class hierarchy with dependency injection when you asked for a script.
3. Surgical Changes
Touch only what you must. Clean up only your own mess.
When editing existing code:
- Don’t “improve” adjacent code, comments, or formatting
- Don’t refactor things that aren’t broken
- Match existing style, even if you’d do it differently
- If you notice unrelated dead code, mention it — don’t delete it
When your changes create orphans:
- Remove imports/variables/functions that your changes made unused
- Don’t remove pre-existing dead code unless asked
The test: Every changed line should trace directly to the user’s request.
This eliminates the diff noise problem — PRs where the agent changed 400 lines but only 20 were actually requested.
4. Goal-Driven Execution
Define success criteria. Loop until verified.
| Instead of… | Transform to… |
|---|---|
| ”Add validation" | "Write tests for invalid inputs, then make them pass" |
| "Fix the bug" | "Write a test that reproduces it, then make it pass" |
| "Refactor X" | "Ensure tests pass before and after” |
For multi-step tasks, state a brief plan with verifiable checkpoints:
1. [Step] → verify: [check]
2. [Step] → verify: [check]
3. [Step] → verify: [check]
Strong success criteria let the agent loop independently. Weak criteria (“make it work”) require constant clarification.
Installing It
Claude Code plugin (all projects):
/plugin marketplace add forrestchang/andrej-karpathy-skills
/plugin install andrej-karpathy-skills@karpathy-skills
Per-project CLAUDE.md:
# New project
curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md
# Existing project (append)
echo "" >> CLAUDE.md
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md
Works with Cursor (committed project rule included in the repo) and any agent that reads project context files.
Why This Works
CLAUDE.md files are loaded into the agent’s context at the start of every session. They’re not prompts you have to remember to include — they’re always there, shaping how the agent reasons about every task in that project.
The Karpathy principles work because they’re behavioral constraints, not domain knowledge. You don’t need to tell the agent what your codebase does. You need to tell it how to reason about any codebase — and these four principles do that in under 200 words.
The principles also generalize. They’re not Claude-specific. Cursor, Copilot CLI, Gemini CLI — any agent that reads project instructions benefits from explicit constraints on assumption-making, scope-creep, and success criteria.
Resources
- GitHub: github.com/multica-ai/andrej-karpathy-skills
- Original Karpathy post: x.com/karpathy/status/2015883857489522876
- Multica (related project — open-source platform for coding agents with reusable skills): github.com/multica-ai/multica