What is the Karpathy CLAUDE.md?

It's a single CLAUDE.md file derived from Andrej Karpathy's observations on LLM coding pitfalls. It encodes four principles — Think Before Coding, Simplicity First, Surgical Changes, and Goal-Driven Execution — that directly address the most common failure modes of AI coding agents. Drop it into any project to improve Claude Code, Cursor, or Copilot CLI behavior.

What are the four Karpathy principles for AI coding agents?

1) Think Before Coding — state assumptions explicitly, ask rather than guess, push back when a simpler approach exists, stop when confused. 2) Simplicity First — minimum code that solves the problem, no speculative features or abstractions. 3) Surgical Changes — touch only what the request requires, don't improve adjacent code. 4) Goal-Driven Execution — define verifiable success criteria before starting, write tests first.

How do I install the Karpathy CLAUDE.md for Claude Code?

Option A (plugin): In Claude Code, run '/plugin marketplace add forrestchang/andrej-karpathy-skills' then '/plugin install andrej-karpathy-skills@karpathy-skills'. Option B (per-project): curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md

What problem does 'Think Before Coding' solve?

LLMs often pick an interpretation silently and run with it — making wrong assumptions without surfacing them. Think Before Coding forces the agent to state assumptions explicitly, present multiple interpretations when ambiguity exists, push back when a simpler approach exists, and stop when confused rather than generating plausible-looking wrong code.

What does Simplicity First mean for AI coding agents?

No features beyond what was asked. No abstractions for single-use code. No 'flexibility' or 'configurability' that wasn't requested. No error handling for impossible scenarios. If 200 lines could be 50, rewrite it. The test: would a senior engineer say this is overcomplicated? If yes, simplify.

What does Surgical Changes prevent?

AI agents often 'improve' adjacent code, reformat things they didn't touch, or delete dead code that wasn't part of the request — creating diff noise and introducing unintended changes. Surgical Changes means: only edit what the request requires, match existing style even if you'd do it differently, mention unrelated issues but don't fix them.

What is Goal-Driven Execution?

Transform imperative tasks into verifiable goals. Instead of 'add validation', say 'write tests for invalid inputs, then make them pass'. Instead of 'fix the bug', say 'write a test that reproduces it, then make it pass'. Strong success criteria let the agent loop independently without constant clarification.

Does this work with Cursor and Copilot CLI?

Yes. The repo includes a committed Cursor project rule. The CLAUDE.md approach works with any coding agent that reads project context files. The principles themselves apply regardless of which agent you're using — they're about how the agent should reason, not Claude-specific behavior.

The Karpathy CLAUDE.md: Four Rules That Fix AI Coding Agents

By Prahlad Menon Published 2026-04-19 4 min read

Andrej Karpathy posted a thread about LLM coding agents that hit a nerve. Not because it was surprising, but because it named things precisely:

“The models make wrong assumptions on your behalf and just run along with them without checking. They don’t manage their confusion, don’t seek clarifications, don’t surface inconsistencies, don’t present tradeoffs, don’t push back when they should.”

“They really like to overcomplicate code and APIs, bloat abstractions, don’t clean up dead code… implement a bloated construction over 1000 lines when 100 would do.”

“They still sometimes change/remove comments and code they don’t sufficiently understand as side effects, even if orthogonal to the task.”

Three failure modes. Most developers who use Claude Code, Cursor, or Copilot CLI daily recognize all three.

multica-ai/andrej-karpathy-skills is a single CLAUDE.md file that addresses all of them — four principles, one file, drop it in any project.

The Four Principles

1. Think Before Coding

LLMs pick an interpretation and run with it. This principle forces explicit reasoning:

State assumptions explicitly — if uncertain, ask rather than guess
Present multiple interpretations — don’t pick silently when ambiguity exists
Push back when warranted — if a simpler approach exists, say so
Stop when confused — name what’s unclear and ask for clarification

The failure mode it fixes: agents that generate 200 lines of confident, wrong code because they assumed one meaning of an ambiguous requirement.

2. Simplicity First

Minimum code that solves the problem. Nothing speculative.

No features beyond what was asked
No abstractions for single-use code
No “flexibility” or “configurability” that wasn’t requested
No error handling for impossible scenarios
If 200 lines could be 50, rewrite it

The test: Would a senior engineer say this is overcomplicated? If yes, simplify.

This directly addresses the pattern where agents implement a class hierarchy with dependency injection when you asked for a script.

3. Surgical Changes

Touch only what you must. Clean up only your own mess.

When editing existing code:

Don’t “improve” adjacent code, comments, or formatting
Don’t refactor things that aren’t broken
Match existing style, even if you’d do it differently
If you notice unrelated dead code, mention it — don’t delete it

When your changes create orphans:

Remove imports/variables/functions that your changes made unused
Don’t remove pre-existing dead code unless asked

The test: Every changed line should trace directly to the user’s request.

This eliminates the diff noise problem — PRs where the agent changed 400 lines but only 20 were actually requested.

4. Goal-Driven Execution

Define success criteria. Loop until verified.

Instead of…	Transform to…
”Add validation"	"Write tests for invalid inputs, then make them pass"
"Fix the bug"	"Write a test that reproduces it, then make it pass"
"Refactor X"	"Ensure tests pass before and after”

For multi-step tasks, state a brief plan with verifiable checkpoints:

1. [Step] → verify: [check]
2. [Step] → verify: [check]  
3. [Step] → verify: [check]

Strong success criteria let the agent loop independently. Weak criteria (“make it work”) require constant clarification.

Installing It

Claude Code plugin (all projects):

/plugin marketplace add forrestchang/andrej-karpathy-skills
/plugin install andrej-karpathy-skills@karpathy-skills

Per-project CLAUDE.md:

# New project
curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md

# Existing project (append)
echo "" >> CLAUDE.md
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md

Works with Cursor (committed project rule included in the repo) and any agent that reads project context files.

Why This Works

CLAUDE.md files are loaded into the agent’s context at the start of every session. They’re not prompts you have to remember to include — they’re always there, shaping how the agent reasons about every task in that project.

The Karpathy principles work because they’re behavioral constraints, not domain knowledge. You don’t need to tell the agent what your codebase does. You need to tell it how to reason about any codebase — and these four principles do that in under 200 words.

The principles also generalize. They’re not Claude-specific. Cursor, Copilot CLI, Gemini CLI — any agent that reads project instructions benefits from explicit constraints on assumption-making, scope-creep, and success criteria.

Resources

GitHub: github.com/multica-ai/andrej-karpathy-skills
Original Karpathy post: x.com/karpathy/status/2015883857489522876
Multica (related project — open-source platform for coding agents with reusable skills): github.com/multica-ai/multica