Your Coding Agent's Other Problem: It's Reading the Wrong Code (code-review-graph, Part 2)

By Prahlad Menon 3 min read

This is Part 2 of a two-part series on making coding agents actually reliable. Part 1 covered reasoning quality β€” Superpowers, Chain-of-Thought, and externalized planning. This one covers input quality.


There is a simpler version of the coding agent problem that gets less attention than the reasoning problem.

Claude Code, on every task, re-reads your entire codebase. Not the relevant files. Not the changed files. The whole thing. On a 500-file project, that’s tens of thousands of tokens burned before the model has written a single line.

The result is predictable: the model drowns in irrelevant context. It hallucinates dependencies that don’t exist. It misses the actual dependency that does. Review quality degrades not because the model is thinking poorly, but because it’s reading poorly.

This is the input quality problem. And it has a clean solution.

What code-review-graph Does

code-review-graph, built by Tirth Patel, parses your repository into an Abstract Syntax Tree (AST) using Tree-sitter and stores it as a graph:

  • Nodes: functions, classes, imports
  • Edges: call sites, inheritance chains, test coverage relationships

When a file changes, the graph performs blast-radius analysis: trace every caller, every dependent, every test that could be affected by this change. The result is the minimal set of files Claude actually needs to read.

Instead of dumping 21,000 tokens of Next.js source code into the context window, Claude gets a 4,500-token structural summary of exactly what changed and what it touches.

The Numbers Are Striking

Benchmarked on three real production codebases across six real git commits:

RepoFilesStandardWith GraphReductionReview Quality
httpx12512,507 tokens458 tokens26.2x9.0 vs 7.0
FastAPI2,9155,495 tokens871 tokens8.1x8.5 vs 7.5
Next.js27,73221,614 tokens4,457 tokens6.0x9.0 vs 7.0
Average13,2051,9286.8x8.8 vs 7.2

The token reduction is expected. The quality improvement is not β€” but it makes sense. A model given 458 precisely relevant tokens produces a better review than a model given 12,507 tokens of mixed signal. Less noise, higher signal. Precision beats volume.

Incremental updates run in under 2 seconds on a 2,900-file project. SHA-256 hash checks on every file mean only changed files get re-parsed. The graph stays current automatically on every save and commit.

The Mechanism: Why Blast Radius Works

The key insight is that code changes don’t affect code randomly. They propagate along dependency edges. If you change auth.py, anything that imports from auth.py might break. Any test that calls those functions needs to run. Nothing else does.

A flat file scan misses this structure entirely. It reads everything or guesses. The graph knows the structure and traces it precisely.

This is not a new idea in software engineering β€” it’s how incremental compilers work, how test impact analysis works, how monorepo build systems (Bazel, Buck, Nx) decide what to rebuild. code-review-graph applies the same principle to LLM context windows.

How It Connects to Superpowers

In Part 1, the three-layer stack for reliable coding agents looked like this:

soul.py        ← What do I know? (cross-session memory)
Superpowers    ← How should I plan and execute?
Reasoning model ← How should I think?

code-review-graph adds the missing fourth layer:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  soul.py / MEMORY.md                        β”‚  ← cross-session memory
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Superpowers                                β”‚  ← planning and execution
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  code-review-graph                          β”‚  ← precise context
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Reasoning model (o3, R1, Claude 3.7)       β”‚  ← per-token thinking
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The layers address different failure modes:

  • Reasoning model: shallow or impulsive thinking
  • code-review-graph: reading the wrong code
  • Superpowers: jumping to implementation without a plan
  • soul.py: forgetting what it did last session

A coding agent with all four layers: thinks carefully, reads precisely, plans before acting, and remembers what it learned.

That’s the complete picture.

Installation

Claude Code (recommended):

claude plugin marketplace add tirth8205/code-review-graph
claude plugin install code-review-graph@code-review-graph

pip:

pip install code-review-graph
code-review-graph install

Then restart Claude Code and run:

Build the code review graph for this project

Initial build: ~10 seconds for a 500-file project. After that, fully automatic. 12 languages supported (Python, TypeScript, JavaScript, Go, Rust, Java, C#, Ruby, Kotlin, Swift, PHP, C/C++). MCP compatible.


code-review-graph β€” Tirth Patel
Part 1: Superpowers + CoT + Reasoning Models
soul.py β€” persistent memory for any LLM agent