What is Superpowers for coding agents?

Superpowers is an open-source plugin for Claude Code, Cursor, Codex, and OpenCode that enforces a structured workflow: brainstorm and spec first, then write an implementation plan, then execute via subagent-driven development with TDD. It prevents coding agents from jumping straight to implementation.

What is Chain-of-Thought prompting?

Chain-of-Thought (CoT) prompting (Wei et al., 2022) adds 'let's think step by step' to a prompt, causing the model to reason through intermediate steps before producing an answer. It dramatically improves performance on math, logic, and multi-step reasoning tasks.

What are reasoning models and how do they differ from standard LLMs?

Reasoning models (OpenAI o1/o3, Claude 3.7 extended thinking, DeepSeek R1) allocate additional test-time compute to generate internal reasoning traces before producing a final answer. Unlike standard CoT, this reasoning happens in a hidden 'thinking' space -- the user sees only the final output.

What is the connection between Superpowers and CoT reasoning?

Both solve the same problem: impulsive, shallow answers. CoT/reasoning models slow down internal token generation to reason before responding. Superpowers externalizes that reasoning as human-reviewable artifacts (spec documents, implementation plans) that can be corrected before execution. Superpowers is essentially externalized, correctable chain-of-thought for software development.

How does soul.py fit into this picture?

CoT and Superpowers address reasoning quality within a single session. soul.py addresses memory persistence across sessions. Together they cover the three pillars of a reliable coding agent: discipline (Superpowers), reasoning depth (reasoning models), and continuity (soul.py).

How do I install Superpowers for Claude Code?

In Claude Code, run: /plugin install superpowers@claude-plugins-official. It's available on the official Claude plugin marketplace. For Cursor: /add-plugin superpowers. For Codex and OpenCode: fetch the install instructions from the GitHub README.

Superpowers, Chain-of-Thought, and the Problem of Impulsive Code: How Reasoning Became External

By Prahlad Menon Published 2026-03-17 6 min read

There is a consistent failure mode in every coding agent you have ever used.

You describe what you want. The agent immediately starts writing code. It makes assumptions, skips edge cases, ignores the test suite, and produces something that mostly works if you squint at it and don’t push too hard. You ask it to fix one thing. It breaks two others. An hour later you’re back where you started with a messier codebase.

This failure mode has a name in cognitive science: System 1 thinking. Fast, associative, confident, often wrong. And the last decade of LLM research has been, in large part, a sustained effort to make language models slower.

The Reasoning Revolution: Making Models Think Before They Answer

In 2022, Google researchers published a deceptively simple finding: adding “Let’s think step by step” to a prompt dramatically improved model performance on math and logic tasks. This was Chain-of-Thought (CoT) prompting — forcing the model to generate intermediate reasoning steps before producing a final answer.

The intuition was right. Models that rush to an answer make different (worse) mistakes than models that work through a problem. The intermediate steps act as a kind of working memory, catching errors that would be invisible in a direct response.

CoT prompting evolved into reasoning models: OpenAI o1 and o3, Claude 3.7 extended thinking, DeepSeek R1. These models allocate additional test-time compute to generate internal reasoning traces — sometimes thousands of tokens of private “thinking” — before producing a final answer. The user sees a cleaner response; behind the scenes, the model has been arguing with itself.

The results are striking. On competition math (AIME), programming (Codeforces), and PhD-level science (GPQA), reasoning models outperform their non-reasoning counterparts by wide margins. Slow thinking works.

But reasoning models have a fundamental limitation for software development: the reasoning is hidden.

You can’t correct a thought the model had internally. You can’t review the design decisions made in the thinking space. You can’t say “wait, you assumed we’d use a relational database, but we’re actually using a document store.” By the time you see the output, the architectural decisions have already been made.

Superpowers: Externalized, Correctable Chain-of-Thought

Superpowers, built by Jesse Vincent, takes the same core insight — slow down before you act — and externalizes it as a structured workflow with human checkpoints.

The workflow has seven stages:

1. Brainstorming — Before writing a line of code, the agent asks clarifying questions, explores alternatives, and presents a design in digestible chunks. Not as internal tokens. As a document you can read and correct.

2. Git worktree setup — Creates an isolated workspace so the implementation doesn’t contaminate your main branch. Clean baseline verified before work starts.

3. Writing plans — A detailed implementation plan with exact file paths, complete code snippets, and verification steps for each task. Tasks are sized at 2-5 minutes each. Clear enough, as the README puts it, for “an enthusiastic junior engineer with poor taste, no judgement, no project context, and an aversion to testing.”

4. Subagent-driven development — Each task is dispatched to a fresh subagent with two-stage review: spec compliance first, then code quality. No accumulated context drift.

5. Test-driven development — Strict RED-GREEN-REFACTOR. Write failing test. Watch it fail. Write minimal code. Watch it pass. Commit. Code written before a test exists gets deleted.

6. Code review — Between tasks, the agent reviews against the plan. Critical issues block progress.

7. Branch finishing — Verifies tests, presents merge/PR/discard options, cleans up.

This is not a suggestion. Superpowers enforces these as mandatory workflows, not optional prompts. The agent checks for relevant skills before any task.

The Key Difference: Correctable vs Hidden Reasoning

Here is the structural comparison:

	Standard LLM	Reasoning Model (o3/R1)	Superpowers
Reasoning depth	Low	High	Medium-High
Reasoning visible?	No	No (hidden tokens)	Yes (documents)
Human can correct?	After the fact	After the fact	Before execution
Persists across sessions?	No	No	Via saved docs
Test coverage enforced?	No	No	Yes (TDD)

Reasoning models moved the thinking earlier in the process. Superpowers moves the thinking outside the model entirely — into artifacts that you, a human, can review and correct before the code is written.

This matters because the cost of an error scales with when it’s caught. A wrong assumption in the spec is free to fix. A wrong assumption in the implementation costs hours. A wrong assumption in production costs customers.

Where CoT, Reasoning Models, and Superpowers Intersect

The most interesting setup is using a reasoning model with Superpowers: the model’s internal thinking improves the quality of the externalized spec and plan, and the externalized spec and plan constrain the model’s subsequent implementation.

DeepSeek R1, for example, produces noticeably better brainstorming and implementation plans than its non-reasoning predecessor. The internal thinking produces better external artifacts. The external artifacts then bound the implementation. It compounds.

The analogy that fits: a great architect who sketches before they build, and is willing to throw away the sketch.

The Missing Layer: Memory

Both CoT reasoning and Superpowers address reasoning quality within a single session. Neither addresses what happens when the session ends.

A coding agent with Superpowers knows how to build well. It doesn’t remember:

Why you chose this architecture three weeks ago
That you tried the other approach and it failed
What the conventions of this codebase are
Who the stakeholders are and what they care about

This is where soul.py closes the loop. Persistent memory and identity across sessions — the same MEMORY.md that SoulSearch uses in the browser, applied to the coding agent context.

The three layers stack cleanly:

┌─────────────────────────────────────────┐
│  soul.py / MEMORY.md                    │  ← What do I know? (cross-session)
├─────────────────────────────────────────┤
│  Superpowers                            │  ← What should I build? (per-session)
├─────────────────────────────────────────┤
│  Reasoning model (o3, R1, Claude 3.7)   │  ← How should I think? (per-token)
└─────────────────────────────────────────┘

A coding agent with all three layers: thinks carefully at the token level, externalizes that thinking into correctable artifacts, and remembers what it learned last time.

That’s a qualitatively different thing from the agent that reads your prompt and immediately starts writing code.

Installation

Claude Code (official marketplace):

/plugin install superpowers@claude-plugins-official

Cursor:

/add-plugin superpowers

Codex / OpenCode / Gemini CLI: See the Superpowers README for platform-specific instructions.

Superpowers: github.com/obra/superpowers
soul.py: github.com/menonpg/soul.py
CoT paper (Wei et al. 2022): arxiv.org/abs/2201.11903

Continue to Part 2: code-review-graph — your agent’s other problem is reading the wrong code