Skills vs MCP: The Token Efficiency War (And Why It's Not Either/Or)
When Claude Skills launched in October 2025, developer Simon Willison called them “maybe a bigger deal than MCP.” His reasoning: MCP’s token consumption was killing context windows, and Skills seemed to solve that problem elegantly.
A few months later, we have real benchmarks. And the numbers are stark.
The Benchmark That Changes Everything
Scalekit ran 75 benchmark runs comparing CLI, CLI+Skills, and MCP for identical GitHub tasks. Same model (Claude Sonnet 4), same prompts, only the tool interface changed.
Token usage for “What language is this repo?”:
- CLI: 1,365 tokens
- CLI + Skills: 4,724 tokens
- MCP: 44,026 tokens
That’s 32× more tokens for MCP to answer a simple question.
The difference? Schema injection. GitHub’s Copilot MCP server exposes 43 tools. Every conversation loads all 43 tool definitions — names, descriptions, input schemas, output schemas — even if the agent only uses one.
The Cost Math
At Claude Sonnet 4 pricing ($3/M input, $15/M output), running 10,000 operations per month:
| Approach | Monthly Cost |
|---|---|
| CLI | ~$3.20 |
| CLI + Skills | ~$4.50 |
| MCP (Direct) | ~$55.20 |
That’s a 17× cost multiplier for MCP. And it gets worse: MCP had a 28% failure rate in testing (timeout errors), while CLI hit 100% reliability.
How Skills Achieve Efficiency
Skills work fundamentally differently than MCP. Instead of injecting tool schemas, Skills inject knowledge about how to use existing tools.
A Skill is just a markdown file with tips:
- Which
ghflags to use - Output formatting patterns
- Common workflows
The agent already knows how to use bash. The Skill just makes it smarter about which bash commands to run. No schema overhead. No tool definitions. Just 800 tokens of guidance that reduces tool calls by a third.
Armin Ronacher, creator of Flask, explains why he moved entirely from MCP to Skills:
“Skills are really just short summaries of which skills exist and in which file the agent can learn more about them. Crucially, skills do not actually load a tool definition into the context. The tools remain the same: bash and the other tools the agent already has.”
The killer feature: Skills can be self-maintaining. When a Skill breaks, you ask the agent to fix it. The agent maintains its own tools. MCP servers, by contrast, change their APIs without warning — and your integrations break silently.
MCP’s January 2026 Comeback
MCP didn’t stand still. In January 2026, Anthropic shipped progressive discovery — the same trick that made Skills efficient.
Now when you load an MCP, you get:
- Tool name + short description: 20-50 tokens each
- Full schema loads only when the agent decides to use that tool
Results:
- Token overhead dropped 85% (77,000 → 8,700 tokens for 50+ tools)
- Tool calling accuracy improved: Claude Opus 4 went from 49% to 74%
This closes the gap significantly. But Skills still win on pure efficiency because they avoid schema injection entirely.
When MCP Still Wins
Here’s where the “just use Skills” advice breaks down: multi-user applications.
If you’re building a personal developer tool, CLI+Skills is the obvious choice. The agent inherits your credentials, acts with your permissions, and the only person at risk is you.
But if you’re building B2B SaaS — a project management tool, support platform, or code review assistant — your agent acts as your customer’s employees, inside your customer’s organizations, touching your customer’s data.
That requires:
-
Per-user OAuth — Each user grants scoped access. They can revoke it. Your app never touches their credentials.
-
Tenant isolation — Acme’s repos must never appear in Globex’s Jira. This is data isolation, not just access control.
-
Audit trails — When the security team asks “which user triggered that action?”, you need a protocol-level answer.
CLI agents can’t provide these. The properties that make CLI efficient — ambient auth, arbitrary execution, zero protocol overhead — are exactly what creates security incidents when agents cross from developer tool to customer-facing product.
The OpenClaw security incidents illustrated this perfectly: 10,000+ exposed instances leaking credentials, 12% of community skills found malicious, 770,000 agents vulnerable to remote hijacking. These aren’t code bugs — they’re architectural consequences of running shell access without authorization boundaries.
The Decision Framework
| Scenario | Best Approach |
|---|---|
| Personal automation | CLI + Skills |
| Developer tools | CLI + Skills |
| Internal team tools | Skills (maybe MCP) |
| B2B SaaS (multi-tenant) | MCP with OAuth |
| Customer-facing agents | MCP with Gateway |
The Hybrid Future
The smart play isn’t Skills or MCP — it’s Skills wrapping MCP.
Use Skills for:
- Teaching the agent domain knowledge
- Workflow orchestration
- Context-efficient instruction delivery
Use MCP for:
- Authenticated external integrations
- Multi-user scenarios
- Actions requiring audit trails
“Skills and MCPs aren’t competing solutions to the same problem. They’re fundamentally different architectures serving different purposes. Skills excel at information delivery and adaptive context management. MCPs provide structured tool integration with authorization boundaries.”
Practical Recommendations
If you’re building on OpenClaw:
- Default to Skills for everything that doesn’t require external auth
- Use mcporter to expose MCPs as CLI tools when you need both
- Add an 800-token skill file for any complex tool — it’s the highest-ROI optimization available
- Monitor token usage with observability tools like opik-openclaw
If you’re evaluating architecture:
- Count your tools. If < 10, MCP overhead is manageable
- Count your users. If > 1, you need MCP’s auth infrastructure
- Count your tenants. If > 1, you need MCP’s isolation guarantees
The token efficiency war isn’t over. But the winner isn’t Skills or MCP — it’s knowing when to use each.
Sources: