What is Elephant Alpha?

Elephant Alpha is a 100B-parameter text model available on OpenRouter at $0 per million tokens during alpha. It focuses on 'intelligence efficiency' — strong reasoning performance with minimal token usage. Features: 256K context window, 32K max output, function calling, structured output, and prompt caching. The lab behind it has not been publicly disclosed.

Who made Elephant Alpha?

Unknown. OpenRouter describes it as from a 'prominent open model lab' but has not disclosed the lab's name. Kilo.ai, which also features the model, uses the same wording. No paper, GitHub repo, or official announcement has been published. The model appeared on OpenRouter as a stealth release — intentionally anonymous during the alpha phase.

How does Elephant Alpha compare to other models?

At 100B parameters it sits between mid-sized open models (Llama 4 Scout) and frontier systems (Claude Mythos, MiniMax M2.7). Community reports indicate it's beating a significant portion of paid models on the OpenRouter Arena leaderboard. It prioritizes speed and token efficiency over raw benchmark maximization — designed as a fast daily driver rather than a reasoning marathon runner.

How do I access Elephant Alpha?

Via OpenRouter at model ID openrouter/elephant-alpha — $0/million tokens during alpha. Also available free in Kilo.ai's VS Code extension, CLI, and KiloClaw agent. Note: prompts and completions may be logged by the provider during the alpha phase and used to improve the model.

What is Elephant Alpha optimized for?

Three specific use cases: rapid code completion and debugging (fast, logical suggestions), massive document processing (long-context without typical latency), and lightweight agent interactions (quick iterative tasks in agentic workflows). It is explicitly designed as a 'snappy daily driver' — leaner and faster than SOTA, not a replacement for it.

What is intelligence efficiency in the context of Elephant Alpha?

Intelligence efficiency means delivering strong task performance while minimizing token usage. A model that solves a problem in 400 tokens costs less and responds faster than one that solves the same problem in 1,200 tokens. For high-volume applications — agents processing thousands of documents, code completions, agentic loops — this efficiency directly translates to cost and latency advantages.

Why do stealth model releases happen on OpenRouter?

OpenRouter provides a distribution channel that lets labs collect real-world usage data, community feedback, and benchmark comparisons before a public announcement. Releasing anonymously eliminates hype bias in evaluations — users judge the model on output quality rather than lab reputation. If the model performs well in the wild, the reveal becomes a positive news event. It's a calculated marketing and evaluation strategy.

Is Elephant Alpha safe to use for sensitive data?

No — during the alpha phase, prompts and completions may be logged by the provider and used for model improvement. Do not use Elephant Alpha with confidential, proprietary, or personally identifiable information during the alpha phase. Once the lab is publicly disclosed and data policies are formalized, this may change.

Elephant Alpha: The Mystery 100B Model That Appeared at the Top of OpenRouter for Free

By Prahlad Menon Published 2026-04-20 2 min read

Last week a 100B-parameter model called Elephant Alpha appeared at the top of OpenRouter’s leaderboard. No announcement. No paper. No known lab.

It was free. It was fast. It was beating paid models that cost $15+/million tokens.

The only description: “a 100B-parameter text model focused on intelligence efficiency, delivering strong reasoning performance while minimizing token usage.” From “a prominent open model lab.”

That’s it. That’s all anyone knows.

What We Know

The specs:

100B parameters
256K context window
32K max output tokens
Function calling and structured output
Prompt caching
$0/million tokens (input and output) during alpha
OpenRouter model ID: openrouter/elephant-alpha

The positioning: Intelligence efficiency. Not raw benchmark maximization. Not the biggest context window. Not the deepest reasoning. The explicit design goal is to solve problems in fewer tokens — faster responses, lower cost per task, without the latency of models that think for 2,000 tokens before answering.

The intended use cases:

Rapid code completion and debugging
Large document processing (research reports, repositories, dependency trees)
Agentic workflows — particularly agents running iterative loops where speed matters more than maximal reasoning depth

The catch: Prompts and completions may be logged during alpha and used to improve the model. Don’t use it with confidential data until the data policy is formalized post-reveal.

The Stealth Model Pattern

This is not the first time this has happened on OpenRouter.

Kilo.ai, which also features Elephant Alpha, previously ran a stealth release called Giga Potato — described as their most popular stealth model. Same pattern: anonymous lab, strong performance, free during evaluation phase, eventual reveal.

The strategy is deliberate:

Eliminates hype bias. Users evaluating an anonymous model judge it on output quality, not lab reputation. GPT-5 gets the benefit of the doubt; an unknown model gets judged on merit.
Collects real-world signal. Alpha users generate diverse prompts across real tasks. That data is more valuable than curated benchmarks for understanding where the model excels and fails.
Builds the reveal. If the model performs well anonymously, the lab reveal becomes a positive news event — “the model you’ve been using and loving is from us.”

The pattern is increasingly common as the LLM market matures and differentiation gets harder. When you can’t compete on name recognition, you compete on surprise.

How It’s Performing

Community reports on Reddit and OpenRouter’s leaderboard indicate Elephant Alpha is notably strong for its stated use cases. Users are finding it:

Fast — perceptibly lower latency than comparable-sized models
Token-efficient — shorter, more direct answers without sacrificing accuracy
Solid on code — comparable to GPT-4o-class models for completion and debugging
Effective for document analysis — holds context well over long inputs

Where it’s reportedly weaker:

Deep multi-step mathematical reasoning (not its target use case)
Creative tasks requiring extended generation
Tasks where longer deliberation actually helps

This tracks exactly with the stated design philosophy. It’s not trying to be the smartest model. It’s trying to be the most useful model per token.

The Comparison That Matters

100B parameters at $0/million tokens, beating a significant slice of the paid leaderboard.

Compare that to:

GPT-4o: ~$5–15/million tokens
Claude Sonnet 4: ~$3/million tokens
Gemini 1.5 Pro: ~$3.50/million tokens

For high-volume agent applications — an agent reading 50 research papers a day and generating summaries, a code assistant processing thousands of files — the per-token cost is the dominant cost. A model that performs at 80–90% of SOTA capability at $0 (and presumably low cost post-alpha) changes the economics of building on top of it.

The Unknown

No one knows who made it.

The most plausible guesses circulating:

Alibaba / Qwen team — active in releasing capable models, has the infrastructure
DeepSeek — has demonstrated intelligence-efficient architectures (MoE)
Mistral — European lab, capable of 100B+ work
A new lab doing a debut release via stealth

The 100B-parameter size, combined with the efficiency focus and long-context capability, is consistent with a Mixture-of-Experts architecture — where 100B total parameters doesn’t mean 100B active parameters per token. That would explain how it achieves the speed-quality balance it’s reportedly hitting.

The reveal will come. When it does, the model’s performance in the wild will already be established — exactly as intended.

How to Try It

OpenRouter:

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openrouter/elephant-alpha",
    "messages": [{"role": "user", "content": "Your task here"}]
  }'

Kilo.ai: Available free in the VS Code extension, CLI (kiloclaw), and web interface.

What to Watch

The reveal — which lab, when, what architecture
Post-alpha pricing — will it stay affordable or price up to match market rates
Data policy — once formalized, whether it’s safe for enterprise use
Open weights — if the lab is one that open-sources models, weights may follow

A free, performant, 100B-parameter model with 256K context and function calling, appearing on the leaderboard from nowhere — whether or not you know who made it, it’s worth knowing about.

Resources

OpenRouter: openrouter.ai/openrouter/elephant-alpha
Kilo.ai announcement: blog.kilo.ai/p/introducing-elephant-a-new-stealth
DataNorth coverage: datanorth.ai/news/openrouter-launches-elephant-alpha