Elephant Alpha: The Mystery 100B Model That Appeared at the Top of OpenRouter for Free

By Prahlad Menon 2 min read

Last week a 100B-parameter model called Elephant Alpha appeared at the top of OpenRouter’s leaderboard. No announcement. No paper. No known lab.

It was free. It was fast. It was beating paid models that cost $15+/million tokens.

The only description: “a 100B-parameter text model focused on intelligence efficiency, delivering strong reasoning performance while minimizing token usage.” From “a prominent open model lab.”

That’s it. That’s all anyone knows.


What We Know

The specs:

  • 100B parameters
  • 256K context window
  • 32K max output tokens
  • Function calling and structured output
  • Prompt caching
  • $0/million tokens (input and output) during alpha
  • OpenRouter model ID: openrouter/elephant-alpha

The positioning: Intelligence efficiency. Not raw benchmark maximization. Not the biggest context window. Not the deepest reasoning. The explicit design goal is to solve problems in fewer tokens — faster responses, lower cost per task, without the latency of models that think for 2,000 tokens before answering.

The intended use cases:

  • Rapid code completion and debugging
  • Large document processing (research reports, repositories, dependency trees)
  • Agentic workflows — particularly agents running iterative loops where speed matters more than maximal reasoning depth

The catch: Prompts and completions may be logged during alpha and used to improve the model. Don’t use it with confidential data until the data policy is formalized post-reveal.


The Stealth Model Pattern

This is not the first time this has happened on OpenRouter.

Kilo.ai, which also features Elephant Alpha, previously ran a stealth release called Giga Potato — described as their most popular stealth model. Same pattern: anonymous lab, strong performance, free during evaluation phase, eventual reveal.

The strategy is deliberate:

  1. Eliminates hype bias. Users evaluating an anonymous model judge it on output quality, not lab reputation. GPT-5 gets the benefit of the doubt; an unknown model gets judged on merit.

  2. Collects real-world signal. Alpha users generate diverse prompts across real tasks. That data is more valuable than curated benchmarks for understanding where the model excels and fails.

  3. Builds the reveal. If the model performs well anonymously, the lab reveal becomes a positive news event — “the model you’ve been using and loving is from us.”

The pattern is increasingly common as the LLM market matures and differentiation gets harder. When you can’t compete on name recognition, you compete on surprise.


How It’s Performing

Community reports on Reddit and OpenRouter’s leaderboard indicate Elephant Alpha is notably strong for its stated use cases. Users are finding it:

  • Fast — perceptibly lower latency than comparable-sized models
  • Token-efficient — shorter, more direct answers without sacrificing accuracy
  • Solid on code — comparable to GPT-4o-class models for completion and debugging
  • Effective for document analysis — holds context well over long inputs

Where it’s reportedly weaker:

  • Deep multi-step mathematical reasoning (not its target use case)
  • Creative tasks requiring extended generation
  • Tasks where longer deliberation actually helps

This tracks exactly with the stated design philosophy. It’s not trying to be the smartest model. It’s trying to be the most useful model per token.


The Comparison That Matters

100B parameters at $0/million tokens, beating a significant slice of the paid leaderboard.

Compare that to:

  • GPT-4o: ~$5–15/million tokens
  • Claude Sonnet 4: ~$3/million tokens
  • Gemini 1.5 Pro: ~$3.50/million tokens

For high-volume agent applications — an agent reading 50 research papers a day and generating summaries, a code assistant processing thousands of files — the per-token cost is the dominant cost. A model that performs at 80–90% of SOTA capability at $0 (and presumably low cost post-alpha) changes the economics of building on top of it.


The Unknown

No one knows who made it.

The most plausible guesses circulating:

  • Alibaba / Qwen team — active in releasing capable models, has the infrastructure
  • DeepSeek — has demonstrated intelligence-efficient architectures (MoE)
  • Mistral — European lab, capable of 100B+ work
  • A new lab doing a debut release via stealth

The 100B-parameter size, combined with the efficiency focus and long-context capability, is consistent with a Mixture-of-Experts architecture — where 100B total parameters doesn’t mean 100B active parameters per token. That would explain how it achieves the speed-quality balance it’s reportedly hitting.

The reveal will come. When it does, the model’s performance in the wild will already be established — exactly as intended.


How to Try It

OpenRouter:

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openrouter/elephant-alpha",
    "messages": [{"role": "user", "content": "Your task here"}]
  }'

Kilo.ai: Available free in the VS Code extension, CLI (kiloclaw), and web interface.


What to Watch

  • The reveal — which lab, when, what architecture
  • Post-alpha pricing — will it stay affordable or price up to match market rates
  • Data policy — once formalized, whether it’s safe for enterprise use
  • Open weights — if the lab is one that open-sources models, weights may follow

A free, performant, 100B-parameter model with 256K context and function calling, appearing on the leaderboard from nowhere — whether or not you know who made it, it’s worth knowing about.


Resources