OpenSeeker is an open-source search agent built by researchers at SJTU that achieves state-of-the-art performance on frontier search benchmarks. It fully open-sources both the training data (11.7K examples) and the fine-tuned model (OpenSeeker-v1-30B-SFT, based on Qwen3-30B-A3B-Thinking). It's the first purely academic project to match or beat proprietary search agents on benchmarks like BrowseComp and WideSearch.

What benchmarks does OpenSeeker score on?

OpenSeeker-v1 scores 48.4 on BrowseComp-ZH, 29.5 on BrowseComp, 74.0 on xbench-DeepSearch, and 59.4 on WideSearch — using only 11.7K training examples fine-tuned on Qwen3-30B-A3B-Thinking-2507.

What makes OpenSeeker different from other search agents?

Most frontier search agents (Perplexity, OpenAI deep research, Gemini Deep Research) are closed-source — you can't see their training data, fine-tuning approach, or reproduce their results. OpenSeeker is the first to open-source everything: the training data, the trained model weights, and the evaluation code. That makes it reproducible, auditable, and buildable-upon.

What model is OpenSeeker based on?

OpenSeeker-v1 is fine-tuned from Qwen3-30B-A3B-Thinking-2507 — Alibaba's 30B mixture-of-experts thinking model. The fine-tuning uses only 11.7K curated search agent training examples, demonstrating that data quality matters more than quantity for search agent capability.

How do I run OpenSeeker locally?

Clone the repo, create a conda environment (Python 3.10), pip install requirements, download the model from HuggingFace using git-xet, update MODEL_PATH in run_openseeker.sh, and run bash run_openseeker.sh to start the model server. Then configure your API endpoints in setup_env.sh.

BrowseComp is OpenAI's benchmark for evaluating web browsing and search agents on complex, multi-step information retrieval tasks. It requires agents to synthesize information across multiple web pages to answer questions that can't be answered with a single search. It's one of the hardest search agent benchmarks available.

Is OpenSeeker's training data available?

Yes. The full training dataset (OpenSeeker-v1-Data) is available on HuggingFace at huggingface.co/datasets/OpenSeeker/OpenSeeker-v1-Data. A newer, higher-quality batch of data is available by contacting the authors at yr991129@sjtu.edu.cn.

What tools does OpenSeeker use during search?

OpenSeeker uses two core tools: a search tool (web search API) and a visit tool (web page reading). These are implemented in src/tools/search.py and src/tools/visit.py. The LLM orchestrates multi-step search by deciding when to search, what to search for, and which pages to visit — iterating until it can answer the question.

OpenSeeker: The First Open-Source Search Agent That Beats Frontier Models

By Prahlad Menon Published 2026-04-18 3 min read

The most capable search agents today — Perplexity Deep Research, OpenAI deep research, Gemini Deep Research — are closed black boxes. You can use them but you can’t inspect them, reproduce them, or build on them. Their training data is proprietary, their fine-tuning approaches are unpublished, and their benchmark scores are unverifiable.

OpenSeeker changes that.

Published in March 2026 by a purely academic team at SJTU, OpenSeeker is the first open-source search agent to achieve state-of-the-art performance on frontier search benchmarks while simultaneously releasing everything: training data, model weights, evaluation code.

What It Achieves

OpenSeeker-v1 was fine-tuned from Qwen3-30B-A3B-Thinking-2507 using just 11.7K training examples. The results:

Benchmark	Score
BrowseComp-ZH	48.4
BrowseComp	29.5
xbench-DeepSearch	74.0
WideSearch	59.4

BrowseComp is OpenAI’s hardest search benchmark — complex, multi-step questions that require synthesizing information across many web pages. Scoring 29.5 on BrowseComp puts OpenSeeker in range of proprietary systems that have had years and massive datasets to train on.

The 11.7K figure is the key. This isn’t a “throw compute at it” result. It’s a data quality and training recipe story — and they’ve open-sourced both.

How It Works

OpenSeeker is a tool-using agent built on a thinking LLM. The architecture is clean:

OpenSeeker/
├── src/
│   ├── llm_tool_openseeker.py   # Core agent loop
│   └── tools/
│       ├── search.py            # Web search
│       └── visit.py             # Page reading
├── eval/
│   ├── generate_answer.py       # Run agent on benchmark
│   └── eval.py                  # Score results
└── run_openseeker.sh            # Deploy model server

The agent loop is iterative: search → read → reason → search again if needed → answer. The Qwen3 thinking model reasons through each step before deciding what tool to call, which is what drives the high scores on complex multi-hop questions.

Running It

# Clone and install
git clone https://github.com/rui-ye/OpenSeeker.git
cd OpenSeeker
conda create --name openseeker python=3.10
conda activate openseeker
pip install -r requirements.txt

# Download model (requires git-xet for HuggingFace large files)
brew install git-xet
git xet install
git clone https://huggingface.co/OpenSeeker/OpenSeeker-v1-30B-SFT

# Configure and start model server
# Edit run_openseeker.sh: set MODEL_PATH to your model directory
bash run_openseeker.sh

# Set API endpoints
source setup_env.sh

# Evaluate
python eval/generate_answer.py \
  --dataset_path /path/to/dataset.jsonl \
  --out_dir ./output

python eval/eval.py \
  --data_path ./output/result_tool200.jsonl \
  --max_workers 20

You need a machine capable of running a 30B model — an A100 or equivalent. The model server is vLLM-based, so standard serving infrastructure applies.

Why Open-Sourcing Training Data Matters

The model weights alone aren’t the breakthrough. The training data is.

Fine-tuning a search agent is hard because you need examples of good search behavior — knowing when to search again, how to synthesize conflicting results, how to visit pages selectively. This is expensive to generate and almost no one publishes it.

OpenSeeker publishes 11.7K of these examples, with a higher-quality batch available on request. That dataset is now available for anyone to fine-tune their own models, run ablations, or build on.

This is what academic AI research is supposed to look like: reproducible, auditable, and genuinely useful to the community.

The Research

The paper is on arXiv at 2603.15594:

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
Yuwen Du, Rui Ye, Shuo Tang, Xinyu Zhu, Yijun Lu, Yuzhu Cai, Siheng Chen
SJTU, 2026

What to Build With It

A few directions worth exploring:

Domain-specific search agents. The architecture is domain-agnostic. Fine-tune on medical literature searches, legal case research, financial filings — the 11.7K training format gives you a template for curating domain-specific search trajectories.

Smaller distilled versions. 30B is large. The training data could be used to distill a 7B or 14B search agent that trades some benchmark points for deployability.

Search + memory. Pair OpenSeeker with a persistent memory layer (soul.py) so the agent remembers what it searched for in prior sessions and avoids redundant queries. Research agents that compound knowledge over time rather than starting fresh each session.

RAG replacement. For complex multi-hop questions, OpenSeeker’s iterative search approach may outperform static RAG pipelines. Instead of embedding a fixed corpus, let the agent search live.

Resources

GitHub: github.com/rui-ye/OpenSeeker
Model: huggingface.co/OpenSeeker/OpenSeeker-v1-30B-SFT
Dataset: huggingface.co/datasets/OpenSeeker/OpenSeeker-v1-Data
Paper: arXiv:2603.15594