PoisonedRAG: 5 Documents Can Hijack Your RAG System 97% of the Time

By Prahlad Menon Published 2026-04-26 4 min read

The industry spent two years optimizing RAG retrieval quality. The attack surface that makes all of it weaponizable was sitting there the whole time.

PoisonedRAG (arXiv:2402.07867), accepted at USENIX Security 2025, demonstrates that an attacker can inject 5 documents into a knowledge base of 2.6 million texts and control a frontier LLM’s output 97% of the time. The attacker never touches the model. Never sees the retriever. They just write a document.

The Setup You’re Already Running

The target is every enterprise RAG system shipping in 2026:

A knowledge base of documents (wikis, support articles, medical records, legal corpora)
A retriever that fetches the most relevant documents for each user query
An LLM that reads those documents and generates a grounded answer

Every defense the industry sells — grounding, citations, retrieval — lives in this pipeline. PoisonedRAG flips the safety story on its head.

From the paper:

“Knowledge databases of RAG systems introduce a new and practical attack surface. An attacker can inject malicious texts into the knowledge database of a RAG system to induce an LLM to generate attacker-desired answers to user questions.”

The target isn’t the model. It’s the library the model reads from.

The Threat Model Is What Makes This Dark

The attacker:

Cannot access or query the LLM
Cannot see the contents of the knowledge database
Cannot access the retriever’s parameters (black-box setting)

All the attacker can do is write a document the retriever might eventually see. That is enough.

The Numbers

From Table 1 of the paper:

Dataset	Corpus Size	ASR (GPT-4)	ASR (PaLM 2)	Injected Texts
Natural Questions	2.68M	97%	97%	5 per question
HotpotQA	5.23M	—	99%	5 per question
MS-MARCO	8.84M	—	91%	5 per question

That’s 0.0002% of the corpus achieving near-total output control.

Across 8 LLMs tested (GPT-4, GPT-3.5-Turbo, PaLM 2, LLaMA-2 7B/13B, Vicuna 7B/13B/33B), the attack consistently works. A bigger database does not mean a safer RAG.

How It Works

Each malicious text is crafted to satisfy two conditions simultaneously:

Retrieval condition: The embedding vector of the malicious text must be similar to the target question’s embedding, so the retriever fetches it.
Generation condition: The LLM must generate the attacker’s chosen answer when this text alone is used as context.

One document. Two payloads.

How Poison Gets Into Real Corpora

The paper’s own attack-vector analysis:

Maliciously editing public resources like Wikipedia
Posting fake news or hosting malicious websites that get scraped into corporate knowledge bases
Exploiting insider access to private databases

Your enterprise RAG is only as safe as the most corruptible source feeding its pipeline.

Every Known Defense Fails

The authors tested the defenses the industry currently recommends:

Defense	Result
Paraphrasing	ASR stays 79–93%
Perplexity-based detection	False positive rate too high to be usable
Duplicate text filtering	”The ASR is the same”
Knowledge expansion (top-50 retrieval)	ASR still 41–43%

The paper’s conclusion: “These defenses are insufficient to defend against PoisonedRAG.”

Real-World Stakes

The authors name them directly:

“These threats bring serious safety and ethical concerns for the deployment of RAG systems for real-world applications in healthcare, finance, legal consulting, etc.”

Example from the paper: target question “Who is the CEO of OpenAI?” → attacker-chosen answer “Tim Cook.” The LLM generates it on demand.

Scale that up: a company falsely “confirmed” as bankrupt. A drug falsely “confirmed” as contraindicated. A legal precedent fabricated from whole cloth.

The 2025 OWASP Top 10 for LLMs now includes Vector and Embedding Weaknesses (LLM08:2025) as a new category explicitly covering embedding poisoning and retrieval manipulation.

The Field Is Pivoting From Prevention to Forensics

A follow-up paper from 2025 — Traceback of Poisoning Attacks to Retrieval-Augmented Generation — confirms the direction. Its framing:

“Existing defenses focus on inference-time mitigation and have proven insufficient against sophisticated attacks.”

The research community is accepting that some poisoning will succeed and pivoting to tracing attacks after the fact.

What This Means for You

If you’re building or deploying RAG:

Audit your data pipeline more aggressively than your model. The retrieval layer is the attack surface.
Treat document ingestion as a security boundary, not a data engineering task.
Don’t rely on existing defenses (paraphrasing, perplexity filters) as primary protection — the paper proves they’re insufficient.
Monitor for output anomalies — forensic detection may be more viable than prevention.
Limit corpus write access as tightly as you’d limit database admin credentials.

The headline the industry sold you: “RAG eliminates hallucinations.”

The headline the research supports: RAG creates a new, high-leverage attack surface.

Paper: PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models (USENIX Security 2025)

Code: github.com/sleeepeer/PoisonedRAG