PoisonedRAG: 5 Documents Can Hijack Your RAG System 97% of the Time
The industry spent two years optimizing RAG retrieval quality. The attack surface that makes all of it weaponizable was sitting there the whole time.
PoisonedRAG (arXiv:2402.07867), accepted at USENIX Security 2025, demonstrates that an attacker can inject 5 documents into a knowledge base of 2.6 million texts and control a frontier LLMâs output 97% of the time. The attacker never touches the model. Never sees the retriever. They just write a document.
The Setup Youâre Already Running
The target is every enterprise RAG system shipping in 2026:
- A knowledge base of documents (wikis, support articles, medical records, legal corpora)
- A retriever that fetches the most relevant documents for each user query
- An LLM that reads those documents and generates a grounded answer
Every defense the industry sells â grounding, citations, retrieval â lives in this pipeline. PoisonedRAG flips the safety story on its head.
From the paper:
âKnowledge databases of RAG systems introduce a new and practical attack surface. An attacker can inject malicious texts into the knowledge database of a RAG system to induce an LLM to generate attacker-desired answers to user questions.â
The target isnât the model. Itâs the library the model reads from.
The Threat Model Is What Makes This Dark
The attacker:
- Cannot access or query the LLM
- Cannot see the contents of the knowledge database
- Cannot access the retrieverâs parameters (black-box setting)
All the attacker can do is write a document the retriever might eventually see. That is enough.
The Numbers
From Table 1 of the paper:
| Dataset | Corpus Size | ASR (GPT-4) | ASR (PaLM 2) | Injected Texts |
|---|---|---|---|---|
| Natural Questions | 2.68M | 97% | 97% | 5 per question |
| HotpotQA | 5.23M | â | 99% | 5 per question |
| MS-MARCO | 8.84M | â | 91% | 5 per question |
Thatâs 0.0002% of the corpus achieving near-total output control.
Across 8 LLMs tested (GPT-4, GPT-3.5-Turbo, PaLM 2, LLaMA-2 7B/13B, Vicuna 7B/13B/33B), the attack consistently works. A bigger database does not mean a safer RAG.
How It Works
Each malicious text is crafted to satisfy two conditions simultaneously:
- Retrieval condition: The embedding vector of the malicious text must be similar to the target questionâs embedding, so the retriever fetches it.
- Generation condition: The LLM must generate the attackerâs chosen answer when this text alone is used as context.
One document. Two payloads.
How Poison Gets Into Real Corpora
The paperâs own attack-vector analysis:
- Maliciously editing public resources like Wikipedia
- Posting fake news or hosting malicious websites that get scraped into corporate knowledge bases
- Exploiting insider access to private databases
Your enterprise RAG is only as safe as the most corruptible source feeding its pipeline.
Every Known Defense Fails
The authors tested the defenses the industry currently recommends:
| Defense | Result |
|---|---|
| Paraphrasing | ASR stays 79â93% |
| Perplexity-based detection | False positive rate too high to be usable |
| Duplicate text filtering | âThe ASR is the sameâ |
| Knowledge expansion (top-50 retrieval) | ASR still 41â43% |
The paperâs conclusion: âThese defenses are insufficient to defend against PoisonedRAG.â
Real-World Stakes
The authors name them directly:
âThese threats bring serious safety and ethical concerns for the deployment of RAG systems for real-world applications in healthcare, finance, legal consulting, etc.â
Example from the paper: target question âWho is the CEO of OpenAI?â â attacker-chosen answer âTim Cook.â The LLM generates it on demand.
Scale that up: a company falsely âconfirmedâ as bankrupt. A drug falsely âconfirmedâ as contraindicated. A legal precedent fabricated from whole cloth.
The 2025 OWASP Top 10 for LLMs now includes Vector and Embedding Weaknesses (LLM08:2025) as a new category explicitly covering embedding poisoning and retrieval manipulation.
The Field Is Pivoting From Prevention to Forensics
A follow-up paper from 2025 â Traceback of Poisoning Attacks to Retrieval-Augmented Generation â confirms the direction. Its framing:
âExisting defenses focus on inference-time mitigation and have proven insufficient against sophisticated attacks.â
The research community is accepting that some poisoning will succeed and pivoting to tracing attacks after the fact.
What This Means for You
If youâre building or deploying RAG:
- Audit your data pipeline more aggressively than your model. The retrieval layer is the attack surface.
- Treat document ingestion as a security boundary, not a data engineering task.
- Donât rely on existing defenses (paraphrasing, perplexity filters) as primary protection â the paper proves theyâre insufficient.
- Monitor for output anomalies â forensic detection may be more viable than prevention.
- Limit corpus write access as tightly as youâd limit database admin credentials.
The headline the industry sold you: âRAG eliminates hallucinations.â
The headline the research supports: RAG creates a new, high-leverage attack surface.
Paper: PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models (USENIX Security 2025)