DontFeedTheAI: Anonymize Your Data Before It Hits Claude
Youâre a pentester halfway through an engagement. Youâve got a pile of scan results â real IPs, internal hostnames, client domain names, maybe some credential hashes â and you need Claude Code to help you analyze them. But sending 192.168.4.17 and acme-corp-dc01.internal to Anthropicâs API means that data lives in their logs. Your clientâs NDA says no.
So you either sanitize everything by hand (tedious, error-prone) or you skip the AI help entirely. Neither option is great.
DontFeedTheAI is a third option: a transparent reverse proxy that sits between Claude Code and Anthropicâs API, automatically anonymizes sensitive data in your requests, sends the sanitized version to Claude, then restores the real values in the response before you see it.
How the flow works
- You run Claude Code normally â it points at
localhostinstead ofapi.anthropic.com - DontFeedTheAI intercepts the request and scans your prompt for sensitive data
- Sensitive values get replaced with realistic surrogates:
192.168.4.17becomes10.0.0.42,acme-corpbecomesexample-org, API keys become dummy strings - The sanitized prompt goes to Anthropicâs API â Claude sees only fake data
- Claudeâs response comes back referencing the surrogate values
- DontFeedTheAI swaps the surrogates back to real values before displaying the response
From your perspective, nothing changes. You type real data, you get real data back. But Anthropicâs servers never see the originals.
Dual-layer detection
Simple regex can catch structured patterns â IP addresses, API keys, JWT tokens, hashes, email addresses. DontFeedTheAI ships with regex rules for all of these (53 test fixtures to validate them).
But regex canât catch everything. A hostname like jenkins-prod-east embedded in a sentence doesnât match a simple pattern. Neither does a client name like âNorthwind Financialâ mentioned casually in a prompt.
Thatâs where the second layer comes in: a local Ollama LLM runs on your machine and analyzes prompts for context-sensitive items â organization names, server names in prose, project codenames, anything that looks sensitive in context but doesnât fit a regex pattern.
Both layers run locally. Nothing sensitive leaves your machine at any point in the detection process.
Per-engagement vaults
If youâre a consultant working with multiple clients, DontFeedTheAI keeps surrogate mappings isolated in per-engagement vaults. Client Aâs 10.1.1.5 maps to one surrogate; Client Bâs 10.1.1.5 maps to a different one. No cross-contamination between projects.
This matters for compliance. If an auditor asks âdid any of Client Aâs data appear in your AI interactions?â, you can point to the vault and the visual audit tool that shows every original â surrogate mapping for that engagement.
Setup
Itâs straightforward:
git clone https://github.com/zeroc00I/DontFeedTheAI
cd DontFeedTheAI
pip install -r requirements.txt
python wizard.py
The wizard configures your Ollama model, sets up the proxy port, and walks you through pointing Claude Code at localhost. Works on macOS, Linux, and Windows.
The visual audit tool
One of the more thoughtful features: a built-in audit view that shows you exactly what was replaced and with what. Before you trust the proxy with a real engagement, you can review every substitution it made. If something slipped through or was incorrectly replaced, you catch it here.
Who needs this
- Pentesters and red teamers â the original use case. Client data stays off third-party servers.
- Developers in regulated environments â healthcare, finance, government. If your compliance team says no production data in cloud AI tools, this is your workaround.
- Legal and consulting firms â client names, case details, contract terms. Attorney-client privilege doesnât extend to Anthropicâs log retention policy.
- SREs and DevOps â production configs, internal DNS, infrastructure topology. Useful data for debugging with AI, dangerous to expose.
- Researchers â IRB-protected datasets, patient identifiers, proprietary data.
Why not just use alternatives?
- Cloud anonymization APIs (like Presidio as a service): Youâre sending your sensitive data to another third party to anonymize it before sending it to the first third party. That defeats the purpose.
- Ollama alone: Great for local inference, but you lose Claudeâs quality. And thereâs no interception layer â youâd have to manually sanitize prompts.
- Just use Claude directly: Everything goes into Anthropicâs logs. For many professional contexts, thatâs a non-starter.
DontFeedTheAI gives you Claude-quality responses with local-only data handling. The proxy layer is the key innovation â it makes anonymization transparent instead of manual.
The self-improving part
The project ships with an auto-improvement loop: when the regex layer misses a pattern that the LLM layer catches, it can generate a new regex rule and add it to the ruleset. Over time, the regex layer gets smarter and the LLM layer handles fewer edge cases. The 53 test fixtures act as a regression suite so new rules donât break existing detection.
Itâs a clever design. Worth checking out if youâve been avoiding AI tools because of data sensitivity â this might be the thing that lets you use them responsibly.