Agent Safehouse: Kernel-Level Sandboxing for Your AI Coding Agents

By Prahlad Menon 4 min read

You’ve probably given an AI coding agent the --dangerously-skip-permissions flag at some point. Maybe --yolo. The agent is faster and more autonomous with those flags — but you’ve just handed a probabilistic system full access to your home directory.

That’s where Agent Safehouse comes in.

The Problem With AI Agent Permissions

LLM coding agents are, by nature, probabilistic. There’s always a small chance of an unexpected action — a misunderstood instruction, a hallucinated file path, an overly aggressive cleanup command. The agents themselves know this: that’s why Claude Code has an approval prompt by default.

The issue is that when you bypass those prompts for productivity, you’re relying entirely on the LLM to stay within bounds. And LLMs don’t always stay within bounds.

Agent Safehouse takes a different approach: the kernel enforces the boundary, not the LLM.

How It Works

Safehouse wraps any agent invocation with macOS’s built-in sandbox-exec facility — the same Seatbelt sandbox used by macOS app sandboxing since macOS 10.5. This is kernel-level enforcement, not a userspace monitor.

The access model is deny-first: nothing is accessible unless explicitly granted. By default, Safehouse automatically grants:

  • ✅ Read/write to your current project directory (git root by default)
  • ✅ Read-only access to installed toolchains (node_modules, Python, etc.)
  • ~/.ssh/ — denied
  • ~/.aws/ — denied
  • ❌ All other repos in your home directory — denied
  • ❌ Personal files, dotfiles, everything else — denied

The agent thinks it has normal filesystem access. The kernel silently enforces the boundary. Even if the LLM decides to run rm -rf ~, the kernel blocks it before the process ever touches anything outside the sandbox.

Installation: One Shell Script, Zero Dependencies

# 1. Download safehouse (single self-contained script)
mkdir -p ~/.local/bin
curl -fsSL https://raw.githubusercontent.com/eugene1g/agent-safehouse/main/dist/safehouse.sh \
  -o ~/.local/bin/safehouse
chmod +x ~/.local/bin/safehouse

# 2. Run any agent inside Safehouse
cd ~/projects/my-app
safehouse claude --dangerously-skip-permissions

No build step. No dependencies. Just Bash and macOS. The entire tool is a single shell script.

See It Work

The proof is straightforward — try accessing something outside the sandbox:

# Try to read your SSH private key — denied by the kernel
safehouse cat ~/.ssh/id_ed25519
# cat: /Users/you/.ssh/id_ed25519: Operation not permitted

# Try to list another repo — invisible
safehouse ls ~/other-project
# ls: /Users/you/other-project: Operation not permitted

# But your current project works fine
safehouse ls .
# README.md src/ package.json ...

The kernel doesn’t just block access — it makes files outside the sandbox invisible to the agent process. There’s no error to reason about; the paths simply don’t exist from the agent’s perspective.

Shell Aliases for Automatic Sandboxing

The most practical setup is to make sandboxed execution the default for every agent command:

# ~/.zshrc or ~/.bashrc
safe() { safehouse --add-dirs-ro=~/mywork "$@"; }

# Every agent invocation is sandboxed by default
claude() { safe claude --dangerously-skip-permissions "$@"; }
codex()  { safe codex --dangerously-bypass-approvals-and-sandbox "$@"; }
amp()    { safe amp --dangerously-allow-all "$@"; }
gemini() { NO_BROWSER=true safe gemini --yolo "$@"; }

# Bypass when needed: `command claude` (plain, unsandboxed)

With this setup, typing claude always runs inside Safehouse. If you explicitly need unsandboxed access, command claude bypasses the function. The default is safe; the escape hatch requires deliberate intent.

Generate Custom Profiles With an LLM

For more complex setups — monorepos, shared libraries, unusual toolchain paths — Safehouse provides a copy-paste prompt for generating custom sandbox-exec profiles using an LLM:

llm-instructions.txt

The prompt tells the model to inspect real Safehouse profile templates, ask about your home directory layout and toolchain, and generate a least-privilege profile for your specific setup. It suggests a durable profile path (~/.config/sandbox-exec.profile) and wrapper scripts for each agent.

It’s a nice meta-move: using an AI agent to configure the sandbox that contains AI agents.

Why Kernel-Level Enforcement Matters

The alternative approaches to agent sandboxing all have weaknesses:

ApproachWeakness
LLM approval promptsBypassed with --dangerously-skip-permissions
Userspace monitorsAgent can write code to bypass them
Docker containersHeavy overhead, breaks toolchain paths
VM isolationEven heavier, terrible DX for local dev
sandbox-exec (Safehouse)Kernel enforces it — no bypass possible from userspace

The kernel boundary is the right place to enforce agent permissions for local development. It’s lightweight (zero runtime overhead on normal operations), composable (profiles can be combined), and bypass-proof from the agent’s process context.

Tested Against Real Agents

The Safehouse docs include agent investigation reports — detailed analyses of how each major coding agent (including Cursor) behaves inside the sandbox. The conclusion: all tested agents work correctly within their allowed project directories and can’t reach anything outside the boundary.

The Bigger Picture: Defense in Depth for AI Development

Agent Safehouse represents a shift in thinking about AI agent security. The current model — “trust the LLM to stay within bounds” — works fine until it doesn’t. The 1% failure case is inevitable given enough agent invocations.

Kernel-level sandboxing adds a layer that doesn’t depend on LLM reliability. Your SSH keys are safe not because the agent is well-behaved, but because the kernel doesn’t show them to the agent at all.

For production agent deployments, this principle scales up: microVMs (AWS Firecracker, Google gVisor) provide hardware-level isolation for cloud agents the same way sandbox-exec provides kernel-level isolation for local ones. The architecture is different, but the principle is identical: enforce the boundary at a layer the agent cannot reason about or escape.

If you’re running coding agents locally on macOS — and especially if you’ve gotten comfortable with --yolo flags — Agent Safehouse is worth five minutes to install.

GitHub: eugene1g/agent-safehouse
Website: agent-safehouse.dev
License: Apache 2.0