PinchTab: The 12MB Binary That Gives AI Agents Full Browser Control
The biggest bottleneck for AI agents just got solved by a 12MB file.
PinchTab is a standalone Go binary that gives any AI agent full browser control through a plain HTTP API. Your agent sends a request, and PinchTab clicks, types, navigates, and extracts data from Chrome. No WebDriver setup, no Playwright dependencies, no complex SDKs.
Why This Matters
Browser automation has been a pain point for AI agents. The typical approach involves:
- Screenshots — Expensive. A single page screenshot can burn 10,000+ tokens just to “see” what’s on screen.
- Heavy dependencies — Playwright, Puppeteer, Selenium all require substantial runtime environments.
- Bot detection — Many sites block automated browsers instantly.
- Language lock-in — Most solutions work best with JavaScript/Python, leaving other stacks behind.
PinchTab addresses all of these.
The Token Efficiency Play
Here’s where PinchTab really shines: instead of taking expensive screenshots, it parses the accessibility tree — the same structure screen readers use. This gives your agent a semantic understanding of the page at roughly 800 tokens instead of 10,000+.
That’s a 13x reduction in token costs for browser-based tasks.
# Get page structure (token-efficient)
pinchtab snap -i -c
# Extract text (~800 tokens vs 10,000 for screenshot)
pinchtab text
The accessibility-first approach also means stable element references. Instead of fragile CSS selectors or pixel coordinates, you get refs like e5 that represent actual interactive elements.
Dead Simple API
Whether you’re building in Python, TypeScript, Go, Rust, or anything else — if you can make an HTTP request, you can control a browser:
# Navigate to a page
curl -X POST http://localhost:9867/tabs/$TAB/navigate \
-d '{"url":"https://example.com"}'
# Get interactive elements
curl "http://localhost:9867/tabs/$TAB/snapshot?filter=interactive"
# Click an element
curl -X POST "http://localhost:9867/tabs/$TAB/action" \
-d '{"kind":"click","ref":"e5"}'
# Fill a form field
curl -X POST "http://localhost:9867/tabs/$TAB/action" \
-d '{"kind":"fill","ref":"e3","text":"user@example.com"}'
No SDK required. No language-specific bindings. Just HTTP.
Stealth Mode Built-In
PinchTab includes advanced stealth injection to bypass bot detection. Sites that typically block Puppeteer or Playwright? PinchTab handles them. It manages Chrome instances with proper fingerprinting and detection evasion baked in.
Multi-Instance Orchestration
Need to run parallel browser sessions? PinchTab supports multiple isolated Chrome instances, each with its own profile:
# Spin up multiple instances
curl -X POST http://localhost:9867/instances/start \
-d '{"profileId":"alice","mode":"headless"}'
curl -X POST http://localhost:9867/instances/start \
-d '{"profileId":"bob","mode":"headless"}'
Each instance maintains its own cookies, history, and local storage. Log in once, stay logged in across restarts.
Installation
One command:
# macOS / Linux
curl -fsSL https://pinchtab.com/install.sh | bash
# Or via npm
npm install -g pinchtab
# Or Docker
docker run -d --name pinchtab -p 127.0.0.1:9867:9867 pinchtab/pinchtab
Then start the server:
pinchtab
That’s it. Your browser service is running at http://localhost:9867.
MCP Integration
For AI agents using the Model Context Protocol, PinchTab includes an SMCP plugin that exposes 15 tools (pinchtab__navigate, pinchtab__snapshot, pinchtab__action, etc.). No extra runtime dependencies — it’s stdlib-only Go.
How It Compares: PinchTab vs Clawdbot vs OpenClaw vs Camofox
If you’re in the AI agent space, you’ve got options for browser control. Here’s how PinchTab stacks up against other tools we’ve tested:
| Feature | PinchTab | Clawdbot Browser | OpenClaw Browser | Camofox |
|---|---|---|---|---|
| Architecture | Standalone HTTP server | Integrated in agent framework | Integrated in agent framework | Standalone HTTP server |
| Binary Size | 12MB (Go) | Part of Clawdbot | Part of OpenClaw | ~50MB (Node + deps) |
| Token Efficiency | ⭐⭐⭐ Accessibility tree (~800 tokens) | ⭐⭐ Snapshots | ⭐⭐ Snapshots | ⭐⭐ Snapshots |
| Stealth/Anti-Detection | ⭐⭐ Built-in stealth injection | ⭐ Basic | ⭐ Basic | ⭐⭐⭐ C++ level fingerprint spoofing |
| Multi-Instance | ⭐⭐⭐ Native orchestration | ⭐⭐ Via profiles | ⭐⭐ Via profiles | ⭐⭐ Session-based |
| MCP Integration | ✅ SMCP plugin | ✅ Native | ✅ Native | ❌ |
| Language Agnostic | ✅ Pure HTTP | Tied to Clawdbot | Tied to OpenClaw | ✅ Pure HTTP |
| Best For | Standalone automation, token-conscious agents | Integrated agent workflows | Open-source agent workflows | Sites with aggressive bot detection |
When to Use What
Use Clawdbot/OpenClaw browser when:
- You’re already running Clawdbot or OpenClaw as your agent framework
- You want tight integration with your agent’s memory, tools, and sessions
- The target site doesn’t have aggressive bot detection
Use PinchTab when:
- You need a lightweight, standalone browser service
- Token costs matter (the 13x reduction is real)
- You’re building in a language without good Playwright/Puppeteer bindings
- You want to decouple browser control from your agent framework
Use Camofox when:
- You’re hitting sites with serious bot detection (Twitter/X, LinkedIn, Cloudflare)
- You need C++-level fingerprint spoofing (not just JS shims)
- Stealth is more important than token efficiency
In practice, these tools complement each other. PinchTab for everyday automation, Camofox when sites fight back, and Clawdbot/OpenClaw browser when you want everything integrated.
The Bottom Line
PinchTab is what browser automation for AI agents should have been from the start:
- Tiny — 12MB binary, no external dependencies
- Universal — HTTP API works with any language
- Efficient — 13x token reduction via accessibility tree parsing
- Stealthy — Built-in bot detection bypass
- Flexible — Headless or headed, single or multi-instance
If you’re building AI agents that need to interact with the web, this removes a major friction point.
GitHub: github.com/pinchtab/pinchtab
Docs: pinchtab.com/docs
License: MIT (100% open source)