PinchTab: The 12MB Binary That Gives AI Agents Full Browser Control

By Prahlad Menon 4 min read

The biggest bottleneck for AI agents just got solved by a 12MB file.

PinchTab is a standalone Go binary that gives any AI agent full browser control through a plain HTTP API. Your agent sends a request, and PinchTab clicks, types, navigates, and extracts data from Chrome. No WebDriver setup, no Playwright dependencies, no complex SDKs.

Why This Matters

Browser automation has been a pain point for AI agents. The typical approach involves:

  1. Screenshots — Expensive. A single page screenshot can burn 10,000+ tokens just to “see” what’s on screen.
  2. Heavy dependencies — Playwright, Puppeteer, Selenium all require substantial runtime environments.
  3. Bot detection — Many sites block automated browsers instantly.
  4. Language lock-in — Most solutions work best with JavaScript/Python, leaving other stacks behind.

PinchTab addresses all of these.

The Token Efficiency Play

Here’s where PinchTab really shines: instead of taking expensive screenshots, it parses the accessibility tree — the same structure screen readers use. This gives your agent a semantic understanding of the page at roughly 800 tokens instead of 10,000+.

That’s a 13x reduction in token costs for browser-based tasks.

# Get page structure (token-efficient)
pinchtab snap -i -c

# Extract text (~800 tokens vs 10,000 for screenshot)
pinchtab text

The accessibility-first approach also means stable element references. Instead of fragile CSS selectors or pixel coordinates, you get refs like e5 that represent actual interactive elements.

Dead Simple API

Whether you’re building in Python, TypeScript, Go, Rust, or anything else — if you can make an HTTP request, you can control a browser:

# Navigate to a page
curl -X POST http://localhost:9867/tabs/$TAB/navigate \
  -d '{"url":"https://example.com"}'

# Get interactive elements
curl "http://localhost:9867/tabs/$TAB/snapshot?filter=interactive"

# Click an element
curl -X POST "http://localhost:9867/tabs/$TAB/action" \
  -d '{"kind":"click","ref":"e5"}'

# Fill a form field
curl -X POST "http://localhost:9867/tabs/$TAB/action" \
  -d '{"kind":"fill","ref":"e3","text":"user@example.com"}'

No SDK required. No language-specific bindings. Just HTTP.

Stealth Mode Built-In

PinchTab includes advanced stealth injection to bypass bot detection. Sites that typically block Puppeteer or Playwright? PinchTab handles them. It manages Chrome instances with proper fingerprinting and detection evasion baked in.

Multi-Instance Orchestration

Need to run parallel browser sessions? PinchTab supports multiple isolated Chrome instances, each with its own profile:

# Spin up multiple instances
curl -X POST http://localhost:9867/instances/start \
  -d '{"profileId":"alice","mode":"headless"}'

curl -X POST http://localhost:9867/instances/start \
  -d '{"profileId":"bob","mode":"headless"}'

Each instance maintains its own cookies, history, and local storage. Log in once, stay logged in across restarts.

Installation

One command:

# macOS / Linux
curl -fsSL https://pinchtab.com/install.sh | bash

# Or via npm
npm install -g pinchtab

# Or Docker
docker run -d --name pinchtab -p 127.0.0.1:9867:9867 pinchtab/pinchtab

Then start the server:

pinchtab

That’s it. Your browser service is running at http://localhost:9867.

MCP Integration

For AI agents using the Model Context Protocol, PinchTab includes an SMCP plugin that exposes 15 tools (pinchtab__navigate, pinchtab__snapshot, pinchtab__action, etc.). No extra runtime dependencies — it’s stdlib-only Go.

How It Compares: PinchTab vs Clawdbot vs OpenClaw vs Camofox

If you’re in the AI agent space, you’ve got options for browser control. Here’s how PinchTab stacks up against other tools we’ve tested:

FeaturePinchTabClawdbot BrowserOpenClaw BrowserCamofox
ArchitectureStandalone HTTP serverIntegrated in agent frameworkIntegrated in agent frameworkStandalone HTTP server
Binary Size12MB (Go)Part of ClawdbotPart of OpenClaw~50MB (Node + deps)
Token Efficiency⭐⭐⭐ Accessibility tree (~800 tokens)⭐⭐ Snapshots⭐⭐ Snapshots⭐⭐ Snapshots
Stealth/Anti-Detection⭐⭐ Built-in stealth injection⭐ Basic⭐ Basic⭐⭐⭐ C++ level fingerprint spoofing
Multi-Instance⭐⭐⭐ Native orchestration⭐⭐ Via profiles⭐⭐ Via profiles⭐⭐ Session-based
MCP Integration✅ SMCP plugin✅ Native✅ Native
Language Agnostic✅ Pure HTTPTied to ClawdbotTied to OpenClaw✅ Pure HTTP
Best ForStandalone automation, token-conscious agentsIntegrated agent workflowsOpen-source agent workflowsSites with aggressive bot detection

When to Use What

Use Clawdbot/OpenClaw browser when:

  • You’re already running Clawdbot or OpenClaw as your agent framework
  • You want tight integration with your agent’s memory, tools, and sessions
  • The target site doesn’t have aggressive bot detection

Use PinchTab when:

  • You need a lightweight, standalone browser service
  • Token costs matter (the 13x reduction is real)
  • You’re building in a language without good Playwright/Puppeteer bindings
  • You want to decouple browser control from your agent framework

Use Camofox when:

  • You’re hitting sites with serious bot detection (Twitter/X, LinkedIn, Cloudflare)
  • You need C++-level fingerprint spoofing (not just JS shims)
  • Stealth is more important than token efficiency

In practice, these tools complement each other. PinchTab for everyday automation, Camofox when sites fight back, and Clawdbot/OpenClaw browser when you want everything integrated.

The Bottom Line

PinchTab is what browser automation for AI agents should have been from the start:

  • Tiny — 12MB binary, no external dependencies
  • Universal — HTTP API works with any language
  • Efficient — 13x token reduction via accessibility tree parsing
  • Stealthy — Built-in bot detection bypass
  • Flexible — Headless or headed, single or multi-instance

If you’re building AI agents that need to interact with the web, this removes a major friction point.

GitHub: github.com/pinchtab/pinchtab
Docs: pinchtab.com/docs
License: MIT (100% open source)