What makes PinchTab different from Playwright/Puppeteer?

12MB standalone Go binary, no external dependencies. Pure HTTP API works with any language. 13x token reduction via accessibility tree (~800 tokens vs 10,000+ for screenshots). Built-in stealth/bot detection bypass. Multi-instance orchestration.

How does PinchTab's token efficiency work?

Instead of expensive screenshots, parses the accessibility tree (what screen readers use). Gives semantic page understanding at ~800 tokens instead of 10,000+. Stable element refs like 'e5' instead of fragile CSS selectors or pixel coordinates.

How do I control a browser with PinchTab?

HTTP requests: POST /tabs/$TAB/navigate with url, GET /tabs/$TAB/snapshot?filter=interactive for elements, POST /tabs/$TAB/action with {kind:'click', ref:'e5'} or {kind:'fill', ref:'e3', text:'...'}. Language agnostic.

When should I use PinchTab vs Clawdbot/OpenClaw browser vs Camofox?

PinchTab: standalone, token-conscious, language-agnostic. Clawdbot/OpenClaw: integrated agent workflows, tight memory/tools integration. Camofox: aggressive bot detection (Twitter, LinkedIn, Cloudflare) needing C++-level fingerprint spoofing.

Does PinchTab support multiple browser instances?

Yes — native multi-instance orchestration. Each instance has own profile with cookies, history, local storage. Log in once, stay logged in across restarts. POST /instances/start with profileId and mode.

How do I install PinchTab?

'curl -fsSL https://pinchtab.com/install.sh | bash' or 'npm install -g pinchtab' or Docker. Then 'pinchtab' to start server at localhost:9867. MCP integration via SMCP plugin with 15 tools.

PinchTab: The 12MB Binary That Gives AI Agents Full Browser Control

By Prahlad Menon Published 2026-03-10 4 min read

The biggest bottleneck for AI agents just got solved by a 12MB file.

PinchTab is a standalone Go binary that gives any AI agent full browser control through a plain HTTP API. Your agent sends a request, and PinchTab clicks, types, navigates, and extracts data from Chrome. No WebDriver setup, no Playwright dependencies, no complex SDKs.

Why This Matters

Browser automation has been a pain point for AI agents. The typical approach involves:

Screenshots — Expensive. A single page screenshot can burn 10,000+ tokens just to “see” what’s on screen.
Heavy dependencies — Playwright, Puppeteer, Selenium all require substantial runtime environments.
Bot detection — Many sites block automated browsers instantly.
Language lock-in — Most solutions work best with JavaScript/Python, leaving other stacks behind.

PinchTab addresses all of these.

The Token Efficiency Play

Here’s where PinchTab really shines: instead of taking expensive screenshots, it parses the accessibility tree — the same structure screen readers use. This gives your agent a semantic understanding of the page at roughly 800 tokens instead of 10,000+.

That’s a 13x reduction in token costs for browser-based tasks.

# Get page structure (token-efficient)
pinchtab snap -i -c

# Extract text (~800 tokens vs 10,000 for screenshot)
pinchtab text

The accessibility-first approach also means stable element references. Instead of fragile CSS selectors or pixel coordinates, you get refs like e5 that represent actual interactive elements.

Dead Simple API

Whether you’re building in Python, TypeScript, Go, Rust, or anything else — if you can make an HTTP request, you can control a browser:

# Navigate to a page
curl -X POST http://localhost:9867/tabs/$TAB/navigate \
  -d '{"url":"https://example.com"}'

# Get interactive elements
curl "http://localhost:9867/tabs/$TAB/snapshot?filter=interactive"

# Click an element
curl -X POST "http://localhost:9867/tabs/$TAB/action" \
  -d '{"kind":"click","ref":"e5"}'

# Fill a form field
curl -X POST "http://localhost:9867/tabs/$TAB/action" \
  -d '{"kind":"fill","ref":"e3","text":"user@example.com"}'

No SDK required. No language-specific bindings. Just HTTP.

Stealth Mode Built-In

PinchTab includes advanced stealth injection to bypass bot detection. Sites that typically block Puppeteer or Playwright? PinchTab handles them. It manages Chrome instances with proper fingerprinting and detection evasion baked in.

Multi-Instance Orchestration

Need to run parallel browser sessions? PinchTab supports multiple isolated Chrome instances, each with its own profile:

# Spin up multiple instances
curl -X POST http://localhost:9867/instances/start \
  -d '{"profileId":"alice","mode":"headless"}'

curl -X POST http://localhost:9867/instances/start \
  -d '{"profileId":"bob","mode":"headless"}'

Each instance maintains its own cookies, history, and local storage. Log in once, stay logged in across restarts.

Installation

One command:

# macOS / Linux
curl -fsSL https://pinchtab.com/install.sh | bash

# Or via npm
npm install -g pinchtab

# Or Docker
docker run -d --name pinchtab -p 127.0.0.1:9867:9867 pinchtab/pinchtab

Then start the server:

pinchtab

That’s it. Your browser service is running at http://localhost:9867.

MCP Integration

For AI agents using the Model Context Protocol, PinchTab includes an SMCP plugin that exposes 15 tools (pinchtab__navigate, pinchtab__snapshot, pinchtab__action, etc.). No extra runtime dependencies — it’s stdlib-only Go.

How It Compares: PinchTab vs Clawdbot vs OpenClaw vs Camofox

If you’re in the AI agent space, you’ve got options for browser control. Here’s how PinchTab stacks up against other tools we’ve tested:

Feature	PinchTab	Clawdbot Browser	OpenClaw Browser	Camofox
Architecture	Standalone HTTP server	Integrated in agent framework	Integrated in agent framework	Standalone HTTP server
Binary Size	12MB (Go)	Part of Clawdbot	Part of OpenClaw	~50MB (Node + deps)
Token Efficiency	⭐⭐⭐ Accessibility tree (~800 tokens)	⭐⭐ Snapshots	⭐⭐ Snapshots	⭐⭐ Snapshots
Stealth/Anti-Detection	⭐⭐ Built-in stealth injection	⭐ Basic	⭐ Basic	⭐⭐⭐ C++ level fingerprint spoofing
Multi-Instance	⭐⭐⭐ Native orchestration	⭐⭐ Via profiles	⭐⭐ Via profiles	⭐⭐ Session-based
MCP Integration	✅ SMCP plugin	✅ Native	✅ Native	❌
Language Agnostic	✅ Pure HTTP	Tied to Clawdbot	Tied to OpenClaw	✅ Pure HTTP
Best For	Standalone automation, token-conscious agents	Integrated agent workflows	Open-source agent workflows	Sites with aggressive bot detection

When to Use What

Use Clawdbot/OpenClaw browser when:

You’re already running Clawdbot or OpenClaw as your agent framework
You want tight integration with your agent’s memory, tools, and sessions
The target site doesn’t have aggressive bot detection

Use PinchTab when:

You need a lightweight, standalone browser service
Token costs matter (the 13x reduction is real)
You’re building in a language without good Playwright/Puppeteer bindings
You want to decouple browser control from your agent framework

Use Camofox when:

You’re hitting sites with serious bot detection (Twitter/X, LinkedIn, Cloudflare)
You need C++-level fingerprint spoofing (not just JS shims)
Stealth is more important than token efficiency

In practice, these tools complement each other. PinchTab for everyday automation, Camofox when sites fight back, and Clawdbot/OpenClaw browser when you want everything integrated.

The Bottom Line

PinchTab is what browser automation for AI agents should have been from the start:

Tiny — 12MB binary, no external dependencies
Universal — HTTP API works with any language
Efficient — 13x token reduction via accessibility tree parsing
Stealthy — Built-in bot detection bypass
Flexible — Headless or headed, single or multi-instance

If you’re building AI agents that need to interact with the web, this removes a major friction point.

GitHub: github.com/pinchtab/pinchtab
Docs: pinchtab.com/docs
License: MIT (100% open source)