Browser Harness: The Self-Healing Browser Agent That Writes Its Own Tools Mid-Task

By Prahlad Menon 4 min read

Every browser automation framework makes a tradeoff: they give you a clean API in exchange for limiting what you can do. Playwright handles 95% of browser tasks well. The other 5% — uploading files in unusual ways, handling edge-case OAuth flows, interacting with iframes that break standard selectors — hit framework limits and fail.

For human programmers, this is acceptable. You hit a wall, you find a workaround.

For AI agents running autonomously, this is a fundamental problem. The agent has no way to work around framework limits. It just fails.

Browser Harness takes a different approach: remove the framework entirely, and let the agent write what’s missing.

The Architecture

The entire system is ~600 lines:

run.py        (~36 lines)  — runs plain Python with helpers preloaded
helpers.py    (~195 lines) — the starting toolset; the agent edits these
admin.py      (~361 lines) — daemon bootstrap + CDP WebSocket + socket bridge
SKILL.md                   — day-to-day usage instructions
install.md                 — first-time setup
domain-skills/             — agent-generated site-specific knowledge

One WebSocket to Chrome via CDP. No Playwright, no Selenium, no intermediate abstraction. The agent sees the browser state directly and acts on it directly.

Self-Healing in Practice

Here’s what the self-healing looks like in the repo’s own example:

● agent: wants to upload a file

● helpers.py → upload_file() missing

● agent edits the harness and writes it
  helpers.py 192 → 199 lines
  + upload_file()

✓ file uploaded

The agent needs upload_file(). It doesn’t exist. Instead of failing, the agent:

  1. Opens helpers.py
  2. Writes the upload_file() function
  3. Adds it to the file (192 → 199 lines)
  4. Calls it
  5. Continues

The function persists. If the agent needs to upload another file later in the same session, upload_file() is there. If someone contributes the session’s helpers.py additions back to the repo, the fix is available to everyone.

This is the “bitter lesson” applied to browser automation: instead of trying to anticipate every possible task with a comprehensive framework, give the agent the primitives and let it build what it needs.

Setting It Up (Claude Code or Codex)

The install prompt from the README is worth quoting directly:

Set up https://github.com/browser-use/browser-harness for me.

Read `install.md` first to install and connect this repo to my real browser.
Then read `SKILL.md` for normal usage.
Always read `helpers.py` because that is where the functions are.
When you open a setup or verification tab, activate it so I can see the active browser tab.
After it is installed, open this repository in my browser and, if I am logged in
to GitHub, ask me whether you should star it for me as a quick demo that the
interaction works — only click the star if I say yes. If I am not logged in,
just go to browser-use.com.

You paste this into Claude Code or Codex and the agent handles the rest — reading the install docs, enabling Chrome remote debugging, connecting via CDP, and verifying the setup with a live browser interaction.

Domain Skills: Agent-Generated Knowledge

Browser Harness includes a domain-skills/ directory with agent-generated knowledge for specific sites: GitHub, LinkedIn, Amazon, and others. These capture the non-obvious selectors, flows, and edge cases that take trial and error to discover.

The design principle is important: skills are written by the agent, not by you. When your agent figures out how to handle an unusual flow on a site, it files the skill itself. The repo asks contributors to PR these agent-generated files — not hand-authored ones, because hand-authored ones describe what developers think the flows look like, not what actually works.

The community effect compounds: every agent that encounters an unusual LinkedIn login flow and figures it out adds to the domain skill. The next agent starts with that knowledge already baked in.

Why Framework-Free Matters

The browser-use team has a post called “The Bitter Lesson” that explains the design philosophy. The short version: every framework layer encodes assumptions about what browser tasks look like. Those assumptions are wrong for the long tail of tasks. The more powerful the agent, the more the framework gets in the way.

Direct CDP gives the agent:

  • Complete DOM access — not filtered through framework abstractions
  • Arbitrary JavaScript execution — in any frame, on any element
  • Network interception — modify requests and responses in flight
  • Full input simulation — mouse, keyboard, touch, exactly as Chrome sees them
  • No version mismatches — Chrome’s CDP API is stable; framework APIs drift

The failure class “this isn’t supported by the framework” simply doesn’t exist.

Cloud Option

For deployments that need browser instances without managing Chrome yourself, browser-use cloud offers 3 concurrent browsers free, no card required. The agent can sign itself up by reading docs.browser-use.com/llms.txt — a plain-text file designed for LLM consumption that walks through the signup flow.

Resources