Why do AI agents need virtual desktops?

Not everything has an API. Sometimes you need to click buttons, fill forms, navigate sites that block automation. Give agent a real browser — isolated, secure, controllable via code.

What problems does VDI solve for agents?

Security risk (compromised agent can't access local filesystem), detection (headless browsers get flagged), no human oversight (can't see what agent does in real-time).

Open-source virtual browser platform. Deploy via Docker, connect to AI agents for safe sandboxed web automation. WebRTC streaming for real-time viewing.

Virtual Desktop Infrastructure for the Agentic Era

By Prahlad Menon Published 2026-03-08 3 min read

AI agents are increasingly capable, but they hit a wall when the task requires a GUI. Not everything has an API. Sometimes you need to click a button, fill a form, or navigate a website that actively blocks automation.

The solution? Give your agent a real browser — isolated, secure, and controllable via code.

This guide walks you through deploying n.eko, an open-source virtual browser platform, and connecting it to AI agents for safe, sandboxed web automation.

Why Agents Need Virtual Desktops

Traditional browser automation (Selenium, Playwright, Puppeteer) runs on the same machine as your agent. This creates problems:

Security risk: A compromised agent has access to your local filesystem
Detection: Headless browsers get flagged by anti-bot systems
No human oversight: You can’t see what the agent is doing in real-time
Resource contention: Browser processes compete with your agent’s compute

A virtual desktop solves all of these. The browser runs in a container. Only video streams out. Your agent sends commands in, but cookies, tokens, and sensitive data never leave the sandbox.

What is n.eko?

n.eko is a self-hosted virtual browser that runs in Docker and streams via WebRTC. Key features:

Multiple browsers: Firefox, Chrome, Brave, Edge, Tor Browser
Full desktop environments: XFCE, KDE — run any Linux app
Multi-user control: Multiple people (or agents) can view/control the same session
Built-in audio: Synced audio streaming for video content
GPU acceleration: Smooth rendering with NVIDIA support
API for room management: Programmatically create/destroy sessions

With 746K+ Docker pulls and 17K+ GitHub stars, it’s battle-tested infrastructure.

Step 1: Deploy a VPS

You’ll need a server with a public IP. Recommended specs:

Resolution	Cores	RAM	Experience
1280x720@30	4	3GB	Good
1280x720@30	6	4GB	Recommended
1280x720@30	8+	4GB+	Best

Providers that work well:

Hetzner Cloud — Best price/performance in EU
DigitalOcean — Simple, reliable
Contabo — Budget option with high specs
Vultr — Global presence

Spin up an Ubuntu 22.04+ instance and SSH in.

Step 2: Install Docker

# Install Docker
curl -sSL https://get.docker.com/ | CHANNEL=stable bash

# Install Docker Compose plugin
sudo apt-get update
sudo apt-get install -y docker-compose-plugin

# Verify
docker --version
docker compose version

Step 3: Deploy n.eko (Single Room)

For a quick single-browser setup:

# Create project directory
mkdir ~/neko && cd ~/neko

# Download docker-compose.yaml
wget https://raw.githubusercontent.com/m1k1o/neko/master/docker-compose.yaml

# Start n.eko
sudo docker compose up -d

Visit http://YOUR_SERVER_IP:8080 in your browser. Default password: neko

To customize, edit docker-compose.yaml:

services:
  neko:
    image: "ghcr.io/m1k1o/neko/firefox:latest"
    restart: unless-stopped
    shm_size: "2gb"
    ports:
      - "8080:8080"
      - "52000-52100:52000-52100/udp"
    environment:
      NEKO_SCREEN: 1280x720@30
      NEKO_PASSWORD: your-user-password
      NEKO_PASSWORD_ADMIN: your-admin-password
      NEKO_EPR: 52000-52100
      NEKO_ICELITE: 1

Available browser images:

ghcr.io/m1k1o/neko/firefox:latest
ghcr.io/m1k1o/neko/chromium:latest
ghcr.io/m1k1o/neko/brave:latest
ghcr.io/m1k1o/neko/tor-browser:latest
ghcr.io/m1k1o/neko/google-chrome:latest

Step 4: Deploy n.eko Rooms (Multi-Session)

For agents that need to spawn multiple isolated browser sessions, use neko-rooms:

# Zero-knowledge install with HTTPS (uses Traefik)
wget -O neko-rooms-traefik.sh https://raw.githubusercontent.com/m1k1o/neko-rooms/master/traefik/install
sudo bash neko-rooms-traefik.sh

Follow the prompts. You’ll need:

A domain pointing to your server
Let’s Encrypt will auto-provision SSL

Once running, you get a web UI to create/manage rooms, plus an API for programmatic control.

Step 5: Connect Your AI Agent

Here’s where it gets interesting. n.eko explicitly supports automation:

“You can install playwright or puppeteer and automate tasks while being able to actively intercept them.”

Option A: Playwright Inside the Container

Create a custom Dockerfile that includes Playwright:

FROM ghcr.io/m1k1o/neko/chromium:latest

# Install Node.js and Playwright
RUN apt-get update && apt-get install -y nodejs npm
RUN npm install -g playwright
RUN npx playwright install chromium

# Your agent script
COPY agent.js /app/agent.js

Your agent runs inside the container, controlling the browser directly.

Option B: External Agent via VNC/WebRTC

For agents running outside the container (like our SkyClaw deployment), connect via:

WebSocket API: n.eko exposes control via WebSocket
Screenshot + Click coordinates: Agent views the stream, sends mouse/keyboard events
Custom plugin: n.eko supports plugins for extended functionality

Option C: CDP (Chrome DevTools Protocol)

For Chromium-based images, expose CDP:

environment:
  NEKO_CHROMIUM_ARGS: "--remote-debugging-port=9222"
ports:
  - "9222:9222"

Then connect Playwright externally:

const browser = await chromium.connectOverCDP('http://YOUR_SERVER:9222');
const context = browser.contexts()[0];
const page = context.pages()[0];

// Agent controls the browser
await page.goto('https://example.com');
await page.click('button#submit');

Step 6: Security Hardening

The beauty of this architecture:

Only video leaves the container — no cookies, tokens, or credentials
Agent is sandboxed — even if compromised, it can’t access host
Human oversight — you can watch the agent work in real-time via WebRTC
Kill switch — destroy the container instantly if something goes wrong

Additional hardening:

# Restrict network access
networks:
  neko-net:
    driver: bridge
    internal: true  # No internet access

# Read-only filesystem where possible
read_only: true
tmpfs:
  - /tmp
  - /run

For sensitive tasks, pair with a VPN container:

neko-vpn routes all traffic through a VPN

Real-World Architecture: SkyClaw on Railway

We deployed an agent system called SkyClaw (codename: Ray) on Railway — a Rust-based runtime with Telegram as its interface. The architecture:

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   Telegram      │────▶│   SkyClaw Agent  │────▶│   n.eko Room    │
│   (Interface)   │◀────│   (Railway)      │◀────│   (VPS/Docker)  │
└─────────────────┘     └──────────────────┘     └─────────────────┘
         │                       │                        │
    User sends              Agent processes          Browser executes
    command                 intent, plans            actions visually

When a task requires web interaction:

SkyClaw spins up an n.eko room via API
Connects via CDP or WebSocket
Executes the task (screenshots streamed to user if needed)
Destroys the room when done

Railway handles the agent compute. n.eko handles the browser isolation. Telegram provides the human interface.

Comparing Approaches

Approach	Isolation	Human Oversight	Multi-Agent	Cost
Local Playwright	❌ None	❌ No	⚠️ Complex	Free
Browserless.io	✅ Container	❌ No	✅ Yes	$$
Hyperbeam API	✅ Full	✅ Yes	✅ Yes	$$$
n.eko (self-hosted)	✅ Full	✅ Yes	✅ Yes	$ (VPS cost)

n.eko gives you Hyperbeam-level capability at a fraction of the cost — you just manage the infrastructure yourself.

Troubleshooting

Black screen on connect?

Check WebRTC ports (52000-52100 UDP) are open
Try adding NEKO_ICELITE: 1 for NAT traversal

High latency?

Reduce resolution: NEKO_SCREEN: 1024x576@24
Enable hardware encoding with NVIDIA GPU

Browser crashes?

Increase shared memory: shm_size: "4gb"
Check container logs: docker compose logs -f

What’s Next

The agentic era demands new infrastructure primitives. n.eko is one of them — a way to give AI agents real browser access without compromising security.

For teams building agent systems:

Start with a single n.eko room for development
Graduate to neko-rooms for multi-agent workloads
Consider GPU instances for smooth performance at scale

The browser is the universal client. Now your agents can use it too.

Resources: