Virtual Desktop Infrastructure for the Agentic Era

By Prahlad Menon 3 min read

AI agents are increasingly capable, but they hit a wall when the task requires a GUI. Not everything has an API. Sometimes you need to click a button, fill a form, or navigate a website that actively blocks automation.

The solution? Give your agent a real browser — isolated, secure, and controllable via code.

This guide walks you through deploying n.eko, an open-source virtual browser platform, and connecting it to AI agents for safe, sandboxed web automation.

Why Agents Need Virtual Desktops

Traditional browser automation (Selenium, Playwright, Puppeteer) runs on the same machine as your agent. This creates problems:

  • Security risk: A compromised agent has access to your local filesystem
  • Detection: Headless browsers get flagged by anti-bot systems
  • No human oversight: You can’t see what the agent is doing in real-time
  • Resource contention: Browser processes compete with your agent’s compute

A virtual desktop solves all of these. The browser runs in a container. Only video streams out. Your agent sends commands in, but cookies, tokens, and sensitive data never leave the sandbox.

What is n.eko?

n.eko is a self-hosted virtual browser that runs in Docker and streams via WebRTC. Key features:

  • Multiple browsers: Firefox, Chrome, Brave, Edge, Tor Browser
  • Full desktop environments: XFCE, KDE — run any Linux app
  • Multi-user control: Multiple people (or agents) can view/control the same session
  • Built-in audio: Synced audio streaming for video content
  • GPU acceleration: Smooth rendering with NVIDIA support
  • API for room management: Programmatically create/destroy sessions

With 746K+ Docker pulls and 17K+ GitHub stars, it’s battle-tested infrastructure.

Step 1: Deploy a VPS

You’ll need a server with a public IP. Recommended specs:

ResolutionCoresRAMExperience
1280x720@3043GBGood
1280x720@3064GBRecommended
1280x720@308+4GB+Best

Providers that work well:

Spin up an Ubuntu 22.04+ instance and SSH in.

Step 2: Install Docker

# Install Docker
curl -sSL https://get.docker.com/ | CHANNEL=stable bash

# Install Docker Compose plugin
sudo apt-get update
sudo apt-get install -y docker-compose-plugin

# Verify
docker --version
docker compose version

Step 3: Deploy n.eko (Single Room)

For a quick single-browser setup:

# Create project directory
mkdir ~/neko && cd ~/neko

# Download docker-compose.yaml
wget https://raw.githubusercontent.com/m1k1o/neko/master/docker-compose.yaml

# Start n.eko
sudo docker compose up -d

Visit http://YOUR_SERVER_IP:8080 in your browser. Default password: neko

To customize, edit docker-compose.yaml:

services:
  neko:
    image: "ghcr.io/m1k1o/neko/firefox:latest"
    restart: unless-stopped
    shm_size: "2gb"
    ports:
      - "8080:8080"
      - "52000-52100:52000-52100/udp"
    environment:
      NEKO_SCREEN: 1280x720@30
      NEKO_PASSWORD: your-user-password
      NEKO_PASSWORD_ADMIN: your-admin-password
      NEKO_EPR: 52000-52100
      NEKO_ICELITE: 1

Available browser images:

  • ghcr.io/m1k1o/neko/firefox:latest
  • ghcr.io/m1k1o/neko/chromium:latest
  • ghcr.io/m1k1o/neko/brave:latest
  • ghcr.io/m1k1o/neko/tor-browser:latest
  • ghcr.io/m1k1o/neko/google-chrome:latest

Step 4: Deploy n.eko Rooms (Multi-Session)

For agents that need to spawn multiple isolated browser sessions, use neko-rooms:

# Zero-knowledge install with HTTPS (uses Traefik)
wget -O neko-rooms-traefik.sh https://raw.githubusercontent.com/m1k1o/neko-rooms/master/traefik/install
sudo bash neko-rooms-traefik.sh

Follow the prompts. You’ll need:

  • A domain pointing to your server
  • Let’s Encrypt will auto-provision SSL

Once running, you get a web UI to create/manage rooms, plus an API for programmatic control.

Step 5: Connect Your AI Agent

Here’s where it gets interesting. n.eko explicitly supports automation:

“You can install playwright or puppeteer and automate tasks while being able to actively intercept them.”

Option A: Playwright Inside the Container

Create a custom Dockerfile that includes Playwright:

FROM ghcr.io/m1k1o/neko/chromium:latest

# Install Node.js and Playwright
RUN apt-get update && apt-get install -y nodejs npm
RUN npm install -g playwright
RUN npx playwright install chromium

# Your agent script
COPY agent.js /app/agent.js

Your agent runs inside the container, controlling the browser directly.

Option B: External Agent via VNC/WebRTC

For agents running outside the container (like our SkyClaw deployment), connect via:

  1. WebSocket API: n.eko exposes control via WebSocket
  2. Screenshot + Click coordinates: Agent views the stream, sends mouse/keyboard events
  3. Custom plugin: n.eko supports plugins for extended functionality

Option C: CDP (Chrome DevTools Protocol)

For Chromium-based images, expose CDP:

environment:
  NEKO_CHROMIUM_ARGS: "--remote-debugging-port=9222"
ports:
  - "9222:9222"

Then connect Playwright externally:

const browser = await chromium.connectOverCDP('http://YOUR_SERVER:9222');
const context = browser.contexts()[0];
const page = context.pages()[0];

// Agent controls the browser
await page.goto('https://example.com');
await page.click('button#submit');

Step 6: Security Hardening

The beauty of this architecture:

  • Only video leaves the container — no cookies, tokens, or credentials
  • Agent is sandboxed — even if compromised, it can’t access host
  • Human oversight — you can watch the agent work in real-time via WebRTC
  • Kill switch — destroy the container instantly if something goes wrong

Additional hardening:

# Restrict network access
networks:
  neko-net:
    driver: bridge
    internal: true  # No internet access

# Read-only filesystem where possible
read_only: true
tmpfs:
  - /tmp
  - /run

For sensitive tasks, pair with a VPN container:

  • neko-vpn routes all traffic through a VPN

Real-World Architecture: SkyClaw on Railway

We deployed an agent system called SkyClaw (codename: Ray) on Railway — a Rust-based runtime with Telegram as its interface. The architecture:

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   Telegram      │────▶│   SkyClaw Agent  │────▶│   n.eko Room    │
│   (Interface)   │◀────│   (Railway)      │◀────│   (VPS/Docker)  │
└─────────────────┘     └──────────────────┘     └─────────────────┘
         │                       │                        │
    User sends              Agent processes          Browser executes
    command                 intent, plans            actions visually

When a task requires web interaction:

  1. SkyClaw spins up an n.eko room via API
  2. Connects via CDP or WebSocket
  3. Executes the task (screenshots streamed to user if needed)
  4. Destroys the room when done

Railway handles the agent compute. n.eko handles the browser isolation. Telegram provides the human interface.

Comparing Approaches

ApproachIsolationHuman OversightMulti-AgentCost
Local Playwright❌ None❌ No⚠️ ComplexFree
Browserless.io✅ Container❌ No✅ Yes$$
Hyperbeam API✅ Full✅ Yes✅ Yes$$$
n.eko (self-hosted)✅ Full✅ Yes✅ Yes$ (VPS cost)

n.eko gives you Hyperbeam-level capability at a fraction of the cost — you just manage the infrastructure yourself.

Troubleshooting

Black screen on connect?

  • Check WebRTC ports (52000-52100 UDP) are open
  • Try adding NEKO_ICELITE: 1 for NAT traversal

High latency?

  • Reduce resolution: NEKO_SCREEN: 1024x576@24
  • Enable hardware encoding with NVIDIA GPU

Browser crashes?

  • Increase shared memory: shm_size: "4gb"
  • Check container logs: docker compose logs -f

What’s Next

The agentic era demands new infrastructure primitives. n.eko is one of them — a way to give AI agents real browser access without compromising security.

For teams building agent systems:

  1. Start with a single n.eko room for development
  2. Graduate to neko-rooms for multi-agent workloads
  3. Consider GPU instances for smooth performance at scale

The browser is the universal client. Now your agents can use it too.


Resources: