Windows-MCP: Giving AI Agents Full Control of Windows
The “computer use” trend in AI has been building for over a year now. Anthropic demoed it, OpenAI shipped Operator, and dozens of startups are racing to let LLMs control desktop environments. But most of these solutions target Mac or Linux, require vision models, or depend on heavy infrastructure.
Windows — the OS running on 72% of desktops worldwide — has been underserved. Until now.
What Is Windows-MCP?
Windows-MCP is an open-source MCP server that bridges LLMs and the Windows operating system. It lets any AI agent perform file navigation, application control, UI interaction, keyboard/mouse simulation, and QA testing — all through the Model Context Protocol.
The project has crossed 2 million users via Claude Desktop Extensions. That’s not a toy. That’s real traction.
Install is a one-liner:
uvx windows-mcp
It’s on PyPI, the MCP Registry, and works with Claude Desktop, Gemini CLI, Codex CLI, Qwen Code, Perplexity Desktop, and Claude Code. Basically every major MCP client.
Why It Matters: No Vision Required
Most computer-use approaches rely on screenshots and vision models — the agent “looks” at the screen and decides where to click. This works, but it’s slow, expensive, and fragile. A button moves 3 pixels and the agent is lost.
Windows-MCP takes a fundamentally different approach: it uses Windows UI Automation APIs to read the actual UI tree. The agent gets structured data about every window, button, text field, and menu item — no screenshots, no CV models, no fine-tuning required.
This means:
- Any LLM works. GPT-4o, Claude, Gemini, Llama, Qwen — if it can call MCP tools, it can control Windows.
- 0.2–0.9 second latency between actions. Fast enough for real workflows.
- DOM mode for browser automation — filters out browser chrome and gives the agent clean web page structure.
It’s lightweight (MIT license, minimal dependencies) and runs on Windows 7 through 11.
The Toolset
Windows-MCP exposes a focused set of tools through MCP:
- State capture — read the current UI tree, active windows, and element properties
- Mouse/keyboard simulation — clicks, typing, key combinations, scrolling
- Application control — launch apps, switch windows, manage focus
- File navigation — browse directories, open files
- DOM mode — browser-specific automation with
use_dom=True, stripping browser UI for cleaner web interaction
The agent doesn’t need to “see” the screen. It reads the UI structure, reasons about it, and acts. This is closer to how accessibility tools work — and it’s far more reliable than pixel-based approaches.
Beyond the Server: Windows-Use
The team also ships windows-use, a standalone agent built on top of Windows-MCP. Think of it as a ready-made computer-use agent for Windows — useful if you want something working out of the box rather than wiring up your own agent loop.
For teams that need VM isolation, windowsmcp.io offers cloud-hosted Windows environments where agents can operate safely without touching your local machine.
Cross-Platform Context
The computer-use landscape is splitting along OS lines:
- macOS has Apple’s Accessibility APIs, and tools like Peekaboo that expose screen state to LLMs. Apple’s own accessibility framework is powerful but locked to their ecosystem.
- Linux sees most computer-use research (Anthropic’s original demo ran on Linux VMs), but the desktop fragmentation across GNOME, KDE, and various display servers makes universal tooling harder.
- Windows was the gap — the most-used desktop OS with the least agent tooling. Windows-MCP fills it cleanly by leveraging the mature UI Automation framework that Microsoft has maintained since Vista.
The pattern is converging: structured UI access beats screenshots. Windows-MCP proves this at scale on the platform that matters most for enterprise adoption.
Getting Started
{
"mcpServers": {
"windows-mcp": {
"command": "uvx",
"args": ["windows-mcp"]
}
}
}
Drop that into your MCP client config. Requires Python 3.13+ and uv. First run takes a minute or two for dependency installation.
The Bigger Picture
Two million users means Windows-MCP isn’t an experiment — it’s infrastructure. AI agents are moving from “call an API and return text” to “operate the computer like a human would.” The MCP ecosystem is making this composable and standardized.
Windows-MCP is the clearest example yet of this shift on the world’s most popular desktop OS. If you’re building agents that need to interact with Windows applications, this is the starting point.
GitHub · PyPI · MIT License