Mike: The Open-Source Legal AI That Wants to Replace Harvey
Legal AI has a pricing problem. Harvey charges enterprise contracts with per-seat licensing. Legora does the same. Both are closed-source black boxes where you canât inspect how prompts are constructed, how citations are parsed, or where your confidential documents actually go. For an industry built on privilege and confidentiality, thatâs a hard pill to swallow.
Enter Mike â an AGPL-3.0 licensed legal AI platform built by Will Chen, a former BigLaw attorney who apparently got tired of the status quo. It just hit the Hacker News front page, and the discussion is worth reading for the raw takes on legal tech incentives alone.
What Mike Actually Does
Mike is a web application that wraps LLM providers (Claude, Gemini) into legal-specific workflows. You bring your own API keys â no markup on model costs, no per-seat fees. The core feature set maps directly to what Harvey and Legora sell at enterprise prices:
- Document-aware chat â Upload contracts, SPAs, leases, and diligence packs. The assistant maintains context across conversations and documents within a matter-scoped project.
- Tabular extraction â Spreadsheet-style data extraction across hundreds of documents in parallel. Every cell cites back to a specific page and verbatim quote.
- Workflow templates â Save proven prompts as reusable workflows (CP checklists, credit agreement summaries, change-of-control reviews) that junior associates can run with one click.
- Contract drafting and editing â End-to-end within the chat interface.
This isnât a fine-tuned legal LLM. Several HN commenters noted this distinction â Harvey and Legora have invested in RLHF with legal professionals and partnerships with Lexis/Westlaw for research grounding. Mike is a well-structured wrapper. Whether that matters depends on your use case.
Architecture and Self-Hosting
The stack is straightforward: Next.js frontend, Express backend, Supabase for auth and Postgres, S3-compatible object storage (Cloudflare R2 works), and LibreOffice for DOC/DOCX-to-PDF conversion. Setup is npm install, copy .env.example, run a one-shot SQL migration, and start both services.
For a law firmâs IT team, this is refreshingly deployable. No Kubernetes cluster required, no exotic dependencies. The Supabase dependency is the main architectural choice to evaluate â it handles auth and database, which means youâre either self-hosting Supabase or trusting their cloud, which partially undermines the âdocuments never leave your perimeterâ pitch.
The AGPL-3.0 license is a deliberate choice. It means if you modify Mike and offer it as a service, you must release your changes. This protects the project from cloud providers wrapping it without contributing back, while keeping it genuinely free for firms running it internally.
The Citation Problem
This is where Mike gets interesting for actual legal work. The tabular extraction feature promises verifiable citations â every extracted data point links back to a specific page and verbatim quote. No hallucinated answers, no dead links.
This matters enormously. The recent United States v. Heppner ruling established that AI chatbots can break attorney-client privilege when using public services. If youâre self-hosting Mike with your own API keys under an enterprise agreement with Anthropic or Google, the privilege analysis changes significantly. Several HN commenters noted that a firm hosting its own deployment with proper data segregation could likely maintain privilege protections â though no court has specifically ruled on this configuration yet.
The citation-per-cell approach in tabular review is also the right design for legal diligence. When a partner asks âwhere did you get this number?â, pointing to a page and quote is table stakes. Hallucinated citations are career-ending in BigLaw.
Whatâs Promising
The value proposition is clear: zero license cost, full source access, self-hostable. For small and mid-size firms priced out of Harvey, or for in-house legal teams that canât justify enterprise contracts, Mike removes the barrier entirely. You pay only model API costs.
The workflow template system is smart. Legal work is repetitive by nature â the same CP checklist, the same lease abstraction, the same change-of-control review. Codifying these as reusable templates that junior associates can execute is exactly how legal AI should work.
Whatâs Missing
Letâs be honest about the gaps. Mike doesnât have fine-tuned legal models â itâs using general-purpose Claude and Gemini. Harveyâs pitch includes legally-trained RLHF and Westlaw/Lexis integration for case law research. Mike doesnât do legal research at all, as far as I can tell.
Thereâs no mention of local model support (Ollama, vLLM), which would be the real answer to the privilege question. The README lists no supported local backends. For firms genuinely concerned about data leaving their infrastructure, BYOK to Anthropicâs cloud isnât the same as running everything on-premise.
The documentation is thin â the GitHub README is essentially a setup guide with no architecture docs, no API reference, no explanation of how the citation engine works under the hood. For an open-source project asking law firms to trust it with confidential documents, thatâs a gap worth closing.
And the HN discussion raises a valid structural question: BigLaw partners billing $500/hour donât necessarily want efficiency tools that reduce billable hours. Mikeâs natural audience might be the clients, not the firms â or the smaller firms competing on fixed-fee work where efficiency directly improves margins.
Bottom Line
Mike is a solid v1 of what legal AI should look like: open, auditable, self-hostable, with the right feature primitives (document context, citations, workflow templates). Itâs not a Harvey replacement for firms that need legal research or fine-tuned models. But for document review, contract analysis, and structured extraction â which is where most legal AI spend actually goes â itâs a genuine alternative at zero license cost.
The AGPL license, the BYOK model, and the clean architecture suggest Will Chen is building for the long game. Worth watching.
Links: mikeoss.com ¡ GitHub ¡ HN Discussion