PDF Parsing for AI Agents: liteparse vs GLM-OCR vs LlamaParse

By Prahlad Menon 2 min read

PDF parsing sounds like a solved problem. It isn’t β€” and the gap between β€œgood enough for simple PDFs” and β€œreliable for production agent pipelines” is where most builders learn the hard way.

Three tools cover the practical spectrum for AI agent use cases in 2026. Here’s when to reach for each.


The Three Tools

liteparse β€” Local, Fast, Zero Setup

Repo: github.com/run-llama/liteparse
From: LlamaIndex team (run-llama)
Model: Tesseract.js (CPU-only, classical OCR)

npm i -g @llamaindex/liteparse
lit parse document.pdf

That’s it. No API key. No GPU. No cloud. Works on Linux, macOS (Intel/ARM), and Windows.

What it does well:

  • Native-text PDFs (those generated by software β€” Word, LaTeX, most web PDFs): near-perfect extraction
  • Bounding boxes on every text element β€” spatial layout preserved
  • Buffer input: zero disk I/O, pipe PDFs in from memory
  • Batch processing: lit batch-parse ./input ./output
  • Screenshot mode: renders pages as images for downstream VLM processing
  • Pluggable OCR: swap Tesseract for EasyOCR, PaddleOCR, or any custom HTTP server

Where it struggles:

  • Scanned PDFs with no text layer (Tesseract accuracy drops sharply on complex layouts)
  • Dense tables, multi-column academic papers, handwritten annotations
  • Non-English documents (Tesseract needs language packs configured)

Node.js API:

import { LiteParse } from '@llamaindex/liteparse';

const parser = new LiteParse({ ocrEnabled: true });
const result = await parser.parse('document.pdf');
console.log(result.text); // with bounding boxes in result.pages

GLM-OCR 0.9B β€” VLM-Quality, Still Local

Repo: github.com/THUDM/GLM-OCR
From: Tsinghua KEG Lab
Model: 0.9B vision-language model

GLM-OCR is a different category of tool β€” a small vision-language model purpose-built for document understanding. In benchmarks published in March 2026, it outperformed Gemini on document parsing tasks and matched models many times its size.

What it does well:

  • Dense tables: understands cell relationships, not just text extraction
  • Multi-column layouts: tracks reading order semantically
  • Handwritten annotations: handles mixed printed/handwritten content
  • Visual document elements: charts, figures, form fields
  • Scanned PDFs: robust to image quality variation
  • Mathematical notation (arXiv papers): reasonable accuracy

Where it struggles:

  • Requires a GPU for fast inference (CPU is slow at 0.9B params)
  • More setup than liteparse (Python, model download ~1.8GB)
  • Overkill for clean native-text PDFs

When to use it: When layout fidelity and accuracy matter more than speed β€” financial statements, academic papers, legal contracts, forms with structured data.


LlamaParse β€” Production Cloud Parsing

URL: cloud.llamaindex.ai
From: LlamaIndex (same team as liteparse)
Model: Proprietary cloud pipeline

LlamaParse is what you reach for when the document is complex and accuracy is non-negotiable, and you’re willing to trade privacy/cost for reliability.

What it does well:

  • Complex tables across pages
  • Charts and figures with context
  • Mixed document types (scanned + native text)
  • Structured markdown output, ready for LLM consumption
  • Handles edge cases that break local tools
  • Per-page SLA guarantees in production

Where it falls short:

  • Documents leave your machine (not for sensitive data without DPA)
  • Per-page pricing at scale
  • Requires API key and internet access

Decision Framework

Is the PDF native-text (generated by software)?
  └── YES β†’ liteparse (fast, free, local)
  └── NO (scanned / complex layout) β†’
        Is data sensitivity a concern?
          └── YES β†’ GLM-OCR (local VLM, no cloud)
          └── NO β†’ LlamaParse (most accurate, handles edge cases)
        
Are you processing at scale (1000+ docs/day)?
  └── liteparse for native-text (parallelizable, zero cost)
  └── LlamaParse for complex (API rate limits apply)
  └── Self-hosted GLM-OCR for sensitive + complex

Tiered Agent Pipeline

The pattern used in production agent systems combines all three:

PDF arrives
  ↓
liteparse: attempt extraction
  ↓
Confidence check (text length, layout flags)
  ↓
Sufficient? β†’ Use liteparse output
Not sufficient? β†’ GLM-OCR (if on-prem required)
             β†’ LlamaParse (if cloud allowed)
  ↓
Structured output β†’ LLM context window

This keeps costs near-zero for the majority of documents (most PDFs have native text layers) while preserving accuracy for the minority that need it.


One More Thing: liteparse as an Agent Skill

liteparse ships with an official agent skill:

npx skills add run-llama/llamaparse-agent-skills --skill liteparse

The skill uses a SKILL.md format β€” the same spec used by OpenClaw skills. If you’re building agents on OpenClaw, you can drop it straight in. This is the direction the LlamaIndex team is pushing: document parsing as a composable agent capability, not a preprocessing step you bolt on before the real work starts.


Bottom Line

ToolBest ForSetupCostPrivacy
liteparseNative-text PDFs, agent pipelinesnpm iFree100% local
GLM-OCRComplex layouts, scanned docs, on-premPython + GPUFree100% local
LlamaParseProduction complex docs, max accuracyAPI keyPer-pageCloud

For most agent builders: start with liteparse. If your documents have complex layouts or low text-layer quality, reach for GLM-OCR before paying for cloud. Reserve LlamaParse for the cases where accuracy genuinely can’t be compromised and data residency isn’t a constraint.