What is the soul.py Modulizer?

The Modulizer is a new feature in soul.py v0.2.0 that automatically splits large MEMORY.md files into indexed modules, reducing token usage by 40-60% during conversations.

How do I use soul modulize?

Run 'soul modulize MEMORY.md' to split your memory file into categorized modules. The command creates a modules/ directory with INDEX.md and category-based files like projects.md, tools.md, etc.

Does Modulizer require a vector database?

No. Modulizer is zero-deps — it uses LLM-based categorization and two-phase retrieval without any vector database or embeddings infrastructure.

How does two-phase retrieval work?

When you ask a question, the agent first reads INDEX.md (small, ~2KB), then uses the LLM to pick relevant modules, and finally reads only those modules instead of the full memory.

What token savings can I expect?

On a 25KB MEMORY.md file, Modulizer achieved 47% token savings in testing. Savings scale with memory size — larger files see even better improvements.

Can I opt out of Modulizer?

Yes. Use 'soul chat --no-modules' to disable modular retrieval, or simply don't run 'soul modulize'. The system falls back to reading full MEMORY.md.

What categories does Modulizer create?

Default categories include projects, tools, people, decisions, learnings, procedures, ideas, and reference. The LLM classifies each section automatically.

How is this different from RAG?

RAG uses vector embeddings for semantic search. Modulizer uses LLM-based categorization — no embeddings, no vector DB, but still achieves significant token savings through intelligent module selection.

soul.py v0.2.0: Modulizer — 50% Token Savings with Zero Infrastructure

By Prahlad Menon Published 2026-03-15 2 min read

📚 Part of the soul.py ecosystem — see the original soul.py post for the full story on persistent AI memory.

The Problem: MEMORY.md Gets Big

soul.py stores memories in plain markdown. It’s human-readable, git-versionable, and works everywhere. But as your agent accumulates knowledge, MEMORY.md grows — 10KB, 25KB, 50KB.

Every conversation loads the full file into context. At 25KB, that’s ~6,000 tokens before you even ask a question. Most of it is irrelevant to the current query.

The old approach:

Query: "What tools have I used?"
→ Load full MEMORY.md (25KB, ~6000 tokens)
→ 94% of tokens wasted on irrelevant context

The Fix: Modulizer

soul.py v0.2.0 introduces Modulizer — a zero-deps memory segmentation system inspired by progressive memory patterns.

pip install --upgrade soul-agent
soul modulize MEMORY.md

This creates:

modules/
├── INDEX.md        (1.7KB — table of contents)
├── projects.md     (6KB)
├── tools.md        (2KB)
├── procedures.md   (5KB)
├── learnings.md    (2KB)
└── reference.md    (9KB)

The new approach:

Query: "What tools have I used?"
→ Read INDEX.md (1.7KB)
→ LLM picks: tools.md, projects.md
→ Read only those (8KB instead of 25KB)
→ 47% token savings

How It Works

Phase 1: Modulization (One-Time)

soul modulize MEMORY.md --output ./modules/

Behind the scenes:

Chunker — splits markdown by headers
Classifier — LLM categorizes each chunk (projects, tools, people, decisions, etc.)
Splitter — groups chunks into modules
Indexer — generates INDEX.md with summaries

Phase 2: Two-Phase Retrieval (Every Query)

from soul import Agent

agent = Agent(use_modules=True)  # default
response = agent.ask("What tools have I used?")

stats = agent.get_memory_stats()
# {
#   'mode': 'modules',
#   'modules_read': ['tools.md', 'projects.md'],
#   'total_kb': 8.5,
#   'index_kb': 1.7
# }

The agent:

Reads INDEX.md (always small)
Asks the LLM: “Which modules are relevant to this query?”
Reads only the selected modules
Answers with full context, fewer tokens

CLI Integration

# Modulize your memory
soul modulize MEMORY.md

# Auto-detect large files (>50KB)
soul modulize --auto

# Chat with modules (automatic)
soul chat

# Chat without modules
soul chat --no-modules

# View module stats during chat
> /modules
📁 Modules in ./modules/:
   📑 INDEX.md (1.7KB)
   📄 projects.md (6.0KB)
   📄 tools.md (2.0KB)
   ...
   Last query used: tools.md, projects.md

Real-World Results

Tested on a 25KB MEMORY.md with 44 sections:

Metric	Before	After	Savings
Memory read per query	25KB	13KB avg	47%
INDEX size	N/A	1.7KB	—
Module count	1 file	6 files	—
Categories	—	7 unique	—

For larger memory files (100KB+), expect 60-80% savings.

Zero Infrastructure

Unlike RAG, Modulizer needs:

❌ No vector database (Qdrant, Pinecone, etc.)
❌ No embedding model or API
❌ No background services

Just your existing LLM provider. Works with Anthropic, OpenAI, Gemini, and Ollama.

When to Use Modulizer vs RAG

Use Case	Modulizer	RAG (v2.0)
Memory size	10-100KB	100KB+
Infrastructure	None	Qdrant + embeddings
Query style	Category-based	Semantic search
Setup time	1 command	~30 min
Offline/airgapped	✅ Yes	Depends

Modulizer is the “right tool” for solo agents with moderate memory. It’s the 90% solution with 0% infrastructure.

RAG is better when you need true semantic search across massive knowledge bases.

Backwards Compatible

If you don’t run soul modulize, everything works exactly as before:

No modules? → Full MEMORY.md is loaded
--no-modules flag? → Full MEMORY.md is loaded
Modules exist? → Two-phase retrieval kicks in automatically

Your existing workflows are unaffected.

Try It Now

pip install --upgrade soul-agent

# If you have an existing memory file
soul modulize MEMORY.md

# Start chatting
soul chat

Check /modules during chat to see which modules are being used.

Links:

Modulizer was inspired by the progressive-memory pattern — scan an index first, fetch details on demand. We brought this to soul.py’s file-based memory system.