soul.py v0.2.0: Modulizer β€” 50% Token Savings with Zero Infrastructure

By Prahlad Menon 2 min read

πŸ“š Part of the soul.py ecosystem β€” see the original soul.py post for the full story on persistent AI memory.

The Problem: MEMORY.md Gets Big

soul.py stores memories in plain markdown. It’s human-readable, git-versionable, and works everywhere. But as your agent accumulates knowledge, MEMORY.md grows β€” 10KB, 25KB, 50KB.

Every conversation loads the full file into context. At 25KB, that’s ~6,000 tokens before you even ask a question. Most of it is irrelevant to the current query.

The old approach:

Query: "What tools have I used?"
β†’ Load full MEMORY.md (25KB, ~6000 tokens)
β†’ 94% of tokens wasted on irrelevant context

The Fix: Modulizer

soul.py v0.2.0 introduces Modulizer β€” a zero-deps memory segmentation system inspired by progressive memory patterns.

pip install --upgrade soul-agent
soul modulize MEMORY.md

This creates:

modules/
β”œβ”€β”€ INDEX.md        (1.7KB β€” table of contents)
β”œβ”€β”€ projects.md     (6KB)
β”œβ”€β”€ tools.md        (2KB)
β”œβ”€β”€ procedures.md   (5KB)
β”œβ”€β”€ learnings.md    (2KB)
└── reference.md    (9KB)

The new approach:

Query: "What tools have I used?"
β†’ Read INDEX.md (1.7KB)
β†’ LLM picks: tools.md, projects.md
β†’ Read only those (8KB instead of 25KB)
β†’ 47% token savings

How It Works

Phase 1: Modulization (One-Time)

soul modulize MEMORY.md --output ./modules/

Behind the scenes:

  1. Chunker β€” splits markdown by headers
  2. Classifier β€” LLM categorizes each chunk (projects, tools, people, decisions, etc.)
  3. Splitter β€” groups chunks into modules
  4. Indexer β€” generates INDEX.md with summaries

Phase 2: Two-Phase Retrieval (Every Query)

from soul import Agent

agent = Agent(use_modules=True)  # default
response = agent.ask("What tools have I used?")

stats = agent.get_memory_stats()
# {
#   'mode': 'modules',
#   'modules_read': ['tools.md', 'projects.md'],
#   'total_kb': 8.5,
#   'index_kb': 1.7
# }

The agent:

  1. Reads INDEX.md (always small)
  2. Asks the LLM: β€œWhich modules are relevant to this query?”
  3. Reads only the selected modules
  4. Answers with full context, fewer tokens

CLI Integration

# Modulize your memory
soul modulize MEMORY.md

# Auto-detect large files (>50KB)
soul modulize --auto

# Chat with modules (automatic)
soul chat

# Chat without modules
soul chat --no-modules

# View module stats during chat
> /modules
πŸ“ Modules in ./modules/:
   πŸ“‘ INDEX.md (1.7KB)
   πŸ“„ projects.md (6.0KB)
   πŸ“„ tools.md (2.0KB)
   ...
   Last query used: tools.md, projects.md

Real-World Results

Tested on a 25KB MEMORY.md with 44 sections:

MetricBeforeAfterSavings
Memory read per query25KB13KB avg47%
INDEX sizeN/A1.7KBβ€”
Module count1 file6 filesβ€”
Categoriesβ€”7 uniqueβ€”

For larger memory files (100KB+), expect 60-80% savings.

Zero Infrastructure

Unlike RAG, Modulizer needs:

  • ❌ No vector database (Qdrant, Pinecone, etc.)
  • ❌ No embedding model or API
  • ❌ No background services

Just your existing LLM provider. Works with Anthropic, OpenAI, Gemini, and Ollama.

When to Use Modulizer vs RAG

Use CaseModulizerRAG (v2.0)
Memory size10-100KB100KB+
InfrastructureNoneQdrant + embeddings
Query styleCategory-basedSemantic search
Setup time1 command~30 min
Offline/airgappedβœ… YesDepends

Modulizer is the β€œright tool” for solo agents with moderate memory. It’s the 90% solution with 0% infrastructure.

RAG is better when you need true semantic search across massive knowledge bases.

Backwards Compatible

If you don’t run soul modulize, everything works exactly as before:

  • No modules? β†’ Full MEMORY.md is loaded
  • --no-modules flag? β†’ Full MEMORY.md is loaded
  • Modules exist? β†’ Two-phase retrieval kicks in automatically

Your existing workflows are unaffected.

Try It Now

pip install --upgrade soul-agent

# If you have an existing memory file
soul modulize MEMORY.md

# Start chatting
soul chat

Check /modules during chat to see which modules are being used.


Links:


Modulizer was inspired by the progressive-memory pattern β€” scan an index first, fetch details on demand. We brought this to soul.py’s file-based memory system.