soul.py: Persistent Memory for LLM Agents in Python — Complete Guide
soul.py gives LLM agents persistent memory and identity across sessions — using plain markdown files, zero external dependencies, and a provider-agnostic design that works with any LLM.
It’s available as pip install soul-agent and is open source at github.com/menonpg/soul.py.
The Problem: LLM Agents Have No Memory
Every time you start a new session with an LLM agent, it wakes up blank. It doesn’t remember your preferences, your project context, the decisions you made last week, or anything you told it before.
This “goldfish memory” problem is the single biggest practical limitation of LLM agents for real-world use. You end up re-explaining context in every conversation, and the agent can’t build on past work.
soul.py solves this with a simple principle: files are memory.
How soul.py Works
soul.py uses a layered file-based memory architecture:
| File | Purpose |
|---|---|
SOUL.md | Agent identity — persona, tone, values |
MEMORY.md | Curated long-term knowledge (like a human’s long-term memory) |
memory/YYYY-MM-DD.md | Daily session logs (raw notes) |
At its core (v0.1), soul.py reads these files and injects them into the LLM’s system prompt. No database, no server, no dependencies.
from soul_agent import Soul
soul = Soul()
system_prompt = soul.build_system_prompt()
# Returns: SOUL.md + MEMORY.md content, formatted for injection
The agent reads its own memory files at the start of each session and writes updates back to them as it works. Memory persists indefinitely, survives restarts, and is fully human-readable.
Installation
# Core (zero dependencies)
pip install soul-agent
# With specific providers
pip install soul-agent[anthropic] # Claude support
pip install soul-agent[openai] # OpenAI/GPT support
# Initialize memory files in current directory
soul init
This creates:
SOUL.md ← Edit this to define the agent's identity
MEMORY.md ← Agent writes curated memories here
memory/ ← Daily session logs go here
Three Versions: Pick Your Complexity
v0.1 — Markdown Injection (Zero Dependencies)
The simplest possible implementation. soul.py reads SOUL.md + MEMORY.md and injects them as the system prompt.
from soul_agent import Soul
soul = Soul()
messages = [
{"role": "system", "content": soul.build_system_prompt()},
{"role": "user", "content": "What are we working on today?"}
]
Best for: Personal assistants, simple agents, offline/local use.
v1.0 — RAG with Vector Search
Adds semantic retrieval — the agent can search across all its memory files using embeddings, not just load everything.
from soul_agent import HybridAgent
agent = HybridAgent(mode="rag")
result = agent.ask("What did we decide about the database schema last month?")
print(result["answer"])
Best for: Agents with large memory files (100KB+), research assistants, long-running projects.
Requires: Qdrant (local or cloud), Azure/OpenAI embeddings.
v2.0 — RAG + RLM Hybrid
Adds a query router that automatically selects the right retrieval strategy per query:
- ~90% of queries → RAG (fast, focused retrieval)
- ~10% of queries → RLM (exhaustive synthesis across all memory)
from soul_agent import HybridAgent
agent = HybridAgent() # v2.0 default
result = agent.ask("Summarize everything you know about Project X")
print(result["answer"])
print(result["route"]) # "RAG" or "RLM"
print(result["router_ms"]) # routing latency
print(result["retrieval_ms"]) # retrieval latency
The router uses a lightweight classifier trained on query patterns. Synthesis queries (“summarize everything about…”) go to RLM; factual lookups go to RAG.
Real-World Usage: Monica the AI Agent
The clearest proof of soul.py’s effectiveness is Monica — our own AI agent running 24/7 on Railway, controlled via Telegram.
Monica uses soul.py v2.0 as her memory layer. Her MEMORY.md contains 700+ lines of curated knowledge about ongoing projects, preferences, technical decisions, and lessons learned. Her daily notes capture session context.
When Monica starts a new session after a restart or context compaction, she reads her memory files and is immediately back up to speed — same personality, same context, no re-explaining required.
Integration with Popular Frameworks
CrewAI
from crewai_soul import SoulMateMemory
memory = SoulMateMemory(soul_path="./SOUL.md")
agent = Agent(
role="Research Assistant",
memory=True,
memory_store=memory
)
LangChain
from langchain_soul import SoulChatMessageHistory
history = SoulChatMessageHistory(session_id="my-session")
chain = ConversationChain(memory=ConversationBufferMemory(chat_memory=history))
LlamaIndex
from llamaindex_soul import SoulChatStore
chat_store = SoulChatStore()
memory = ChatMemoryBuffer.from_defaults(chat_store=chat_store, token_limit=3000)
Why soul.py Instead of MemGPT, LangChain Memory, or mem0?
| soul.py | MemGPT | LangChain Memory | mem0 | |
|---|---|---|---|---|
| Zero dependencies | ✅ (v0.1) | ❌ | ❌ | ❌ |
| Human-readable memory | ✅ | ❌ | ❌ | ❌ |
| Cross-session persistence | ✅ | ✅ | ❌ | ✅ |
| Provider-agnostic | ✅ | ✅ | ✅ | ✅ |
| No server required | ✅ | ❌ | ✅ | ❌ |
| Git-trackable memory | ✅ | ❌ | ❌ | ❌ |
The key differentiator: soul.py memory lives in files you can read, edit, and version-control with git. The agent’s memory is transparent and auditable — not buried in a vector database.
Getting Started in 5 Minutes
# 1. Install
pip install soul-agent
# 2. Initialize memory files
soul init
# 3. Edit SOUL.md to define your agent's identity
# (just a markdown file — write in plain English)
# 4. Use in your agent
from soul_agent import Soul
soul = Soul()
system_prompt = soul.build_system_prompt()
That’s it. Your agent now has persistent memory.
Links:
- PyPI: pypi.org/project/soul-agent
- GitHub: github.com/menonpg/soul.py
- Docs: soul.themenonlab.com