soul.py: Persistent Memory for LLM Agents in Python — Complete Guide

By Prahlad Menon 2 min read

soul.py gives LLM agents persistent memory and identity across sessions — using plain markdown files, zero external dependencies, and a provider-agnostic design that works with any LLM.

It’s available as pip install soul-agent and is open source at github.com/menonpg/soul.py.


The Problem: LLM Agents Have No Memory

Every time you start a new session with an LLM agent, it wakes up blank. It doesn’t remember your preferences, your project context, the decisions you made last week, or anything you told it before.

This “goldfish memory” problem is the single biggest practical limitation of LLM agents for real-world use. You end up re-explaining context in every conversation, and the agent can’t build on past work.

soul.py solves this with a simple principle: files are memory.


How soul.py Works

soul.py uses a layered file-based memory architecture:

FilePurpose
SOUL.mdAgent identity — persona, tone, values
MEMORY.mdCurated long-term knowledge (like a human’s long-term memory)
memory/YYYY-MM-DD.mdDaily session logs (raw notes)

At its core (v0.1), soul.py reads these files and injects them into the LLM’s system prompt. No database, no server, no dependencies.

from soul_agent import Soul

soul = Soul()
system_prompt = soul.build_system_prompt()
# Returns: SOUL.md + MEMORY.md content, formatted for injection

The agent reads its own memory files at the start of each session and writes updates back to them as it works. Memory persists indefinitely, survives restarts, and is fully human-readable.


Installation

# Core (zero dependencies)
pip install soul-agent

# With specific providers
pip install soul-agent[anthropic]   # Claude support
pip install soul-agent[openai]      # OpenAI/GPT support

# Initialize memory files in current directory
soul init

This creates:

SOUL.md       ← Edit this to define the agent's identity
MEMORY.md     ← Agent writes curated memories here
memory/       ← Daily session logs go here

Three Versions: Pick Your Complexity

v0.1 — Markdown Injection (Zero Dependencies)

The simplest possible implementation. soul.py reads SOUL.md + MEMORY.md and injects them as the system prompt.

from soul_agent import Soul

soul = Soul()
messages = [
    {"role": "system", "content": soul.build_system_prompt()},
    {"role": "user", "content": "What are we working on today?"}
]

Best for: Personal assistants, simple agents, offline/local use.

Adds semantic retrieval — the agent can search across all its memory files using embeddings, not just load everything.

from soul_agent import HybridAgent

agent = HybridAgent(mode="rag")
result = agent.ask("What did we decide about the database schema last month?")
print(result["answer"])

Best for: Agents with large memory files (100KB+), research assistants, long-running projects.
Requires: Qdrant (local or cloud), Azure/OpenAI embeddings.

v2.0 — RAG + RLM Hybrid

Adds a query router that automatically selects the right retrieval strategy per query:

  • ~90% of queries → RAG (fast, focused retrieval)
  • ~10% of queries → RLM (exhaustive synthesis across all memory)
from soul_agent import HybridAgent

agent = HybridAgent()  # v2.0 default
result = agent.ask("Summarize everything you know about Project X")
print(result["answer"])
print(result["route"])       # "RAG" or "RLM"
print(result["router_ms"])   # routing latency
print(result["retrieval_ms"]) # retrieval latency

The router uses a lightweight classifier trained on query patterns. Synthesis queries (“summarize everything about…”) go to RLM; factual lookups go to RAG.


Real-World Usage: Monica the AI Agent

The clearest proof of soul.py’s effectiveness is Monica — our own AI agent running 24/7 on Railway, controlled via Telegram.

Monica uses soul.py v2.0 as her memory layer. Her MEMORY.md contains 700+ lines of curated knowledge about ongoing projects, preferences, technical decisions, and lessons learned. Her daily notes capture session context.

When Monica starts a new session after a restart or context compaction, she reads her memory files and is immediately back up to speed — same personality, same context, no re-explaining required.


CrewAI

from crewai_soul import SoulMateMemory

memory = SoulMateMemory(soul_path="./SOUL.md")

agent = Agent(
    role="Research Assistant",
    memory=True,
    memory_store=memory
)

LangChain

from langchain_soul import SoulChatMessageHistory

history = SoulChatMessageHistory(session_id="my-session")
chain = ConversationChain(memory=ConversationBufferMemory(chat_memory=history))

LlamaIndex

from llamaindex_soul import SoulChatStore

chat_store = SoulChatStore()
memory = ChatMemoryBuffer.from_defaults(chat_store=chat_store, token_limit=3000)

Why soul.py Instead of MemGPT, LangChain Memory, or mem0?

soul.pyMemGPTLangChain Memorymem0
Zero dependencies✅ (v0.1)
Human-readable memory
Cross-session persistence
Provider-agnostic
No server required
Git-trackable memory

The key differentiator: soul.py memory lives in files you can read, edit, and version-control with git. The agent’s memory is transparent and auditable — not buried in a vector database.


Getting Started in 5 Minutes

# 1. Install
pip install soul-agent

# 2. Initialize memory files
soul init

# 3. Edit SOUL.md to define your agent's identity
# (just a markdown file — write in plain English)

# 4. Use in your agent
from soul_agent import Soul
soul = Soul()
system_prompt = soul.build_system_prompt()

That’s it. Your agent now has persistent memory.


Links: