soul.py is an open-source Python library that gives LLM agents persistent memory and identity across sessions. It uses a markdown-native, provider-agnostic architecture — no database required. Install with: pip install soul-agent

How do I install soul.py?

Install with pip: `pip install soul-agent`. For provider-specific extras: `pip install soul-agent[anthropic]` or `pip install soul-agent[openai]`. After installing, run `soul init` to create SOUL.md and MEMORY.md files.

Does soul.py work with any LLM provider?

Yes. soul.py is provider-agnostic by design — it works with OpenAI, Anthropic Claude, Google Gemini, Ollama (local), and any LLM that accepts a system prompt. The core library has zero dependencies.

What is the difference between soul.py v0.1, v1.0, and v2.0?

v0.1 injects SOUL.md and MEMORY.md as plain markdown — zero dependencies. v1.0 adds RAG with vector search via Qdrant. v2.0 adds a hybrid RAG+RLM (Retrieval Language Model) architecture with an automatic query router that sends ~90% of queries to fast RAG and ~10% to exhaustive RLM synthesis.

How does soul.py store memory?

soul.py uses a layered memory architecture: SOUL.md defines the agent's identity and persona, MEMORY.md stores curated long-term knowledge, and daily notes in memory/YYYY-MM-DD.md capture session logs. The v2.0 RAG layer indexes these into a Qdrant vector store for semantic search.

What is soul.py used for?

soul.py is used to give AI agents persistent context across sessions — preventing the 'goldfish memory' problem where agents forget everything between conversations. Common use cases: personal AI assistants, coding agents, autonomous workflows, and multi-session research agents.

Is soul.py free and open source?

Yes. soul.py is MIT licensed and fully open source at github.com/menonpg/soul.py. The PyPI package `soul-agent` is free. A managed cloud version (SoulMate API) is available for teams.

How is soul.py different from other LLM memory libraries like MemGPT or LangChain memory?

soul.py is markdown-native and zero-dependency at its core — no vector DB or server required for the basic version. Unlike MemGPT which uses a complex OS-like memory hierarchy, soul.py uses simple human-readable files that you can edit directly. Unlike LangChain memory which is session-scoped, soul.py memory persists indefinitely across restarts.

soul.py: Persistent Memory for LLM Agents in Python — Complete Guide

By Prahlad Menon Published 2026-03-12 2 min read

soul.py gives LLM agents persistent memory and identity across sessions — using plain markdown files, zero external dependencies, and a provider-agnostic design that works with any LLM.

It’s available as pip install soul-agent and is open source at github.com/menonpg/soul.py.

The Problem: LLM Agents Have No Memory

Every time you start a new session with an LLM agent, it wakes up blank. It doesn’t remember your preferences, your project context, the decisions you made last week, or anything you told it before.

This “goldfish memory” problem is the single biggest practical limitation of LLM agents for real-world use. You end up re-explaining context in every conversation, and the agent can’t build on past work.

soul.py solves this with a simple principle: files are memory.

How soul.py Works

soul.py uses a layered file-based memory architecture:

File	Purpose
`SOUL.md`	Agent identity — persona, tone, values
`MEMORY.md`	Curated long-term knowledge (like a human’s long-term memory)
`memory/YYYY-MM-DD.md`	Daily session logs (raw notes)

At its core (v0.1), soul.py reads these files and injects them into the LLM’s system prompt. No database, no server, no dependencies.

from soul_agent import Soul

soul = Soul()
system_prompt = soul.build_system_prompt()
# Returns: SOUL.md + MEMORY.md content, formatted for injection

The agent reads its own memory files at the start of each session and writes updates back to them as it works. Memory persists indefinitely, survives restarts, and is fully human-readable.

Installation

# Core (zero dependencies)
pip install soul-agent

# With specific providers
pip install soul-agent[anthropic]   # Claude support
pip install soul-agent[openai]      # OpenAI/GPT support

# Initialize memory files in current directory
soul init

This creates:

SOUL.md       ← Edit this to define the agent's identity
MEMORY.md     ← Agent writes curated memories here
memory/       ← Daily session logs go here

Three Versions: Pick Your Complexity

v0.1 — Markdown Injection (Zero Dependencies)

The simplest possible implementation. soul.py reads SOUL.md + MEMORY.md and injects them as the system prompt.

from soul_agent import Soul

soul = Soul()
messages = [
    {"role": "system", "content": soul.build_system_prompt()},
    {"role": "user", "content": "What are we working on today?"}
]

Best for: Personal assistants, simple agents, offline/local use.

v1.0 — RAG with Vector Search

Adds semantic retrieval — the agent can search across all its memory files using embeddings, not just load everything.

from soul_agent import HybridAgent

agent = HybridAgent(mode="rag")
result = agent.ask("What did we decide about the database schema last month?")
print(result["answer"])

Best for: Agents with large memory files (100KB+), research assistants, long-running projects.
Requires: Qdrant (local or cloud), Azure/OpenAI embeddings.

v2.0 — RAG + RLM Hybrid

Adds a query router that automatically selects the right retrieval strategy per query:

~90% of queries → RAG (fast, focused retrieval)
~10% of queries → RLM (exhaustive synthesis across all memory)

from soul_agent import HybridAgent

agent = HybridAgent()  # v2.0 default
result = agent.ask("Summarize everything you know about Project X")
print(result["answer"])
print(result["route"])       # "RAG" or "RLM"
print(result["router_ms"])   # routing latency
print(result["retrieval_ms"]) # retrieval latency

The router uses a lightweight classifier trained on query patterns. Synthesis queries (“summarize everything about…”) go to RLM; factual lookups go to RAG.

Real-World Usage: Monica the AI Agent

The clearest proof of soul.py’s effectiveness is Monica — our own AI agent running 24/7 on Railway, controlled via Telegram.

Monica uses soul.py v2.0 as her memory layer. Her MEMORY.md contains 700+ lines of curated knowledge about ongoing projects, preferences, technical decisions, and lessons learned. Her daily notes capture session context.

When Monica starts a new session after a restart or context compaction, she reads her memory files and is immediately back up to speed — same personality, same context, no re-explaining required.

Integration with Popular Frameworks

CrewAI

from crewai_soul import SoulMateMemory

memory = SoulMateMemory(soul_path="./SOUL.md")

agent = Agent(
    role="Research Assistant",
    memory=True,
    memory_store=memory
)

LangChain

from langchain_soul import SoulChatMessageHistory

history = SoulChatMessageHistory(session_id="my-session")
chain = ConversationChain(memory=ConversationBufferMemory(chat_memory=history))

LlamaIndex

from llamaindex_soul import SoulChatStore

chat_store = SoulChatStore()
memory = ChatMemoryBuffer.from_defaults(chat_store=chat_store, token_limit=3000)

Why soul.py Instead of MemGPT, LangChain Memory, or mem0?

	soul.py	MemGPT	LangChain Memory	mem0
Zero dependencies	✅ (v0.1)	❌	❌	❌
Human-readable memory	✅	❌	❌	❌
Cross-session persistence	✅	✅	❌	✅
Provider-agnostic	✅	✅	✅	✅
No server required	✅	❌	✅	❌
Git-trackable memory	✅	❌	❌	❌

The key differentiator: soul.py memory lives in files you can read, edit, and version-control with git. The agent’s memory is transparent and auditable — not buried in a vector database.

Getting Started in 5 Minutes

# 1. Install
pip install soul-agent

# 2. Initialize memory files
soul init

# 3. Edit SOUL.md to define your agent's identity
# (just a markdown file — write in plain English)

# 4. Use in your agent
from soul_agent import Soul
soul = Soul()
system_prompt = soul.build_system_prompt()

That’s it. Your agent now has persistent memory.

Links:

PyPI: pypi.org/project/soul-agent
GitHub: github.com/menonpg/soul.py
Docs: soul.themenonlab.com