How do I add persistent memory to my AI project?

Install soul-agent with pip install soul-agent, run soul init, then use Agent() in your code. Memory persists across sessions using two markdown files — no database needed.

How long does it take to set up soul.py?

About 5 minutes. Install the package, run the init wizard, and you have persistent memory working. Two files (SOUL.md and MEMORY.md), zero infrastructure.

Does soul.py require a database?

No. soul.py stores all memory in plain markdown files (SOUL.md for identity, MEMORY.md for conversation history). You can read and edit them with any text editor.

What's the difference between Agent and HybridAgent in soul.py?

Agent uses pure markdown injection (simple, zero deps). HybridAgent adds RAG+RLM routing for larger memory sets. Both are included in pip install soul-agent.

Can I add soul.py to an existing codebase?

Yes. Just run soul init in your project directory. It creates SOUL.md and MEMORY.md without touching your existing files. The AI will learn about your project through conversations.

How does soul.py remember context across sessions?

Every conversation is appended to MEMORY.md with timestamps. When you start a new session, soul.py reads this file and injects it into the LLM's context.

Add Persistent Memory to Any Project in 5 Minutes

By Prahlad Menon Published 2026-03-02 3 min read

Updated for soul-agent 0.1.2 on PyPI

Version note: pip install soul-agent gives you version 0.1.2, which includes both the simple Agent class (pure markdown) and the HybridAgent class (RAG+RLM routing). The “v2.0” mentioned in demos refers to the HybridAgent architecture, not a separate package.

Your AI forgets everything when you close the terminal.

You spend 10 minutes explaining your project context. The AI gives great help. You close the session. Tomorrow, it’s a stranger again.

soul.py fixes this in two files.

The Fix

from soul import Agent

# First conversation
agent = Agent()
agent.ask("My name is Prahlad and I'm building a Flask app for task management.")
# → "Nice! What's the first feature you're working on?"

# Close Python. Go to lunch. Come back tomorrow.

# New session — memory persists
agent = Agent()
agent.ask("What do you know about my project?")
# → "You're Prahlad, building a Flask task management app."

That’s it. Memory survives across sessions. No database. No server running in the background.

How It Works

soul.py uses two markdown files:

your-project/
├── src/
├── docs/
├── SOUL.md      ← Who the agent is
├── MEMORY.md    ← What it remembers
└── README.md

SOUL.md — The agent’s identity:

# Project Assistant

I am the AI assistant for this Flask project.

## How I Help
- Answer questions about the codebase
- Remember project decisions
- Help debug issues

## Style
- Reference specific files when helpful
- Acknowledge when I'm uncertain

MEMORY.md — Conversation history (grows automatically):

# Memory

## 2026-03-02 10:00
Q: My name is Prahlad and I'm building a Flask app for task management.
A: Nice! What's the first feature you're working on?

## 2026-03-02 10:15
Q: We decided to use SQLite for the database.
A: Good choice for a task app — simple and serverless.

Every agent.ask() call reads these files, calls the LLM, and appends the exchange to MEMORY.md.

Quick Start

pip install soul-agent
soul init
soul chat   # NEW in v0.1.2 — interactive CLI!

Or use the Python API:

soul init

The wizard asks:

What’s your agent’s name?
Which provider? (anthropic / openai / openai-compatible)

Creates SOUL.md and MEMORY.md in your current directory.

Then use it in Python:

from soul import Agent

agent = Agent()
response = agent.ask("How should I structure the routes?")
print(response)

Configuration

Configuration happens in Python, not config files:

# Anthropic (default)
agent = Agent(provider="anthropic")

# OpenAI
agent = Agent(provider="openai")

# Local with Ollama (free, private, no API key)
agent = Agent(
    provider="openai-compatible",
    base_url="http://localhost:11434/v1",
    model="llama3.2",
    api_key="ollama"
)

You can also specify custom file paths:

agent = Agent(
    soul_path="docs/SOUL.md",
    memory_path="docs/MEMORY.md"
)

The Persistence Demo

This is the whole point. Let me show it clearly:

Session 1:

from soul import Agent

agent = Agent()
agent.ask("I'm debugging a CORS issue in the API")
# Agent helps with CORS

agent.ask("Fixed it — needed to allow credentials")
# "Great! I'll remember that for future API questions."

Close Python. New terminal. New day.

Session 2:

from soul import Agent

agent = Agent()
agent.ask("I'm having another API issue")
# "Is this related to the CORS issue you fixed yesterday 
#  with credentials? Or something new?"

The agent remembers because MEMORY.md persisted.

What’s in the Package (soul-agent 0.1.2)

pip install soul-agent gives you two classes:

Class	Import	What it does
`Agent`	`from soul import Agent`	Simple markdown injection — reads SOUL.md + MEMORY.md, injects into prompt
`HybridAgent`	`from hybrid_agent import HybridAgent`	RAG+RLM routing — vector search for large memories, automatic query classification

Start with Agent if your memory is small (<100 entries). Use HybridAgent when memory grows large or you want semantic search.

Package features (0.1.2):

Reads SOUL.md and MEMORY.md from current directory
Injects both into the system prompt
Calls your chosen LLM provider (Anthropic, OpenAI, Ollama)
Appends every exchange to MEMORY.md with timestamp
NEW: soul chat — interactive CLI, no Python code needed
NEW: soul status — check your memory file stats
NEW: ChromaDB local vector search for large memories
NEW: Direct OpenAI embeddings (not just Azure)

What it doesn’t do:

No automatic codebase indexing (you paste relevant code)
No git integration (yet)

The simplicity is intentional. ~300 lines of Python. No infrastructure.

The CLI Experience (v0.1.2)

You don’t need to write Python anymore. Just chat:

soul chat

🧠 soul.py (HybridAgent mode)
   Soul:   SOUL.md
   Memory: MEMORY.md (12 entries)
   Commands: /memory  /reset  /help  exit

You: What did we decide about the database schema?
Assistant: Based on our conversations, you decided to use SQLAlchemy 
with SQLite for development. The Task model has: id, title, description, 
status (enum: todo/doing/done), due_date, and user_id foreign key.
[RAG · 847ms]

You: /memory
📝 MEMORY.md — 12 entries, 4.2KB

You: exit
👋 Memory saved. See you next time.

Check memory stats anytime:

soul status

🧠 soul.py status

✅ SOUL.md     — 15 lines
✅ MEMORY.md   — 47 entries, 8.3KB

When Memory Gets Large: Vector Search

v0.1 injects your entire MEMORY.md into the LLM context. This works great for months of use, but eventually you’ll hit context limits.

v0.1.2 adds ChromaDB — a local vector database that runs on your machine with zero configuration:

pip install soul-agent[chromadb]

from hybrid_agent import HybridAgent

agent = HybridAgent(
    mode="auto",  # Automatically chooses RAG vs RLM per query
)

Now when you ask a question, soul.py:

Routes the query — Is this a specific fact lookup (RAG) or synthesis question (RLM)?
Retrieves relevant memories — Vector search finds the top-k most similar entries
Generates response — Only relevant context goes to the LLM

This scales to thousands of memory entries without hitting token limits.

Vector Database Deep Dive

Under the hood, soul.py supports multiple vector backends:

Backend	Install	Best For
BM25	Built-in	Small memories, offline, zero deps
ChromaDB	`pip install soul-agent[chromadb]`	Local dev, medium memories
Qdrant	Cloud or self-hosted	Production, large scale

How Collections Work

Each agent gets its own collection (like a database table) in the vector store:

agent = HybridAgent(
    collection_name="my_project_memory",  # Your collection name
    # Defaults to "soul_v2_memory" if not specified
)

What goes in a collection:

Each entry from MEMORY.md becomes a vector
Entries are embedded using your configured provider (OpenAI, Azure)
Vectors are stored with the original text as payload

Querying:

Your question: "What database did we choose?"
         ↓
    Embed question → vector [0.12, -0.45, 0.78, ...]
         ↓
    Search collection for similar vectors
         ↓
    Return top-5 most similar memory entries
         ↓
    Inject into LLM prompt as context

Configuring Qdrant (Production)

For production deployments, use Qdrant Cloud:

agent = HybridAgent(
    qdrant_url="https://your-cluster.qdrant.io:6333",
    qdrant_api_key="your-api-key",
    azure_embedding_endpoint="https://your-azure.openai.azure.com",
    azure_embedding_key="your-key",
    collection_name="prod_agent_memory",
)

Or set via environment variables:

export QDRANT_URL=https://your-cluster.qdrant.io:6333
export QDRANT_API_KEY=xxx
export AZURE_EMBEDDING_ENDPOINT=https://xxx.openai.azure.com
export AZURE_EMBEDDING_KEY=xxx

Using OpenAI Embeddings Directly

New in v0.1.2 — you don’t need Azure anymore:

agent = HybridAgent(
    openai_api_key="sk-...",  # Direct OpenAI, not Azure
)

Uses text-embedding-3-small (1536 dimensions) by default.

RAG vs RLM: The Query Router

soul.py v2.0 doesn’t just do RAG. It automatically routes queries:

Query Type	Route	Method
”What’s my name?”	RAG	Vector search, return top matches
”Summarize all our decisions”	RLM	Read ALL memories, synthesize

RAG (Retrieval-Augmented Generation):

Fast (~500ms)
Finds specific facts
Good for: “What did we decide about X?”

RLM (Retrieval + Learning Memory):

Slower (~5-10s)
Processes everything recursively
Good for: “Give me a summary of the whole project”

The router is a lightweight LLM call that classifies your query, then dispatches to the right retrieval strategy.

Try it live: soulv2.themenonlab.com

Team Usage (Convention, Not Feature)

For team projects, a useful pattern:

Share identity, keep separate memories:

# .gitignore
MEMORY.md          # Each dev has their own history

Commit SOUL.md so the team shares agent identity. Gitignore MEMORY.md so each developer has their own conversation history.

Or share everything:

Don’t gitignore MEMORY.md. The team builds shared institutional knowledge:

## Key Decisions
- 2026-02-15: Chose SQLite over PostgreSQL (Sarah)
- 2026-02-20: Moving auth to Clerk (Prahlad)

This isn’t a built-in feature — it’s just how files work. That’s the point.

Why This Matters

Most AI coding assistants are stateless. You explain context, get help, close the tab — gone.

With soul.py:

Context accumulates over time
Decisions are remembered
The assistant knows your project
It’s all in readable files you control

You can open MEMORY.md in any text editor. Edit it. Delete things. Add context manually. It’s just markdown.

Try it now:

pip install soul-agent
soul init

Then start a Python session and talk to your agent.

Repo: github.com/menonpg/soul.py

Live demo: soul.themenonlab.com

v2.0 demo (with RAG): soulv2.themenonlab.com