Add Persistent Memory to Any Project in 5 Minutes

By Prahlad Menon 3 min read

Updated for soul-agent 0.1.2 on PyPI

Version note: pip install soul-agent gives you version 0.1.2, which includes both the simple Agent class (pure markdown) and the HybridAgent class (RAG+RLM routing). The “v2.0” mentioned in demos refers to the HybridAgent architecture, not a separate package.

Your AI forgets everything when you close the terminal.

You spend 10 minutes explaining your project context. The AI gives great help. You close the session. Tomorrow, it’s a stranger again.

soul.py fixes this in two files.

The Fix

from soul import Agent

# First conversation
agent = Agent()
agent.ask("My name is Prahlad and I'm building a Flask app for task management.")
# → "Nice! What's the first feature you're working on?"

# Close Python. Go to lunch. Come back tomorrow.

# New session — memory persists
agent = Agent()
agent.ask("What do you know about my project?")
# → "You're Prahlad, building a Flask task management app."

That’s it. Memory survives across sessions. No database. No server running in the background.

How It Works

soul.py uses two markdown files:

your-project/
├── src/
├── docs/
├── SOUL.md      ← Who the agent is
├── MEMORY.md    ← What it remembers
└── README.md

SOUL.md — The agent’s identity:

# Project Assistant

I am the AI assistant for this Flask project.

## How I Help
- Answer questions about the codebase
- Remember project decisions
- Help debug issues

## Style
- Reference specific files when helpful
- Acknowledge when I'm uncertain

MEMORY.md — Conversation history (grows automatically):

# Memory

## 2026-03-02 10:00
Q: My name is Prahlad and I'm building a Flask app for task management.
A: Nice! What's the first feature you're working on?

## 2026-03-02 10:15
Q: We decided to use SQLite for the database.
A: Good choice for a task app — simple and serverless.

Every agent.ask() call reads these files, calls the LLM, and appends the exchange to MEMORY.md.

Quick Start

pip install soul-agent
soul init
soul chat   # NEW in v0.1.2 — interactive CLI!

Or use the Python API:

soul init

The wizard asks:

  • What’s your agent’s name?
  • Which provider? (anthropic / openai / openai-compatible)

Creates SOUL.md and MEMORY.md in your current directory.

Then use it in Python:

from soul import Agent

agent = Agent()
response = agent.ask("How should I structure the routes?")
print(response)

Configuration

Configuration happens in Python, not config files:

# Anthropic (default)
agent = Agent(provider="anthropic")

# OpenAI
agent = Agent(provider="openai")

# Local with Ollama (free, private, no API key)
agent = Agent(
    provider="openai-compatible",
    base_url="http://localhost:11434/v1",
    model="llama3.2",
    api_key="ollama"
)

You can also specify custom file paths:

agent = Agent(
    soul_path="docs/SOUL.md",
    memory_path="docs/MEMORY.md"
)

The Persistence Demo

This is the whole point. Let me show it clearly:

Session 1:

from soul import Agent

agent = Agent()
agent.ask("I'm debugging a CORS issue in the API")
# Agent helps with CORS

agent.ask("Fixed it — needed to allow credentials")
# "Great! I'll remember that for future API questions."

Close Python. New terminal. New day.

Session 2:

from soul import Agent

agent = Agent()
agent.ask("I'm having another API issue")
# "Is this related to the CORS issue you fixed yesterday 
#  with credentials? Or something new?"

The agent remembers because MEMORY.md persisted.

What’s in the Package (soul-agent 0.1.2)

pip install soul-agent gives you two classes:

ClassImportWhat it does
Agentfrom soul import AgentSimple markdown injection — reads SOUL.md + MEMORY.md, injects into prompt
HybridAgentfrom hybrid_agent import HybridAgentRAG+RLM routing — vector search for large memories, automatic query classification

Start with Agent if your memory is small (<100 entries). Use HybridAgent when memory grows large or you want semantic search.

Package features (0.1.2):

  • Reads SOUL.md and MEMORY.md from current directory
  • Injects both into the system prompt
  • Calls your chosen LLM provider (Anthropic, OpenAI, Ollama)
  • Appends every exchange to MEMORY.md with timestamp
  • NEW: soul chat — interactive CLI, no Python code needed
  • NEW: soul status — check your memory file stats
  • NEW: ChromaDB local vector search for large memories
  • NEW: Direct OpenAI embeddings (not just Azure)

What it doesn’t do:

  • No automatic codebase indexing (you paste relevant code)
  • No git integration (yet)

The simplicity is intentional. ~300 lines of Python. No infrastructure.

The CLI Experience (v0.1.2)

You don’t need to write Python anymore. Just chat:

soul chat
🧠 soul.py (HybridAgent mode)
   Soul:   SOUL.md
   Memory: MEMORY.md (12 entries)
   Commands: /memory  /reset  /help  exit

You: What did we decide about the database schema?
Assistant: Based on our conversations, you decided to use SQLAlchemy 
with SQLite for development. The Task model has: id, title, description, 
status (enum: todo/doing/done), due_date, and user_id foreign key.
[RAG · 847ms]

You: /memory
📝 MEMORY.md — 12 entries, 4.2KB

You: exit
👋 Memory saved. See you next time.

Check memory stats anytime:

soul status
🧠 soul.py status

✅ SOUL.md     — 15 lines
✅ MEMORY.md   — 47 entries, 8.3KB

v0.1 injects your entire MEMORY.md into the LLM context. This works great for months of use, but eventually you’ll hit context limits.

v0.1.2 adds ChromaDB — a local vector database that runs on your machine with zero configuration:

pip install soul-agent[chromadb]
from hybrid_agent import HybridAgent

agent = HybridAgent(
    mode="auto",  # Automatically chooses RAG vs RLM per query
)

Now when you ask a question, soul.py:

  1. Routes the query — Is this a specific fact lookup (RAG) or synthesis question (RLM)?
  2. Retrieves relevant memories — Vector search finds the top-k most similar entries
  3. Generates response — Only relevant context goes to the LLM

This scales to thousands of memory entries without hitting token limits.

Vector Database Deep Dive

Under the hood, soul.py supports multiple vector backends:

BackendInstallBest For
BM25Built-inSmall memories, offline, zero deps
ChromaDBpip install soul-agent[chromadb]Local dev, medium memories
QdrantCloud or self-hostedProduction, large scale

How Collections Work

Each agent gets its own collection (like a database table) in the vector store:

agent = HybridAgent(
    collection_name="my_project_memory",  # Your collection name
    # Defaults to "soul_v2_memory" if not specified
)

What goes in a collection:

  • Each entry from MEMORY.md becomes a vector
  • Entries are embedded using your configured provider (OpenAI, Azure)
  • Vectors are stored with the original text as payload

Querying:

Your question: "What database did we choose?"

    Embed question → vector [0.12, -0.45, 0.78, ...]

    Search collection for similar vectors

    Return top-5 most similar memory entries

    Inject into LLM prompt as context

Configuring Qdrant (Production)

For production deployments, use Qdrant Cloud:

agent = HybridAgent(
    qdrant_url="https://your-cluster.qdrant.io:6333",
    qdrant_api_key="your-api-key",
    azure_embedding_endpoint="https://your-azure.openai.azure.com",
    azure_embedding_key="your-key",
    collection_name="prod_agent_memory",
)

Or set via environment variables:

export QDRANT_URL=https://your-cluster.qdrant.io:6333
export QDRANT_API_KEY=xxx
export AZURE_EMBEDDING_ENDPOINT=https://xxx.openai.azure.com
export AZURE_EMBEDDING_KEY=xxx

Using OpenAI Embeddings Directly

New in v0.1.2 — you don’t need Azure anymore:

agent = HybridAgent(
    openai_api_key="sk-...",  # Direct OpenAI, not Azure
)

Uses text-embedding-3-small (1536 dimensions) by default.

RAG vs RLM: The Query Router

soul.py v2.0 doesn’t just do RAG. It automatically routes queries:

Query TypeRouteMethod
”What’s my name?”RAGVector search, return top matches
”Summarize all our decisions”RLMRead ALL memories, synthesize

RAG (Retrieval-Augmented Generation):

  • Fast (~500ms)
  • Finds specific facts
  • Good for: “What did we decide about X?”

RLM (Retrieval + Learning Memory):

  • Slower (~5-10s)
  • Processes everything recursively
  • Good for: “Give me a summary of the whole project”

The router is a lightweight LLM call that classifies your query, then dispatches to the right retrieval strategy.

Try it live: soulv2.themenonlab.com

Team Usage (Convention, Not Feature)

For team projects, a useful pattern:

Share identity, keep separate memories:

# .gitignore
MEMORY.md          # Each dev has their own history

Commit SOUL.md so the team shares agent identity. Gitignore MEMORY.md so each developer has their own conversation history.

Or share everything:

Don’t gitignore MEMORY.md. The team builds shared institutional knowledge:

## Key Decisions
- 2026-02-15: Chose SQLite over PostgreSQL (Sarah)
- 2026-02-20: Moving auth to Clerk (Prahlad)

This isn’t a built-in feature — it’s just how files work. That’s the point.

Why This Matters

Most AI coding assistants are stateless. You explain context, get help, close the tab — gone.

With soul.py:

  • Context accumulates over time
  • Decisions are remembered
  • The assistant knows your project
  • It’s all in readable files you control

You can open MEMORY.md in any text editor. Edit it. Delete things. Add context manually. It’s just markdown.

Try it now:

pip install soul-agent
soul init

Then start a Python session and talk to your agent.


Repo: github.com/menonpg/soul.py

Live demo: soul.themenonlab.com

v2.0 demo (with RAG): soulv2.themenonlab.com