agentmemory vs soul.py: Two Approaches to Persistent Memory for AI Agents

By Prahlad Menon 3 min read

Every AI agent has the same problem: amnesia. Each session starts from zero. You re-explain your architecture, re-teach your preferences, re-discover the same bugs. Two open-source projects — agentmemory and soul.py — are attacking this problem from fundamentally different directions.

agentmemory is a memory engine built for coding agents. soul.py is a memory library built for any LLM application. Same problem, very different philosophies.

The Core Difference

agentmemory is infrastructure-first. It runs as a standalone server, captures agent activity automatically through hooks, and serves memories back via MCP or REST. Think of it as a sidecar process that watches what your coding agent does and builds a searchable knowledge base from it — without you doing anything.

soul.py is library-first. You import it, point it at a text file (like SOUL.md or MEMORY.md), and it gives you hybrid retrieval — combining vector search (RAG) with direct LLM reasoning over the raw text (RLM). It’s designed to be embedded into any Python application, not run as a separate service.

The distinction matters. agentmemory assumes you’re using a coding agent (Claude Code, Cursor, Gemini CLI, Codex) and wants to be invisible — silently capturing and replaying context. soul.py assumes you’re building something custom and wants to be a composable building block.

Architecture

agentmemory

  • Runtime: Node.js server (port 3113)
  • Storage: SQLite + iii engine (Rust-based)
  • Search: BM25 + Vector + Knowledge Graph with RRF (Reciprocal Rank Fusion)
  • Capture: 12 hooks for Claude Code, MCP protocol for others, REST API for everything else
  • Memory lifecycle: 4-tier consolidation with confidence scoring, decay, and auto-forget
  • Embeddings: all-MiniLM-L6-v2 (local, free)

The key innovation is the hook system. When integrated with Claude Code, agentmemory registers 12 hooks that fire on events like file edits, tool calls, and conversation turns. It compresses these into memories automatically — no manual add() calls.

soul.py

  • Runtime: Python library (import and use)
  • Storage: Qdrant, ChromaDB, or BM25 (pluggable backends)
  • Search: RAG (vector similarity) + RLM (direct LLM reasoning over source text)
  • Capture: Explicit — you choose what to remember
  • Memory lifecycle: Manual or application-controlled
  • LLM providers: Anthropic, OpenAI, Gemini, any OpenAI-compatible endpoint

The key innovation is the RAG+RLM hybrid. Instead of relying solely on embeddings (which miss nuance) or solely on LLM reasoning (which is expensive), soul.py runs both and merges the results. RAG finds relevant chunks fast; RLM catches what embeddings miss by reasoning directly over the source material.

Retrieval Quality

agentmemory publishes benchmarks against LongMemEval-S (ICLR 2025, 500 questions):

MetricagentmemoryBM25-only
R@595.2%86.2%
R@1098.6%94.6%
MRR88.2%71.5%

soul.py’s architecture is harder to benchmark directly because the RLM component’s quality depends on which LLM you use. With a strong model (Claude, GPT-4), the RLM fallback catches edge cases that pure vector search misses — queries where the semantic relationship isn’t captured well by embeddings. The tradeoff is latency and cost: RLM requires an LLM call per retrieval.

Agent Compatibility

This is where agentmemory has a clear edge in breadth:

agentmemory supports:

  • Claude Code (12 hooks + MCP + skills)
  • Cursor, Windsurf, Cline, Roo Code (MCP)
  • Gemini CLI, Codex CLI, OpenCode (MCP)
  • Goose, Kilo Code, Aider (MCP/REST)
  • Claude Desktop, Claude SDK
  • Any agent speaking MCP or HTTP (104 endpoints)

soul.py supports:

  • Any Python application (library import)
  • CrewAI (via crewai-soul)
  • LangChain (via langchain-soul)
  • LlamaIndex (via llamaindex-soul)
  • SoulMate API for managed deployments
  • n8n, custom REST services

agentmemory is optimized for the coding agent ecosystem. soul.py is optimized for the AI application framework ecosystem. If you’re using Claude Code or Cursor, agentmemory plugs in with zero friction. If you’re building a CrewAI pipeline or a LangChain application, soul.py has native integrations.

Token Efficiency

Both projects care about this. Stuffing full context into every prompt is expensive and eventually hits window limits.

agentmemory reports ~1,900 tokens per session injection, translating to roughly $10/year with API embeddings or $0 with local embeddings. Their 4-tier consolidation compresses raw session data into increasingly abstract memories over time.

soul.py keeps token usage low by chunking source material and only retrieving relevant chunks via RAG. The RLM path is more expensive (full LLM call), but it’s a fallback — RAG handles the majority of queries. For a typical MEMORY.md file, RAG retrieval costs are negligible.

Self-Hosting & Dependencies

agentmemory: Fully self-hosted by default. SQLite for storage, local embeddings, no external API keys needed. The iii engine (Rust) handles the heavy lifting. Single command to start: npx @agentmemory/agentmemory.

soul.py: Also self-hostable. Can run entirely local with Ollama + BM25 backend (zero API calls). Or use Qdrant/ChromaDB for vector search. The SoulMate API offers a managed option for production deployments. Install: pip install soul-agent.

Both projects are genuinely self-hostable without cloud dependencies, which is increasingly rare.

The Real-Time Viewer

agentmemory ships with a built-in web viewer on port 3113 that lets you watch memories form in real time, browse the knowledge graph, and replay past sessions. This is a significant UX advantage — you can actually see what the system remembers and debug retrieval issues visually.

soul.py doesn’t have an equivalent viewer. Debugging retrieval means inspecting the returned chunks programmatically or through the SoulMate API dashboard.

When to Use Which

Choose agentmemory if:

  • You’re using Claude Code, Cursor, or another coding agent
  • You want zero-effort memory capture (hooks do the work)
  • You need multi-agent memory sharing (one server, all agents read/write)
  • You want a visual viewer for debugging memory
  • You prefer Node.js/TypeScript ecosystem

Choose soul.py if:

  • You’re building a custom AI application in Python
  • You want hybrid RAG+RLM retrieval (catches what embeddings miss)
  • You’re using CrewAI, LangChain, or LlamaIndex
  • You need database semantic layers (soul-schema)
  • You want a library you embed, not a server you run
  • You need identity persistence (SOUL.md pattern for agent personality)

Use both if:

  • You use coding agents for development AND build AI applications — agentmemory for your coding workflow, soul.py for your application’s memory layer

The Bigger Picture

The fact that both projects exist (along with mem0, Letta, Khoj, and others) signals that persistent memory is becoming table stakes for AI agents. The “chat with amnesia” era is ending.

What’s interesting is the convergence. agentmemory started with coding agents and is expanding toward general-purpose memory. soul.py started with general-purpose memory and has been adopted by coding-adjacent tools. Both are heading toward the same destination — reliable, efficient, cross-session memory that just works.

The question isn’t which one “wins.” It’s whether persistent memory becomes a standard feature of every AI framework, or whether standalone memory layers remain the architecture. Based on the trajectory, both will likely happen: frameworks will build basic memory in, and specialized tools like agentmemory and soul.py will handle the cases where basic isn’t enough.


Links:

Disclosure: soul.py is built by The Menon Lab. This comparison aims to be fair to both projects.