Introducing SoulMate: Persistent AI Memory as a Service

We built soul.py to solve a fundamental problem: AI agents forget everything between sessions. Two markdown files — SOUL.md for identity, MEMORY.md for memory — and suddenly your agent remembers who it is and who it’s talking to.
The response was overwhelming. The library hit #1 on r/ollama with 50,000+ views. Developers started building with it immediately.
Then the enterprise requests started coming in.
The Enterprise Problem
soul.py works beautifully for individual developers and small teams. But enterprises have different requirements:
Scale. Not one agent remembering one user — millions of customers, each with their own persistent context. A telecom company handling support calls needs per-customer memory for every subscriber.
Compliance. Healthcare needs HIPAA. Finance needs SOC 2. GDPR requires the ability to delete customer data on request. Memory infrastructure has to support all of this.
Operations. Self-hosting is overhead. Enterprises want managed infrastructure — APIs they can call, not servers they have to maintain.
Integration. Engineering teams don’t want to learn a new framework. They want a REST API that fits into their existing stack.
soul.py is the primitive. Enterprises need the platform.
Introducing SoulMate
SoulMate is persistent AI memory as a service. The architecture is simple:
You bring the LLM. Use your existing Anthropic, OpenAI, or Ollama setup. Keep your model relationships, your rate limits, your enterprise agreements. SoulMate doesn’t touch your LLM tokens.
We handle the memory. Per-customer persistent memory, managed and scaled. Every customer interaction builds context. Every conversation remembers the last one. No database setup, no vector store configuration, no infrastructure to maintain.
Think of it like Pinecone for memory. Pinecone charges for vector storage, not the embedding model. SoulMate charges for memory operations, not the LLM. You keep control of the AI; we handle what it remembers.
How It Works
The SoulMate API is a REST service with a simple mental model:
from soulmate import SoulMateClient
# Initialize with your API key and LLM credentials
sm = SoulMateClient(
api_key="sm_live_xxxxx", # Your SoulMate key
llm_provider="anthropic", # Your LLM provider
llm_key="sk-ant-api03-..." # Your LLM key (BYOK)
)
# Every call maintains per-customer memory
response = sm.ask("customer_123", "What's my account status?")
# SoulMate remembers this customer's entire history
# Later, same customer, memory persists
response = sm.ask("customer_123", "What about that issue I mentioned last week?")
# Context from "last week" is automatically available
Under the hood, SoulMate:
- Maintains a persistent memory store for each
customer_id - Retrieves relevant context using RAG + RLM hybrid routing
- Constructs the prompt with identity (SOUL.md) and relevant memories
- Calls your LLM with the enriched context
- Stores the interaction for future reference
You get persistent, context-aware AI without managing any of the memory infrastructure.
The API
SoulMate exposes a clean REST API:
| Endpoint | Purpose |
|---|---|
POST /v1/ask | Send a message, get a response with memory |
GET /v1/memory/{customer_id} | Retrieve a customer’s stored memory |
DELETE /v1/memory/{customer_id} | GDPR delete — remove all customer data |
POST /v1/souls | Upload SOUL.md configurations |
GET /v1/usage | Track your API usage |
SoulMate v2: Qdrant-Powered Semantic Memory
Released March 2026
v1 works. It’s simple, battle-tested, and production-ready. But it has one fundamental limitation: every memory gets loaded into context every time. As your agents accumulate history, context windows fill up and costs climb.
v2 solves this with real vector search.
Instead of injecting all memories into every prompt, v2 uses Qdrant and Azure embeddings to semantically retrieve only the top-8 most relevant memories for each query. Context stays lean regardless of how much the agent has accumulated. Retrieval scales to millions of entries per customer.
What’s New in v2
Managed Qdrant collections. Every customer automatically gets a dedicated vector collection (sm_{customer_id}). No setup, no config — it’s provisioned on first write. The Qdrant infrastructure is fully managed; you never touch it.
Azure text-embedding-3-large. State-of-the-art embeddings. Also fully managed — no Azure account or embedding key required on your side.
Semantic retrieval. Instead of “all memories,” each query gets the 8 memories most semantically similar to the question being asked. Relevant context, not exhaustive context.
Constant token cost. Whether your agent has 10 memories or 100,000, the context window footprint stays the same. Costs don’t compound with usage.
rag_hits in every response. The response now tells you exactly how many memories matched the query — useful for debugging and understanding what the agent knew.
v1 vs v2 at a Glance
| v1 — Classic | v2 — RAG + Qdrant | |
|---|---|---|
| Storage | Flat MEMORY.md | Qdrant vector collections |
| Retrieval | All memories → context | Top-8 semantic matches only |
| Embeddings | None | Azure text-embedding-3-large (managed) |
| Context size | Grows with history | Constant regardless of size |
| Response | answer | answer + rag_hits |
| Scale | Lightweight agents | Millions of memories per customer |
| Endpoint | soulmate-api.themenonlab.com | soulmate-api-v2.themenonlab.com |
Migrating from v1 to v2
Zero code changes required beyond the base URL:
# v1
sm = SoulMateClient(
api_key="sm_live_xxxxx",
llm_provider="anthropic",
llm_key="sk-ant-..."
)
# v2 — one line different
sm = SoulMateClient(
api_key="sm_live_xxxxx", # Same key works on both
base_url="https://soulmate-api-v2.themenonlab.com",
llm_provider="anthropic",
llm_key="sk-ant-..."
)
resp = sm.ask("user_123", "What's my account status?")
print(resp["rag_hits"]) # How many memories matched
print(resp["answer"]) # Response with relevant context injected
The same API key works on both v1 and v2. You can run both in parallel, migrate incrementally, or keep different agents on different versions. v1 stays live and maintained.
Use Cases
We’re seeing early interest in verticals where customer relationships matter and context compounds:
Telecom Support Verizon, AT&T, and T-Mobile handle millions of support calls monthly. Imagine if the AI already knew each customer’s plan, devices, history, and previous issues before they said a word. No more “let me pull up your account.” Handle time drops 40-60%.
Healthcare Patient-facing AI that remembers medical history, care preferences, and prior interactions. HIPAA-compliant via self-hosted deployment option. Reduces intake friction, improves care continuity.
Financial Services Wealth management AI that knows each client’s portfolio, risk tolerance, life events, and goals. Personalization at scale without relationship manager overhead.
Retail & E-Commerce Shopping AI that remembers preferences, past purchases, size profiles, and gift occasions. True relationship commerce across every touchpoint.
Pricing Model
SoulMate uses consumption-based pricing:
- Memory operations — reads, writes, deletes
- Storage — per-customer memory footprint
- No LLM markup — your tokens, your costs
Free tier for developers to experiment. Enterprise tiers with SLAs, dedicated support, and compliance certifications.
The Ecosystem
SoulMate sits in a layered architecture:
┌─────────────────────────────────────────┐
│ Your Application │
├─────────────────────────────────────────┤
│ SoulMate v2 API (Qdrant + RAG memory) │ ← Hosted, enterprise
├─────────────────────────────────────────┤
│ SoulMate v1 API (markdown memory) │ ← Hosted, getting started
├─────────────────────────────────────────┤
│ Self-hosted via Docker (v1 or v2) │ ← Your infra, your data
├─────────────────────────────────────────┤
│ soul.py (open source, MIT) │ ← Build your own
├─────────────────────────────────────────┤
│ LLM Provider (Anthropic/OpenAI) │ ← You choose
└─────────────────────────────────────────┘
soul.py stays open source (MIT). It’s the credibility engine, the community funnel, and the foundation everything builds on. If you want to self-host and manage your own memory infrastructure, soul.py gives you everything you need.
SoulMate is for teams that want managed infrastructure. Less ops, more building. The same persistent memory patterns, hosted and scaled.
Self-Hosting
Both v1 and v2 are available as Docker images on Docker Hub. Self-hosting for internal use is fully permitted under the license — no restrictions, no fees.
# v1 — simple markdown memory
docker pull pgmenon/soulmate-api:latest
docker run -p 8000:8000 \
-e SUPABASE_URL=your_url \
-e SUPABASE_SERVICE_KEY=your_key \
-v /data/soulmate:/data/soulmate \
pgmenon/soulmate-api:latest
# v2 — Qdrant + RAG memory
docker pull pgmenon/soulmate-api-v2:latest
docker run -p 8000:8000 \
-e SUPABASE_URL=your_url \
-e SUPABASE_SERVICE_KEY=your_key \
-e QDRANT_URL=your_qdrant_url \
-e QDRANT_API_KEY=your_qdrant_key \
-e AZURE_EMBEDDING_ENDPOINT=your_azure_endpoint \
-e AZURE_EMBEDDING_KEY=your_azure_key \
-e AZURE_EMBEDDING_DEPLOYMENT=text-embedding-3-large \
-v /data/soulmate:/data/soulmate \
pgmenon/soulmate-api-v2:latest
v2 falls back to BM25 keyword search automatically if Qdrant/Azure credentials aren’t provided — so you can run it with zero external dependencies and upgrade to full vector search later.
License
SoulMate API is licensed under Business Source License 1.1 (BSL 1.1):
- ✅ Self-host for internal use — unrestricted, no fee
- ✅ Use the hosted API — free tier + paid plans
- ✅ Read, modify, fork — full source on GitHub
- ❌ Cannot run a competing commercial hosted memory service
- 📅 Converts to MIT on March 4, 2030 — fully open source after that
For commercial hosting licenses (white-label, OEM, reseller): prahlad.menon@quant.md
Get Started
Developers: Get a free API key at soulmate.thinkcreateai.com — choose v1 (simple) or v2 (RAG+Qdrant). 1,000 free memory operations per month, no credit card.
pip install soul-agent --upgrade
from soulmate import SoulMateClient
# v2 — semantic RAG memory
sm = SoulMateClient(
api_key="sm_live_xxxxx",
base_url="https://soulmate-api-v2.themenonlab.com",
llm_provider="anthropic",
llm_key="sk-ant-..."
)
Enterprise: Book a call to discuss your use case, compliance requirements, and deployment options.
Related
- soul.py: Persistent Memory for LLM Agents — The open-source foundation
- soul.py v2.0: RAG + RLM Hybrid — How the retrieval works
- Meet Darwin: The Soul Book Companion — See persistent memory in action
- Open Source Projects — All Menon Lab releases