SoulMate is persistent AI memory as a service. You bring your LLM (Anthropic, OpenAI, Ollama), we handle the memory infrastructure — per-customer context that persists across conversations.

How is SoulMate different from soul.py?

soul.py is the open-source library for individual developers. SoulMate is the managed enterprise platform — hosted infrastructure, compliance features, and REST API integration.

What is the BYOK model?

Bring Your Own Key. SoulMate doesn't touch your LLM tokens. You keep your existing Anthropic/OpenAI/Ollama setup. SoulMate only handles memory storage and retrieval.

Does SoulMate support HIPAA and SOC 2?

SoulMate is built for enterprise compliance requirements including HIPAA for healthcare and SOC 2 for security. Per-customer memory isolation and GDPR deletion support included.

How does SoulMate pricing work?

SoulMate charges for memory operations, not LLM usage. Like Pinecone charges for vector storage rather than embedding models. You control your AI costs separately.

Can I migrate from soul.py to SoulMate?

Yes. SoulMate uses the same SOUL.md and MEMORY.md format. Export your files and import into SoulMate, or use the API to sync automatically.

What retrieval does SoulMate use?

SoulMate v2 uses Qdrant-powered semantic RAG retrieval with the same RAG+RLM hybrid routing as soul.py v2.0.

Introducing SoulMate: Persistent AI Memory as a Service

By Prahlad Menon Published 2026-03-10 3 min read

We built soul.py to solve a fundamental problem: AI agents forget everything between sessions. Two markdown files — SOUL.md for identity, MEMORY.md for memory — and suddenly your agent remembers who it is and who it’s talking to.

The response was overwhelming. The library hit #1 on r/ollama with 50,000+ views. Developers started building with it immediately.

Then the enterprise requests started coming in.

The Enterprise Problem

soul.py works beautifully for individual developers and small teams. But enterprises have different requirements:

Scale. Not one agent remembering one user — millions of customers, each with their own persistent context. A telecom company handling support calls needs per-customer memory for every subscriber.

Compliance. Healthcare needs HIPAA. Finance needs SOC 2. GDPR requires the ability to delete customer data on request. Memory infrastructure has to support all of this.

Operations. Self-hosting is overhead. Enterprises want managed infrastructure — APIs they can call, not servers they have to maintain.

Integration. Engineering teams don’t want to learn a new framework. They want a REST API that fits into their existing stack.

soul.py is the primitive. Enterprises need the platform.

Introducing SoulMate

SoulMate is persistent AI memory as a service. The architecture is simple:

You bring the LLM. Use your existing Anthropic, OpenAI, or Ollama setup. Keep your model relationships, your rate limits, your enterprise agreements. SoulMate doesn’t touch your LLM tokens.

We handle the memory. Per-customer persistent memory, managed and scaled. Every customer interaction builds context. Every conversation remembers the last one. No database setup, no vector store configuration, no infrastructure to maintain.

Think of it like Pinecone for memory. Pinecone charges for vector storage, not the embedding model. SoulMate charges for memory operations, not the LLM. You keep control of the AI; we handle what it remembers.

How It Works

The SoulMate API is a REST service with a simple mental model:

from soulmate import SoulMateClient

# Initialize with your API key and LLM credentials
sm = SoulMateClient(
    api_key="sm_live_xxxxx",           # Your SoulMate key
    llm_provider="anthropic",           # Your LLM provider
    llm_key="sk-ant-api03-..."          # Your LLM key (BYOK)
)

# Every call maintains per-customer memory
response = sm.ask("customer_123", "What's my account status?")
# SoulMate remembers this customer's entire history

# Later, same customer, memory persists
response = sm.ask("customer_123", "What about that issue I mentioned last week?")
# Context from "last week" is automatically available

Under the hood, SoulMate:

Maintains a persistent memory store for each customer_id
Retrieves relevant context using RAG + RLM hybrid routing
Constructs the prompt with identity (SOUL.md) and relevant memories
Calls your LLM with the enriched context
Stores the interaction for future reference

You get persistent, context-aware AI without managing any of the memory infrastructure.

The API

SoulMate exposes a clean REST API:

Endpoint	Purpose
`POST /v1/ask`	Send a message, get a response with memory
`GET /v1/memory/{customer_id}`	Retrieve a customer’s stored memory
`DELETE /v1/memory/{customer_id}`	GDPR delete — remove all customer data
`POST /v1/souls`	Upload SOUL.md configurations
`GET /v1/usage`	Track your API usage

SoulMate v2: Qdrant-Powered Semantic Memory

Released March 2026

v1 works. It’s simple, battle-tested, and production-ready. But it has one fundamental limitation: every memory gets loaded into context every time. As your agents accumulate history, context windows fill up and costs climb.

v2 solves this with real vector search.

Instead of injecting all memories into every prompt, v2 uses Qdrant and Azure embeddings to semantically retrieve only the top-8 most relevant memories for each query. Context stays lean regardless of how much the agent has accumulated. Retrieval scales to millions of entries per customer.

What’s New in v2

Managed Qdrant collections. Every customer automatically gets a dedicated vector collection (sm_{customer_id}). No setup, no config — it’s provisioned on first write. The Qdrant infrastructure is fully managed; you never touch it.

Azure text-embedding-3-large. State-of-the-art embeddings. Also fully managed — no Azure account or embedding key required on your side.

Semantic retrieval. Instead of “all memories,” each query gets the 8 memories most semantically similar to the question being asked. Relevant context, not exhaustive context.

Constant token cost. Whether your agent has 10 memories or 100,000, the context window footprint stays the same. Costs don’t compound with usage.

rag_hits in every response. The response now tells you exactly how many memories matched the query — useful for debugging and understanding what the agent knew.

v1 vs v2 at a Glance

	v1 — Classic	v2 — RAG + Qdrant
Storage	Flat MEMORY.md	Qdrant vector collections
Retrieval	All memories → context	Top-8 semantic matches only
Embeddings	None	Azure text-embedding-3-large (managed)
Context size	Grows with history	Constant regardless of size
Response	`answer`	`answer` + `rag_hits`
Scale	Lightweight agents	Millions of memories per customer
Endpoint	`soulmate-api.themenonlab.com`	`soulmate-api-v2.themenonlab.com`

Migrating from v1 to v2

Zero code changes required beyond the base URL:

# v1
sm = SoulMateClient(
    api_key="sm_live_xxxxx",
    llm_provider="anthropic",
    llm_key="sk-ant-..."
)

# v2 — one line different
sm = SoulMateClient(
    api_key="sm_live_xxxxx",           # Same key works on both
    base_url="https://soulmate-api-v2.themenonlab.com",
    llm_provider="anthropic",
    llm_key="sk-ant-..."
)

resp = sm.ask("user_123", "What's my account status?")
print(resp["rag_hits"])   # How many memories matched
print(resp["answer"])     # Response with relevant context injected

The same API key works on both v1 and v2. You can run both in parallel, migrate incrementally, or keep different agents on different versions. v1 stays live and maintained.

Use Cases

We’re seeing early interest in verticals where customer relationships matter and context compounds:

Telecom Support Verizon, AT&T, and T-Mobile handle millions of support calls monthly. Imagine if the AI already knew each customer’s plan, devices, history, and previous issues before they said a word. No more “let me pull up your account.” Handle time drops 40-60%.

Healthcare Patient-facing AI that remembers medical history, care preferences, and prior interactions. HIPAA-compliant via self-hosted deployment option. Reduces intake friction, improves care continuity.

Financial Services Wealth management AI that knows each client’s portfolio, risk tolerance, life events, and goals. Personalization at scale without relationship manager overhead.

Retail & E-Commerce Shopping AI that remembers preferences, past purchases, size profiles, and gift occasions. True relationship commerce across every touchpoint.

Pricing Model

SoulMate uses consumption-based pricing:

Memory operations — reads, writes, deletes
Storage — per-customer memory footprint
No LLM markup — your tokens, your costs

Free tier for developers to experiment. Enterprise tiers with SLAs, dedicated support, and compliance certifications.

The Ecosystem

SoulMate sits in a layered architecture:

┌─────────────────────────────────────────┐
│           Your Application              │
├─────────────────────────────────────────┤
│  SoulMate v2 API (Qdrant + RAG memory)  │  ← Hosted, enterprise
├─────────────────────────────────────────┤
│  SoulMate v1 API (markdown memory)      │  ← Hosted, getting started
├─────────────────────────────────────────┤
│   Self-hosted via Docker (v1 or v2)     │  ← Your infra, your data
├─────────────────────────────────────────┤
│         soul.py (open source, MIT)      │  ← Build your own
├─────────────────────────────────────────┤
│     LLM Provider (Anthropic/OpenAI)     │  ← You choose
└─────────────────────────────────────────┘

soul.py stays open source (MIT). It’s the credibility engine, the community funnel, and the foundation everything builds on. If you want to self-host and manage your own memory infrastructure, soul.py gives you everything you need.

SoulMate is for teams that want managed infrastructure. Less ops, more building. The same persistent memory patterns, hosted and scaled.

Self-Hosting

Both v1 and v2 are available as Docker images on Docker Hub. Self-hosting for internal use is fully permitted under the license — no restrictions, no fees.

# v1 — simple markdown memory
docker pull pgmenon/soulmate-api:latest
docker run -p 8000:8000 \
  -e SUPABASE_URL=your_url \
  -e SUPABASE_SERVICE_KEY=your_key \
  -v /data/soulmate:/data/soulmate \
  pgmenon/soulmate-api:latest

# v2 — Qdrant + RAG memory
docker pull pgmenon/soulmate-api-v2:latest
docker run -p 8000:8000 \
  -e SUPABASE_URL=your_url \
  -e SUPABASE_SERVICE_KEY=your_key \
  -e QDRANT_URL=your_qdrant_url \
  -e QDRANT_API_KEY=your_qdrant_key \
  -e AZURE_EMBEDDING_ENDPOINT=your_azure_endpoint \
  -e AZURE_EMBEDDING_KEY=your_azure_key \
  -e AZURE_EMBEDDING_DEPLOYMENT=text-embedding-3-large \
  -v /data/soulmate:/data/soulmate \
  pgmenon/soulmate-api-v2:latest

v2 falls back to BM25 keyword search automatically if Qdrant/Azure credentials aren’t provided — so you can run it with zero external dependencies and upgrade to full vector search later.

License

SoulMate API is licensed under Business Source License 1.1 (BSL 1.1):

✅ Self-host for internal use — unrestricted, no fee
✅ Use the hosted API — free tier + paid plans
✅ Read, modify, fork — full source on GitHub
❌ Cannot run a competing commercial hosted memory service
📅 Converts to MIT on March 4, 2030 — fully open source after that

For commercial hosting licenses (white-label, OEM, reseller): prahlad.menon@quant.md

Get Started

Developers: Get a free API key at soulmate.thinkcreateai.com — choose v1 (simple) or v2 (RAG+Qdrant). 1,000 free memory operations per month, no credit card.

pip install soul-agent --upgrade

from soulmate import SoulMateClient

# v2 — semantic RAG memory
sm = SoulMateClient(
    api_key="sm_live_xxxxx",
    base_url="https://soulmate-api-v2.themenonlab.com",
    llm_provider="anthropic",
    llm_key="sk-ant-..."
)

Enterprise: Book a call to discuss your use case, compliance requirements, and deployment options.

→ Get your API key

soul.py: Persistent Memory for LLM Agents — The open-source foundation
soul.py v2.0: RAG + RLM Hybrid — How the retrieval works
Meet Darwin: The Soul Book Companion — See persistent memory in action
Open Source Projects — All Menon Lab releases