Introducing SoulMate: Persistent AI Memory as a Service

By Prahlad Menon 3 min read

soul.py logo

We built soul.py to solve a fundamental problem: AI agents forget everything between sessions. Two markdown files — SOUL.md for identity, MEMORY.md for memory — and suddenly your agent remembers who it is and who it’s talking to.

The response was overwhelming. The library hit #1 on r/ollama with 50,000+ views. Developers started building with it immediately.

Then the enterprise requests started coming in.

The Enterprise Problem

soul.py works beautifully for individual developers and small teams. But enterprises have different requirements:

Scale. Not one agent remembering one user — millions of customers, each with their own persistent context. A telecom company handling support calls needs per-customer memory for every subscriber.

Compliance. Healthcare needs HIPAA. Finance needs SOC 2. GDPR requires the ability to delete customer data on request. Memory infrastructure has to support all of this.

Operations. Self-hosting is overhead. Enterprises want managed infrastructure — APIs they can call, not servers they have to maintain.

Integration. Engineering teams don’t want to learn a new framework. They want a REST API that fits into their existing stack.

soul.py is the primitive. Enterprises need the platform.

Introducing SoulMate

SoulMate is persistent AI memory as a service. The architecture is simple:

You bring the LLM. Use your existing Anthropic, OpenAI, or Ollama setup. Keep your model relationships, your rate limits, your enterprise agreements. SoulMate doesn’t touch your LLM tokens.

We handle the memory. Per-customer persistent memory, managed and scaled. Every customer interaction builds context. Every conversation remembers the last one. No database setup, no vector store configuration, no infrastructure to maintain.

Think of it like Pinecone for memory. Pinecone charges for vector storage, not the embedding model. SoulMate charges for memory operations, not the LLM. You keep control of the AI; we handle what it remembers.

How It Works

The SoulMate API is a REST service with a simple mental model:

from soulmate import SoulMateClient

# Initialize with your API key and LLM credentials
sm = SoulMateClient(
    api_key="sm_live_xxxxx",           # Your SoulMate key
    llm_provider="anthropic",           # Your LLM provider
    llm_key="sk-ant-api03-..."          # Your LLM key (BYOK)
)

# Every call maintains per-customer memory
response = sm.ask("customer_123", "What's my account status?")
# SoulMate remembers this customer's entire history

# Later, same customer, memory persists
response = sm.ask("customer_123", "What about that issue I mentioned last week?")
# Context from "last week" is automatically available

Under the hood, SoulMate:

  1. Maintains a persistent memory store for each customer_id
  2. Retrieves relevant context using RAG + RLM hybrid routing
  3. Constructs the prompt with identity (SOUL.md) and relevant memories
  4. Calls your LLM with the enriched context
  5. Stores the interaction for future reference

You get persistent, context-aware AI without managing any of the memory infrastructure.

The API

SoulMate exposes a clean REST API:

EndpointPurpose
POST /v1/askSend a message, get a response with memory
GET /v1/memory/{customer_id}Retrieve a customer’s stored memory
DELETE /v1/memory/{customer_id}GDPR delete — remove all customer data
POST /v1/soulsUpload SOUL.md configurations
GET /v1/usageTrack your API usage

SoulMate v2: Qdrant-Powered Semantic Memory

Released March 2026

v1 works. It’s simple, battle-tested, and production-ready. But it has one fundamental limitation: every memory gets loaded into context every time. As your agents accumulate history, context windows fill up and costs climb.

v2 solves this with real vector search.

Instead of injecting all memories into every prompt, v2 uses Qdrant and Azure embeddings to semantically retrieve only the top-8 most relevant memories for each query. Context stays lean regardless of how much the agent has accumulated. Retrieval scales to millions of entries per customer.

What’s New in v2

Managed Qdrant collections. Every customer automatically gets a dedicated vector collection (sm_{customer_id}). No setup, no config — it’s provisioned on first write. The Qdrant infrastructure is fully managed; you never touch it.

Azure text-embedding-3-large. State-of-the-art embeddings. Also fully managed — no Azure account or embedding key required on your side.

Semantic retrieval. Instead of “all memories,” each query gets the 8 memories most semantically similar to the question being asked. Relevant context, not exhaustive context.

Constant token cost. Whether your agent has 10 memories or 100,000, the context window footprint stays the same. Costs don’t compound with usage.

rag_hits in every response. The response now tells you exactly how many memories matched the query — useful for debugging and understanding what the agent knew.

v1 vs v2 at a Glance

v1 — Classicv2 — RAG + Qdrant
StorageFlat MEMORY.mdQdrant vector collections
RetrievalAll memories → contextTop-8 semantic matches only
EmbeddingsNoneAzure text-embedding-3-large (managed)
Context sizeGrows with historyConstant regardless of size
Responseansweranswer + rag_hits
ScaleLightweight agentsMillions of memories per customer
Endpointsoulmate-api.themenonlab.comsoulmate-api-v2.themenonlab.com

Migrating from v1 to v2

Zero code changes required beyond the base URL:

# v1
sm = SoulMateClient(
    api_key="sm_live_xxxxx",
    llm_provider="anthropic",
    llm_key="sk-ant-..."
)

# v2 — one line different
sm = SoulMateClient(
    api_key="sm_live_xxxxx",           # Same key works on both
    base_url="https://soulmate-api-v2.themenonlab.com",
    llm_provider="anthropic",
    llm_key="sk-ant-..."
)

resp = sm.ask("user_123", "What's my account status?")
print(resp["rag_hits"])   # How many memories matched
print(resp["answer"])     # Response with relevant context injected

The same API key works on both v1 and v2. You can run both in parallel, migrate incrementally, or keep different agents on different versions. v1 stays live and maintained.


Use Cases

We’re seeing early interest in verticals where customer relationships matter and context compounds:

Telecom Support Verizon, AT&T, and T-Mobile handle millions of support calls monthly. Imagine if the AI already knew each customer’s plan, devices, history, and previous issues before they said a word. No more “let me pull up your account.” Handle time drops 40-60%.

Healthcare Patient-facing AI that remembers medical history, care preferences, and prior interactions. HIPAA-compliant via self-hosted deployment option. Reduces intake friction, improves care continuity.

Financial Services Wealth management AI that knows each client’s portfolio, risk tolerance, life events, and goals. Personalization at scale without relationship manager overhead.

Retail & E-Commerce Shopping AI that remembers preferences, past purchases, size profiles, and gift occasions. True relationship commerce across every touchpoint.

Pricing Model

SoulMate uses consumption-based pricing:

  • Memory operations — reads, writes, deletes
  • Storage — per-customer memory footprint
  • No LLM markup — your tokens, your costs

Free tier for developers to experiment. Enterprise tiers with SLAs, dedicated support, and compliance certifications.

The Ecosystem

SoulMate sits in a layered architecture:

┌─────────────────────────────────────────┐
│           Your Application              │
├─────────────────────────────────────────┤
│  SoulMate v2 API (Qdrant + RAG memory)  │  ← Hosted, enterprise
├─────────────────────────────────────────┤
│  SoulMate v1 API (markdown memory)      │  ← Hosted, getting started
├─────────────────────────────────────────┤
│   Self-hosted via Docker (v1 or v2)     │  ← Your infra, your data
├─────────────────────────────────────────┤
│         soul.py (open source, MIT)      │  ← Build your own
├─────────────────────────────────────────┤
│     LLM Provider (Anthropic/OpenAI)     │  ← You choose
└─────────────────────────────────────────┘

soul.py stays open source (MIT). It’s the credibility engine, the community funnel, and the foundation everything builds on. If you want to self-host and manage your own memory infrastructure, soul.py gives you everything you need.

SoulMate is for teams that want managed infrastructure. Less ops, more building. The same persistent memory patterns, hosted and scaled.

Self-Hosting

Both v1 and v2 are available as Docker images on Docker Hub. Self-hosting for internal use is fully permitted under the license — no restrictions, no fees.

# v1 — simple markdown memory
docker pull pgmenon/soulmate-api:latest
docker run -p 8000:8000 \
  -e SUPABASE_URL=your_url \
  -e SUPABASE_SERVICE_KEY=your_key \
  -v /data/soulmate:/data/soulmate \
  pgmenon/soulmate-api:latest

# v2 — Qdrant + RAG memory
docker pull pgmenon/soulmate-api-v2:latest
docker run -p 8000:8000 \
  -e SUPABASE_URL=your_url \
  -e SUPABASE_SERVICE_KEY=your_key \
  -e QDRANT_URL=your_qdrant_url \
  -e QDRANT_API_KEY=your_qdrant_key \
  -e AZURE_EMBEDDING_ENDPOINT=your_azure_endpoint \
  -e AZURE_EMBEDDING_KEY=your_azure_key \
  -e AZURE_EMBEDDING_DEPLOYMENT=text-embedding-3-large \
  -v /data/soulmate:/data/soulmate \
  pgmenon/soulmate-api-v2:latest

v2 falls back to BM25 keyword search automatically if Qdrant/Azure credentials aren’t provided — so you can run it with zero external dependencies and upgrade to full vector search later.

License

SoulMate API is licensed under Business Source License 1.1 (BSL 1.1):

  • Self-host for internal use — unrestricted, no fee
  • Use the hosted API — free tier + paid plans
  • Read, modify, fork — full source on GitHub
  • Cannot run a competing commercial hosted memory service
  • 📅 Converts to MIT on March 4, 2030 — fully open source after that

For commercial hosting licenses (white-label, OEM, reseller): prahlad.menon@quant.md

Get Started

Developers: Get a free API key at soulmate.thinkcreateai.com — choose v1 (simple) or v2 (RAG+Qdrant). 1,000 free memory operations per month, no credit card.

pip install soul-agent --upgrade
from soulmate import SoulMateClient

# v2 — semantic RAG memory
sm = SoulMateClient(
    api_key="sm_live_xxxxx",
    base_url="https://soulmate-api-v2.themenonlab.com",
    llm_provider="anthropic",
    llm_key="sk-ant-..."
)

Enterprise: Book a call to discuss your use case, compliance requirements, and deployment options.

→ Get your API key