ReMe: The Agent Memory Framework That Tracks Which Tools Actually Work

By Prahlad Menon 5 min read

TL;DR: ReMe is a memory framework that gives agents 4 types of memory: personal, task, tool, and working. The killer feature is tool memory — it tracks every API call, records success rates and token costs, and turns that history into dynamic usage guidelines. 15%+ improvement in tool selection accuracy. Install: pip install reme-ai

Your AI agent has access to 50 tools. How does it know which one to use?

Right now, it reads the tool description. That’s it. No performance data, no success rates, no “this one times out when you ask for more than 20 results.” Just a static string.

ReMe fixes this.

What Problem Does ReMe Solve?

ReMe solves the stateless agent problem — agents that start from scratch every session, never learning from their own experience.

It tackles two core issues:

  1. Limited context window — early information gets truncated in long conversations
  2. Stateless sessions — new sessions can’t inherit history

But the real innovation is tool memory: a system that learns which tools actually work.

What Are the 4 Types of Memory?

ReMe provides four distinct memory types:

Memory TypePurposeExample
PersonalUser preferences, long-term facts”User prefers Python over JavaScript”
TaskSuccess/failure patterns from past tasks”Navigation tasks work better with smaller steps”
ToolAPI performance history, optimal parameters”web_search succeeds 92% with max_results=10”
WorkingCurrent session context, active stateToken-aware conversation buffer

Each type serves a different cognitive function — just like human memory isn’t one monolithic thing.

How Does Tool Memory Work?

This is the standout feature. Here’s the problem it solves:

Before Tool Memory:

Tool A: "Search the web for information"
Tool B: "Perform web searches with customizable parameters"  
Tool C: "Query search engines and return results"

The descriptions are nearly identical. But in reality:

  • Tool A: 95% success rate, 2.3s average, best for technical queries
  • Tool B: 70% success rate, 5.8s average, often times out
  • Tool C: 85% success rate, 3.1s average, good for general queries

After Tool Memory:

The agent sees enriched context:

Tool: web_search

Static Description:
"Search the web for information"

+ Tool Memory Context:
"Based on 150 historical calls:
- Success rate: 92% (138 successful, 12 failed)
- Avg time: 2.3s, Avg tokens: 150
- Best for: Technical documentation (95% success)
- Optimal params: max_results=5-20, language='en'
- Common failures: Generic queries timeout
- Recommendation: Use specific multi-word queries"

The agent now makes data-driven decisions instead of guessing.

What’s the Performance Impact?

ReMe’s benchmarks show significant improvements:

MetricBeforeAfterChange
Success rate75%92%+17%
Average time5.2s2.8s-46%
Token cost200+150-25%

The tool memory alone drives a 15%+ improvement in tool selection accuracy from historical performance data.

ReMe also achieves state-of-the-art results on the LoCoMo and HaluMem benchmarks for agent memory evaluation.

How Do I Install ReMe?

# Full installation
pip install reme-ai

# Light version (file-based only)
pip install reme-ai[light]

# From source
git clone https://github.com/agentscope-ai/ReMe.git
cd ReMe
pip install -e ".[light]"

How Do I Use Tool Memory?

from reme import ToolMemory, ToolMemoryManager

# Initialize tool memory manager
manager = ToolMemoryManager(workspace_id="my-agent")

# After each tool call, record the result
await manager.add_tool_call_result(
    tool_name="web_search",
    input={"query": "python async tutorial"},
    output="Found 15 results...",
    success=True,
    time_cost=2.3,
    token_cost=150
)

# Before calling a tool, retrieve learned guidelines
guidelines = await manager.retrieve_tool_memory("web_search")
# Returns: success rates, optimal params, failure patterns

The system automatically:

  • Evaluates each call quality (LLM-as-judge)
  • Synthesizes usage guidelines from patterns
  • Updates recommendations as it learns more

How Does the File-Based Memory Work?

ReMe stores memory as readable Markdown files:

working_dir/
├── MEMORY.md          # Long-term memory
├── memory/
│   └── 2026-03-22.md  # Daily journal (auto-written)
├── dialog/
│   └── 2026-03-22.jsonl  # Raw conversation records
└── tool_result/
    └── <uuid>.txt     # Cached tool outputs

This is similar to soul.py’s approach — human-readable, git-versionable, directly editable. The difference is ReMe’s focus on operational memory types vs soul.py’s focus on identity and conversational memory.

How Does ReMe Compare to Other Memory Solutions?

FeatureReMesoul.pyMem0
Memory types4 (personal, task, tool, working)2 (identity, conversational)1 (general)
Tool performance tracking✅ Full history
File-based storage✅ Markdown✅ Markdown
Vector retrieval✅ Hybrid✅ RAG+RLM✅ RAG
Context compression✅ Auto-compact
MCP ecosystem focus

They’re complementary. ReMe excels at operational memory (especially tool learning). soul.py excels at identity persistence and conversational memory. You could use both.

When Should I Use ReMe?

Use ReMe when:

  • Your agent uses many tools and needs to learn which work best
  • You’re building MCP-based workflows
  • You need automatic context compression for long sessions
  • You want file-based memory you can inspect and edit

Consider alternatives when:

  • You primarily need conversational memory (soul.py)
  • You need simple key-value memory (most built-in solutions)
  • You’re not using tool-heavy workflows

What’s the Architecture?

ReMe uses a pre-reasoning hook that runs before each agent step:

  1. compact_tool_result — Truncate long tool outputs
  2. check_context — Count tokens, decide if compaction needed
  3. compact_memory — Generate summary if over limit
  4. summary_memory — Async persistence to files

The tool memory layer sits alongside this, tracking every tool invocation and synthesizing guidelines.

Frequently Asked Questions

What is ReMe?

ReMe (Remember Me, Refine Me) is a memory management framework for AI agents. It provides four types of memory — personal, task, tool, and working — enabling agents to learn from experience instead of starting fresh every session.

How do I install ReMe?

pip install reme-ai

For the file-based light version: pip install reme-ai[light]

What makes tool memory different?

Tool memory tracks actual API performance: success rates, execution times, token costs, and failure patterns. It transforms static tool descriptions into data-driven usage guidelines that improve over time.

Does ReMe work with Ollama/local models?

Yes. ReMe is model-agnostic. Configure your LLM via environment variables (LLM_API_KEY, LLM_BASE_URL) to point at any OpenAI-compatible API including Ollama.

How does context compression work?

ReMe automatically compacts conversation history when approaching context limits. Their benchmark shows 223,838 tokens → 1,105 tokens (99.5% compression) while retaining key information.

Is this production-ready?

ReMe achieves state-of-the-art on LoCoMo and HaluMem benchmarks. It’s actively maintained by the AgentScope team. Check GitHub activity for current status.

What’s the relationship to AgentScope?

ReMe is developed by the AgentScope team. It integrates with CoPaw, their personal assistant agent, but works standalone with any agent framework.

Links: