ReMe is a memory management framework for AI agents that provides four types of memory: personal, task, tool, and working memory. It helps agents learn from experience instead of starting from zero every session.

What is tool memory in ReMe?

Tool memory tracks every API/tool call an agent makes, records success rates and token costs, and generates dynamic usage guidelines. It enables data-driven tool selection instead of relying on static descriptions.

How much does tool memory improve agent performance?

According to ReMe's benchmarks, tool memory improves tool selection accuracy by 15-17% and reduces average execution time by 46%.

Does ReMe work with MCP tools?

Yes. ReMe's tool memory is specifically designed to help agents navigate large MCP tool ecosystems by learning which tools work best for different tasks.

How does ReMe compare to other agent memory solutions?

ReMe focuses on operational memory types (personal, task, tool, working) with file-based storage. It's complementary to conversational memory solutions like soul.py which focus on identity and RAG retrieval.

Yes. ReMe is open source and available on GitHub at github.com/agentscope-ai/ReMe.

ReMe: The Agent Memory Framework That Tracks Which Tools Actually Work

Q: How do I install ReMe?

Install with pip: pip install reme-ai. For the file-based light version, use pip install reme-ai[light].

Q: What benchmarks does ReMe achieve?

ReMe achieves state-of-the-art results on the LoCoMo and HaluMem benchmarks for agent memory evaluation.

By Prahlad Menon Published 2026-03-22 5 min read

TL;DR: ReMe is a memory framework that gives agents 4 types of memory: personal, task, tool, and working. The killer feature is tool memory — it tracks every API call, records success rates and token costs, and turns that history into dynamic usage guidelines. 15%+ improvement in tool selection accuracy. Install: pip install reme-ai

Your AI agent has access to 50 tools. How does it know which one to use?

Right now, it reads the tool description. That’s it. No performance data, no success rates, no “this one times out when you ask for more than 20 results.” Just a static string.

ReMe fixes this.

What Problem Does ReMe Solve?

ReMe solves the stateless agent problem — agents that start from scratch every session, never learning from their own experience.

It tackles two core issues:

Limited context window — early information gets truncated in long conversations
Stateless sessions — new sessions can’t inherit history

But the real innovation is tool memory: a system that learns which tools actually work.

What Are the 4 Types of Memory?

ReMe provides four distinct memory types:

Memory Type	Purpose	Example
Personal	User preferences, long-term facts	”User prefers Python over JavaScript”
Task	Success/failure patterns from past tasks	”Navigation tasks work better with smaller steps”
Tool	API performance history, optimal parameters	”web_search succeeds 92% with max_results=10”
Working	Current session context, active state	Token-aware conversation buffer

Each type serves a different cognitive function — just like human memory isn’t one monolithic thing.

How Does Tool Memory Work?

This is the standout feature. Here’s the problem it solves:

Before Tool Memory:

Tool A: "Search the web for information"
Tool B: "Perform web searches with customizable parameters"  
Tool C: "Query search engines and return results"

The descriptions are nearly identical. But in reality:

Tool A: 95% success rate, 2.3s average, best for technical queries
Tool B: 70% success rate, 5.8s average, often times out
Tool C: 85% success rate, 3.1s average, good for general queries

After Tool Memory:

The agent sees enriched context:

Tool: web_search

Static Description:
"Search the web for information"

+ Tool Memory Context:
"Based on 150 historical calls:
- Success rate: 92% (138 successful, 12 failed)
- Avg time: 2.3s, Avg tokens: 150
- Best for: Technical documentation (95% success)
- Optimal params: max_results=5-20, language='en'
- Common failures: Generic queries timeout
- Recommendation: Use specific multi-word queries"

The agent now makes data-driven decisions instead of guessing.

What’s the Performance Impact?

ReMe’s benchmarks show significant improvements:

Metric	Before	After	Change
Success rate	75%	92%	+17%
Average time	5.2s	2.8s	-46%
Token cost	200+	150	-25%

The tool memory alone drives a 15%+ improvement in tool selection accuracy from historical performance data.

ReMe also achieves state-of-the-art results on the LoCoMo and HaluMem benchmarks for agent memory evaluation.

How Do I Install ReMe?

# Full installation
pip install reme-ai

# Light version (file-based only)
pip install reme-ai[light]

# From source
git clone https://github.com/agentscope-ai/ReMe.git
cd ReMe
pip install -e ".[light]"

How Do I Use Tool Memory?

from reme import ToolMemory, ToolMemoryManager

# Initialize tool memory manager
manager = ToolMemoryManager(workspace_id="my-agent")

# After each tool call, record the result
await manager.add_tool_call_result(
    tool_name="web_search",
    input={"query": "python async tutorial"},
    output="Found 15 results...",
    success=True,
    time_cost=2.3,
    token_cost=150
)

# Before calling a tool, retrieve learned guidelines
guidelines = await manager.retrieve_tool_memory("web_search")
# Returns: success rates, optimal params, failure patterns

The system automatically:

Evaluates each call quality (LLM-as-judge)
Synthesizes usage guidelines from patterns
Updates recommendations as it learns more

How Does the File-Based Memory Work?

ReMe stores memory as readable Markdown files:

working_dir/
├── MEMORY.md          # Long-term memory
├── memory/
│   └── 2026-03-22.md  # Daily journal (auto-written)
├── dialog/
│   └── 2026-03-22.jsonl  # Raw conversation records
└── tool_result/
    └── <uuid>.txt     # Cached tool outputs

This is similar to soul.py’s approach — human-readable, git-versionable, directly editable. The difference is ReMe’s focus on operational memory types vs soul.py’s focus on identity and conversational memory.

How Does ReMe Compare to Other Memory Solutions?

Feature	ReMe	soul.py	Mem0
Memory types	4 (personal, task, tool, working)	2 (identity, conversational)	1 (general)
Tool performance tracking	✅ Full history	❌	❌
File-based storage	✅ Markdown	✅ Markdown	❌
Vector retrieval	✅ Hybrid	✅ RAG+RLM	✅ RAG
Context compression	✅ Auto-compact	❌	❌
MCP ecosystem focus	✅	❌	❌

They’re complementary. ReMe excels at operational memory (especially tool learning). soul.py excels at identity persistence and conversational memory. You could use both.

When Should I Use ReMe?

Use ReMe when:

Your agent uses many tools and needs to learn which work best
You’re building MCP-based workflows
You need automatic context compression for long sessions
You want file-based memory you can inspect and edit

Consider alternatives when:

You primarily need conversational memory (soul.py)
You need simple key-value memory (most built-in solutions)
You’re not using tool-heavy workflows

What’s the Architecture?

ReMe uses a pre-reasoning hook that runs before each agent step:

compact_tool_result — Truncate long tool outputs
check_context — Count tokens, decide if compaction needed
compact_memory — Generate summary if over limit
summary_memory — Async persistence to files

The tool memory layer sits alongside this, tracking every tool invocation and synthesizing guidelines.

Frequently Asked Questions

What is ReMe?

ReMe (Remember Me, Refine Me) is a memory management framework for AI agents. It provides four types of memory — personal, task, tool, and working — enabling agents to learn from experience instead of starting fresh every session.

How do I install ReMe?

pip install reme-ai

For the file-based light version: pip install reme-ai[light]

What makes tool memory different?

Tool memory tracks actual API performance: success rates, execution times, token costs, and failure patterns. It transforms static tool descriptions into data-driven usage guidelines that improve over time.

Does ReMe work with Ollama/local models?

Yes. ReMe is model-agnostic. Configure your LLM via environment variables (LLM_API_KEY, LLM_BASE_URL) to point at any OpenAI-compatible API including Ollama.

How does context compression work?

ReMe automatically compacts conversation history when approaching context limits. Their benchmark shows 223,838 tokens → 1,105 tokens (99.5% compression) while retaining key information.

Is this production-ready?

ReMe achieves state-of-the-art on LoCoMo and HaluMem benchmarks. It’s actively maintained by the AgentScope team. Check GitHub activity for current status.

What’s the relationship to AgentScope?

ReMe is developed by the AgentScope team. It integrates with CoPaw, their personal assistant agent, but works standalone with any agent framework.

Links: