We Forked a Rust AI Agent for 24/7 Railway Hosting — Here's Everything We Had to Fix
We wanted a 24/7 cloud AI agent — always on, Telegram-native, running in a container we control. Monica (our OpenClaw agent) runs great on a Mac but is tied to local hardware. We needed something that lives in the cloud, reacts instantly, and survives restarts.
SkyClaw (menonpg/skyclaw, forked from nagisanzenin/skyclaw) fit the bill: a Rust-based AI agent runtime with Telegram integration, shell access, a browser tool, and file operations — all deployable on Railway in minutes. But “deployable” and “works correctly” turned out to be different things.
This is the full post-mortem on everything we found and fixed.
What SkyClaw Is
SkyClaw is a Rust runtime that turns an LLM (Anthropic, OpenAI, Gemini) into a capable cloud agent. It:
- Listens on Telegram (or CLI)
- Runs tools: shell, file read/write, web fetch, headless Chrome browser
- Maintains session history in memory
- Connects to a SoulMate RAG/RLM memory backend for deep cross-session recall
- Deploys as a single Docker container on Railway
The architecture is clean — a trait-based provider system, a session manager, an agent runtime loop, and a tool executor. The bones are solid. But several things needed fixing before it was production-ready.
Railway Deployment Setup
Before getting into the bugs, here’s how we deploy SkyClaw on Railway — because the deployment config itself is part of the story.
Dockerfile
SkyClaw ships with a multi-stage Dockerfile. The builder stage compiles the Rust binary; the runtime stage is a minimal Debian slim image:
FROM rust:1.82-slim AS builder
WORKDIR /app
COPY . .
RUN cargo build --release
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y ca-certificates chromium && rm -rf /var/lib/apt/lists/*
COPY --from=builder /app/target/release/skyclaw /usr/local/bin/
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
Chromium is included for the headless browser tool.
The 5GB Persistent Volume
This is critical and easy to overlook. Railway containers are ephemeral — every deploy or restart gets a fresh filesystem. Without a volume, Ray loses everything: his workspace, memory files, session state, any files he created.
We attach a 5GB Railway volume mounted at ~/.skyclaw/workspace. Everything important lives here:
~/.skyclaw/workspace/
├── SOUL.md # Ray's identity and personality
├── MEMORY.md # Curated long-term knowledge (git-synced)
├── SESSION-STATE.md # Last task context (written after every reply)
└── memory/ # Daily notes and conversation logs
Without this volume, every deployment is a complete amnesia event.
entrypoint.sh: Git-Synced Workspace
The entrypoint script handles workspace initialization at startup:
WORKSPACE_DIR="$HOME/.skyclaw/workspace"
if [ -d "$WORKSPACE_DIR/.git" ]; then
# Volume exists — pull latest memory/identity updates
cd "$WORKSPACE_DIR" && git pull origin main
else
# Fresh volume — clone the workspace repo
git clone "https://$GITHUB_TOKEN@github.com/menonpg/ray-workspace.git" "$WORKSPACE_DIR"
fi
exec skyclaw
Ray’s SOUL.md and MEMORY.md live in a separate git repo (menonpg/ray-workspace). On startup, he pulls the latest. This means you can update Ray’s knowledge or personality by pushing to that repo — the next restart picks it up automatically.
SESSION-STATE.md is NOT committed to git — it’s a local file on the volume that survives Railway restarts but not volume recreations. This is intentional: it’s transient task context, not permanent knowledge.
Environment Variables
ANTHROPIC_API_KEY=sk-ant-...
TELEGRAM_BOT_TOKEN=...
SOULMATE_API_KEY=... # For RAG/RLM memory backend
GITHUB_TOKEN=... # For workspace git sync
SoulMate: The Memory Backend
Out of the box, SkyClaw stores session history in RAM — which dies on restart. Our deployment connects Ray to SoulMate, a RAG+RLM hybrid memory backend built on soul.py.
What SoulMate Does
Every conversation turn gets stored to Qdrant (a vector database). On each new user message, SoulMate searches those stored memories semantically and injects the top matches into the prompt:
[memory]
backend = "soulmate"
path = "soulmate://ray/ray" # customer_id/soul_id
api_key = "${SOULMATE_API_KEY}"
[memory.search]
vector_weight = 0.7
keyword_weight = 0.3
The 70/30 vector/keyword split means Ray retrieves memories that are semantically related to the current query, not just keyword matches.
RAG vs RLM
SoulMate has two retrieval modes that get used together:
RAG (Retrieval-Augmented Generation): Fast vector search. Given “what did we decide about the 407singles email config?”, it returns the 3-5 most relevant conversation snippets in milliseconds. Works for focused, specific recall.
RLM (Retrospective Language Memory): Slower but more thorough. Scans the full memory corpus and synthesizes a summary rather than returning raw snippets. Better for open-ended questions like “what have we been building lately?”
The query router auto-classifies each retrieval request:
- ~90% → RAG (specific, fast)
- ~10% → RLM (open-ended, comprehensive)
The Three-Layer Memory Stack
With SoulMate connected and the file-based state working, Ray has three memory layers:
| Layer | What it holds | Survives restart? | Survives volume loss? |
|---|---|---|---|
| SoulMate RAG | Every conversation, searchable | ✅ External API | ✅ Cloud-hosted |
| MEMORY.md | Curated permanent knowledge | ✅ Git-synced | ✅ In git repo |
| SESSION-STATE.md | Last task context | ✅ On volume | ❌ If volume wiped |
SoulMate is the foundation. Without it, Ray is goldfish-brained after every restart regardless of what files you give him.
Problem 1: Anthropic 400 Errors After ~25 Tool Steps
The most visible failure. After a string of tool calls, Ray would crash with:
messages.1: tool_use ids were found without tool_result blocks
immediately after: toolu_011eEbFmQA3w5P7YAZuT8G1c
Anthropic’s API requires strict alternation: every assistant message containing tool_use blocks must be immediately followed by a user message containing the matching tool_result blocks. Any break in that sequence is a hard 400 error.
Root Cause
The context builder in context.rs applies a token budget — it trims the oldest messages when the window gets too large. The trimming happened message-by-message, which meant it could slice through the middle of a tool pair, leaving an assistant(tool_use) as the last message with no following tool_result.
The front of the context window was cleaned (a loop stripped non-user messages from the start), but the end was never validated.
Fix: Strip Dangling Tool Pairs from the Tail
// After token budget trimming, strip any orphaned tool_use from the END
loop {
match kept.last() {
Some(last) if matches!(last.role, Role::Assistant) => {
let has_dangling_tool_use = match &last.content {
MessageContent::Parts(parts) =>
parts.iter().any(|p| matches!(p, ContentPart::ToolUse { .. })),
_ => false,
};
if has_dangling_tool_use {
kept.pop();
// Also remove preceding tool_result that lost its pair
if matches!(kept.last().map(|m| &m.role), Some(Role::Tool)) {
kept.pop();
}
} else { break; }
}
Some(last) if matches!(last.role, Role::Tool) => { kept.pop(); }
_ => break,
}
}
We also added a last-resort sanitizer in the Anthropic provider layer that catches anything that slips through: drops orphaned tool_use blocks, merges consecutive same-role messages, and ensures the sequence starts with user.
Problem 2: History Front-Corruption After Truncation
SessionManager::update_session() truncated history by draining the oldest N messages. After the drain, the first remaining message could be an Assistant or Tool — never User, which is what Anthropic requires.
Fix: Clean the Front After Every Drain
if session.history.len() > MAX_HISTORY_PER_SESSION {
let drain_count = session.history.len() - MAX_HISTORY_PER_SESSION;
session.history.drain(..drain_count);
}
// Ensure we don't start with a non-user message after draining
while session.history.first()
.map(|m| !matches!(m.role, Role::User))
.unwrap_or(false)
{
session.history.remove(0);
}
Problem 3: Zero Context After Railway Restarts
Railway restarts containers regularly — on deploy, on memory pressure, on crash recovery. SoulMate handles long-term recall, but there was nothing bridging the immediate task context: what was Ray working on 5 minutes ago before the restart?
Monica (our OpenClaw agent) handles this by writing SESSION-STATE.md after every reply — the current task, last user message, and last response, all on disk. Files on the persistent volume survive restarts. RAM doesn’t.
Fix: SESSION-STATE.md Written After Every Reply
In runtime.rs, after each successful reply:
let state_summary = format!(
"# Ray — Session State\n\
## Last Active\n{}\n\
## Last User Message\n{}\n\
## Last Response\n{}\n",
chrono::Utc::now().format("%Y-%m-%d %H:%M UTC"),
last_user_message,
&reply_text.chars().take(600).collect::<String>(),
);
tokio::fs::write(&state_path, &state_summary).await?;
On startup, if SESSION-STATE.md exists it’s injected into the system prompt — Ray immediately knows what he was working on before the restart. SoulMate gives him the full history; SESSION-STATE.md gives him the immediate context.
/new deletes SESSION-STATE.md so a deliberate fresh start is actually fresh.
Problem 4: MEMORY.md Existed But Was Never Loaded
Ray had a MEMORY.md in his git-synced workspace with project context, credentials, and curated knowledge. But main.rs only loaded SOUL.md. The memory file was sitting there unread.
SoulMate can retrieve relevant memories on demand, but there’s a class of knowledge — default behaviors, current project state, team context — that should always be in the system prompt rather than searched for. That’s what MEMORY.md is for.
Fix: Load Both at Startup
// Load SOUL.md
let soul_content = read_workspace_file("SOUL.md");
// Load MEMORY.md — always-on curated knowledge
let memory_content = read_workspace_file("MEMORY.md");
// System prompt order: SOUL → MEMORY → base instructions
let system_prompt = vec![soul_content, memory_content, SYSTEM_PROMPT]
.join("\n\n---\n\n");
After this fix, Ray boots knowing about every active project, repo, credential, and team member — without a single tool call to look anything up.
Problem 5: “Never Give Up” Instruction Caused Tool-Call Storms
The original system prompt contained this instruction:
NEVER give up on a task by explaining limitations. You have a multi-round tool loop — keep calling tools until the task is done or you hit a real error. Do not stop early to explain what you ‘cannot’ do.
The intent was good — prevent the agent from giving up prematurely. The effect was catastrophic: Ray would burn through every available tool round on even simple research tasks, reading every file in a directory instead of targeted lookups.
A request like “review what Monica built” turned into 25 tool calls of aimless file exploration.
Fix: Replace With Efficiency Rules
TOOL EFFICIENCY — READ THIS CAREFULLY:
- Each tool call costs time and context. Use the MINIMUM calls needed.
- For research/review tasks: search memory first, then read 2-3 key files.
Do NOT read every file in a directory. Scan indices before full reads.
- Aim for under 15 tool calls for information tasks, under 30 for builds.
- NEVER read files you don't specifically need. Grep for what you want.
- If you can answer from memory or a quick search, do so without tool calls.
We also added a round-20 nudge — a system message injected into context at tool call #20 telling the model to wrap up rather than continue exploring. It’s not a hard stop; it’s a gentle redirect.
Problem 6: Hard Tool-Round Cap Produced Useless Responses
The config had max_tool_rounds = 25. When hit, Ray returned:
“I reached the maximum number of tool execution steps. Here is what I have so far.”
…followed by nothing. Twenty-five tool calls of work, zero output.
Fix: Remove the Hard Cap
Monica doesn’t have one. The LLM naturally stops calling tools when it has a complete answer. The right approach is to trust that — combined with the efficiency rules and the round-20 nudge — rather than hitting an arbitrary wall and discarding all gathered context.
We raised the config limit to 200 (effectively unlimited) and rely on the LLM’s natural termination behavior.
What Ray Looks Like Now
| Before | After | |
|---|---|---|
| Anthropic 400 crash | Every ~25 tool steps | Fixed at root cause |
| Context after restart | Zero | SoulMate RAG + SESSION-STATE.md |
| Project knowledge on boot | None | MEMORY.md + SoulMate on every message |
| Research task tool calls | 25 (exhaustive) | 3-8 (targeted) |
| Tool-round limit behavior | Silent failure | Doesn’t apply |
| History corruption | Possible after truncation | Cleaned at both ends |
| Long-term cross-session recall | None | SoulMate RAG/RLM |
The fixes are all in menonpg/skyclaw. The structural changes to context.rs, runtime.rs, session.rs, and router.rs would apply cleanly to any fork of the original nagisanzenin/skyclaw codebase.
Why Ray Matters: Two Firsts
Beyond the debugging, Ray represents two significant milestones for our agent ecosystem:
First Compiled Rust Agent
Every other agent we run — Monica (OpenClaw), Pi (Android), Clawdbot — is interpreted: Node.js or Python runtimes parsing code at execution time. Ray is a native binary compiled from Rust.
The difference matters:
- Startup time: Ray is ready in milliseconds, not seconds. No runtime initialization, no module resolution.
- Memory footprint: A few hundred MB vs. the GB+ a Node.js process can consume with agent tooling loaded.
- Deployment simplicity: One static binary. No
node_modules/, no dependency hell, no runtime version mismatches.
SkyClaw proves that a Rust-based agent runtime is viable — and that the performance characteristics are worth the steeper development curve. For cloud deployments where you pay per MB of RAM and per second of compute, compiled matters.
First True SoulMate Integration
We’ve had the SoulMate API running for months. We’ve had soul.py with RAG and RLM working. But no agent was actually using it end-to-end in production until Ray.
Monica and Clawdbot use file-based memory: MEMORY.md loaded at startup, daily notes appended, manual curation. It works, but it doesn’t scale — eventually the files get too big to inject, and there’s no semantic retrieval.
Ray is the first agent where every conversation turn gets vectorized and stored to Qdrant. Every new message triggers a semantic search. The memory grows without bound and retrieval stays fast because it’s indexed, not scanned.
The full production stack:
User (Telegram)
↓
SkyClaw (Rust binary)
↓
SoulMate API (RAG/RLM retrieval)
↓
Qdrant (vector storage)
↓
Anthropic Claude (BYOK LLM)
This is the architecture we’ve been building toward: compiled runtime, semantic memory, cloud-native, always-on. Ray is proof it works.
Lessons
1. “Never give up” is bad agent instruction. Persistence without efficiency is thrashing. The right instruction is: use the minimum tools needed, answer from memory when possible, be targeted.
2. Context window trimming needs to validate both ends. Most implementations check that history starts with the right role. Few check that it also ends cleanly after trimming.
3. Persistent volumes aren’t optional for cloud agents. RAM resets on restart. A 5GB Railway volume costs next to nothing and gives your agent a stable home for its workspace, memory files, and session state. Without it, every restart is complete amnesia.
4. Pair file-based state with a vector memory backend. SESSION-STATE.md handles immediate task continuity (survives restarts). SoulMate handles long-term recall (survives everything). You need both — they solve different problems at different timescales.
5. Load your memory files. This one was almost embarrassing — the memory file existed, the path was correct, the loading code pattern was already there for SOUL.md. We just forgot to extend it to MEMORY.md. Check your startup sequence.