OpenClaw Finally Gets Full Observability — And Why Native Beats Proxy
OpenClaw agents are black boxes. Until now.
When your agent assembles context for an LLM call, it pulls from system prompts, conversation history, tool schemas, skills, and memory. All of that happens inside the gateway — invisible to external observability tools that only see the final API request hitting the LLM provider.
That’s the proxy problem. Most observability setups (LangSmith, Langfuse, custom solutions) work by intercepting LLM API calls. They see inputs and outputs at the provider boundary. But they’re blind to what happened before that call was made.
opik-openclaw solves this by running inside the OpenClaw Gateway itself.
What Native Observability Actually Sees
Because the plugin hooks into OpenClaw’s internal event system, it captures the full picture:
- LLM request/response spans — the actual call, with full context
- Tool execution — inputs, outputs, errors, duration
- Sub-agent delegation — when agents spawn other agents
- Per-request cost breakdowns — by model, by session
- Conversation threads — spanning multiple sessions
The event mapping is comprehensive:
| OpenClaw Event | Opik Entity |
|---|---|
llm_input | trace + llm span |
llm_output | llm span update/end |
before_tool_call | tool span start |
after_tool_call | tool span update/end |
subagent_spawning | subagent span start |
subagent_ended | subagent span update/end |
agent_end | trace finalize |
Every tool call, every sub-agent delegation, every LLM interaction — captured with full context.
Three Commands to Install
openclaw plugins install @opik/opik-openclaw
openclaw opik configure
openclaw gateway restart
The setup wizard validates your Opik endpoint and credentials, then writes the config. Run openclaw opik status to verify.
Why This Matters for Debugging
Consider a failing agent. With proxy-based observability, you see:
- LLM input: some assembled prompt
- LLM output: some response
- Maybe the final tool call result
With native observability, you see:
- Which skills were loaded and why
- What memory was retrieved
- How tool schemas were assembled
- The full chain of sub-agent delegations
- Where exactly things went wrong
The difference between “the LLM returned an error” and “the browser tool failed because the snapshot timed out, which caused the agent to retry with a different approach, which hit the rate limit” is the difference between guessing and knowing.
Cost Visibility
OpenClaw routes to multiple models — Claude, GPT-4, Gemini, local models. Native observability tracks costs per model, per request, per session. You can finally answer “why did that conversation cost $4?” with data, not estimates.
Open Source, Built by Comet
Opik is Comet ML’s open-source observability platform. The OpenClaw plugin is Apache-2.0 licensed and doesn’t require any changes to OpenClaw core — it uses the native plugin hooks.
Self-host Opik or use Comet’s hosted version. Either way, your traces are yours.
The Bigger Picture
OpenClaw’s plugin ecosystem is maturing. We’ve covered the ecosystem market map and PinchBench for agent benchmarks. Observability was the missing piece.
Now you can build agents, benchmark them, and actually see what they’re doing. The black box is open.
Links: