OpenClaw Finally Gets Full Observability — And Why Native Beats Proxy

By Prahlad Menon 1 min read

OpenClaw agents are black boxes. Until now.

When your agent assembles context for an LLM call, it pulls from system prompts, conversation history, tool schemas, skills, and memory. All of that happens inside the gateway — invisible to external observability tools that only see the final API request hitting the LLM provider.

That’s the proxy problem. Most observability setups (LangSmith, Langfuse, custom solutions) work by intercepting LLM API calls. They see inputs and outputs at the provider boundary. But they’re blind to what happened before that call was made.

opik-openclaw solves this by running inside the OpenClaw Gateway itself.

What Native Observability Actually Sees

Because the plugin hooks into OpenClaw’s internal event system, it captures the full picture:

  • LLM request/response spans — the actual call, with full context
  • Tool execution — inputs, outputs, errors, duration
  • Sub-agent delegation — when agents spawn other agents
  • Per-request cost breakdowns — by model, by session
  • Conversation threads — spanning multiple sessions

The event mapping is comprehensive:

OpenClaw EventOpik Entity
llm_inputtrace + llm span
llm_outputllm span update/end
before_tool_calltool span start
after_tool_calltool span update/end
subagent_spawningsubagent span start
subagent_endedsubagent span update/end
agent_endtrace finalize

Every tool call, every sub-agent delegation, every LLM interaction — captured with full context.

Three Commands to Install

openclaw plugins install @opik/opik-openclaw
openclaw opik configure
openclaw gateway restart

The setup wizard validates your Opik endpoint and credentials, then writes the config. Run openclaw opik status to verify.

Why This Matters for Debugging

Consider a failing agent. With proxy-based observability, you see:

  • LLM input: some assembled prompt
  • LLM output: some response
  • Maybe the final tool call result

With native observability, you see:

  • Which skills were loaded and why
  • What memory was retrieved
  • How tool schemas were assembled
  • The full chain of sub-agent delegations
  • Where exactly things went wrong

The difference between “the LLM returned an error” and “the browser tool failed because the snapshot timed out, which caused the agent to retry with a different approach, which hit the rate limit” is the difference between guessing and knowing.

Cost Visibility

OpenClaw routes to multiple models — Claude, GPT-4, Gemini, local models. Native observability tracks costs per model, per request, per session. You can finally answer “why did that conversation cost $4?” with data, not estimates.

Open Source, Built by Comet

Opik is Comet ML’s open-source observability platform. The OpenClaw plugin is Apache-2.0 licensed and doesn’t require any changes to OpenClaw core — it uses the native plugin hooks.

Self-host Opik or use Comet’s hosted version. Either way, your traces are yours.

The Bigger Picture

OpenClaw’s plugin ecosystem is maturing. We’ve covered the ecosystem market map and PinchBench for agent benchmarks. Observability was the missing piece.

Now you can build agents, benchmark them, and actually see what they’re doing. The black box is open.


Links: