What is the proxy problem with AI agent observability?

Most observability tools (LangSmith, Langfuse) only see LLM API calls at the provider boundary. They're blind to what happened inside the gateway — context assembly, system prompts, tool schemas, skills, memory retrieval. You see inputs/outputs but not the full picture.

What does opik-openclaw capture that proxy tools miss?

LLM request/response spans with full context, tool execution (inputs, outputs, errors, duration), sub-agent delegation chains, per-request cost breakdowns by model, and conversation threads spanning multiple sessions.

How do I install opik-openclaw?

Three commands: 'openclaw plugins install @opik/opik-openclaw', 'openclaw opik configure', 'openclaw gateway restart'. Setup wizard validates your Opik endpoint. Run 'openclaw opik status' to verify.

Why does native observability matter for debugging?

Proxy: 'LLM returned an error.' Native: 'browser tool failed because snapshot timed out, causing retry with different approach, which hit rate limit.' You see which skills loaded, what memory was retrieved, where exactly things went wrong.

Is opik-openclaw open source?

Yes — Apache-2.0 licensed, built by Comet ML. Uses OpenClaw's native plugin hooks, no changes to OpenClaw core needed. Self-host Opik or use Comet's hosted version.

OpenClaw Finally Gets Full Observability — And Why Native Beats Proxy

Q: How does opik-openclaw handle cost visibility?

OpenClaw routes to multiple models (Claude, GPT-4, Gemini, local). Native observability tracks costs per model, per request, per session. Finally answer 'why did that conversation cost $4?' with data, not estimates.

By Prahlad Menon Published 2026-03-12 1 min read

OpenClaw agents are black boxes. Until now.

When your agent assembles context for an LLM call, it pulls from system prompts, conversation history, tool schemas, skills, and memory. All of that happens inside the gateway — invisible to external observability tools that only see the final API request hitting the LLM provider.

That’s the proxy problem. Most observability setups (LangSmith, Langfuse, custom solutions) work by intercepting LLM API calls. They see inputs and outputs at the provider boundary. But they’re blind to what happened before that call was made.

opik-openclaw solves this by running inside the OpenClaw Gateway itself.

What Native Observability Actually Sees

Because the plugin hooks into OpenClaw’s internal event system, it captures the full picture:

LLM request/response spans — the actual call, with full context
Tool execution — inputs, outputs, errors, duration
Sub-agent delegation — when agents spawn other agents
Per-request cost breakdowns — by model, by session
Conversation threads — spanning multiple sessions

The event mapping is comprehensive:

OpenClaw Event	Opik Entity
`llm_input`	trace + llm span
`llm_output`	llm span update/end
`before_tool_call`	tool span start
`after_tool_call`	tool span update/end
`subagent_spawning`	subagent span start
`subagent_ended`	subagent span update/end
`agent_end`	trace finalize

Every tool call, every sub-agent delegation, every LLM interaction — captured with full context.

Three Commands to Install

openclaw plugins install @opik/opik-openclaw
openclaw opik configure
openclaw gateway restart

The setup wizard validates your Opik endpoint and credentials, then writes the config. Run openclaw opik status to verify.

Why This Matters for Debugging

Consider a failing agent. With proxy-based observability, you see:

LLM input: some assembled prompt
LLM output: some response
Maybe the final tool call result

With native observability, you see:

Which skills were loaded and why
What memory was retrieved
How tool schemas were assembled
The full chain of sub-agent delegations
Where exactly things went wrong

The difference between “the LLM returned an error” and “the browser tool failed because the snapshot timed out, which caused the agent to retry with a different approach, which hit the rate limit” is the difference between guessing and knowing.

Cost Visibility

OpenClaw routes to multiple models — Claude, GPT-4, Gemini, local models. Native observability tracks costs per model, per request, per session. You can finally answer “why did that conversation cost $4?” with data, not estimates.

Open Source, Built by Comet

Opik is Comet ML’s open-source observability platform. The OpenClaw plugin is Apache-2.0 licensed and doesn’t require any changes to OpenClaw core — it uses the native plugin hooks.

Self-host Opik or use Comet’s hosted version. Either way, your traces are yours.

The Bigger Picture

OpenClaw’s plugin ecosystem is maturing. We’ve covered the ecosystem market map and PinchBench for agent benchmarks. Observability was the missing piece.

Now you can build agents, benchmark them, and actually see what they’re doing. The black box is open.

Links: