Google Just Standardized Karpathy's LLM Wiki Pattern

By Prahlad Menon Published 2026-06-17 6 min read

Andrej Karpathy posted a gist about LLM wikis last year. The idea: give your AI agents a shared markdown library that grows more useful over time. Let agents handle the tedious parts — updating cross-references, filing changes, maintaining indexes — while humans curate the content and manage it like code.

The pattern spread everywhere. Obsidian vaults wired to coding agents. The AGENTS.md / CLAUDE.md family of convention files. Repos full of index.md and log.md artifacts that agents consult before doing real work. Metadata-as-code repositories inside data teams.

The problem: every instance was bespoke. None of them were designed to cooperate.

Google just fixed that. The Open Knowledge Format (OKF) formalizes the pattern into an open specification — vendor-neutral, agent-friendly, human-readable, and portable across organizations.

What OKF Actually Is

A bundle of OKF documents is:

Just markdown — readable in any editor, renderable on GitHub, indexable by any search tool
Just files — shippable as a tarball, hostable in any git repo, mountable on any filesystem
Just YAML frontmatter — for the small set of structured fields that need to be queryable: type, title, description, resource, tags, and timestamp

That’s it. No complex compression scheme, no new runtime, no required SDK.

The full v0.1 specification fits on a single page. Every concept is one file. The file path is the concept’s identity. Concepts link to each other with normal markdown links, turning the directory into a graph.

Example Concept Document

---
type: BigQuery Table
title: Customer Orders
description: One row per completed customer order across all channels.
resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders
tags: [sales, orders, revenue]
timestamp: 2026-05-28T14:30:00Z
---

# Schema

| Column        | Type      | Description                              |
|---------------|-----------|------------------------------------------|
| `order_id`    | STRING    | Globally unique order identifier.        |
| `customer_id` | STRING    | FK into [customers](/tables/customers.md)|
| `total_usd`   | NUMERIC   | Order total in US dollars.               |

# Joins

This table joins to [customers](/tables/customers.md) on `customer_id`.

One required field: type. Everything else is optional but recommended.

The Problem OKF Solves

In most organizations, the information that foundation models need is internal knowledge: the schema of a table, your business’s meaning of a metric, the runbook for an incident, the join paths between two systems, the deprecation notice for an old API.

Today, these atoms of knowledge live in fragmented systems:

Metadata catalogs with their own APIs
Wikis, third-party systems, or shared drives
Code comments, docstrings, or notebook cells
The heads of a few senior engineers

When an AI agent needs to answer “How do I compute weekly active users from our event stream?” it has to assemble the answer from these scattered, mutually incompatible surfaces. Every agent builder solves the same context-assembly problem from scratch. Every catalog vendor reinvents the same data models. The knowledge itself is locked behind whichever surface created it.

OKF is the interchange format that lets these surfaces finally talk to each other.

Three Design Principles

1. Minimally opinionated. OKF requires exactly one thing of every concept: a type field. Everything else — what types exist, what other fields to include, what sections the body has — is left to the producer. The spec defines the interoperability surface, not the content model.

2. Producer/consumer independence. A bundle hand-authored by a human can be consumed by an AI agent. A bundle generated by a metadata export pipeline can be browsed in a visualizer. A bundle synthesized by one LLM can be queried by another. The format is the contract; the tooling at each end is independently swappable.

3. Format, not platform. OKF isn’t tied to any specific cloud, database, model provider, or agent framework. It never requires a proprietary account or SDK to read, write, or serve. Google is publishing it as an open standard because the value of a knowledge format comes from how many parties speak it, not from who owns it.

What Google Ships With the Spec

To make the format concrete, they’re shipping reference implementations:

Enrichment agent — Walks a BigQuery dataset, drafts an OKF concept document for every table and view, then runs a second LLM pass that crawls authoritative documentation and enriches each concept with citations, schemas, and join paths.
Static HTML visualizer — Turns any OKF bundle into an interactive graph view in a single self-contained file. No backend, no install on the viewing side, no data leaves the page.
Sample bundles — GA4 e-commerce, Stack Overflow, and Bitcoin public datasets, ready to browse.

How This Connects to Tools You Already Use

We’ve covered several tools that implement variations of the LLM wiki pattern:

Tolaria — A Mac/Linux desktop app that turns markdown folders into AI-native knowledge bases with Git versioning and built-in MCP server support. Could adopt OKF as its interchange format.
obsidian-mind — Uses Obsidian vaults for AI coding agent memory. The frontmatter conventions are OKF-adjacent; alignment would enable cross-tool portability.
The Karpathy CLAUDE.md — The four principles for AI coding agents. OKF is the standardization of the knowledge layer those agents consume.

OKF isn’t competing with these tools — it’s the common language that lets them interoperate.

Why This Matters

The pattern was always good. What was missing was agreement.

If Tolaria exports OKF, and your company’s data catalog imports OKF, and Claude Code reads OKF natively, you’ve got a knowledge layer that survives tool changes, org changes, and vendor changes. The knowledge compounds instead of fragmenting.

Google’s backing gives this the enterprise momentum to become the actual standard. The spec is minimal enough that adoption is trivial — if you’re already writing markdown with YAML frontmatter, you’re 90% there.

The Elephant in the Room: Obsidian Did This First

OKF didn’t emerge from a vacuum. The Google Cloud blog explicitly acknowledges that “If you’ve used Obsidian, Notion, Hugo, or any of the LLM wiki patterns that have emerged over the past year, the shape will feel familiar.”

That’s putting it mildly. The Obsidian community has been doing markdown-with-YAML-frontmatter knowledge management for years. The index.md and log.md conventions that OKF reserves? They exist in countless Obsidian vaults already. The graph-of-linked-concepts model? That’s literally Obsidian’s signature feature.

What Google did is take community patterns, formalize the minimal set of conventions needed for interoperability, and ship it with enterprise backing. Whether that’s “standardizing best practices” or “codifying someone else’s namespace” depends on your perspective.

The charitable read: Google created a vendor-neutral interchange format that lets tools like Obsidian, Notion, and enterprise data catalogs finally speak the same language. Nobody owns markdown.

The cynical read: A multi-trillion dollar company just claimed the spec namespace for patterns the PKM community developed organically, and now everyone will cite Google’s spec instead of the community work that preceded it.

Reality is probably somewhere in between. OKF is Apache 2.0 licensed and genuinely minimal. If it becomes the interchange format that finally makes PKM tools interoperable, everyone wins — even if the credit distribution feels off.

Spec: GoogleCloudPlatform/knowledge-catalog/okf/SPEC.md

Blog post: How the Open Knowledge Format can improve data sharing

Karpathy’s original gist: LLM Wiki