Where is medical knowledge stored inside LLMs?

Medical knowledge is not uniformly distributed. In Llama 3.3-70B, most medical knowledge concentrates in the first half of the model's layers, as shown by UMAP projections, gradient saliency, layer lesioning, and activation patching.

How do LLMs encode patient age?

Age is encoded non-linearly with sharp discontinuities — for example, a clear break between under-18 and over-18 representations — rather than as a smooth gradient from young to old.

What is an LLM knowledge map?

An LLM knowledge map visualizes which layers of a model store specific types of knowledge (age, symptoms, diseases, drugs) by integrating multiple interpretability methods including UMAP, saliency, lesioning, and activation patching.

Why does medical LLM interpretability matter?

Knowing which layers store medical knowledge enables targeted fine-tuning, surgical debiasing, layer-specific model editing for updated clinical guidelines, and safety auditing before deployment.

What is activation collapse in LLMs?

Gemma 3-27B and MedGemma-27B show activation collapse at intermediate layers — losing discriminative structure — before recovering at final layers. This suggests some layers contribute minimally to medical reasoning.

Where Does Medical Knowledge Live Inside LLMs? Knowledge Maps Tell You

By Prahlad Menon Published 2026-04-23 5 min read

When an LLM answers a medical question correctly, where did that answer come from? Not which training document — which layers, which weights, which part of the network’s internal geometry?

A team from Lumos AI decided to find out. They systematically mapped where medical knowledge is stored inside five open-source LLMs — Llama 3.3-70B, Gemma 3-27B, MedGemma-27B, Qwen-32B, and GPT-OSS-120B — using four independent interpretability methods. The result: LLM knowledge maps that show, layer by layer, where different types of medical information are encoded.

The Four Lenses

The researchers didn’t rely on a single technique. They used four complementary methods to triangulate where knowledge lives:

UMAP projections of intermediate activations — visualize how the model clusters medical concepts at each layer
Gradient-based saliency — which weights contribute most to medical outputs
Layer lesioning — remove a layer entirely, measure how much medical performance degrades
Activation patching — replace one layer’s output with activations from a different prompt, see what breaks

Using multiple methods on the same models means the findings aren’t artifacts of one technique. When all four methods point to the same layers, you can trust the result.

The Findings That Matter

Medical knowledge concentrates early

In Llama 3.3-70B, most medical knowledge is processed in the first half of the model’s layers. This challenges the assumption that deeper layers handle more complex reasoning. For medical tasks at least, the heavy lifting happens earlier than you’d expect.

Age isn’t a number line

How does a model represent “the patient is 45 years old”? You might assume it’s a linear encoding — a smooth gradient from young to old. It’s not. Age is encoded non-linearly, with sharp discontinuities. The boundary between under-18 and over-18 shows up as a clear break in the representation — the model has internalized a categorical distinction that mirrors legal and clinical significance.

Disease progression is circular

This is the strangest finding. You’d expect disease progression to be represented as a sequence: early → moderate → severe. Instead, at certain layers, progression follows a circular pattern in activation space. The representation wraps around, which could reflect how diseases cycle through remission and relapse, or how the model learned from clinical narratives that don’t always follow linear trajectories.

Drugs cluster by specialty, not mechanism

In Llama 3.3-70B, drugs organize themselves by medical specialty (cardiology drugs together, oncology drugs together) rather than by pharmacological mechanism of action. This is exactly how clinicians think about medications — by the condition they treat, not by the receptor they bind — and suggests the model learned clinical practice patterns rather than pharmacology textbook structure.

Activation collapse and recovery

Gemma 3-27B and MedGemma-27B show a phenomenon where activations collapse at intermediate layers — losing discriminative structure — then recover by the final layers. This is architecturally concerning. It means there are layers in the network that contribute nothing meaningful to medical reasoning, or worse, that the model is working around its own internal bottleneck.

Why This Matters Beyond Research

The practical implication is precise: if you know which layers store which knowledge, you can target interventions.

Fine-tuning. Instead of fine-tuning the entire model, target the layers where medical knowledge concentrates. This is cheaper and less likely to damage other capabilities.

Debiasing. If you find demographic biases in specific layers (the age discontinuity is a clear example), you can apply corrections surgically rather than hoping RLHF catches everything.

Model editing. Need to update a drug interaction that changed in clinical guidelines? Layer-specific editing is more tractable than retraining.

Safety auditing. Before deploying a medical LLM, map its knowledge layers. If critical clinical knowledge is concentrated in layers that also show activation collapse or instability, that’s a red flag.

The Connection to Abliteration

This work sits in the same intellectual space as abliteration research — the idea that understanding where features live geometrically inside a network gives you power over those features. Abliteration found that refusal lives in a single direction. This paper finds that medical knowledge lives in specific layers with specific geometric properties.

The implication is the same: linear algebra on the right layers can modify model behavior with surgical precision. The question is whether we use that capability for safety (targeted debiasing, knowledge updates) or risk (removing clinical guardrails).

The Models and the Code

The five models analyzed span the current open-source medical AI landscape:

Llama 3.3-70B — Meta’s general-purpose flagship
Gemma 3-27B — Google’s efficient architecture
MedGemma-27B — Google’s medical fine-tune of Gemma
Qwen-32B — Alibaba’s competitive mid-size model
GPT-OSS-120B — Large-scale open model

The paper and source code are available for reproducibility, and the methodology generalizes to any transformer-based LLM. If you’re deploying medical AI, mapping your model’s knowledge layers before deployment isn’t just good science — it’s due diligence.

Medical AI is moving from “does it get the right answer?” to “does it get the right answer for the right reasons, in the right part of the network?” This paper is a significant step toward that second question.