APOLLO is a multimodal temporal foundation model trained on 25 billion clinical events from 7.2 million patients across 33 years and 28 modalities, creating unified virtual patient representations that capture longitudinal clinical trajectories.

What are virtual patient representations?

Virtual patient representations are dense, computable embeddings that encode a patient's entire medical history across time — labs, notes, imaging, medications, diagnoses — into a single representation that captures both current state and trajectory.

What can APOLLO be used for?

Applications include earlier risk prediction, treatment response modeling, clinical trial matching, biomarker discovery, and serving as the patient context layer for agentic clinical AI systems.

Is APOLLO disease-specific?

No. APOLLO is disease-agnostic by design — a single model that learns the shared structure underlying human health across every specialty, modality, and stage of care.

APOLLO: A Foundation Model That Turns 33 Years of Hospital Data Into Virtual Patients

Q: How many modalities does APOLLO support?

APOLLO integrates 28 clinical modalities including lab results, clinical notes, pathology images, medications, diagnoses, and procedures into a single unified representation space.

By Prahlad Menon Published 2026-04-22 4 min read

Most medical AI models solve one problem. A radiology model reads chest X-rays. A clinical NLP model extracts diagnoses from notes. A genomics model predicts drug response. Each operates in its own silo, on its own modality, for its own task.

APOLLO does something fundamentally different. It learns from everything — labs, clinical notes, pathology images, medications, diagnoses, procedures — across time, creating what the authors call virtual patient representations: dense, computable embeddings that capture a patient’s entire longitudinal trajectory.

The Scale

The numbers are staggering:

25 billion clinical events
7.2 million patients
33 years of longitudinal hospital records from a major US hospital system
28 modalities unified into a single representation space

This isn’t a model trained on curated research datasets. It’s trained on the full, messy, heterogeneous reality of healthcare delivery — decades of it.

Why Temporal Matters

Most clinical AI treats each encounter as independent. A model looks at today’s labs and makes a prediction. But medicine doesn’t work that way. A hemoglobin A1c of 7.2 means something very different for a patient whose last three readings were 6.8, 7.0, and 7.1 (trending up) versus one whose readings were 9.4, 8.1, and 7.6 (improving on treatment).

APOLLO’s temporal architecture captures these trajectories natively. It doesn’t just see a snapshot — it sees the movie. Lab values, medication changes, imaging findings, clinical notes — all sequenced in time and learned jointly.

This is what makes the “virtual patient” framing precise rather than marketing. The model produces an embedding that encodes not just what is happening clinically, but how the patient got there and where they’re heading.

Why Multimodal Matters

Healthcare generates wildly different types of data. A pathology slide is a gigapixel image. A clinical note is free text. A medication list is structured data. A lab panel is a time series. An ICD code is a categorical label.

Most approaches handle these separately, then try to fuse predictions downstream. APOLLO learns a shared representation across all 28 modalities from the start. The model understands that a pathology finding, a lab trend, and a medication change can all describe the same underlying biological process — because it’s seen millions of examples where they do.

What This Enables

The authors highlight several downstream applications, but the implications run deeper than any single task:

Risk prediction. With a rich longitudinal embedding, you can predict adverse events, readmissions, or disease progression — not from a handful of features, but from the full context of a patient’s medical history.

Treatment response modeling. Given a patient trajectory and a proposed intervention, predict the likely outcome. This is where virtual patient representations could transform clinical trial design.

Clinical trial matching. Instead of matching patients to trials on a few inclusion criteria, match on deep phenotypic similarity — patients whose trajectories look like they’d benefit from a specific intervention.

Biomarker discovery. When the model learns that certain trajectory patterns predict outcomes, the features driving those predictions become candidate biomarkers.

Agentic clinical systems. Perhaps most forward-looking: APOLLO-style representations could serve as the patient context layer for AI agents that assist with clinical decision-making — agents that understand not just the current state, but the full history.

The Bigger Picture

Healthcare has been waiting for its “foundation model moment.” Not a chatbot that answers medical questions — those already exist — but a model that actually understands patient data at the level of complexity that medicine demands.

APOLLO’s contribution is showing that this is architecturally feasible. A single model can learn meaningful representations across modalities, across time, across disease states, at the scale of an entire hospital system.

The disease-agnostic design is particularly significant. Rather than training separate models for cardiology, oncology, and neurology, APOLLO learns the shared structure underlying human health and disease. The same model that tracks a cancer patient’s treatment response can track a diabetic’s metabolic trajectory — because at the representation level, it’s learned what “clinical trajectory” means in general.

The preprint is available now. The team — Andrew Zhang, Tong Ding, Sophia J. Wagner, and collaborators from Faisal Mahmood’s group — has built what could be the most comprehensive clinical foundation model to date.

Whether APOLLO itself becomes the standard or inspires the next generation of clinical foundation models, the direction is clear: the future of medical AI is multimodal, temporal, and patient-centric. Not single-task classifiers on isolated snapshots, but unified models that learn what it means to be a patient over time.