MosaicMRI is the largest open-source raw musculoskeletal (MSK) MRI dataset to date, released by researchers at USC. It contains 2,671 fully sampled multi-coil volumes, 80,156 slices, from 454 patients across 10 anatomies. Data was collected on a 1.5T Siemens Magnetom Avantofit scanner and stored as HDF5 files with ISMRMRD-compatible headers.

What anatomies does MosaicMRI cover?

MosaicMRI covers 10 musculoskeletal anatomies with broad variation in contrast, orientation, and coil configuration — going well beyond the knee- and brain-focused datasets that have dominated MRI benchmarks. Specific anatomies include spine, ankle, and others across the full MSK spectrum.

What MRI research challenges does MosaicMRI support?

MosaicMRI supports accelerated MRI reconstruction, low-field MRI, motion compensation, anatomy generalization, contrast generalization, and foundation model research including scaling laws, continual learning, heterogeneous data mixtures, robustness, and out-of-distribution generalization.

How do I access MosaicMRI?

MosaicMRI requires access request at mosaicmri.ai. The dataset is ~3TB total (train: 2,382 GiB, val: 580 GiB, test: 72 GiB) stored as HDF5 files. Code is available on GitHub and the paper is on arXiv:2604.11762.

What makes MosaicMRI different from fastMRI?

FastMRI is primarily knee and brain, from a single scanner type, with limited contrast and coil variation. MosaicMRI covers 10 anatomies including spine and ankle, with broad variation in contrast (PD, T1, T2, STIR, DIXON, DESS, TIRM), orientation, and coil count (4-46 channels). It's designed to stress-test generalization — the thing fastMRI-trained models notoriously fail at.

What are the MosaicMRI benchmark tracks?

The preliminary benchmark includes three tracks: mixed-anatomy accelerated reconstruction (4x and 8x), anatomy generalization (held-out ankle data), and contrast generalization (held-out T1 fat-suppressed data). Official benchmark tracks and community challenges will be announced over time at mosaicmri.ai/benchmark.

Who created MosaicMRI?

MosaicMRI was created by Paula Arguello, Berk Tinaz, Mohammad Shahab Sepehri, Zalan Fabian, Maryam Soltanolkotabi, and collaborators at USC. The paper (arXiv:2604.11762) was submitted April 13, 2026. The project builds on earlier work by former student Zalan Fabian that started three years prior.

What file format does MosaicMRI use?

MosaicMRI uses HDF5 files with ISMRMRD-compatible headers and fastMRI-style internal layout. Each file contains ismrmrd_header, kspace, and reconstruction_rss datasets, plus study fields (anatomy, protocol) and acquisition fields (scanner, channels, matrix size, FOV, TR/TE/TI, acceleration factor).

MosaicMRI: The Largest Open-Source Musculoskeletal MRI Dataset Just Dropped

By Prahlad Menon Published 2026-04-18 4 min read

MRI AI research has a dirty secret: most models are trained and benchmarked on knee and brain data from a handful of scanner configurations. When those models encounter a spine, an ankle, or a different coil setup, they fail. Not gracefully — dramatically.

MosaicMRI is the dataset built to fix that.

Released last week by researchers at USC, it’s the largest open-source raw musculoskeletal MRI dataset ever published: 2,671 volumes, 80,156 slices, 454 patients, 10 anatomies. Raw k-space data — the full acquisition, not just magnitude images — with realistic clinical variability across contrast, orientation, and coil configuration.

Why This Matters

The standard MRI ML datasets — fastMRI, SKM-TEA, CMRxRecon — were instrumental in demonstrating that learned reconstruction could work. But they mostly showed it works in narrow, controlled settings. Single anatomy. Single scanner. Limited contrast variation. Training on knee data and deploying on spine data is a known failure mode, but benchmarks haven’t pushed researchers to solve it.

MosaicMRI is explicitly designed to make generalization the central challenge:

10 anatomies (vs. 1-2 in most existing datasets)
Multi-contrast: PD, T1, T2, STIR, and clinical variants including DIXON, DESS, TIRM
Multi-orientation: axial, sagittal, coronal across all anatomies
Multi-coil: 4–46 channels, with 16-channel most common
Spine included — almost entirely absent from public raw MRI datasets

The benchmark tracks operationalize this directly. The anatomy generalization challenge withholds ankle data from training. The contrast generalization challenge withholds T1 fat-suppressed. You can’t memorize your way to good scores.

The Data

Collected on a 1.5T Siemens Magnetom Avantofit scanner between July–September 2025. Every scan was visually quality-checked. Stored as HDF5 with ISMRMRD-compatible headers and fastMRI-style layout — drop-in compatible with existing reconstruction pipelines.

MosaicMRI/
├── multicoil_train/    1,873 scans | 303 patients | 56,235 slices | 2,382 GiB
├── multicoil_val/        398 scans |  68 patients | 12,027 slices |   580 GiB
├── multicoil_test/       400 scans |  79 patients | 11,894 slices |    72 GiB
├── anatomy_transfer_challenge/
│   └── ankle/           (held out from training — 20 files, 49 GiB)
└── contrast_generalization_challenge/
    └── T1_FS/           (held out from training — 17 files, 21 GiB)

Splits are patient-disjoint to prevent leakage, balanced by slice count with per-anatomy coverage preserved across train/val/test.

Each H5 file contains:

k-space — raw multi-coil acquisition data
reconstruction_rss — root-sum-of-squares reference reconstruction
ISMRMRD header — full acquisition metadata (TR, TE, TI, FOV, matrix, coil count, acceleration factor, trajectory)

The Research Directions It Opens

Accelerated reconstruction across anatomy. Today’s state of the art in learned MRI reconstruction trains a separate model per anatomy/contrast. MosaicMRI enables — and the benchmark requires — single models that generalize across the full MSK spectrum.

Foundation models for MRI. The paper frames MosaicMRI as a testbed for foundation model questions: scaling laws (does more diverse data keep improving reconstruction?), data mixtures (which anatomy combinations transfer?), continual learning (can a model learn new anatomies without forgetting old ones?), and OOD generalization.

Low-field reconstruction. The 1.5T data combined with realistic coil variability creates a proxy for the noise and artifact distributions seen in low-field scanners. Training reconstruction models on 1.5T with coil diversity improves performance on 0.55T and 1.0T systems — a clinically important direction as low-field MRI expands globally.

Motion compensation. Real clinical scans have motion artifacts. MosaicMRI includes scans with realistic patient motion, enabling development of motion-robust reconstruction methods that aren’t trained on pristine phantom data.

The Benchmark

Three initial tracks:

Track	Challenge	Withheld Data
Mixed-anatomy reconstruction	4x and 8x acceleration, all anatomies	—
Anatomy generalization	Reconstruct ankle with no ankle training data	Ankle (20 volumes)
Contrast generalization	Reconstruct T1-FS with no T1-FS training data	T1 fat-suppressed (17 volumes)

Submit results at mosaicmri.ai/benchmark. More tracks announced over time.

Access and Citation

Data access requires a request at mosaicmri.ai. Code is on GitHub. Paper is arXiv:2604.11762.

@misc{arguello2026mosaicmri,
  title  = {MosaicMRI: A Diverse Dataset and Benchmark for Raw Musculoskeletal MRI},
  author = {Paula Arguello and Berk Tinaz and Mohammad Shahab Sepehri and Zalan Fabian and Maryam Soltanolkotabi},
  year   = {2026},
  eprint = {2604.11762},
  archivePrefix = {arXiv},
  primaryClass  = {eess.IV},
  url    = {https://arxiv.org/abs/2604.11762}
}

This is the kind of dataset release that changes what’s possible in a field. If you work on MRI reconstruction, foundation models for medical imaging, or clinical AI generalization — this is worth your attention.