MosaicMRI: The Largest Open-Source Musculoskeletal MRI Dataset Just Dropped
MRI AI research has a dirty secret: most models are trained and benchmarked on knee and brain data from a handful of scanner configurations. When those models encounter a spine, an ankle, or a different coil setup, they fail. Not gracefully β dramatically.
MosaicMRI is the dataset built to fix that.
Released last week by researchers at USC, itβs the largest open-source raw musculoskeletal MRI dataset ever published: 2,671 volumes, 80,156 slices, 454 patients, 10 anatomies. Raw k-space data β the full acquisition, not just magnitude images β with realistic clinical variability across contrast, orientation, and coil configuration.
Why This Matters
The standard MRI ML datasets β fastMRI, SKM-TEA, CMRxRecon β were instrumental in demonstrating that learned reconstruction could work. But they mostly showed it works in narrow, controlled settings. Single anatomy. Single scanner. Limited contrast variation. Training on knee data and deploying on spine data is a known failure mode, but benchmarks havenβt pushed researchers to solve it.
MosaicMRI is explicitly designed to make generalization the central challenge:
- 10 anatomies (vs. 1-2 in most existing datasets)
- Multi-contrast: PD, T1, T2, STIR, and clinical variants including DIXON, DESS, TIRM
- Multi-orientation: axial, sagittal, coronal across all anatomies
- Multi-coil: 4β46 channels, with 16-channel most common
- Spine included β almost entirely absent from public raw MRI datasets
The benchmark tracks operationalize this directly. The anatomy generalization challenge withholds ankle data from training. The contrast generalization challenge withholds T1 fat-suppressed. You canβt memorize your way to good scores.
The Data
Collected on a 1.5T Siemens Magnetom Avantofit scanner between JulyβSeptember 2025. Every scan was visually quality-checked. Stored as HDF5 with ISMRMRD-compatible headers and fastMRI-style layout β drop-in compatible with existing reconstruction pipelines.
MosaicMRI/
βββ multicoil_train/ 1,873 scans | 303 patients | 56,235 slices | 2,382 GiB
βββ multicoil_val/ 398 scans | 68 patients | 12,027 slices | 580 GiB
βββ multicoil_test/ 400 scans | 79 patients | 11,894 slices | 72 GiB
βββ anatomy_transfer_challenge/
β βββ ankle/ (held out from training β 20 files, 49 GiB)
βββ contrast_generalization_challenge/
βββ T1_FS/ (held out from training β 17 files, 21 GiB)
Splits are patient-disjoint to prevent leakage, balanced by slice count with per-anatomy coverage preserved across train/val/test.
Each H5 file contains:
- k-space β raw multi-coil acquisition data
- reconstruction_rss β root-sum-of-squares reference reconstruction
- ISMRMRD header β full acquisition metadata (TR, TE, TI, FOV, matrix, coil count, acceleration factor, trajectory)
The Research Directions It Opens
Accelerated reconstruction across anatomy. Todayβs state of the art in learned MRI reconstruction trains a separate model per anatomy/contrast. MosaicMRI enables β and the benchmark requires β single models that generalize across the full MSK spectrum.
Foundation models for MRI. The paper frames MosaicMRI as a testbed for foundation model questions: scaling laws (does more diverse data keep improving reconstruction?), data mixtures (which anatomy combinations transfer?), continual learning (can a model learn new anatomies without forgetting old ones?), and OOD generalization.
Low-field reconstruction. The 1.5T data combined with realistic coil variability creates a proxy for the noise and artifact distributions seen in low-field scanners. Training reconstruction models on 1.5T with coil diversity improves performance on 0.55T and 1.0T systems β a clinically important direction as low-field MRI expands globally.
Motion compensation. Real clinical scans have motion artifacts. MosaicMRI includes scans with realistic patient motion, enabling development of motion-robust reconstruction methods that arenβt trained on pristine phantom data.
The Benchmark
Three initial tracks:
| Track | Challenge | Withheld Data |
|---|---|---|
| Mixed-anatomy reconstruction | 4x and 8x acceleration, all anatomies | β |
| Anatomy generalization | Reconstruct ankle with no ankle training data | Ankle (20 volumes) |
| Contrast generalization | Reconstruct T1-FS with no T1-FS training data | T1 fat-suppressed (17 volumes) |
Submit results at mosaicmri.ai/benchmark. More tracks announced over time.
Access and Citation
Data access requires a request at mosaicmri.ai. Code is on GitHub. Paper is arXiv:2604.11762.
@misc{arguello2026mosaicmri,
title = {MosaicMRI: A Diverse Dataset and Benchmark for Raw Musculoskeletal MRI},
author = {Paula Arguello and Berk Tinaz and Mohammad Shahab Sepehri and Zalan Fabian and Maryam Soltanolkotabi},
year = {2026},
eprint = {2604.11762},
archivePrefix = {arXiv},
primaryClass = {eess.IV},
url = {https://arxiv.org/abs/2604.11762}
}
This is the kind of dataset release that changes whatβs possible in a field. If you work on MRI reconstruction, foundation models for medical imaging, or clinical AI generalization β this is worth your attention.