OmniMRI: A Unified Vision--Language Foundation Model for Generalist MRI Interpretation

Xingxin He; Aurora Rofena; Ruimin Feng; Haozhe Liao; Zhaoye Zhou; Albert Jang; and Fang Liu

arXiv:2508.17524·cs.CV·August 26, 2025

OmniMRI: A Unified Vision--Language Foundation Model for Generalist MRI Interpretation

Xingxin He, Aurora Rofena, Ruimin Feng, Haozhe Liao, Zhaoye Zhou, Albert Jang, and Fang Liu

PDF

TL;DR

OmniMRI is a comprehensive vision-language foundation model that unifies multiple MRI interpretation tasks within a single architecture, trained on extensive heterogeneous data to enhance generalizability and clinical utility.

Contribution

The paper introduces OmniMRI, a novel unified model that integrates vision and language for end-to-end MRI analysis, trained on large-scale diverse datasets for broad clinical applicability.

Findings

01

Successfully performs MRI reconstruction, segmentation, detection, and report generation.

02

Demonstrates strong cross-task generalization and instruction-following capabilities.

03

Consolidates multiple MRI workflows into a single scalable framework.

Abstract

Magnetic Resonance Imaging (MRI) is indispensable in clinical practice but remains constrained by fragmented, multi-stage workflows encompassing acquisition, reconstruction, segmentation, detection, diagnosis, and reporting. While deep learning has achieved progress in individual tasks, existing approaches are often anatomy- or application-specific and lack generalizability across diverse clinical settings. Moreover, current pipelines rarely integrate imaging data with complementary language information that radiologists rely on in routine practice. Here, we introduce OmniMRI, a unified vision-language foundation model designed to generalize across the entire MRI workflow. OmniMRI is trained on a large-scale, heterogeneous corpus curated from 60 public datasets, over 220,000 MRI volumes and 19 million MRI slices, incorporating image-only data, paired vision-text data, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.