Structure is Supervision: Multiview Masked Autoencoders for Radiology

Sonia Laguna; Andrea Agostini; Alain Ryser; Samuel Ruiperez-Campillo; Irene Cannistraci; Moritz Vandenhirtz; Stephan Mandt; Nicolas Deperrois; Farhad Nooralahzadeh; Michael Krauthammer; Thomas M. Sutter; Julia E. Vogt

arXiv:2511.22294·cs.CV·April 3, 2026

Structure is Supervision: Multiview Masked Autoencoders for Radiology

Sonia Laguna, Andrea Agostini, Alain Ryser, Samuel Ruiperez-Campillo, Irene Cannistraci, Moritz Vandenhirtz, Stephan Mandt, Nicolas Deperrois, Farhad Nooralahzadeh, Michael Krauthammer, Thomas M. Sutter, Julia E. Vogt

PDF

TL;DR

This paper introduces MVMAE, a self-supervised learning framework for radiology that exploits multi-view data and reports to improve disease classification, outperforming existing methods.

Contribution

The paper proposes MVMAE and MVMAE-V2T, novel self-supervised models that leverage multi-view radiology data and reports for better medical image representations.

Findings

01

MVMAE outperforms supervised and vision-language baselines on three datasets.

02

MVMAE-V2T improves performance especially in low-label settings.

03

Structural and textual supervision are effective complementary signals.

Abstract

Building robust medical machine learning systems requires pretraining strategies that exploit the intrinsic structure present in clinical data. We introduce Multiview Masked Autoencoder (MVMAE), a self-supervised framework that leverages the natural multi-view organization of radiology studies to learn view-invariant and disease-relevant representations. MVMAE combines masked image reconstruction with cross-view alignment, transforming clinical redundancy across projections into a powerful self-supervisory signal. We further extend this approach with MVMAE-V2T, which incorporates radiology reports as an auxiliary text-based learning signal to enhance semantic grounding while preserving fully vision-based inference. Evaluated on a downstream disease classification task on three large-scale public datasets, MIMIC-CXR, CheXpert, and PadChest, MVMAE consistently outperforms supervised and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.