Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging

Tan Pan; Shuhao Mei; Yixuan Sun; Kaiyu Guo; Chen Jiang; Zhaorui Tan; Mengzhu Li; Limei Han; Xiang Zou; Yuan Cheng; Mahsa Baktashmotlagh

arXiv:2605.14654·cs.CV·May 15, 2026

Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging

Tan Pan, Shuhao Mei, Yixuan Sun, Kaiyu Guo, Chen Jiang, Zhaorui Tan, Mengzhu Li, Limei Han, Xiang Zou, Yuan Cheng, Mahsa Baktashmotlagh

PDF

TL;DR

This paper introduces a novel self-supervised learning approach for 3D multi-modal medical imaging that leverages anatomical topological consistency across instances to improve downstream task performance.

Contribution

It proposes a method that utilizes cross-instance topological consistency as a supervisory signal, addressing variability in medical images and enhancing multi-modal representation learning.

Findings

01

Achieved 1.1% improvement in segmentation tasks.

02

Achieved 5.94% improvement in classification tasks.

03

Demonstrated better robustness with missing modalities.

Abstract

Self-supervised pre-training methods in medical imaging typically treat each individual as an isolated instance, learning representations through augmentation-based objectives or masked reconstruction. They often do not adequately capitalize on a key characteristic of physiological features: anatomical structures maintain consistent spatial relationships across individuals (instances), such as the thalamus being medial to the basal ganglia, regardless of variations in brain size, shape, or pathology. We propose leveraging this cross-instance topological consistency as a supervisory signal. The challenge arises from the inherent variability in medical imaging, which can differ significantly across instances and modalities. To tackle this, we focus on two alignment regimes. (i) Intra-instance: with pixel-level correspondences available, a cross-modal triplet objective explicitly preserves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.