VolTA-3D: Self-Supervised Learning for Brain MRI using 3D Volumetric Token Alignment
Amy Makawana, Abhijeet Parida, Marius George Linguraru, Julia Ive, Syed Muhammad Anwar

TL;DR
VolTA-3D introduces a self-supervised learning framework for 3D brain MRI that enhances transferability and robustness across various tasks and domain shifts by aligning global and local features.
Contribution
It proposes a novel global-local alignment approach within a 3D Vision Transformer for improved generalization of MRI models.
Findings
Outperforms baseline models on hippocampal segmentation
Achieves higher accuracy in sex classification
Demonstrates robustness in Alzheimer's disease detection
Abstract
Self-supervised learning (SSL) has advanced medical image analysis be enabling learning form large unlabelled data. However, in brain magnetic resonance imaging (MRI), most 3D models remain specialized for either segmentation of classification, limiting their ability to generalize across datasets, imaging protocols,, and downstream tasks. This lack of transferability constrains the clinical utility of 3D MRI models, despite the availability of unlabeled volumetric data. We present Volta-3D, a self-supervised 3D Vision Transformer framework designed to learn transferable volumetric representations. Volta-3D jointly aligns global class-style tokens and local patch tokens within a student-teacher paradigm and enforces fine-grained structural reconstruction. This combined global-local alignment addresses the limited semantic diversity and subtle anatomical characteristics of brain MRI,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
