VolTA-3D: Self-Supervised Learning for Brain MRI using 3D Volumetric Token Alignment

Amy Makawana; Abhijeet Parida; Marius George Linguraru; Julia Ive; Syed Muhammad Anwar

arXiv:2605.16775·cs.CV·May 19, 2026

VolTA-3D: Self-Supervised Learning for Brain MRI using 3D Volumetric Token Alignment

Amy Makawana, Abhijeet Parida, Marius George Linguraru, Julia Ive, Syed Muhammad Anwar

PDF

TL;DR

VolTA-3D introduces a self-supervised learning framework for 3D brain MRI that enhances transferability and robustness across various tasks and domain shifts by aligning global and local features.

Contribution

It proposes a novel global-local alignment approach within a 3D Vision Transformer for improved generalization of MRI models.

Findings

01

Outperforms baseline models on hippocampal segmentation

02

Achieves higher accuracy in sex classification

03

Demonstrates robustness in Alzheimer's disease detection

Abstract

Self-supervised learning (SSL) has advanced medical image analysis be enabling learning form large unlabelled data. However, in brain magnetic resonance imaging (MRI), most 3D models remain specialized for either segmentation of classification, limiting their ability to generalize across datasets, imaging protocols,, and downstream tasks. This lack of transferability constrains the clinical utility of 3D MRI models, despite the availability of unlabeled volumetric data. We present Volta-3D, a self-supervised 3D Vision Transformer framework designed to learn transferable volumetric representations. Volta-3D jointly aligns global class-style tokens and local patch tokens within a student-teacher paradigm and enforces fine-grained structural reconstruction. This combined global-local alignment addresses the limited semantic diversity and subtle anatomical characteristics of brain MRI,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.