Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI

Mingjie Li; Edward Kim; Yue Zhao; Ehsan Adeli; Kilian M. Pohl

arXiv:2604.05171·cs.CV·April 8, 2026

Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI

Mingjie Li, Edward Kim, Yue Zhao, Ehsan Adeli, Kilian M. Pohl

PDF

TL;DR

This paper introduces NeuroQuant, a novel modality-aware 3D vector-quantized VAE that effectively reconstructs multi-modal brain MRIs by disentangling anatomical structures from appearance features.

Contribution

It proposes a dual-stream 3D encoder with shared anatomical encoding and modality-specific appearance features, trained with a joint 2D/3D strategy for multi-modal MRI reconstruction.

Findings

01

NeuroQuant outperforms existing VAEs in reconstruction fidelity.

02

The shared anatomical codebook captures relationships between distant brain regions.

03

The model enables scalable cross-modal brain image analysis.

Abstract

Learning a robust Variational Autoencoder (VAE) is a fundamental step for many deep learning applications in medical image analysis, such as MRI synthesizes. Existing brain VAEs predominantly focus on single-modality data (i.e., T1-weighted MRI), overlooking the complementary diagnostic value of other modalities like T2-weighted MRIs. Here, we propose a modality-aware and anatomically grounded 3D vector-quantized VAE (VQ-VAE) for reconstructing multi-modal brain MRIs. Called NeuroQuant, it first learns a shared latent representation across modalities using factorized multi-axis attention, which can capture relationships between distant brain regions. It then employs a dual-stream 3D encoder that explicitly separates the encoding of modality-invariant anatomical structures from modality-dependent appearance. Next, the anatomical encoding is discretized using a shared codebook and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.