Multi-Plane Vision Transformer for Hemorrhage Classification Using Axial and Sagittal MRI Data

Badhan Kumar Das; Gengyan Zhao; Boris Mailhe; Thomas J. Re; Dorin Comaniciu; Eli Gibson; and Andreas Maier

arXiv:2505.07349·eess.IV·May 12, 2026

Multi-Plane Vision Transformer for Hemorrhage Classification Using Axial and Sagittal MRI Data

Badhan Kumar Das, Gengyan Zhao, Boris Mailhe, Thomas J. Re, Dorin Comaniciu, Eli Gibson, and Andreas Maier

PDF

TL;DR

This paper introduces a multi-plane vision transformer model that effectively classifies brain hemorrhages from MRI data with varying orientations, outperforming traditional methods and CNNs.

Contribution

The proposed MP-ViT employs dual transformer encoders with cross-attention for axial and sagittal MRI data, improving hemorrhage classification accuracy.

Findings

01

MP-ViT outperforms ViT by 5.5% in AUC.

02

MP-ViT surpasses CNN architectures by 1.8% in AUC.

03

Model demonstrates robustness across diverse MRI orientations.

Abstract

Identifying brain hemorrhages from magnetic resonance imaging (MRI) is a critical task for healthcare professionals. The diverse nature of MRI acquisitions with varying contrasts and orientation introduce complexity in identifying hemorrhage using neural networks. For acquisitions with varying orientations, traditional methods often involve resampling images to a fixed plane, which can lead to information loss. To address this, we propose a 3D multi-plane vision transformer (MP-ViT) for hemorrhage classification with varying orientation data. It employs two separate transformer encoders for axial and sagittal contrasts, using cross-attention to integrate information across orientations. MP-ViT also includes a modality indication vector to provide missing contrast information to the model. The effectiveness of the proposed model is demonstrated with extensive experiments on real world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.