MultiCrossViT: Multimodal Vision Transformer for Schizophrenia Prediction using Structural MRI and Functional Network Connectivity Data
Yuda Bi, Anees Abrol, Zening Fu, Vince Calhoun

TL;DR
This paper introduces MultiCrossViT, a multimodal vision transformer model that combines structural MRI and functional connectivity data to predict schizophrenia, achieving high accuracy and providing insights into brain regions involved.
Contribution
The study presents a novel multimodal deep learning pipeline, MultiCrossViT, that effectively integrates sMRI and sFNC data for schizophrenia prediction, outperforming existing models.
Findings
Achieved an AUC of 0.832 on a small dataset.
Visualized key brain regions and covariance patterns related to schizophrenia.
Demonstrated the effectiveness of ViT-based models in medical imaging classification.
Abstract
Vision Transformer (ViT) is a pioneering deep learning framework that can address real-world computer vision issues, such as image classification and object recognition. Importantly, ViTs are proven to outperform traditional deep learning models, such as convolutional neural networks (CNNs). Relatively recently, a number of ViT mutations have been transplanted into the field of medical imaging, thereby resolving a variety of critical classification and segmentation challenges, especially in terms of brain imaging data. In this work, we provide a novel multimodal deep learning pipeline, MultiCrossViT, which is capable of analyzing both structural MRI (sMRI) and static functional network connectivity (sFNC) data for the prediction of schizophrenia disease. On a dataset with minimal training subjects, our novel model can achieve an AUC of 0.832. Finally, we visualize multiple brain regions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFunctional Brain Connectivity Studies · Advanced Neuroimaging Techniques and Applications · Machine Learning in Healthcare
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Softmax · Adam · Absolute Position Encodings · Byte Pair Encoding
