CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal   Representation Learning for AD classification

Guangqian Yang; Kangrui Du; Zhihan Yang; Ye Du; Yongping Zheng; Shujun; Wang

arXiv:2403.16520·cs.CV·March 26, 2024·3 cites

CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for AD classification

Guangqian Yang, Kangrui Du, Zhihan Yang, Ye Du, Yongping Zheng, Shujun, Wang

PDF

Open Access

TL;DR

This paper introduces CMViM, a novel contrastive masked autoencoder designed for 3D multi-modal data, significantly improving Alzheimer's disease classification by learning unified and discriminative representations from complex medical images.

Contribution

The paper presents the first efficient 3D multi-modal representation learning method combining masked autoencoding with intra- and inter-modal contrastive learning for AD diagnosis.

Findings

01

Achieves 2.7% AUC improvement over state-of-the-art methods.

02

Effectively models long-range dependencies in 3D medical images.

03

Enhances discriminative feature learning in multi-modal data.

Abstract

Alzheimer's disease (AD) is an incurable neurodegenerative condition leading to cognitive and functional deterioration. Given the lack of a cure, prompt and precise AD diagnosis is vital, a complex process dependent on multiple factors and multi-modal data. While successful efforts have been made to integrate multi-modal representation learning into medical datasets, scant attention has been given to 3D medical images. In this paper, we propose Contrastive Masked Vim Autoencoder (CMViM), the first efficient representation learning method tailored for 3D multi-modal data. Our proposed framework is built on a masked Vim autoencoder to learn a unified multi-modal representation and long-dependencies contained in 3D medical images. We also introduce an intra-modal contrastive learning module to enhance the capability of the multi-modal Vim encoder for modeling the discriminative features in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning · ALIGN