MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
Biao Wu, Yutong Xie, Zeyu Zhang, Minh Hieu Phan, Qi Chen, Ling Chen,, Qi Wu

TL;DR
MMCLIP introduces a novel cross-modal masked modeling framework for medical vision-language pretraining, effectively utilizing paired and unpaired data to improve pathological feature learning and achieve state-of-the-art results.
Contribution
The paper proposes MMCLIP with attention-masked image modeling and entity-driven masked language modeling, leveraging unpaired data and disease prompts for enhanced medical multimodal learning.
Findings
Achieves state-of-the-art zero-shot classification performance.
Improves reconstruction of pathological features in medical images.
Effectively utilizes unpaired data with disease prompts.
Abstract
Vision-and-language pretraining (VLP) in the medical field utilizes contrastive learning on image-text pairs to achieve effective transfer across tasks. Yet, current VLP approaches with the masked modeling strategy face two challenges when applied to the medical domain. First, current models struggle to accurately reconstruct key pathological features due to the scarcity of medical data. Second, most methods only adopt either paired image-text or image-only data, failing to exploit the combination of both paired and unpaired data. To this end, this paper proposes the MMCLIP (Masked Medical Contrastive Language-Image Pre-Training) framework to enhance pathological learning and feature learning via unpaired data. First, we introduce the attention-masked image modeling (AttMIM) and entity-driven masked language modeling module (EntMLM), which learns to reconstruct pathological visual and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging
MethodsContrastive Learning
