MMCLIP: Cross-modal Attention Masked Modelling for Medical   Language-Image Pre-Training

Biao Wu; Yutong Xie; Zeyu Zhang; Minh Hieu Phan; Qi Chen; Ling Chen,; Qi Wu

arXiv:2407.19546·cs.CV·April 17, 2025·3 cites

MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training

Biao Wu, Yutong Xie, Zeyu Zhang, Minh Hieu Phan, Qi Chen, Ling Chen,, Qi Wu

PDF

Open Access 1 Repo

TL;DR

MMCLIP introduces a novel cross-modal masked modeling framework for medical vision-language pretraining, effectively utilizing paired and unpaired data to improve pathological feature learning and achieve state-of-the-art results.

Contribution

The paper proposes MMCLIP with attention-masked image modeling and entity-driven masked language modeling, leveraging unpaired data and disease prompts for enhanced medical multimodal learning.

Findings

01

Achieves state-of-the-art zero-shot classification performance.

02

Improves reconstruction of pathological features in medical images.

03

Effectively utilizes unpaired data with disease prompts.

Abstract

Vision-and-language pretraining (VLP) in the medical field utilizes contrastive learning on image-text pairs to achieve effective transfer across tasks. Yet, current VLP approaches with the masked modeling strategy face two challenges when applied to the medical domain. First, current models struggle to accurately reconstruct key pathological features due to the scarcity of medical data. Second, most methods only adopt either paired image-text or image-only data, failing to exploit the combination of both paired and unpaired data. To this end, this paper proposes the MMCLIP (Masked Medical Contrastive Language-Image Pre-Training) framework to enhance pathological learning and feature learning via unpaired data. First, we introduce the attention-masked image modeling (AttMIM) and entity-driven masked language modeling module (EntMLM), which learns to reconstruct pathological visual and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aigeeksgroup/mmclip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging

MethodsContrastive Learning