Masked Contrastive Reconstruction for Cross-modal Medical Image-Report Retrieval
Zeqiang Wei, Kai Jin, Xiuzhuang Zhou

TL;DR
This paper introduces MCR, a novel VLP framework that uses masked data for both contrastive learning and reconstruction, improving cross-modal medical image-report retrieval efficiency and accuracy.
Contribution
The paper proposes MCR, a unified masked data approach for contrastive and reconstruction tasks, reducing interference and memory use, with a new modality alignment strategy MbA for better semantic matching.
Findings
Achieves state-of-the-art results on MIMIC-CXR dataset.
Reduces GPU memory and training time significantly.
Enhances semantic consistency in cross-modal retrieval.
Abstract
Cross-modal medical image-report retrieval task plays a significant role in clinical diagnosis and various medical generative tasks. Eliminating heterogeneity between different modalities to enhance semantic consistency is the key challenge of this task. The current Vision-Language Pretraining (VLP) models, with cross-modal contrastive learning and masked reconstruction as joint training tasks, can effectively enhance the performance of cross-modal retrieval. This framework typically employs dual-stream inputs, using unmasked data for cross-modal contrastive learning and masked data for reconstruction. However, due to task competition and information interference caused by significant differences between the inputs of the two proxy tasks, the effectiveness of representation learning for intra-modal and cross-modal features is limited. In this paper, we propose an efficient VLP framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsContrastive Learning
