MV-MLM: Bridging Multi-View Mammography and Language for Breast Cancer Diagnosis and Risk Prediction
Shunjie-Fabian Zheng, Hyeonjun Lee, Thijs Kooi, Ali Diba

TL;DR
This paper presents MV-MLM, a multi-view mammography and language model that leverages synthetic radiology reports and cross-modal self-supervision to improve breast cancer diagnosis and risk prediction, achieving state-of-the-art results.
Contribution
Introduction of MV-MLM, a novel multi-view vision-language model trained on synthetic reports, enhancing robustness and data efficiency in breast cancer classification and risk prediction.
Findings
Achieves state-of-the-art performance on multiple classification tasks.
Outperforms existing baselines with synthetic text reports.
Demonstrates strong data efficiency without real radiology reports.
Abstract
Large annotated datasets are essential for training robust Computer-Aided Diagnosis (CAD) models for breast cancer detection or risk prediction. However, acquiring such datasets with fine-detailed annotation is both costly and time-consuming. Vision-Language Models (VLMs), such as CLIP, which are pre-trained on large image-text pairs, offer a promising solution by enhancing robustness and data efficiency in medical imaging tasks. This paper introduces a novel Multi-View Mammography and Language Model for breast cancer classification and risk prediction, trained on a dataset of paired mammogram images and synthetic radiology reports. Our MV-MLM leverages multi-view supervision to learn rich representations from extensive radiology data by employing cross-modal self-supervision across image-text pairs. This includes multiple views and the corresponding pseudo-radiology reports. We propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
