MAM-CLIP: Vision-Language Pretraining on Mammography Atlases for BI-RADS Classification

Halil Ibrahim Gulluk; Olivier Gevaert

arXiv:2605.19359·cs.CV·May 20, 2026

MAM-CLIP: Vision-Language Pretraining on Mammography Atlases for BI-RADS Classification

Halil Ibrahim Gulluk, Olivier Gevaert

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces MAM-CLIP, a vision-language model trained on mammography images and captions to improve BI-RADS classification, especially with limited labeled data.

Contribution

It presents a novel multi-modal approach using contrastive learning on image-caption pairs from mammography atlases, enhancing model performance over traditional methods.

Findings

01

3-class F1 score improved by up to 14% with fewer labeled samples

02

2K image-text pairs can outperform 2K labeled samples in training

03

Pretrained model achieves superior BI-RADS prediction accuracy

Abstract

Deep learning methods have demonstrated promising results in predicting BI-RADS scores from mammography images. However, the interpretation of these images can vary, leading to discrepancies even among radiologists. Given the inherent complexity of mammograms, training classification models solely on image labels often yields limited performance. To address this challenge, we curated 2313 mammogram images and their corresponding captions from two mammography atlases. Our proposed approach employs a multi-modal model that uses a pretrained PubMedBERT as the language component. By training this model on image-text pairs with contrastive learning, we enable the vision encoder to absorb the rich information contained in the captions, thereby improving its understanding of mammography findings. We then fine-tune the vision encoder on two datasets for BI-RADS prediction, achieving superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

igulluk/MAM-CLIP
github

Datasets

gulluk/mammosightr-preprocessed
dataset· 72 dl
72 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.