CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer

Moein Sorkhei; Yue Liu; Hossein Azizpour; Edward Azavedo; Karin Dembrower; Dimitra Ntoula; Athanasios Zouzos; Fredrik Strand; Kevin Smith

arXiv:2112.01330·cs.CV·December 16, 2025

CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer

Moein Sorkhei, Yue Liu, Hossein Azizpour, Edward Azavedo, Karin Dembrower, Dimitra Ntoula, Athanasios Zouzos, Fredrik Strand, Kevin Smith

PDF

2 Repos 1 Datasets

TL;DR

This paper introduces CSAW-M, a large annotated mammographic dataset for benchmarking cancer masking, and demonstrates that deep learning models trained on it can better predict difficult-to-detect cancers than traditional density measures.

Contribution

The paper presents CSAW-M, the largest public dataset with expert annotations of mammographic masking, and shows its effectiveness in improving cancer detection predictions.

Findings

01

Deep learning models trained on CSAW-M outperform density measures in predicting interval and invasive cancers.

02

CSAW-M is the largest dataset with expert-annotated masking potential for mammography.

03

Estimated masking levels correlate strongly with difficult-to-detect cancers.

Abstract

Interval and large invasive breast cancers, which are associated with worse prognosis than other cancers, are usually detected at a late stage due to false negative assessments of screening mammograms. The missed screening-time detection is commonly caused by the tumor being obscured by its surrounding breast tissues, a phenomenon called masking. To study and benchmark mammographic masking of cancer, in this work we introduce CSAW-M, the largest public mammographic dataset, collected from over 10,000 individuals and annotated with potential masking. In contrast to the previous approaches which measure breast image density as a proxy, our dataset directly provides annotations of masking potential assessments from five specialists. We also trained deep learning models on CSAW-M to estimate the masking level and showed that the estimated masking is significantly more predictive of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

SinKove/synthetic_mammography_csaw
dataset· 5 dl
5 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.