TL;DR
This paper introduces a novel framework leveraging region-of-interest token reduction, contrastive learning, and pretrained vision transformers to improve breast cancer classification from mammograms, addressing challenges of high-resolution images and fine-grained distinctions.
Contribution
It proposes a new approach combining RoI-based token reduction, contrastive learning, and pretrained ViT models to enhance mammogram classification accuracy.
Findings
Achieves superior performance over existing baselines on public datasets.
Demonstrates the effectiveness of RoI-guided attention and contrastive learning in fine-grained medical image classification.
Establishes potential clinical utility for large-scale breast cancer screening.
Abstract
Vision Transformers have become the architecture of choice for many computer vision tasks, yet their performance in computer-aided diagnostics remains limited. Focusing on breast cancer detection from mammograms, we identify two main causes for this shortfall. First, medical images are high-resolution with small abnormalities, leading to an excessive number of tokens and making it difficult for the softmax-based attention to localize and attend to relevant regions. Second, medical image classification is inherently fine-grained, with low inter-class and high intra-class variability, where standard cross-entropy training is insufficient. To overcome these challenges, we propose a framework with three key components: (1) Region of interest based token reduction using an object detection model to guide attention; (2) contrastive learning between selected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
