Multimodal Breast Lesion Classification Using Cross-Attention Deep Networks
Hung Q. Vo, Pengyu Yuan, Tiancheng He, Stephen T.C. Wong, and Hien V., Nguyen

TL;DR
This paper explores multimodal deep learning architectures that combine mammogram images and clinical data to improve breast lesion classification accuracy, addressing challenges like missing data and optimal feature integration.
Contribution
It introduces and systematically compares cross-attention and co-attention based deep networks for multimodal breast lesion classification, demonstrating significant performance improvements.
Findings
Classification performance improved from 0.89 to 0.94 AUC.
Cross-attention models outperform simple concatenation methods.
Models maintain robustness even with missing clinical data.
Abstract
Accurate breast lesion risk estimation can significantly reduce unnecessary biopsies and help doctors decide optimal treatment plans. Most existing computer-aided systems rely solely on mammogram features to classify breast lesions. While this approach is convenient, it does not fully exploit useful information in clinical reports to achieve the optimal performance. Would clinical features significantly improve breast lesion classification compared to using mammograms alone? How to handle missing clinical information caused by variation in medical practice? What is the best way to combine mammograms and clinical features? There is a compelling need for a systematic study to address these fundamental questions. This paper investigates several multimodal deep networks based on feature concatenation, cross-attention, and co-attention to combine mammograms and categorical clinical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
