Performance Evaluation of Deep Learning and Transformer Models Using   Multimodal Data for Breast Cancer Classification

Sadam Hussain; Mansoor Ali; Usman Naseem; Beatriz Alejandra Bosques; Palomo; Mario Alexis Monsivais Molina; Jorge Alberto Garza Abdala; Daly; Betzabeth Avendano Avalos; Servando Cardona-Huerta; T. Aaron Gulliver; Jose; Gerardo Tamez Pena

arXiv:2410.10146·eess.IV·October 15, 2024

Performance Evaluation of Deep Learning and Transformer Models Using Multimodal Data for Breast Cancer Classification

Sadam Hussain, Mansoor Ali, Usman Naseem, Beatriz Alejandra Bosques, Palomo, Mario Alexis Monsivais Molina, Jorge Alberto Garza Abdala, Daly, Betzabeth Avendano Avalos, Servando Cardona-Huerta, T. Aaron Gulliver, Jose, Gerardo Tamez Pena

PDF

TL;DR

This study develops and evaluates a multimodal deep learning model combining imaging and textual data for breast cancer classification, demonstrating high accuracy and robustness over unimodal approaches.

Contribution

It introduces a novel multimodal dataset and architecture, integrating mammogram images and radiological reports, and compares multiple state-of-the-art models for improved diagnostic performance.

Findings

01

VGG19+ANN achieved 95.1% accuracy.

02

VGG16+LSTM achieved 0.903 sensitivity.

03

VGG16+LSTM achieved 0.937 AUC.

Abstract

Rising breast cancer (BC) occurrence and mortality are major global concerns for women. Deep learning (DL) has demonstrated superior diagnostic performance in BC classification compared to human expert readers. However, the predominant use of unimodal (digital mammography) features may limit the current performance of diagnostic models. To address this, we collected a novel multimodal dataset comprising both imaging and textual data. This study proposes a multimodal DL architecture for BC classification, utilising images (mammograms; four views) and textual data (radiological reports) from our new in-house dataset. Various augmentation techniques were applied to enhance the training data size for both imaging and textual data. We explored the performance of eleven SOTA DL architectures (VGG16, VGG19, ResNet34, ResNet50, MobileNet-v3, EffNet-b0, EffNet-b1, EffNet-b2, EffNet-b3,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Dense Connections · Residual Connection · Dropout · Layer Normalization · Adam · Byte Pair Encoding · Absolute Position Encodings · Vision Transformer · Softmax