# Multimodal deep learning for breast tumor classification: Integrating mammography and ultrasound for enhanced diagnostic accuracy

**Authors:** Yu Yan, Yichen Xu, Ge Fang, Xu He, Yifei Qian, Wenwen Zhu

PMC · DOI: 10.1002/acm2.70464 · 2026-01-18

## TL;DR

This paper introduces a multimodal deep learning model combining mammography and ultrasound to improve breast tumor classification accuracy and support clinical decisions.

## Contribution

The novel contribution is a multimodal model with modality-specific attention mechanisms that outperforms single-modality approaches in breast tumor classification.

## Key findings

- The MPM-MU model achieved an AUC of 87.9% for breast tumor classification.
- It outperformed single-modality models by 13.4% and 15.6% for mammography and ultrasound, respectively.
- Ablation studies confirmed the effectiveness of multimodal fusion and attention mechanisms.

## Abstract

Deep learning has advanced breast tumor prediction research, but traditional single‐modality models limit feature diversity and accuracy.

To develop and validate a multimodal deep learning approach that combines mammography and ultrasound imaging for improved breast tumor classification and enhanced clinical decision‐making.

This retrospective study analyzed 663 female patients with breast lesions from 2018 to 2021, including 384 benign and 279 malignant cases. The two‐stage prediction model employed improved modality‐specific attention mechanisms: efficient channel attention (ECA‐Net) for ultrasound and convolutional block attention module (CBAM) for mammography. The fused features were input into a stacking ensemble module with logistic regression (LR), support vector machine (SVM), random forest (RF), and Extra‐Trees (ET) as base learners, and multilayer perceptron (MLP) neural network as meta‐learner. Data was divided into training (464), validation (133), and test (66) sets with a 7:2:1 ratio.

The proposed multimodal prediction model—mammography ultrasound (MPM‐MU) achieved superior performance with an area under the receiver operating characteristic (ROC) Curve (AUC) of 87.9 ± 0.21%, representing improvements of 13.4% and 15.6% over attention‐enhanced mammography (74.5%) and ultrasound (72.3%) models, respectively. Ablation studies confirmed the effectiveness of both multimodal feature fusion and attention mechanisms in enhancing diagnostic performance.

The multimodal prediction model—mammography ultrasound (MPM‐MU) with modality‐specific attention mechanisms demonstrated superior performance in distinguishing between benign and malignant breast tumors compared to single‐modality approaches. This approach assists radiologists in improving breast lesion classification accuracy and enhancing clinical decision‐making, potentially reducing unnecessary biopsies and improving diagnostic consistency.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Diseases:** breast tumor (MESH:D001943), breast lesion (MESH:D061325)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12813411/full.md

---
Source: https://tomesphere.com/paper/PMC12813411