# Ensemble Deep Learning-Based High-Precision Framework for Breast Cancer Detection from Histopathological Images

**Authors:** Faizan Ahmad, Arfan Jaffar, Ghazanfar Latif, Jaafar Alghazo, Sohail Masood Bhatti

PMC · DOI: 10.3390/diagnostics16050653 · Diagnostics · 2026-02-24

## TL;DR

This paper introduces a deep learning framework that combines CNNs and ViTs with self-attention to improve breast cancer detection from histopathological images.

## Contribution

The novel framework integrates CNNs and ViTs with self-attention for enhanced feature fusion and robust breast cancer diagnosis.

## Key findings

- XGBoost achieved 98.7% accuracy and 98.7% F1-score on the BreakHis dataset.
- The framework reached 95.8% accuracy on the external BACH dataset with strong interpretability via Grad-CAM.

## Abstract

Background/Objectives: Analysis of histopathological images is the absolute standard of breast cancer diagnosis. However, modern deep learning- and ViT-based architecture still struggle to capture effective local and global discriminatory patterns that tend to make architecture more complex, increasing the risk of overfitting and optimization problems. Methods: To address these problems, this paper proposes a four-phase hybrid framework that aims to enhance the feature fusion, improving the model’s strength, robustness, and generalization ability. In Phase 1, the BreakHis dataset was split patient-wise into a 70-15-15 manner to avoid data leakage, while extensive data augmentation, comprehensive normalization, and a five-fold cross-validation protocol were implemented to make the dataset more varied and reliably evaluated without bias. Phase 2 entailed the training of three CNNs (VGG16, ResNet50, and DenseNet121) and four ViTs (DeiT, CaiT, T2T-ViT, and Swin Transformer) independently to establish the strict baseline performance standards. In Phase 3, the CNN-based features were fused and classified with a soft voting mechanism to allow more stable and representative learning. Phase 4 depicts the Proposed Framework, which combines the two best-performing CNN and ViT models. Feature refinements were performed randomly by using Global Average Pooling and feature scaling, while a self-attention mechanism enabled the accurate cross-modal feature fusion. The generalization capability of the fused representation was further enhanced by the subsequent of dense layers followed by dropout. Results: XGBoost exhibited the highest performance among the evaluated ML classifiers, achieving 98.7% accuracy and 98.7% F1-score on BreakHis, while achieving 95.8% accuracy on external BACH dataset backed by Grad-CAM- and Grad-CAM++-based interpretability. Conclusions: By integrating CNNs and ViTs through self-attention, the proposed framework offers a robust and interpretable solution for automated breast cancer diagnosis.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Diseases:** Breast Cancer (MESH:D001943)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** T2T

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12985142/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12985142/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC12985142/full.md

---
Source: https://tomesphere.com/paper/PMC12985142