# Explainable and uncertainty-aware ensemble framework with causal analysis for breast cancer detection

**Authors:** Muhammad Zaheer Sajid, Muhammad Fareed Hamid, Imran Qureshi

PMC · DOI: 10.3389/fonc.2025.1751090 · Frontiers in Oncology · 2026-02-20

## TL;DR

This paper introduces a machine learning framework that improves breast cancer detection by combining uncertainty estimation and causal analysis for more accurate and trustworthy predictions.

## Contribution

The novel contribution is an uncertainty-aware ensemble framework with causal analysis and multimodal explainability for breast cancer prediction.

## Key findings

- The model achieved high accuracy and precision with no false positives on the UCTH Clinical Dataset.
- Causal analysis identified key clinical confounders like lymph node involvement and tumor size.
- The framework showed balanced performance across demographic groups and improved reliability in clinical use.

## Abstract

Breast cancer is one of the main causes of cancer deaths around the world and is known for its aggressive growth and ability to spread. While machine learning has shown good results for diagnosis, most existing methods do not handle uncertainty or explain their predictions clearly. In this study, we present an integrated framework that combines uncertainty-aware ensemble learning with causal feature analysis and multimodal explainability for breast cancer prediction. The framework uses a mix of Light Gradient Boosting Machine (LightGBM), random forest, and gradient boosting classifiers that include uncertainty estimation so that the model can mark predictions that are less confident. It also applies causal analysis to detect possible clinical confounders and uses SHAP (Shapley Additive Explanations), permutation importance, and feature attribution for interpretation. Tests on two public datasets showed strong and consistent performance. On the UCTH Clinical Dataset, the model reached an area under the curve (AUC) of 0.97%, an accuracy of 0.95%, and an F1 score of 0.94%, with 100% precision for high confidence cases and no false positives. On the Breast Cancer Wisconsin dataset, it achieved an AUC of 0.99, an accuracy of 0.94%, and an F1 score of 0.92%, which increased to 0.98% accuracy and 0.98% F1 score when only certain predictions were considered. Causal analysis pointed out important clinical confounders like lymph node involvement, tumor size, and metastasis, while fairness tests showed balanced results across demographic groups. Overall, the framework combines uncertainty estimation and causal interpretability to give predictions that are both accurate and trustworthy. It provides clinicians with clear confidence levels for every prediction and supports transparent decision-making that can reduce diagnostic errors and improve reliability in clinical use.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** death (MESH:D003643), lung, cervical, colorectal, or prostate cancer (MESH:D015179), UCTH (MESH:D003428), metastasis (MESH:D009362), Breast Cancer (MESH:D001943), swelling (MESH:D004487), Tumor (MESH:D009369), calcifications (MESH:D002114), pain (MESH:D010146), AI (MESH:C538142), multiple organ failure (MESH:D009102), skin cancer (MESH:D012878)
- **Chemicals:** alcohol (MESH:D000438)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** UCTH — Homo sapiens (Human), Ovarian carcinoma, Cancer cell line (CVCL_RV35)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12962921/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12962921/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/PMC12962921/full.md

---
Source: https://tomesphere.com/paper/PMC12962921