# Breast Cancer Prediction Using Rotation Forest Algorithm Along with Finding the Influential Causes

**Authors:** Prosenjit Das, Proshenjit Sarker, Jun-Jiat Tiang, Abdullah-Al Nahid

PMC · DOI: 10.3390/bioengineering12101020 · Bioengineering · 2025-09-25

## TL;DR

This paper uses machine learning to predict breast cancer and identifies which factors most influence the predictions.

## Contribution

The study introduces an optimized Rotation Forest algorithm with feature selection techniques and evaluates counterfactual explanations for breast cancer prediction.

## Key findings

- Hard voting strategy achieves higher accuracy (85.71%) and F1-score (83.87%) compared to soft voting.
- BMI and Glucose values are most influential in predicting breast cancer, while HOMA, Adiponectin, and Resistin have little influence.

## Abstract

Breast cancer is a widespread disease involving abnormal (uncontrolled) growth of breast tissue cells along with the formation of a tumor and metastasis. Breast cancer cases occur mostly among women. Early detection and regular screening have significantly improved survival rates. This research classifies breast cancer and non-breast cancer cases using machine learning algorithms based on the Breast Cancer Coimbra dataset by optimizing the classifier performance and feature selection methodology. In addition, this research identifies the influential features responsible for BC classification by using diverse counterfactual explanations. The Rotation Forest classifier algorithm is used to classify breast cancer and non-breast cancer cases. The hyperparameters of this algorithm are optimized using the Optuna optimizer. Three wrapper-based feature selection techniques (Sequential Forward Selection, Sequential Backward Selection, and Exhaustive Feature Selection) are used to select the most relevant features. An ensemble environment is also created using the best feature subsets of these methods, incorporating both soft and hard voting strategies. Experimental results show that the hard voting strategy achieves an accuracy of 85.71%, F1-score of 83.87%, precision of 92.85%, and recall of 76.47%. In contrast, the soft voting strategy obtains an accuracy of 80.00%, F1-score of 77.42%, precision of 85.71%, and recall of 70.59%. These findings demonstrate that hard voting achieves noticeably better performance. The misclassification outcomes of both strategies are explored using Diverse Counterfactual Explanations, revealing that BMI and Glucose values are most influential in predicting correct classes, whereas the HOMA, Adiponectin, and Resistin values have little influence.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** RETN (resistin) [NCBI Gene 56729] {aka ADSF, FIZZ3, RENT, RETN1, RSTN, XCP1}, ADIPOQ (adiponectin, C1Q and collagen domain containing) [NCBI Gene 9370] {aka ACDC, ACRP30, ADIPQTL1, ADPN, APM-1, APM1}
- **Diseases:** tumor (MESH:D009369), metastasis (MESH:D009362), Breast Cancer (MESH:D001943)
- **Chemicals:** Glucose (MESH:D005947)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12561878/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12561878/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/PMC12561878/full.md

---
Source: https://tomesphere.com/paper/PMC12561878