# Understanding Cancer Risk Among Bangladeshi Women: An Explainable Machine Learning Approach to Socio-Reproductive Factors Using Tertiary Hospital Data

**Authors:** Muhammad Rafiqul Islam, Humayera Islam, Syeda Masuma Siddiqua, Salman Bashar Al Ayub, Beauty Saha, Nargis Akter, Rashedul Islam, Nazrina Khatun, Andrew Craver, Habibul Ahsan

PMC · DOI: 10.3390/healthcare13121432 · Healthcare · 2025-06-15

## TL;DR

This study uses machine learning to identify socio-reproductive factors linked to breast cancer subtypes in Bangladeshi women, aiming to improve early detection and public health strategies.

## Contribution

The study introduces SHAP-based explainable machine learning to uncover subtype-specific risk factors for breast cancer in a low-resource setting.

## Key findings

- XGBoost achieved the highest performance (F1-score = 0.750) in predicting breast cancer subtypes.
- Rural residence, low education, and undernutrition were significant predictors across HR+ and TNBC subtypes.
- Cesarean delivery and multiple abortions were more predictive of TNBC, while urban residence and higher education were more predictive of HR+.

## Abstract

Background: Breast cancer poses a significant health challenge in Bangladesh, where limited screening and unique reproductive patterns contribute to delayed diagnoses and subtype-specific disparities. While reproductive risk factors such as age at menarche, parity, and contraceptive use are well studied in high-income countries, their associations with hormone-receptor-positive (HR+) and triple-negative breast cancer (TNBC) remain underexplored in low-resource settings. Methods: A case-control study was conducted at the National Institute of Cancer Research and Hospital (NICRH) including 486 histopathologically confirmed breast cancer cases (246 HR+, 240 TNBC) and 443 cancer-free controls. Socio-demographic and reproductive data were collected through structured interviews. Machine learning models—including Logistic Regression, Lasso, Support Vector Machines, Random Forest, and XGBoost—were trained using stratified five-fold cross-validation. Model performance was evaluated using sensitivity, F1-score, and Area Under Receiver Operating Curve (AUROC). To interpret model predictions and quantify the contribution of individual features, we employed Shapley Additive exPlanation (SHAP) values. Results: XGBoost achieved the highest overall performance (F1-score = 0.750), and SHAP-based interpretability revealed key predictors for each subtype. Rural residence, low education (≤5 years), and undernutrition were significant predictors across subtypes. Cesarean delivery and multiple abortions were more predictive of TNBC, while urban residence, employment, and higher education were more predictive of HR+. Age at menarche and age at first childbirth showed decreasing predictive importance with increasing age for HR+, while larger gaps between marriage and childbirth were more predictive of TNBC. Conclusions: Our findings underscore the value of machine learning coupled with SHAP-based explainability in identifying context-specific risk factors for breast cancer subtypes in resource-limited settings. This approach enhances transparency and supports the development of targeted public health interventions to reduce breast cancer disparities in Bangladesh.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** NR4A1 (nuclear receptor subfamily 4 group A member 1) [NCBI Gene 3164] {aka GFRP1, HMR, N10, NAK-1, NGFIB, NP10}
- **Diseases:** TNBC (MESH:D064726), undernutrition (MESH:D044342), abortions (MESH:D000026), Cancer (MESH:D009369), Breast cancer (MESH:D001943)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12192815/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12192815/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/PMC12192815/full.md

---
Source: https://tomesphere.com/paper/PMC12192815