# Machine learning approaches for predicting progression in hormone-sensitive prostate cancer patients

**Authors:** Bingyu Zhu, Haiyang Jiang, Chongjian Zhang, Qilin Wang, Libo Yang, Hong Yang, Ruiqian Li, Jun Li, Xusong Pang, Yufeng Zheng, Lingtao Yan, Yu Wang, Yu Bai

PMC · DOI: 10.3389/fonc.2026.1704671 · Frontiers in Oncology · 2026-02-12

## TL;DR

This study uses machine learning to predict when hormone-sensitive prostate cancer will progress to a more resistant form, showing that ensemble methods like random forest perform best.

## Contribution

The study introduces a machine learning model using ensemble methods to predict progression in hormone-sensitive prostate cancer patients.

## Key findings

- Ensemble learning methods, especially random forest, showed the best performance in predicting HSPC progression.
- Random forest achieved an AUC of 0.873 on the test dataset, indicating strong predictive accuracy.
- The model demonstrated good calibration and no significant overfitting, suggesting reliable performance.

## Abstract

Almost all hormone-sensitive prostate cancer (HSPC) cases eventually progress to castration-resistant prostate cancer (CRPC) following androgen deprivation therapy (ADT). This study aims to develop a machine learning (ML) model to predict the progression of HSPC patients. Additionally, we conducted statistical analyses on the dataset to identify significant features and clinical markers predictive of HSPC transitioning to CRPC.

Data from 410 HSPC patients treated at Yunnan Cancer Hospital between 01/01/2017, and 31/05/2022, were analyzed. Predictive analyses were performed on a series of features observed during the patients’ initial visits. The primary ML methods employed were decision tree (DT), random forest (RF), XGBoost, artificial neural network (ANN), and support vector machine (SVM). Feature selection was conducted using a genetic algorithm (GA). The ML models were trained with an 80% training set and validated with a 20% test set. Model performance was evaluated using the area under the ROC curve (AUC), calibration plots, and learning curves to assess fit and calibration. Evaluation metrics included accuracy (ACC), precision (PRE), specificity (SPE), sensitivity (SEN), and F1 score.

Visualization of evaluation metrics was presented through confusion matrices and ROC curves. Ensemble learning methods, particularly RF and XGBoost, demonstrated the best model performance. RF achieved a score of 0.838 (95% CI:0.8324-0.902)on the training dataset and 0.817 (95% CI: 0.659 - 0.829) on the test dataset (AUC: 0.873, 95% CI:0.730-0.878). XGBoost achieved a score of 0.814 (95% CI:0.790-0.878) on the training dataset and 0.805 (95% CI:0.707-0.829) on the test dataset (AUC: 0.866, 95% CI:0.780-0.871). Calibration curves indicated good model calibration, and learning curves suggested no significant overfitting in both the training and test sets.

Our findings demonstrate that ensemble learning methods, particularly RF, exhibit superior performance in predicting HSPC progression. This study represents a preliminary step toward a predictive tool, highlighting the potential of baseline clinical data for risk stratification. Future prospective studies with larger, multi-center cohorts are warranted to validate and refine this approach for possible clinical integration.

## Linked entities

- **Diseases:** prostate cancer (MONDO:0005159)

## Full-text entities

- **Genes:** ALPP (alkaline phosphatase, placental) [NCBI Gene 250] {aka ALP, PALP, PLAP, PLAP-1}, KLK3 (kallikrein related peptidase 3) [NCBI Gene 354] {aka APS, KLK2A1, PSA, hK3}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** bladder (MESH:D001745), ML (MESH:D007859), CRPC (MESH:D064129), breast cancer (MESH:D001943), soft-tissue lesions (MESH:D012983), urinary tract obstruction (MESH:D014552), nodal involvement (MESH:D013611), Cancer (MESH:D009369), diabetic retinopathy (MESH:D003930), adenocarcinoma (MESH:D000230), immune system disorders (MESH:D007154), lung cancer (MESH:D008175), detrusor overactivity (MESH:D053201), hematuria (MESH:D006417), breast-lesion (MESH:D061325), blood disorders (MESH:D006402), bladder outlet obstruction (MESH:D001748), bone metastases (MESH:D009362), Obstructive symptoms (MESH:D012816), HSPC (MESH:D011471), HVD (MESH:D004194), FI (MESH:D000076263)
- **Chemicals:** LIME (MESH:C016538), bisphosphonates (MESH:D004164), testosterone (MESH:D013739)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12935601/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12935601/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/PMC12935601/full.md

---
Source: https://tomesphere.com/paper/PMC12935601