# Using Machine Learning to Create Prognostic Systems for Primary Prostate Cancer

**Authors:** Kevin Guan, Andy Guan, Anwar E. Ahmed, Andrew J. Waters, Shyh-Han Tan, Dechang Chen

PMC · DOI: 10.3390/diagnostics15192462 · Diagnostics · 2025-09-26

## TL;DR

This study uses machine learning to improve prostate cancer prognosis by creating a more accurate staging system than the current standard.

## Contribution

The novel Ensemble Algorithm for Clustering Cancer Data (EACCD) outperforms the AJCC staging system in predicting patient outcomes.

## Key findings

- The EACCD model with five variables achieved a C-index of 0.8293, outperforming the AJCC system.
- Adding age and race improved the model's C-index to 0.8504, showing better prognostic accuracy.
- EACCD effectively stratified patients into distinct groups with well-separated survival curves.

## Abstract

Background: Cancer staging, guided by anatomical and clinicopathologic factors, is essential for determining treatment strategies and patient prognosis. The current gold standard for prostate cancer is the American Joint Committee on Cancer (AJCC) Tumor, Lymph Node, and Metastasis (TNM) Staging System 9th Version (2024). This system incorporates five prognostic variables: tumor (T), spread to lymph nodes (N), metastasis (M), prostate-specific antigen (PSA) levels (P), and Grade Group/Gleason score (G). While effective, further refinement of prognostic systems may improve prediction of patient outcomes and support more individualized treatment. Methods: We applied the Ensemble Algorithm for Clustering Cancer Data (EACCD), an unsupervised machine learning approach. EACCD involves three steps: calculating initial dissimilarities, performing ensemble learning, and conducting hierarchical clustering. We first developed an EACCD model using the five AJCC variables (T, N, M, P, G). The model was then expanded to include two additional factors, age (A) and race (R). Prostate cancer patient data were obtained from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute. Results: The EACCD algorithm effectively stratified patients into distinct prognostic groups, each with well-separated survival curves. The five-variable model achieved a concordance index (C-index) of 0.8293 (95% CI: 0.8245–0.8341), while the seven-variable model, including age and race, improved performance to 0.8504 (95% CI: 0.8461–0.8547). Both outperformed the AJCC TNM system, which had a C-index of 0.7676 (95% CI: 0.7622–0.7731). Conclusions: EACCD provides a refined prognostic framework for primary localized prostate cancer, demonstrating superior accuracy over the AJCC staging system. With further validation in independent cohorts, EACCD could enhance risk stratification and support precision oncology.

## Linked entities

- **Diseases:** prostate cancer (MONDO:0005159)

## Full-text entities

- **Genes:** KLK3 (kallikrein related peptidase 3) [NCBI Gene 354] {aka APS, KLK2A1, PSA, hK3}
- **Diseases:** metastasis (MESH:D009362), Prostate Cancer (MESH:D011471), TNM (MESH:D008207), Cancer (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12523577/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12523577/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC12523577/full.md

---
Source: https://tomesphere.com/paper/PMC12523577