# Development and validation of a pathomics-driven machine learning model for individualized prediction of neoadjuvant chemotherapy response and early recurrence in HR-positive, HER2-negative breast cancer

**Authors:** Jiaxian Yue, Jiaxiang Liu, Xiyu Kang, Pei Yuan, Wei Wang, Zhanyu Wang, Chao Shang, Qingyao Shang, Guangyu Li, Xubin Dong, Tianxiao Wang, Dongmin Yang, Shuhao Wang, Chenxuan Yang, Jianming Ying, Xin Wang

PMC · DOI: 10.3389/fonc.2026.1770037 · Frontiers in Oncology · 2026-02-23

## TL;DR

This study develops a machine learning model using digital pathology to predict chemotherapy response and early recurrence in a common type of breast cancer.

## Contribution

The novel contribution is a pathomics-driven model combining clinical and AI-extracted features for individualized prediction in HR-positive, HER2-negative breast cancer.

## Key findings

- The CatBoost model achieved high accuracy (AUC = 0.900 in training) for predicting chemotherapy response.
- Pathomics-based models accurately predicted 1-year recurrence with AUC = 0.907 in training.
- Key factors included Ki-67, age, histological grade, PR status, and AI-extracted pathomic features.

## Abstract

Hormone receptor (HR)-positive, human epidermal growth factor receptor 2 (HER2)-negative breast cancer is the most prevalent subtype among women but has a modest response to neoadjuvant chemotherapy (NAC). Accurately predicting NAC efficacy and recurrence risk remains challenging, as conventional clinical and molecular markers have limited predictive power. Advances in digital pathology and artificial intelligence now enable quantitative pathomics analysis, offering new opportunities for precise prediction and prognostic assessment.

In this retrospective study, 162 HR-positive, HER2-negative breast cancer patients treated with NAC between 2014 and 2021 were included. Hematoxylin and eosin (H&E)-stained pretreatment biopsy slides were digitized and analyzed using Vision Transformer (ViT) and Unified Network for Image (UNI) deep learning models to extract pathomic features. Thirteen clinical variables were collected. After least absolute shrinkage and selection operator (LASSO)-based feature selection, multiple machine learning models were developed for both response prediction and prognostic evaluation of recurrence, with performance evaluated by receiver operating characteristic (ROC) curves, area under the curve (AUC), sensitivity, specificity, confusion matrix, calibration curves, and decision curve analysis (DCA). Furthermore, SHapley Additive exPlanations (SHAP) was used to rank the importance of features for each model.

The CatBoost model achieved the best predictive performance (AUC = 0.900 in training and 0.848 in validation) when a combination of clinical and pathomics-derived variables was used. Key predictive factors included Ki-67 expression, age, histological grade, PR status, and prominent pathomic features. A Kaplan–Meier survival plot indicated that regardless of stratification by MP grade or pCR status, there was no significant difference in recurrence status or survival outcomes between the two groups in this cohort. Furthermore, the recurrence models developed mainly using pathomics were strongly accurate for predicting 1-year recurrence (AUC = 0.907 in training and 0.769 in validation).

Integrating pathomic features with clinical variables via machine learning enables robust pretreatment prediction of NAC efficacy and short-term recurrence in HR-positive, HER2-negative breast cancer. This approach has the potential to offer a clinically practical tool to optimize individualized therapy and improve patient management, highlighting the translational value of AI-powered digital pathology in breast cancer care.

## Linked entities

- **Proteins:** Mki67 (antigen identified by monoclonal antibody Ki 67), PGR (progesterone receptor)
- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** ESR1 (estrogen receptor 1) [NCBI Gene 2099] {aka ER, ESR, ESRA, ESTRR, Era, NR3A1}, PGR (progesterone receptor) [NCBI Gene 5241] {aka NR3C3, PR}, NR4A1 (nuclear receptor subfamily 4 group A member 1) [NCBI Gene 3164] {aka GFRP1, HMR, N10, NAK-1, NGFIB, NP10}, EREG (epiregulin) [NCBI Gene 2069] {aka EPR, ER, Ep}, ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}
- **Diseases:** MP (MESH:C537680), toxicity (MESH:D064420), deaths (MESH:D003643), N stage III (MESH:D062706), TNBC (MESH:D064726), Breast cancer (MESH:D001943), pCR (MESH:D005598), cancer (MESH:D009369)
- **Chemicals:** H&amp;E (-), H&amp;E (MESH:D006371), TG (MESH:D013866), Hematoxylin (MESH:D006416), eosin (MESH:D004801), glucose (MESH:D005947)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12967959/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12967959/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC12967959/full.md

---
Source: https://tomesphere.com/paper/PMC12967959