# Construction of a prediction model for axillary lymph node metastasis in stage cN0 hormone receptor-positive breast cancer: based on interpretable machine learning methods

**Authors:** Wenyan Liu, Zhijun Ma, Yufei Wang, Qishuai Chen, Liu Wang, Jiuqing Chi

PMC · DOI: 10.3389/fonc.2026.1763228 · 2026-02-03

## TL;DR

This paper develops a machine learning model to predict lymph node metastasis in hormone receptor-positive breast cancer patients, using interpretable methods to guide preoperative decisions.

## Contribution

The novel contribution is an interpretable KNN model for predicting axillary lymph node metastasis in cN0 HR+ BC patients using SHAP explanations.

## Key findings

- The KNN model achieved an AUC of 0.898 in the test set and 0.774 in the external validation set.
- SHAP analysis identified parity as the most critical predictor of axillary lymph node metastasis.
- The model provides high net clinical benefit within the 30%–65% probability threshold range.

## Abstract

Accurately predicting axillary lymph node metastasis (ALNM) preoperatively is crucial for optimizing management in patients with clinically node-negative (cN0) hormone receptor-positive (HR+) breast cancer (BC).

We retrospectively analyzed 816 cN0 HR+ BC patients (2016-2024). Data from 2016-2023 (n=726) were randomly assigned to a training set (n=503) or an internal test set (n=223) in a 7:3 ratio. Patients treated in the most recent year, 2024 (n=90), were reserved as a held-out temporal validation set. Following feature selection via Recursive Feature Elimination (RFE), five machine learning models—XGBoost, Random Forest, Logistic Regression, Support Vector Machine, and K-Nearest Neighbors (KNN)—were developed. Performance was assessed by the area under the receiver operating characteristic curve (AUC) and decision curve analysis (DCA). The optimal model was interpreted using SHapley Additive exPlanations (SHAP).

The incidence of ALNM was 30.9%. The KNN model demonstrated optimal performance, achieving an AUC of 0.898 (95% CI: 0.857–0.939) in the test set and 0.774 (95% CI: 0.655–0.892) in the external validation set. DCA indicated that the KNN model provided the highest net clinical benefit within the 30%–65% threshold probability range. SHAP analysis identified parity as the most critical predictor, followed by age, tumor location, menopausal status, tumor diameter, lymphocyte count, platelet count, alpha-fetoprotein (AFP), neutrophil count, and carcinoembryonic antigen (CEA).

The KNN model, integrated with the SHAP interpretability framework, shows favorable performance, interpretability, and clinical applicability for predicting ALNM in cN0 HR+ BC, offering a valuable tool for preoperative risk assessment and individualized decision-making.

## Linked entities

- **Chemicals:** carcinoembryonic antigen (PubChem CID 10306739)
- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** ESR1 (estrogen receptor 1) [NCBI Gene 2099] {aka ER, ESR, ESRA, ESTRR, Era, NR3A1}, CD8A (CD8 subunit alpha) [NCBI Gene 925] {aka CD8, CD8alpha, IMD116, Leu2, p32}, AFP (alpha fetoprotein) [NCBI Gene 174] {aka AFPD, FETA, HPAFP}, TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}, PGR (progesterone receptor) [NCBI Gene 5241] {aka NR3C3, PR}, NR4A1 (nuclear receptor subfamily 4 group A member 1) [NCBI Gene 3164] {aka GFRP1, HMR, N10, NAK-1, NGFIB, NP10}, IL1B (interleukin 1 beta) [NCBI Gene 3553] {aka IL-1, IL1-BETA, IL1F2, IL1beta}, ITIH2 (inter-alpha-trypsin inhibitor heavy chain 2) [NCBI Gene 3698] {aka H2P, ITI-HC2, SHAP}, CSF3 (colony stimulating factor 3) [NCBI Gene 1440] {aka C17orf33, CSF3OS, GCSF}, EREG (epiregulin) [NCBI Gene 2069] {aka EPR, ER, Ep}, CEACAM3 (CEA cell adhesion molecule 3) [NCBI Gene 1084] {aka CD66D, CEA, CGM1, CGM1a, W264, W282}
- **Diseases:** ALND (MESH:D000072717), BC (MESH:D001943), ALNM (MESH:D008207), stage IV disease (MESH:D007676), Male breast cancer (MESH:D018567), HR (MESH:D002303), breast masses (MESH:D061325), Inflammatory breast cancer (MESH:D058922), hormone receptor-positive (MESH:D046150), axillary metastasis (MESH:D009362), T3 (MESH:C537047), Luminal A tumors (MESH:D009369), ductal carcinoma in situ (MESH:D002285), inflammation (MESH:D007249), node (MESH:D012804)
- **Species:** Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090]

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12909162/full.md

---
Source: https://tomesphere.com/paper/PMC12909162