# A machine learning model for predicting lymph node positivity in ovarian cancer: development, validation, and clinical application

**Authors:** QingYong Guo, Jinji Wang, Ru Chen, LiPing Hu, Wenqiang You

PMC · DOI: 10.3389/fonc.2025.1527674 · Frontiers in Oncology · 2025-07-02

## TL;DR

This paper introduces a machine learning model that predicts lymph node involvement in ovarian cancer patients, helping guide treatment decisions.

## Contribution

The first machine learning model for predicting lymph node positivity in ovarian cancer, validated across large and diverse patient cohorts.

## Key findings

- The XGBoost model achieved an AUC of 0.98 in training and 0.847 in external validation for predicting lymph node positivity.
- Tumor size ≥5 cm, histological subtype, and chemotherapy were identified as key predictive features.
- A free online calculator was developed to help clinicians estimate lymph node positivity risk based on patient data.

## Abstract

Ovarian cancer (OC) remains a highly lethal gynecological malignancy, often diagnosed at advanced stages with a poor prognosis. Lymph node involvement is a critical prognostic factor and significantly influences treatment planning. However, accurately predicting lymph node positivity remains challenging due to the disease’s heterogeneity and the limitations of traditional models in handling high-dimensional and imbalanced data.

A retrospective analysis was conducted using the SEER database (2000–2021), including 26,844 OC patients with complete clinical information. We developed a machine learning model incorporating multiple algorithms, with XGBoost demonstrating superior performance. SMOTE was used to address class imbalance, and LASSO regression aided in selecting key predictors such as tumor size, histology, chemotherapy, and surgery. Model performance was assessed via accuracy, sensitivity, specificity, F1 score, and AUC, with external validation performed using an independent cohort from Fujian Provincial Maternity and Children’s Hospital.

The XGBoost model achieved an AUC of 0.98 (95% CI: 0.975–0.986) in the training set and 0.847 (95% CI: 0.823–0.871) in external validation. The model demonstrated high sensitivity and robust performance in identifying lymph node-positive cases. Tumor size ≥5 cm, histological subtype, and chemotherapy were key predictive features, with SHAP analysis identifying tumor size as the most influential factor.

We present the first machine learning model specifically developed for predicting lymph node positivity in OC, validated across large, diverse cohorts. To facilitate clinical translation, we developed a free, user-friendly online calculator, which allows clinicians to quickly estimate lymph node positivity risk using patient-specific clinical parameters. This tool can be accessed at http://127.0.0.1:6818 and serves as a practical, evidence-based aid to support individualized treatment decisions and potentially improve patient outcomes. Future studies should integrate molecular data and broaden external validation to enhance generalizability.

## Linked entities

- **Diseases:** ovarian cancer (MONDO:0005140)

## Full-text entities

- **Diseases:** OC (MESH:D010051), Tumor (MESH:D009369), gynecological malignancy (MESH:D005833), Lymph node (MESH:D000072717)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12265300/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12265300/full.md

## References

16 references — full list in the complete paper: https://tomesphere.com/paper/PMC12265300/full.md

---
Source: https://tomesphere.com/paper/PMC12265300