# Development of a Machine Learning Model for Predicting Treatment-Related Amenorrhea in Young Women with Breast Cancer

**Authors:** Long Song, Zobaida Edib, Uwe Aickelin, Hadi Akbarzadeh Khorshidi, Anne-Sophie Hamy, Yasmin Jayasinghe, Martha Hickey, Richard A. Anderson, Matteo Lambertini, Margherita Condorelli, Isabelle Demeestere, Michail Ignatiadis, Barbara Pistilli, H. Irene Su, Shanton Chang, Patrick Cheong-Iao Pang, Fabien Reyal, Scott M. Nelson, Paniti Sukumvanich, Alessandro Minisini, Fabio Puglisi, Kathryn J. Ruddy, Fergus J. Couch, Janet E. Olson, Kate Stern, Franca Agresta, Lesley Stafford, Laura Chin-Lenn, Wanda Cui, Antoinette Anazodo, Alexandra Gorelik, Tuong L. Nguyen, Ann Partridge, Christobel Saunders, Elizabeth Sullivan, Mary Macheras-Magias, Michelle Peate

PMC · DOI: 10.3390/bioengineering12111171 · 2025-10-28

## TL;DR

A machine learning model was developed to predict amenorrhea risk in young breast cancer patients, improving fertility counseling and decision-making.

## Contribution

The study introduces a novel machine learning model with enhanced accuracy and a robust framework for integrating diverse datasets.

## Key findings

- The model achieved an internal validation AUC of 0.820 and external validation AUC of 0.743.
- Twenty variables were identified as significant predictors of amenorrhea risk.
- The model demonstrated high sensitivity (91.3% internally, 92.9% externally) at a cutoff of 0.20.

## Abstract

Treatment-induced ovarian function loss is a significant concern for many young patients with breast cancer. Accurately predicting this risk is crucial for counselling young patients and informing their fertility-related decision-making. However, current risk prediction models for treatment-related ovarian function loss have limitations. To provide a broader representation of patient cohorts and improve feature selection, we combined retrospective data from six datasets within the FoRECAsT (Infertility after Cancer Predictor) databank, including 2679 pre-menopausal women diagnosed with breast cancer. This combined dataset presented notable missingness, prompting us to employ cross imputation using the k-nearest neighbours (KNN) machine learning (ML) algorithm. Employing Lasso regression, we developed an ML model to forecast the risk of treatment-related amenorrhea as a surrogate marker of ovarian function loss at 12 months after starting chemotherapy. Our model identified 20 variables significantly associated with risk of developing amenorrhea. Internal validation resulted in an area under the receiver operating characteristic curve (AUC) of 0.820 (95% CI: 0.817–0.823), while external validation with another dataset demonstrated an AUC of 0.743 (95% CI: 0.666–0.818). A cutoff of 0.20 was chosen to achieve higher sensitivity in validation, as false negatives—patients incorrectly classified as likely to regain menses—could miss timely opportunities for fertility preservation if desired. At this threshold, internal validation yielded sensitivity and precision rates of 91.3% and 61.7%, respectively, while external validation showed 92.9% and 60.0%. Leveraging ML methodologies, we not only devised a model for personalised risk prediction of amenorrhea, demonstrating substantial enhancements over existing models but also showcased a robust framework for maximally harnessing available data sources.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Diseases:** Breast Cancer (MESH:D001943), Cancer (MESH:D009369), ovarian function loss (MESH:D010051), Amenorrhea (MESH:D000568)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12649454/full.md

---
Source: https://tomesphere.com/paper/PMC12649454