# Validation of machine learning-based models to predict and explain the risk of ovarian cancer: a multicentric study on BRCA-mutated patients undergoing risk-reducing salpingo-oophorectomy

**Authors:** Vera Loizzi, Maria Colomba Comes, Francesca Arezzo, Adriana Ionelia Apostol, Samantha Bove, Annarita Fanizzi, Robert Fruscio, Vanesa Gregorc, Francesco Legge, Rosanna Mancari, Claudia Marchetti, Serena Negri, Giorgia Russo, Laura Vertechy, Giovanni Scambia, Raffaella Massafra, Gennaro Cormio

PMC · DOI: 10.3389/fonc.2025.1574037 · Frontiers in Oncology · 2025-04-15

## TL;DR

This study developed machine learning models to predict ovarian cancer risk in BRCA-mutated patients, finding that CA125, age, and time to surgery are key risk factors.

## Contribution

The study introduces explainable machine learning models validated across multiple centers to assess ovarian cancer risk in BRCA-mutated patients.

## Key findings

- The best model achieved an AUC of 79.3% and identified CA125, age, and MatoRRSO as significant risk factors.
- Estroprogestinuse and PregnancyNfdt were identified as protective factors against ovarian cancer.
- The model's sensitivity was limited, reducing its effectiveness in high-risk populations.

## Abstract

BRCA-mutated women are recommended to undergo bilateral risk-reducing salpingo-oophorectomy (RRSO) after childbearing, due to the lack of effective methods that could be able to early detect the occurrence of ovarian cancer. Thus, predictive machine learning (ML) techniques could be crucial to aid clinicians in identifying high-risk BRCA-mutated patients and determining the appropriate timing for performing RRSO.

In this work, we addressed this task by developing explainable ML models using clinical data referred to a multicentric cohort of 694 BRCA-mutated patients from six Italian centers (Policlinico Gemelli, IRCCS San Gerardo, Policlinico Bari, Istituto Tumori Regina Elena, Istituto Tumori Giovanni Paolo II, Ospedale F. Miulli), who performed salpingo-oophorectomy, out of which 39 patients showed tumor (5.6%). Data from Istituto Tumori Regina Elena and Policlinico Bari were used as External Validation Cohort (EVC). The other data were employed as Investigational Cohort (IC). Resampling and ensemble techniques were implemented to handle dataset imbalance. Explainable techniques enabled us to identify some protective and risk factors predicted by the models with respect to the task under study.

The best ML model achieved an AUC value of 79.3% (95% CI: 75.3% - 83.0%), an accuracy value of 73.8% (95% CI: 69.6% - 78.2%), a sensitivity value of 66.7% (95% CI: 58.1% - 75.3%), a specificity value of 74.3% (95% CI: 68.7% - 80.0%), and a G-mean value of 70.4% (95% CI: 63.0% - 76.0%) on EVC. Although the model demonstrated good overall performance, its limited sensitivity reduces its effectiveness in this high-risk population. The variables CA125, age and MatoRRSO were found to be the most significant risk factors, in agreement with the clinical perspective. Conversely, variables such as Estroprogestinuse and PregnancyNfdt played a protective factor role.

Our ML proposal explores the intricate relationships between multiple clinical variables, with a particular emphasis on understanding their non-linear associations. However, while our approach provides valuable insights into risk assessment for BRCA-mutated patients, its current predictive capacity does not significantly improve upon existing clinical models.

## Linked entities

- **Genes:** Brca2 (BRCA2, DNA repair associated) [NCBI Gene 37916]
- **Diseases:** ovarian cancer (MONDO:0005140)

## Full-text entities

- **Genes:** BRCA1 (BRCA1 DNA repair associated) [NCBI Gene 672] {aka BRCAI, BRCC1, BROVCA1, FANCS, IRIS, PNCA4}, MUC16 (mucin 16, cell surface associated) [NCBI Gene 94025] {aka CA125}
- **Diseases:** tumor (MESH:D009369), ovarian cancer (MESH:D010051)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12037974/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12037974/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/PMC12037974/full.md

---
Source: https://tomesphere.com/paper/PMC12037974