# Prediction of a Multi-Gene Assay (Oncotype DX and Mammaprint) Recurrence Risk Group Using Machine Learning in Estrogen Receptor-Positive, HER2-Negative Breast Cancer—The BRAIN Study

**Authors:** Jung-Hwan Ji, Sung Gwe Ahn, Youngbum Yoo, Shin-Young Park, Joo-Heung Kim, Ji-Yeong Jeong, Seho Park, Ilkyun Lee

PMC · DOI: 10.3390/cancers16040774 · Cancers · 2024-02-13

## TL;DR

This study develops a machine learning model to predict the results of expensive breast cancer gene tests, offering a cost-effective alternative.

## Contribution

A machine learning-based model is developed to predict multi-gene assay outcomes with high accuracy and accessibility.

## Key findings

- The model achieved 84.8% accuracy for MammaPrint and 87.9% for Oncotype DX.
- The ensemble model reached 86.8% accuracy and 95.7% in a specific subgroup.
- The model could serve as a cost-effective alternative to expensive multi-gene assays.

## Abstract

Multi-gene assays (MGAs), such as Oncotype DX and Mammaprint, are used to provide predictive and prognostic values in treatment of ER+HER2− breast cancer. However, their accessibility is restricted due to their high cost in some countries. For this reason, many studies have been conducted to develop the tests that can replace the multi-gene assays, but practicality is still insufficient. The aim of our study is to develop a highly accessible machine learning-based model for predicting the result of MGA. Our accurate and affordable machine learning-based predictive model may serve as a cost-effective alternative to the expensive multi-gene assays.

This study aimed to develop a machine learning-based prediction model for predicting multi-gene assay (MGA) risk categories. Patients with estrogen receptor-positive (ER+)/HER2− breast cancer who had undergone Oncotype DX (ODX) or MammaPrint (MMP) were used to develop the prediction model. The development cohort consisted of a total of 2565 patients including 2039 patients tested with ODX and 526 patients tested with MMP. The MMP risk prediction model utilized a single XGBoost model, and the ODX risk prediction model utilized combined LightGBM, CatBoost, and XGBoost models through soft voting. Additionally, the ensemble (MMP + ODX) model combining MMP and ODX utilized CatBoost and XGBoost through soft voting. Ten random samples, corresponding to 10% of the modeling dataset, were extracted, and cross-validation was performed to evaluate the accuracy on each validation set. The accuracy of our predictive models was 84.8% for MMP, 87.9% for ODX, and 86.8% for the ensemble model. In the ensemble cohort, the sensitivity, specificity, and precision for predicting the low-risk category were 0.91, 0.66, and 0.92, respectively. The prediction accuracy exceeded 90% in several subgroups, with the highest prediction accuracy of 95.7% in the subgroup that met Ki-67 <20 and HG 1~2 and premenopausal status. Our machine learning-based predictive model has the potential to complement existing MGAs in ER+/HER2− breast cancer.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}, EREG (epiregulin) [NCBI Gene 2069] {aka EPR, ER, Ep}, ESR1 (estrogen receptor 1) [NCBI Gene 2099] {aka ER, ESR, ESRA, ESTRR, Era, NR3A1}
- **Diseases:** Breast Cancer (MESH:D001943)
- **Chemicals:** MGAs (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC10887075/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10887075/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC10887075/full.md

---
Source: https://tomesphere.com/paper/PMC10887075