# Multidimensional cell-free DNA fragmentomics enables early detection of breast cancer

**Authors:** Lixian Yang, Mengyang An, Heng song, Xuan Zhang, Meiqi Wang, Liu Yang, Xinle Wang, Hua Yang, Xinyue Hong, Zhenchuan Song

PMC · DOI: 10.1186/s13058-025-02190-8 · Breast Cancer Research : BCR · 2025-12-09

## TL;DR

This study shows that analyzing cell-free DNA fragments with machine learning can detect breast cancer early with high accuracy.

## Contribution

A stacked ensemble model combining cfDNA fragmentomic features and multiple machine learning algorithms achieves high sensitivity for early breast cancer detection.

## Key findings

- The ensemble model achieved 93.3% sensitivity at 94.6% specificity in the training cohort.
- The model demonstrated 96.5% sensitivity at 93.7% specificity in the validation cohort.
- The model performed well across different cancer stages, types, and molecular classifications.

## Abstract

Cell-free DNA (cfDNA) fragmentomics represents a transformative approach for early breast cancer detection, offering significant potential to improve patient survival through timely intervention. Despite this promise, existing cfDNA-based methods demonstrate inadequate sensitivity for clinical implementation, particularly in early-stage malignancies. There remains an urgent need to develop robust, cost-effective diagnostic strategies integrating cfDNA fragmentomic profiling with advanced machine learning algorithms.

This research involved a total of 191 participants who did not have cancer and 204 participants diagnosed with breast cancer. The plasma cfDNA samples from the participants underwent profiling through whole-genome sequencing. A variety of cfDNA characteristics and machine learning models were assessed within the training cohort to attain the best model. The evaluation of model performance took place in a separate validation cohort.

An assembled ensemble model that combines three cfDNA characteristics with six machine learning algorithms, developed in the training cohort (cancer: 119; healthy: 112), outperformed all models created from individual feature-algorithm pairs. This composite model demonstrated enhanced sensitivities of 93.3% at a specificity of 94.6% for the training cohort (area under the curve [AUC], 0.983) and 96.5% at 93.7% specificity for the validation cohort (AUC, 0.989) (cancer: 85; healthy: 79). Additionally, our model exhibited sensitivity across various stages, distinct pathological types, and diverse molecular classifications.

We have established a stacked ensemble model using cfDNA fragmentomics features and achieved superior sensitivity for detecting early-stage breast cancer, which could promote early diagnosis and benefit more patients.

The online version contains supplementary material available at 10.1186/s13058-025-02190-8.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Diseases:** cancer (MESH:D009369), breast cancer (MESH:D001943)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12801790/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12801790/full.md

## References

5 references — full list in the complete paper: https://tomesphere.com/paper/PMC12801790/full.md

---
Source: https://tomesphere.com/paper/PMC12801790