# Leveraging cfDNA fragmentomic features for the early detection of colorectal cancer

**Authors:** Lina Shan, Dengyong Xu, Jie Chen, Wenjia Liu, Ji Lin, Juhang Bao, Jianfei Huang, Hanqing Zhang, Hanchen Zhao, Wei Xue, Ziao Lin, Bingjun Bai

PMC · DOI: 10.3389/fimmu.2026.1705156 · 2026-01-28

## TL;DR

This study uses machine learning on cell-free DNA to detect colorectal cancer early, showing strong accuracy and identifying key molecular patterns.

## Contribution

The study introduces a novel machine learning algorithm using cfDNA fragmentomic features for early CRC detection.

## Key findings

- The machine learning model achieved high AUC values (0.959-0.979) across training and validation cohorts.
- Malignant samples showed distinct end motif profiles, while benign samples had elevated Alu and LTR elements.
- The model demonstrated strong classification accuracy for advanced-stage colorectal cancer.

## Abstract

Early detection of colorectal cancer (CRC) is crucial for improving patient outcomes. Cell-free DNA (cfDNA) analysis has emerged as a promising non-invasive approach for cancer detection. This study aims to develop a machine learning algorithm leveraging cfDNA fragmentomic features to accurately detect CRC.

573 individuals from Sir Run Run Shaw Hospital, two community healthcare centers and three additional medical centers, were collected between April 1, 2023, and December 12, 2025. Participants were divided into training, internal validation, and external validation cohorts. A variety of cfDNA fragmentomic features were analyzed and incorporated into machine learning models. The models were evaluated using 10-fold cross-validation and assessed for accuracy, sensitivity, specificity, and AUC values. We also performed differential analysis of key genomic features, such as Alu elements and long terminal repeats (LTRs), between benign and malignant CRC samples.

The machine learning algorithm demonstrated robust discriminative performance across all datasets using generalized linear modeling (GLM), achieving AUC values of 0.959 (training set), 0.979 (internal validation cohort), and 0.959 (external validation cohort). Notably, the model exhibited particularly strong classification accuracy for advanced-stage colorectal cancer (CRC). Comparative cfDNA profiling revealed distinct molecular signatures between benign and malignant samples: benign samples were characterized by elevated frequencies of Alu elements and long terminal repeats (LTRs), whereas malignant samples showed distinct end motif profiles, characterized by the significant enrichment of specific 4-mer end motifs. These findings suggest that these molecular features may serve as potential biomarkers for malignancy detection.

This study demonstrates that cfDNA fragmentomic profiling, particularly differential patterns of Alu and LTR elements, effectively discriminates benign from malignant colorectal lesions. These findings validate the clinical utility of repetitive element analysis and provide a foundation for developing improved non-invasive CRC diagnostics through machine learning approaches incorporating genomic features.

## Linked entities

- **Diseases:** colorectal cancer (MONDO:0005575), CRC (MONDO:0005575)

## Full-text entities

- **Diseases:** CRC (MESH:D015179), cancer (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12891135/full.md

---
Source: https://tomesphere.com/paper/PMC12891135