# Comprehensive serum glycopeptide spectrum analysis with machine learning for non-invasive early detection of gastrointestinal cancers

**Authors:** Yuichi Hisamatsu, Kazuhiro Tanabe, Kensuke Kudo, Hirofumi Hasuda, Eiji Kusumoto, Hideo Uehara, Rintaro Yoshida, Mitsuhiko Ota, Yoshihisa Sakaguchi, Chihiro Hayashi, Mikio Mikami, Tetsuya Kusumoto

PMC · DOI: 10.1016/j.csbj.2025.10.067 · Computational and Structural Biotechnology Journal · 2025-11-01

## TL;DR

This study uses blood-based glycopeptide analysis and machine learning to detect gastrointestinal cancers non-invasively with high accuracy.

## Contribution

The study introduces CSGSA, a novel method combining glycopeptide profiling and machine learning for early cancer detection.

## Key findings

- Two glycopeptides showed high cancer specificity and improved diagnostic performance when integrated with tumor markers.
- Neural network models achieved AUC values of 0.966 for CRC, 0.992 for GC, and 0.995 for EC.
- CSGSA accurately differentiated cancer types, even in early stages, outperforming existing non-invasive benchmarks.

## Abstract

Gastrointestinal cancers, including colorectal cancer (CRC), gastric cancer (GC), and esophageal cancer (EC), are among the most common and lethal malignancies worldwide. Early detection is critical for improving patient outcomes, but the current diagnostic methods, such as endoscopy, are burdensome, costly, and inaccessible for widespread screening. Here, we have identified the transformative potential of non-invasive blood-based diagnostics by integrating advanced glycan biomarkers and machine learning.

This study analyzed serum samples from 296 CRC, 180 GC, and 42 EC patients, alongside 590 healthy controls. Nine conventional tumor markers were quantified and 1688 enriched glycopeptides (EGPs) were identified via liquid chromatography-mass spectrometry. Using Comprehensive Serum Glycopeptide Spectrum Analysis (CSGSA), EGPs were integrated with conventional markers into machine learning models, including neural networks, to develop and validate diagnostic frameworks.

Two glycopeptides, α1-antitrypsin at Asn271 and α2-macroglobulin at Asn70, were identified as highly cancer-specific biomarkers. Integrating these glycopeptides, tumor markers, and EGPs significantly improved the diagnostic performance. The neural network-based model achieved area under the curve values of 0.966, 0.992, and 0.995 for CRC, GC, and EC, respectively, with respective positive predictive values of 54.5 %, 35.3 %, and 11.1 %, exceeding non-invasive diagnostic benchmarks. Remarkably, the CSGSA approach differentiated cancer types with high accuracy, even in early-stage disease.

CSGSA represents a breakthrough in non-invasive gastrointestinal cancer diagnostics, combining glycopeptide profiling with machine learning to achieve unprecedented accuracy. This method provides a cost-effective and scalable alternative to invasive procedures and may have potential utility in general health screening, which warrants further investigation.

## Linked entities

- **Diseases:** colorectal cancer (MONDO:0005575), gastric cancer (MONDO:0001056), esophageal cancer (MONDO:0007576)

## Full-text entities

- **Genes:** A2M (alpha-2-macroglobulin) [NCBI Gene 2] {aka A2MD, CPAMD5, FWP007, S863-7}, SERPINA1 (serpin family A member 1) [NCBI Gene 5265] {aka A1A, A1AT, AAT, PI, PI1, PRO2275}
- **Diseases:** CRC (MESH:D015179), EC (MESH:D004938), GC (MESH:D013274), cancer (MESH:D009369), Gastrointestinal cancers (MESH:D005770)
- **Chemicals:** glycan (MESH:D011134)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12636384/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12636384/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/PMC12636384/full.md

---
Source: https://tomesphere.com/paper/PMC12636384