# Identification and predictive machine learning model construction of gut microbiota associated with carcinoembryonic antigens in colorectal cancer

**Authors:** Yongzhi Wu, Zigui Huang, Yongqi Huang, Chuanbin Chen, Mingjian Qin, Zhen Wang, Fuhai He, Shenghai Liu, Rumao Zhong, Jun Liu, Chenyan Long, Jungang Liu, Xiaoliang Huang

PMC · DOI: 10.1128/msphere.00454-25 · mSphere · 2025-09-17

## TL;DR

This study identifies gut bacteria linked to high CEA levels in colorectal cancer and uses machine learning to predict CEA levels noninvasively.

## Contribution

The study introduces R. callidus as a novel gut microbiota species associated with high CEA levels and develops ML models for CEA prediction.

## Key findings

- Ruminococcus callidus is significantly enriched in high-CEA colorectal cancer patients.
- High-CEA patients show elevated resting memory CD4+ T cells, while low-CEA patients show increased T follicular helper cells.
- Machine learning models using gut microbiota features achieved high AUC values for predicting CEA levels.

## Abstract

Carcinoembryonic antigen (CEA) is a critical colorectal cancer (CRC) biomarker, but its mechanistic link to gut microbiota remains unclear. This study characterized gut microbiota differences between high-CEA (H-CEA) and low-CEA (L-CEA) CRC patients and explored their associations with host immunity and tumor progression mechanisms. Stool samples from 187 CRC patients were subjected to 16S rRNA sequencing, identifying 30 differentially abundant bacteria using LEfSe analysis. Ruminococcus callidus was significantly enriched in H-CEA patients. Transcriptome sequencing of tumor tissues from 25 patients revealed distinct immune micro-environments: H-CEA patients showed elevated resting memory CD4+ T cells, while L-CEA patients showed increased T follicular helper cells. Functional enrichment analysis identified differential GO terms (26 in L-CEA; 31 in H-CEA) and KEGG pathways (three in H-CEA). R. callidus correlated positively with mast cell infiltration, CXCL1 chemokine, and long-chain fatty acid upregulation. The area under the curve (AUC) values of the subjects in the training set for the RF and XGBoost models constructed based on differential gut microbiota for predicting high and low CEA levels were 0.969 and 0.815, respectively, and the AUC for the test set were 0.715 and 0.639. These findings demonstrate that CEA-level-specific gut microbiota dysbiosis modulates CRC progression through immune micro-environment alterations and related biological pathway regulation. Gut microbiota, as a noninvasive biomarker, can be used to construct an effective machine learning (ML) model for predicting blood CEA levels.

This study reveals R. callidus as a key gut microbiota species enriched in CRC patients with high CEA levels, demonstrating its novel pro-tumor associations through positive correlations with mast cell infiltration and CXCL1 chemokine and upregulation of long-chain fatty acid metabolism. Concurrently, we identify distinct immune micro-environments: elevated resting memory CD4+ T cells in high-CEA patients versus increased T follicular helper cells in low-CEA cohorts. Critically, by leveraging 30 differential microbial features, we develop ML models for noninvasive prediction of CEA levels. These findings establish gut microbiota as both a mechanistic mediator of CEA-driven CRC progression and a foundation for microbiome-based diagnostic tools.

## Linked entities

- **Diseases:** colorectal cancer (MONDO:0005575)
- **Species:** Ruminococcus callidus (taxon 40519)

## Full-text entities

- **Genes:** CD4 (CD4 molecule) [NCBI Gene 920] {aka CD4mut, IMD79, Leu-3, OKT4D, T4}
- **Diseases:** R. callidus (MESH:C580424), tumor (MESH:D009369), CRC (MESH:D015179)
- **Chemicals:** H (MESH:D006859), long-chain fatty acid (-)
- **Species:** Ruminococcus callidus (species) [taxon 40519], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12570507/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12570507/full.md

## References

68 references — full list in the complete paper: https://tomesphere.com/paper/PMC12570507/full.md

---
Source: https://tomesphere.com/paper/PMC12570507