# Gut microbial and functional signatures in breast cancer: an integrated metagenomic and machine learning approach to non-invasive detection

**Authors:** Yalin Li, Yi Cheng, Weichi Liu, Jingjin Li, Shiqi Li, Suriguga, Teng Ma, Lai-Yu Kwok, Zhihui Cai, Zhihong Sun

PMC · DOI: 10.3389/fmicb.2025.1722632 · 2026-01-15

## TL;DR

This study explores gut microbial and metabolic changes in breast cancer patients, using machine learning to detect non-invasive biomarkers for early diagnosis.

## Contribution

The novel integration of metagenomic data and machine learning identifies non-invasive gut microbial and metabolic signatures for breast cancer detection.

## Key findings

- Breast cancer patients showed altered gut microbiota with depletion of beneficial taxa like Limosilactobacillus fermentum and Blautia sp., and enrichment of Prevotella copri.
- Metabolic pathways for short-chain fatty acids and purine were reduced, with lower levels of butyrate, propionate, and nicotinate in patients.
- A machine learning model combining microbial and metabolic features achieved 0.78 AUC in discovery and 0.73 AUC in validation for breast cancer detection.

## Abstract

Breast cancer is associated with significant restructuring of the gut ecosystem. Gut microbial composition and function may influence cancer development and progression through immune modulation, metabolic regulation, and inflammation-related pathways.

Using shotgun metagenomic sequencing of fecal samples from 38 stage I–III breast cancer patients and 36 age- and body mass index-matched healthy controls. Machine learning models were constructed to evaluate the diagnostic potential of integrated microbial and metabolic features.

Significant alterations were observed in gut microbiota composition, including depletion of beneficial taxa (Limosilactobacillus fermentum, Blautia sp.) and enrichment of Prevotella copri. Pathways involved in short-chain fatty acid and purine metabolism were reduced. The gut phageome exhibited structural changes and altered correlations with bacterial hosts. Predictive analysis revealed depletion of short-chain fatty acids (butyrate, propionate), purine intermediates (hypoxanthine, xanthine), and nicotinate in patients. A machine learning model integrating microbial and predicted metabolic features achieved an area under the curve values of 0.78 in the discovery cohort and 0.73 (recall = 0.74) in an independent validation cohort.

Coordinated gut microbiome, phageome, and metabolome alterations characterize breast cancer, offering potential non-invasive biomarkers and mechanistic insights for disease detection and intervention.

## Linked entities

- **Chemicals:** butyrate (PubChem CID 104775), propionate (PubChem CID 104745), hypoxanthine (PubChem CID 135398638), xanthine (PubChem CID 1188), nicotinate (PubChem CID 937)
- **Diseases:** breast cancer (MONDO:0004989)
- **Species:** Limosilactobacillus fermentum (taxon 1613), Blautia sp. (taxon 1955243)

## Full-text entities

- **Diseases:** Breast cancer (MESH:D001943), inflammation (MESH:D007249), cancer (MESH:D009369)
- **Chemicals:** nicotinate (MESH:D009525), propionate (MESH:D011422), hypoxanthine (MESH:D019271), butyrate (MESH:D002087), purine (MESH:C030985), short-chain fatty acid (MESH:D005232), xanthine (MESH:D019820)
- **Species:** Segatella copri (species) [taxon 165179], gut metagenome (species) [taxon 749906], Homo sapiens (human, species) [taxon 9606], Blautia sp. (species) [taxon 1955243]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12853649/full.md

---
Source: https://tomesphere.com/paper/PMC12853649