# Prediction of sporulating Firmicutes from uncultured gut microbiota using SpoMAG, an ensemble learning tool

**Authors:** Douglas Terra Machado, Otávio José Bernardes Brustolini, Ellen dos Santos Corrêa, Ana Tereza Ribeiro Vasconcelos

PMC · DOI: 10.7717/peerj.20232 · PeerJ · 2025-10-17

## TL;DR

This paper introduces SpoMAG, a machine learning tool that predicts which gut bacteria can form spores, helping to understand their survival and transmission in different hosts.

## Contribution

SpoMAG is a novel ensemble learning framework that predicts sporulation potential in uncultured Firmicutes using 160 sporulation-associated genes.

## Key findings

- SpoMAG achieved high performance with an AUC of 92.2% and F1-score of 88.2%.
- 63 putatively spore-forming MAGs were identified across human, cattle, poultry, and swine fecal metagenomes.
- Nine genes were consistently present across all predicted spore-formers, indicating conserved genetic elements.

## Abstract

Sporulation represents a key adaptive strategy among Firmicutes, facilitating bacterial persistence under environmental stress while mediating host colonization, transmission dynamics, and microbiome stability. Despite the recognized ecological and biomedical significance of spore-forming Bacilli and Clostridia, most taxa remain uncultivated, limiting phenotypic characterization of their sporulation capacity. To bridge this knowledge gap, we developed SpoMAG, an ensemble machine learning framework that predicts sporulation potential of metagenome-assembled genomes (MAGs) through supervised classification models trained on the presence/absence of 160 sporulation-associated genes. This R-based tool integrates Random Forest and support vector machine algorithms, achieving probabilistic predictions with high performance (AUC = 92.2%, F1-score = 88.2%). Application to fecal metagenomes from humans, cattle, poultry, and swine identified 63 putatively spore-forming MAGs exhibiting distinct host- and order-specific patterns. Bacilli MAGs from Bacillales and Paenibacillales orders showed high sporulation probabilities and gene richness, while Clostridia MAGs exhibited more heterogeneous profiles. Predictions included undercharacterized families in the spore-forming perspective, such as Acetivibrionaceae, Christensenellaceae, and UBA1381, expanding the known phylogenetic breadth of sporulation capacity. Nine genes were consistently present across all predicted spore-formers (namely pth, yaaT, spoIIAB, spoIIIAE, spoIIIAD, ctpB, ftsW, spoVD, and lgt), suggesting conserved genetic elements across uncultivated Firmicutes for future research. Average nucleotide identity (ANI) analysis revealed seven cases of species-level sharing (ANI value > 95%) among hosts, including a putative novel Acetivibrionaceae species, suggesting possible cross-host transmission facilitated by sporulation. In all 63 genomes predicted to sporulate, we identified nine genes across sporulation steps. In addition, SHapley Additive exPlanations (SHAP) analysis indicated 16 consensus genes consistently contributing to predictions (namely lytH, cotP, spoIIIAG, spoIIR, spoVAD, gerC, yabP, yqfD, gerD, spoVAA, gpr, ytaF, gdh, ypeB, spoVID, and ymfJ), bringing biologically meaningful features across sporulation stages. By combining gene annotation with interpretable machine learning, SpoMAG provides a reproducible and accessible framework to infer sporulation potential in uncultured microbial taxa. This tool enhances targeted investigations into microbial survival strategies and supports research in microbiome ecology, probiotic discovery, food safety, and public health surveillance. SpoMAG is freely available as an R package and expands current capabilities for functional inference in metagenomic datasets.

## Linked entities

- **Genes:** PTH (parathyroid hormone) [NCBI Gene 5741], fixX (putative ferredoxin FixX) [NCBI Gene 948590], spoIIAB (anti-sigma factor (antagonist of sigma(F)) and serine kinase) [NCBI Gene 938930], spoIIIAE (stage III sporulation protein (feeding tube apparatus)) [NCBI Gene 938569], spoIIIAD (stage III sporulation protein (feeding tube apparatus)) [NCBI Gene 938577], ctpB (cation-transporter P-type ATPase B) [NCBI Gene 886928], ftsW (putative plastid division protein) [NCBI Gene 800983], spoVD (transpeptidase penicillin-binding protein) [NCBI Gene 936661], lgt (leg tumor) [NCBI Gene 249584], lytH (sporulation-specific L-Ala-D-Glu endopeptidase) [NCBI Gene 936510], cotP (spore coat protein) [NCBI Gene 938073], spoIIIAG (stage III sporulation engulfment assembly protein) [NCBI Gene 938595], spoIIR (regulator signal of pro-sigma(E) spoIIGA endopeptidase (stage II sporulation)) [NCBI Gene 937017], spoVAD (stage V sporulation protein AD (uptake of pyridine-2,6-dicarboxylic acid)) [NCBI Gene 938932], gerC (germination protein p109) [NCBI Gene 8626210], yabP (putative uncharacterized protein YabP) [NCBI Gene 945039], yqfD (stage IV sporulation protein; putative UDP-glucose-4-epimerase) [NCBI Gene 937867], gerD (lipoprotein factor mediating clustering of germination proteins) [NCBI Gene 938910], spoVAA (stage V sporulation protein AA) [NCBI Gene 938734], gpr (L-glyceraldehyde 3-phosphate reductase) [NCBI Gene 916499], ytaF (sporulation membrane protein YtaF) [NCBI Gene 4541849], GLUD1 (glutamate dehydrogenase 1) [NCBI Gene 2746], ypeB (PF12843 family protein YpeB) [NCBI Gene 1450285], spoVID (morphogenetic spore protein (stage VI sporulation)) [NCBI Gene 936772], ymfJ (putative enzyme) [NCBI Gene 939670]
- **Species:** Acetivibrionaceae (taxon 3120654), Christensenellaceae (taxon 990719)

## Full-text entities

- **Genes:** GLUD1 (glutamate dehydrogenase 1) [NCBI Gene 281785] {aka GDH, GDH 1, GDH1}, PTH (parathyroid hormone) [NCBI Gene 280903]
- **Species:** Bos taurus (bovine, species) [taxon 9913], Clostridia (class) [taxon 186801], Bacilli (class) [taxon 91061], Sus scrofa (pig, species) [taxon 9823], Homo sapiens (human, species) [taxon 9606], Bacillota (clostridial firmicutes, phylum) [taxon 1239]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12536801/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12536801/full.md

## References

85 references — full list in the complete paper: https://tomesphere.com/paper/PMC12536801/full.md

---
Source: https://tomesphere.com/paper/PMC12536801