# Prediction of soil probiotics based on foundation model representation enhancement and stacked aggregation classifier

**Authors:** Qiang Kang, Haotong Sun, Yayu Wang, Xiaolong Fang, Yuxiang Li, Yong Zhang, Tong Wei, Peng Yin

PMC · DOI: 10.1093/bib/bbaf567 · Briefings in Bioinformatics · 2025-10-29

## TL;DR

This paper introduces a new method for predicting soil probiotics using advanced machine learning techniques, improving accuracy and biological insights.

## Contribution

A novel stacked aggregation classifier is proposed, using enhanced genomic representations for reliable soil probiotic prediction.

## Key findings

- The method performs well on both balanced and imbalanced test sets.
- Functional genes in predicted probiotics are identified, offering biological insights.

## Abstract

Soil probiotics are indispensable in agro-ecosystems, enhancing crop yield through nutrient solubilization, pathogen suppression, and soil structure improvement. However, reliable prediction methods for soil probiotics are still lacking. In this study, we use genomic foundation models to generate representations from sample sequences and enhance them by deeply integrating domain-specific engineered features. The enhanced representations enable training a powerful classifier for a target task, rather than relying on conventional parameter fine-tuning. Inspired by the stacking ensemble learning framework, we design a stacked aggregation classifier. It predicts a sample’s label by leveraging only a subset of its sequence segments, effectively addressing the challenges in processing long or incompletely assembled sequences. The proposed method is applied to the prediction of soil probiotics and demonstrates excellent performance on both balanced and imbalanced test sets. Furthermore, potential functional genes are revealed from the predicted probiotics, providing valuable biological insights for related studies.

## Full-text entities

- **Genes:** MLC1 (modulator of VRAC current 1) [NCBI Gene 23209] {aka LVM, MLC, VL}
- **Chemicals:** 2,4-diacetylphloroglucinol (MESH:C059817), Nucleotide (MESH:D009711), phosphate (MESH:D010710), EVO (-), N (MESH:D009584)
- **Species:** Homo sapiens (human, species) [taxon 9606], Bacillus (genus) [taxon 55087], Ralstonia solanacearum (species) [taxon 305], Pseudomonas fluorescens (species) [taxon 294], Bacillus amyloliquefaciens (species) [taxon 1390], Paenibacillus peoriae (species) [taxon 59893], Azotobacter (genus) [taxon 352], Bacillus subtilis (species) [taxon 1423]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12570017/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12570017/full.md

## References

51 references — full list in the complete paper: https://tomesphere.com/paper/PMC12570017/full.md

---
Source: https://tomesphere.com/paper/PMC12570017