# Decoding emergent properties of microbial community functions through subcommunity observations and interpretable machine learning

**Authors:** Hidehiro Ishizawa, Sunao Noguchi, Miku Kito, Yui Nomura, Kodai Kimura, Masahiro Takeo

PMC · DOI: 10.1093/ismejo/wraf236 · The ISME Journal · 2025-10-23

## TL;DR

Researchers used small microbial subcommunities and machine learning to predict and understand complex community functions, like pollutant degradation.

## Contribution

A novel framework using subcommunity observations and interpretable machine learning to decode emergent microbial community functions.

## Key findings

- Three to four species in subcommunities can predict functions in larger microbial communities with high accuracy.
- Key species and interactions influencing community function were identified through model interpretation.
- The method remains effective even with limited subcommunity data, applicable to diverse microbial systems.

## Abstract

The functions of microbial communities, including substrate conversion and pathogen suppression, arise not as a simple sum of individual species’ capabilities but through complex interspecies interactions. Understanding how such functions arise from individual species and their interactions remains a major challenge, limiting efforts to rationally understand microbial roles in both natural and engineered ecosystems. Because current holistic (meta-omics) and reductionist (isolation- or single-cell-based) approaches struggle to capture these emergent microbial community functions, this study explores an intermediate strategy: analyzing simple subcommunity combinations to enable a bottom-up understanding of community-level functions. To examine the validity of this approach, we used a nine-member synthetic microbial community capable of degrading the environmental pollutant aniline, and systematically generated a dataset of 256 subcommunity combinations and their associated functions. Analyses using random forest models revealed that the subcommunity combinations of just three to four species enabled the quantitative prediction of functions in larger communities (5–9-member; Pearson’s r = 0.78–0.80). Prediction performance remained robust even with limited subcommunity data, suggesting applicability to more diverse microbial communities where exhaustive subcommunity observation is infeasible. Moreover, interpreting models trained on these simple subcommunity combinations enabled the identification of key species and interspecies interactions that strongly influence the overall community function. These findings provide a methodological framework for mechanistically dissecting complex microbial community functions through subcommunity-based analysis.

## Linked entities

- **Chemicals:** aniline (PubChem CID 6115)

## Full-text entities

- **Chemicals:** aniline (MESH:C023650)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12599306/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12599306/full.md

## References

49 references — full list in the complete paper: https://tomesphere.com/paper/PMC12599306/full.md

---
Source: https://tomesphere.com/paper/PMC12599306