# DynaBiome: interpretable unsupervised learning of gut microbiome dysbiosis via temporal deep models

**Authors:** Awais Qureshi, Abdul Wahid, Shams Qazi, Muhammad K. Shahzad, Hashir Moheed Kiani, Muhammad Daud Abdullah Asif

PMC · DOI: 10.1186/s12859-026-06400-8 · BMC Bioinformatics · 2026-02-27

## TL;DR

DynaBiome uses unsupervised learning and clinical data to predict gut microbiome dysbiosis without needing detailed genetic labels.

## Contribution

The novel framework combines unsupervised learning with phenotypic proxies to predict dysbiosis, reducing reliance on scarce genomic annotations.

## Key findings

- The stacked ensemble model achieved an ROC AUC of 0.8908 and a Weighted F1-score of 0.7909.
- DynaBiome outperformed the One-Class SVM baseline by a significant margin.
- The model's performance was comparable to fully supervised baselines.

## Abstract

Gut microbiome dysbiosis is a critical determinant for autologous fecal microbiota transplantation (Auto-FMT) eligibility, yet current classification approaches rely predominantly on supervised learning with manually annotated sequencing labels, which are often scarce. This study proposes DynaBiome, a framework designed to predict gut dysbiosis by leveraging unsupervised learning and clinical phenotypic proxies as a scalable alternative to ground-truth genomic labeling.

Our framework employs an LSTM autoencoder architecture to capture temporal microbiome dynamics within 14-day windows. The model reconstructs normal microbiome patterns, where high reconstruction errors signal potential dysbiosis. To ensure rigorous evaluation and prevent data leakage, the dataset was partitioned via a strict patient-level split. Unsupervised anomaly signals were refined via phenotypic proxy labels (e.g., fever, neutropenia) via weak supervision, and ensemble learning methods were applied to optimize classification performance.

The initial LSTM autoencoder successfully flagged dysbiotic sequences but required refinement to reduce false positives. Ensemble learning significantly enhanced predictive accuracy. The stacked ensemble (with Logistic Regression meta-learner) demonstrated optimal performance with an ROC AUC of 0.8908 and a Weighted F1-score of 0.7909. This approach significantly outperformed the standard One-Class SVM baseline (ROC AUC 0.6033), confirming the superiority of deep temporal modeling over static anomaly detection. Critically, the model achieved performance levels comparable to fully supervised baselines, confirming the efficacy of the proxy-label framework.

Integrating unsupervised temporal feature extraction with stacked ensemble methods provides a viable framework for dysbiosis prediction. These results demonstrate that leveraging phenotypic via weak supervision can effectively approximate supervised baselines, thereby reducing the reliance on comprehensive metagenomic annotations for longitudinal patient monitoring.

## Linked entities

- **Diseases:** neutropenia (MONDO:0001475)

## Full-text entities

- **Species:** gut metagenome (species) [taxon 749906]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13023179/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13023179/full.md

## References

8 references — full list in the complete paper: https://tomesphere.com/paper/PMC13023179/full.md

---
Source: https://tomesphere.com/paper/PMC13023179