TaxaPLN: a taxonomy-aware augmentation strategy for microbiome-trait classification including metadata
Alexandre Chaussard, Anna Bonnet, Sylvain Le Corff, Harry Sokol

TL;DR
TaxaPLN is a novel taxonomy-aware data augmentation method for microbiome-trait classification that improves predictive performance by generating realistic synthetic microbiome data while preserving ecological properties.
Contribution
We introduce TaxaPLN, a new model-based augmentation strategy that incorporates taxonomic relationships and covariate information for microbiome data analysis.
Findings
TaxaPLN enhances predictive accuracy, especially with non-linear classifiers.
It preserves ecological properties of microbiome data.
Conditional augmentation provides covariate-aware synthetic data.
Abstract
The gut microbiome plays a crucial role in human health, making it a corner stone of modern biomedical research. To study its structure and dynamics, machine learning models are increasingly used to identify key microbial patterns associated with disease and environmental factors. However, microbiome data present unique challenges due to their compositionality, high-dimensionality, sparsity, and high variability, which can obscure meaningful signals. Besides, the effectiveness of machine learning models is often constrained by limited sample sizes, as microbiome data collection remains costly and time consuming. In this context, data augmentation has emerged as a promising strategy to enhance model robustness and predictive performance by generating artificial microbiome data. The aim of this study is to improve predictive modeling from microbiome data by introducing a model-based data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
