Data Augmentation for Compositional Data: Advancing Predictive Models of   the Microbiome

Elliott Gordon-Rodriguez; Thomas P. Quinn; John P. Cunningham

arXiv:2205.09906·stat.ML·May 23, 2022·6 cites

Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome

Elliott Gordon-Rodriguez, Thomas P. Quinn, John P. Cunningham

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces new data augmentation strategies tailored for microbiome compositional data, leading to improved predictive performance and a novel contrastive learning model, setting new benchmarks in disease prediction tasks.

Contribution

It develops and applies novel augmentation methods for compositional data, advancing microbiome analysis and representation learning.

Findings

01

Achieved state-of-the-art results in disease prediction tasks.

02

Enhanced model performance across multiple benchmark datasets.

03

Developed a contrastive learning approach for microbiome data.

Abstract

Data augmentation plays a key role in modern machine learning pipelines. While numerous augmentation strategies have been studied in the context of computer vision and natural language processing, less is known for other data modalities. Our work extends the success of data augmentation to compositional data, i.e., simplex-valued data, which is of particular interest in the context of the human microbiome. Drawing on key principles from compositional data analysis, such as the Aitchison geometry of the simplex and subcompositions, we define novel augmentation strategies for this data modality. Incorporating our data augmentations into standard supervised learning pipelines results in consistent performance gains across a wide range of standard benchmark datasets. In particular, we set a new state-of-the-art for key disease prediction tasks including colorectal cancer, type 2 diabetes,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cunningham-lab/augcoda
noneOfficial

Videos

Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome· slideslive

Taxonomy

TopicsOral microbiology and periodontitis research · Dental Radiography and Imaging · HIV/AIDS oral health manifestations

MethodsContrastive Learning