A New Approach for Scalable Analysis of Microbial Communities
Ehsaneddin Asgari, Kiavash Garakani, Mohammad R.K Mofrad

TL;DR
This paper introduces a scalable, reference-free n-gram based method for analyzing microbial communities from 16S rRNA data, significantly reducing data processing size and enabling advanced classification tasks.
Contribution
The authors present a novel n-gram sequence analysis technique that eliminates the need for taxonomic alignment, offering a more scalable approach for microbial community analysis.
Findings
Reduced data processing size by 105-fold
Effective classification across body sites and health states
Proposed continuous vector representations for deep learning
Abstract
Microbial communities play important roles in the function and maintenance of various biosystems, ranging from human body to the environment. Current methods for analysis of microbial communities are typically based on taxonomic phylogenetic alignment using 16S rRNA metagenomic or Whole Genome Sequencing data. In typical characterizations of microbial communities, studies deal with billions of micobial sequences, aligning them to a phylogenetic tree. We introduce a new approach for the efficient analysis of microbial communities. Our new reference-free analysis tech- nique is based on n-gram sequence analysis of 16S rRNA data and reduces the processing data size dramatically (by 105 fold), without requiring taxonomic alignment. The proposed approach is applied to characterize phenotypic microbial community differ- ences in different settings. Specifically, we applied this approach in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · Algorithms and Data Compression
