CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads
Sourav Chatterji, Ichitaro Yamazaki, Zhaojun Bai, Jonathan Eisen

TL;DR
CompostBin is a novel DNA composition-based algorithm that accurately bins raw metagenomic reads into taxon-specific groups without requiring assembly or prior training, facilitating microbial diversity studies.
Contribution
It introduces a new method that bins raw environmental shotgun sequencing reads directly, bypassing the need for assembly or reference genomes, using PCA and normalized cut clustering.
Findings
Accurately bins simulated metagenomic data
Successfully classifies real metagenomic reads with known species
Operates without assembly or training on reference genomes
Abstract
A major hindrance to studies of microbial diversity has been that the vast majority of microbes cannot be cultured in the laboratory and thus are not amenable to traditional methods of characterization. Environmental shotgun sequencing (ESS) overcomes this hurdle by sequencing the DNA from the organisms present in a microbial community. The interpretation of this metagenomic data can be greatly facilitated by associating every sequence read with its source organism. We report the development of CompostBin, a DNA composition-based algorithm for analyzing metagenomic sequence reads and distributing them into taxon-specific bins. Unlike previous methods that seek to bin assembled contigs and often require training on known reference genomes, CompostBin has the ability to accurately bin raw sequence reads without need for assembly or training. It applies principal component analysis to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnvironmental DNA in Biodiversity Studies · Genomics and Phylogenetic Studies · Microbial Community Ecology and Physiology
