Low-density locality-sensitive hashing boosts metagenomic binning
Yunan Luo, Jianyang Zeng, Bonnie Berger, Jian Peng

TL;DR
This paper introduces Opal, a novel, fast, and accurate compositional-based metagenomic binning method using low-density locality-sensitive hashing inspired by error correcting codes, outperforming traditional methods in speed and accuracy.
Contribution
Opal applies Gallager's low-density parity-check code principles to design discriminative hashing functions for improved metagenomic binning accuracy and robustness.
Findings
Opal is up to 100 times faster than BWA.
Opal achieves higher binning accuracy than traditional k-mer based models.
Opal is robust to sequencing errors and mutations.
Abstract
Metagenomic binning is an essential task in analyzing metagenomic sequence datasets. To analyze structure or function of microbial communities from environmental samples, metagenomic sequence fragments are assigned to their taxonomic origins. Although sequence alignment algorithms can readily be used and usually provide high-resolution alignments and accurate binning results, the computational cost of such alignment-based methods becomes prohibitive as metagenomic datasets continue to grow. Alternative compositional-based methods, which exploit sequence composition by profiling local short k-mers in fragments, are often faster but less accurate than alignment-based methods. Inspired by the success of linear error correcting codes in noisy channel communication, we introduce Opal, a fast and accurate novel compositional-based binning method. It incorporates ideas from Gallager's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Gene expression and cancer classification · Algorithms and Data Compression
