The Metagenomic Binning Problem: Clustering Markov Sequences
G. Greenberg, I. Shomorony

TL;DR
This paper models the metagenomic binning problem as clustering Markov sequences, establishing the theoretical limits for perfect binning based on contig length and proposing the use of conditional relative entropy for improved clustering accuracy.
Contribution
It introduces an information-theoretic framework for metagenomic binning, deriving the minimum contig length needed and recommending a new distance measure for clustering.
Findings
Contig length must scale with inverse Chernoff Information for perfect binning.
Conditional relative entropy outperforms Euclidean distance in clustering.
Theoretical limits guide practical binning strategies.
Abstract
The goal of metagenomics is to study the composition of microbial communities, typically using high-throughput shotgun sequencing. In the metagenomic binning problem, we observe random substrings (called contigs) from a mixture of genomes and want to cluster them according to their genome of origin. Based on the empirical observation that genomes of different bacterial species can be distinguished based on their tetranucleotide frequencies, we model this task as the problem of clustering N sequences generated by M distinct Markov processes, where M<<N. Utilizing the large-deviation principle for Markov processes, we establish the information-theoretic limit for perfect binning. Specifically, we show that the length of the contigs must scale with the inverse of the Chernoff Information between the two most similar species. Our result also implies that contigs should be binned using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGut microbiota and health · Genomics and Phylogenetic Studies · Probiotics and Fermented Foods
