RepBin: Constraint-based Graph Representation Learning for Metagenomic Binning
Hansheng Xue, Vijini Mallawaarachchi, Yujia Zhang, Vaibhav Rajan, Yu, Lin

TL;DR
RepBin introduces a novel graph-based approach for metagenomic binning that incorporates biological constraints and heterophilous signals, significantly improving clustering accuracy in complex microbial communities.
Contribution
This paper presents a new constraint-based graph representation learning and clustering framework specifically designed for metagenomic binning, addressing biological constraints and skewed cluster sizes.
Findings
Outperforms existing binning methods on real and synthetic datasets.
Effectively incorporates biological constraints into graph learning.
Advances state-of-the-art in both metagenomics and graph representation learning.
Abstract
Mixed communities of organisms are found in many environments (from the human gut to marine ecosystems) and can have profound impact on human health and the environment. Metagenomics studies the genomic material of such communities through high-throughput sequencing that yields DNA subsequences for subsequent analysis. A fundamental problem in the standard workflow, called binning, is to discover clusters, of genomic subsequences, associated with the unknown constituent organisms. Inherent noise in the subsequences, various biological constraints that need to be imposed on them and the skewed cluster size distribution exacerbate the difficulty of this unsupervised learning problem. In this paper, we present a new formulation using a graph where the nodes are subsequences and edges represent homophily information. In addition, we model biological constraints providing heterophilous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBioinformatics and Genomic Networks · Genomics and Phylogenetic Studies · Gene expression and cancer classification
