MetaPar: Metagenomic Sequence Assembly via Iterative Reclassification
Minji Kim, Jonathan G. Ligo, Amin Emad, Farzad Farnoud (Hassanzadeh),, Olgica Milenkovic, Venugopal V. Veeravalli

TL;DR
MetaPar is a parallel algorithm for metagenomic sequence assembly that reduces processing time and memory usage by iterative read reclassification and selective assembly, enabling large dataset analysis on low-resource computers.
Contribution
It introduces a novel iterative reclassification approach for metagenomic assembly that improves efficiency and scalability compared to existing methods.
Findings
Effective on synthetic data with 15 species
Reduces assembly time and memory usage
Compatible with existing assemblers like Velvet and IDBA-UD
Abstract
We introduce a parallel algorithmic architecture for metagenomic sequence assembly, termed MetaPar, which allows for significant reductions in assembly time and consequently enables the processing of large genomic datasets on computers with low memory usage. The gist of the approach is to iteratively perform read (re)classification based on phylogenetic marker genes and assembler outputs generated from random subsets of metagenomic reads. Once a sufficiently accurate classification within genera is performed, de novo metagenomic assemblers (such as Velvet or IDBA-UD) or reference based assemblers may be used for contig construction. We analyze the performance of MetaPar on synthetic data consisting of 15 randomly chosen species from the NCBI database through the effective gap and effective coverage metrics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Gene expression and cancer classification · Algorithms and Data Compression
