Extreme Scale De Novo Metagenome Assembly
Evangelos Georganas, Rob Egan, Steven Hofmeyr, Eugene Goltsman, Bill, Arndt, Andrew Tritt, Aydin Buluc, Leonid Oliker, Katherine Yelick

TL;DR
MetaHipMer is a scalable, high-performance metagenome assembler that can handle extremely large datasets, outperforming existing tools in accuracy and efficiency, and enabling the assembly of previously intractable metagenomes.
Contribution
We introduce MetaHipMer, a novel parallel metagenome assembler capable of assembling terabyte-scale datasets using an iterative de Bruijn graph approach.
Findings
MetaHipMer matches or exceeds the accuracy of state-of-the-art tools.
It scales efficiently to large numbers of processors.
It successfully assembled the first full metagenome from the Twitchell Wetlands dataset.
Abstract
Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require big shared memory machines and cannot handle contemporary metagenome datasets that exceed Terabytes in size. In this paper, we introduce the MetaHipMer pipeline, a high-quality and high-performance metagenome assembler that employs an iterative de Bruijn graph approach. MetaHipMer leverages a specialized scaffolding algorithm that produces long scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is end-to-end parallelized using the Unified Parallel C language and therefore can run seamlessly on shared and distributed-memory systems. Experimental results show that MetaHipMer matches or outperforms the state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
