Extreme Scale De Novo Metagenome Assembly

Evangelos Georganas; Rob Egan; Steven Hofmeyr; Eugene Goltsman; Bill; Arndt; Andrew Tritt; Aydin Buluc; Leonid Oliker; Katherine Yelick

arXiv:1809.07014·cs.DC·September 20, 2018

Extreme Scale De Novo Metagenome Assembly

Evangelos Georganas, Rob Egan, Steven Hofmeyr, Eugene Goltsman, Bill, Arndt, Andrew Tritt, Aydin Buluc, Leonid Oliker, Katherine Yelick

PDF

TL;DR

MetaHipMer is a scalable, high-performance metagenome assembler that can handle extremely large datasets, outperforming existing tools in accuracy and efficiency, and enabling the assembly of previously intractable metagenomes.

Contribution

We introduce MetaHipMer, a novel parallel metagenome assembler capable of assembling terabyte-scale datasets using an iterative de Bruijn graph approach.

Findings

01

MetaHipMer matches or exceeds the accuracy of state-of-the-art tools.

02

It scales efficiently to large numbers of processors.

03

It successfully assembled the first full metagenome from the Twitchell Wetlands dataset.

Abstract

Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require big shared memory machines and cannot handle contemporary metagenome datasets that exceed Terabytes in size. In this paper, we introduce the MetaHipMer pipeline, a high-quality and high-performance metagenome assembler that employs an iterative de Bruijn graph approach. MetaHipMer leverages a specialized scaffolding algorithm that produces long scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is end-to-end parallelized using the Unified Parallel C language and therefore can run seamlessly on shared and distributed-memory systems. Experimental results show that MetaHipMer matches or outperforms the state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.