MEGAHIT: An ultra-fast single-node solution for large and complex   metagenomics assembly via succinct de Bruijn graph

Dinghua Li; Chi-Man Liu; Ruibang Luo; Kunihiko Sadakane; Tak-Wah; Lam

arXiv:1409.7208·q-bio.GN·December 24, 2014·Bioinform.·457 cites

MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

Dinghua Li, Chi-Man Liu, Ruibang Luo, Kunihiko Sadakane, Tak-Wah, Lam

PDF

Open Access

TL;DR

MEGAHIT is a highly efficient assembler for large metagenomics datasets that produces larger, more accurate assemblies in significantly less time on a single node, utilizing succinct de Bruijn graphs.

Contribution

It introduces a novel single-node assembler that efficiently handles large, complex metagenomics data without pre-processing, improving assembly quality and speed.

Findings

01

Assembled 252Gb soil metagenomics data in 44.1 hours on a single node.

02

Generated 3 times larger assemblies with longer contigs than previous methods.

03

Achieved 4 times higher read alignment rate to the assembly.

Abstract

MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a single computing node with and without a GPU, respectively. MEGAHIT assembles the data as a whole, i.e., it avoids pre-processing like partitioning and normalization, which might compromise on result integrity. MEGAHIT generates 3 times larger assembly, with longer contig N50 and average contig length than the previous assembly. 55.8% of the reads were aligned to the assembly, which is 4 times higher than the previous. The source code of MEGAHIT is freely available at https://github.com/voutcn/megahit under GPLv3 license.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · Gene expression and cancer classification · Microbial Community Ecology and Physiology