DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies
Chengxi Ye, Chris Hill, Shigang Wu, Jue Ruan, Zhanshan (Sam) Ma

TL;DR
This paper introduces a hybrid genome assembly method combining NGS and 3GS data, enabling efficient assembly of large genomes despite high error rates and reducing sequencing costs.
Contribution
It presents a novel hybrid assembly approach that leverages NGS and 3GS data, using a compact representation and a conversion from de Bruijn to overlap graphs.
Findings
Assembles mammalian-sized genomes efficiently and faster than existing methods.
Reduces sequencing costs by approximately 50%.
Successfully handles high-error long reads from third-generation sequencing.
Abstract
(An updated version of this manuscript has been accepted to Scientific Reports in 2016, please refer to http://www.nature.com/articles/srep31900) The highly anticipated transition from next generation sequencing (NGS) to third generation sequencing (3GS) has been difficult primarily due to high error rates and excessive sequencing cost. The high error rates make the assembly of long erroneous reads of large genomes challenging because existing software solutions are often overwhelmed by error correction tasks. Here we report a hybrid assembly approach that simultaneously utilizes NGS and 3GS data to address both issues. We gain advantages from three general and basic design principles: (i) Compact representation of the long reads lead to efficient alignments. (ii) Base-level errors can be skipped; structural errors need to be detected and corrected. (iii) Structurally correct 3GS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
