TL;DR
This paper introduces a practical method for mapping sequencing reads directly onto de Bruijn graphs, improving mapping efficiency and coverage over traditional contig-based approaches, especially for complex eukaryotic genomes.
Contribution
It defines the problem of read mapping on de Bruijn graphs, analyzes its NP-completeness, and presents GGMAP with the BGREAT heuristic for efficient, improved read mapping.
Findings
Up to 22% more reads mapped on the graph than on contigs.
GGMAP can process millions of reads per CPU hour.
The problem of mapping on de Bruijn graphs is NP-complete.
Abstract
Background Next Generation Sequencing (NGS) has dramatically enhanced our ability to sequence genomes, but not to assemble them. In practice, many published genome sequences remain in the state of a large set of contigs. Each contig describes the sequence found along some path of the assembly graph, however, the set of contigs does not record all the sequence information contained in that graph. Although many subsequent analyses can be performed with the set of contigs, one may ask whether mapping reads on the contigs is as informative as mapping them on the paths of the assembly graph. Currently, one lacks practical tools to perform mapping on such graphs. Results Here, we propose a formal definition of mapping on a de Bruijn graph, analyse the problem complexity which turns out to be NP-complete, and provide a practical solution.We propose a pipeline called GGMAP (Greedy Graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
