Toward perfect reads: self-correction of short reads via mapping on de   Bruijn graphs

Antoine Limasset; Jean-Francois Flot; Pierre Peterlongo

arXiv:1711.03336·cs.DS·April 8, 2020

Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs

Antoine Limasset, Jean-Francois Flot, Pierre Peterlongo

PDF

TL;DR

This paper introduces Bcool, a scalable method for correcting short reads by mapping them onto a filtered de Bruijn graph, improving accuracy over existing k-mer spectrum correctors for large genomic datasets.

Contribution

The paper presents a novel approach using de Bruijn graphs for short read correction, outperforming traditional methods in accuracy and scalability.

Findings

01

Bcool achieves higher correction accuracy than k-mer spectrum correctors.

02

The method scales efficiently to human-sized genomes.

03

Open-source implementation available for broad use.

Abstract

Motivations Short-read accuracy is important for downstream analyses such as genome assembly and hybrid long-read correction. Despite much work on short-read correction, present-day correctors either do not scale well on large data sets or consider reads as mere suites of k-mers, without taking into account their full-length read information. Results We propose a new method to correct short reads using de Bruijn graphs, and implement it as a tool called Bcool. As a first st ep, Bcool constructs a compacted de Bruijn graph from the reads. This graph is filtered on the basis of k-mer abundance then of unitig abundance, thereby removing from most sequencing errors. The cleaned graph is then used as a reference on which the reads are mapped to correct them. We show that this approach yields more accurate reads than k-mer-spectrum correctors while being scalable to human-size genomic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.