ReadsClean: a new approach to error correction of sequencing reads based on alignments clustering
Oleg Fokin (1), Anastasia Bakulina (1, 2), Igor Seledtsov (1) and, Victor Solovyev (1)

TL;DR
ReadsClean introduces a novel alignment clustering method for error correction in Illumina sequencing reads, significantly improving genome assembly and SNP detection accuracy over existing tools.
Contribution
The paper presents a new error correction approach based on clustering alignments, implemented in the ReadsClean program, outperforming existing methods in accuracy.
Findings
ReadsClean achieves superior error correction in sequencing reads.
The method improves genome assembly quality.
ReadsClean is freely available for academic use.
Abstract
Motivation: Next generation methods of DNA sequencing produce relatively high rate of reading errors, which interfere with de novo genome assembly of newly sequenced organisms and particularly affect the quality of SNP detection important for diagnostics of many hereditary diseases. There exists a number of programs developed for correcting errors in NGS reads. Such programs utilize various approaches and are optimized for different specific tasks, but all of them are far from being able to correct all errors, especially in sequencing reads that crossing by repeats and DNA from di/polyploid eukaryotic genomes. Results: This paper describes a novel method of error correction based on clustering of alignments of similar reads. This method is implemented in ReadsClean program, which is designed for cleaning Illumina HiSeq sequencing reads. We compared ReadsClean to other reads cleaning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · RNA and protein synthesis mechanisms · Chromosomal and Genetic Variations
