# ReadsClean: a new approach to error correction of sequencing reads based   on alignments clustering

**Authors:** Oleg Fokin (1), Anastasia Bakulina (1, 2), Igor Seledtsov (1) and, Victor Solovyev (1)

arXiv: 1907.12718 · 2019-07-31

## TL;DR

ReadsClean introduces a novel alignment clustering method for error correction in Illumina sequencing reads, significantly improving genome assembly and SNP detection accuracy over existing tools.

## Contribution

The paper presents a new error correction approach based on clustering alignments, implemented in the ReadsClean program, outperforming existing methods in accuracy.

## Key findings

- ReadsClean achieves superior error correction in sequencing reads.
- The method improves genome assembly quality.
- ReadsClean is freely available for academic use.

## Abstract

Motivation: Next generation methods of DNA sequencing produce relatively high rate of reading errors, which interfere with de novo genome assembly of newly sequenced organisms and particularly affect the quality of SNP detection important for diagnostics of many hereditary diseases. There exists a number of programs developed for correcting errors in NGS reads. Such programs utilize various approaches and are optimized for different specific tasks, but all of them are far from being able to correct all errors, especially in sequencing reads that crossing by repeats and DNA from di/polyploid eukaryotic genomes. Results: This paper describes a novel method of error correction based on clustering of alignments of similar reads. This method is implemented in ReadsClean program, which is designed for cleaning Illumina HiSeq sequencing reads. We compared ReadsClean to other reads cleaning programs recognized to be the best by several publications. Our sequence assembly tests using actual and simulated sequencing reads show superior results achieved by ReadsClean. Availability and implementation: ReadsClean is implemented as a standalone C code. It is incorporated in an error correction pipeline and is freely available to academic users at Softberry web server www.softberry.com.

---
Source: https://tomesphere.com/paper/1907.12718