RECKONER: Read Error Corrector Based on KMC
Maciej Dlugosz, Sebastian Deorowicz

TL;DR
Reckoner is a fast, memory-efficient error correction tool for genomic sequencing data that achieves high accuracy on large eukaryotic genomes using minimal RAM and processing time.
Contribution
It introduces a novel algorithm combining k-mer counts and quality indicators, optimized for large genomes and low memory usage, outperforming existing methods.
Findings
Corrects 300 Mbp eukaryotic genomes in under 40 minutes
Uses less than 4 GB RAM during processing
Achieves comparable or better accuracy than competitors
Abstract
Motivation: Next-generation sequencing tools have enabled producing of huge amount of genomic information at low cost. Unfortunately, presence of sequencing errors in such data affects quality of downstream analyzes. Accuracy of them can be improved by performing error correction. Because of huge amount of such data correction algorithms have to: be fast, memory-frugal, and provide high accuracy of error detection and elimination for variously-sized organisms. Results: We introduce a new algorithm for genomic data correction, capable of processing eucaryotic 300 Mbp-genome-size, high error-rated data using less than 4 GB of RAM in less than 40 minutes on 16-core CPU. The algorithm allows to correct sequencing data at better or comparable level than competitors. This was achieved by using very robust KMC~2 -mer counter, new method of erroneous regions correction based on both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
