Gerbil: A Fast and Memory-Efficient $k$-mer Counter with GPU-Support
Marius Erbert, Steffen Rechner, Matthias M\"uller-Hannemann

TL;DR
Gerbil is an open-source, GPU-accelerated $k$-mer counting software optimized for large $k$, offering efficient performance and low memory usage in genome analysis, especially for long-read sequencing data.
Contribution
Gerbil introduces a two-disk approach combined with optional GPU support to efficiently count large $k$-mers, outperforming existing tools in speed and memory usage.
Findings
Outperforms state-of-the-art $k$-mer counters for large genomes
Supports large $k$ (≥32) efficiently with minimal performance loss
Uses a two-disk approach and GPU acceleration for improved scalability
Abstract
A basic task in bioinformatics is the counting of -mers in genome strings. The -mer counting problem is to build a histogram of all substrings of length in a given genome sequence. We present the open source -mer counting software Gerbil that has been designed for the efficient counting of -mers for . Given the technology trend towards long reads of next-generation sequencers, support for large becomes increasingly important. While existing -mer counting tools suffer from excessive memory resource consumption or degrading performance for large , Gerbil is able to efficiently support large without much loss of performance. Our software implements a two-disk approach. In the first step, DNA reads are loaded from disk and distributed to temporary files that are stored at a working disk. In a second step, the temporary files are read again, split into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · DNA and Biological Computing
