QuadRank: Engineering a High Throughput Rank
R. Groot Koerkamp

TL;DR
QuadRank introduces high-throughput, space-efficient rank data structures for DNA and binary alphabets, achieving near-cache-miss-free performance and significant speedups in bioinformatics applications.
Contribution
It extends recent inlining techniques to the DNA alphabet, enabling single cache miss queries and efficient batch processing with prefetching.
Findings
BiRank and QuadRank are 1.5x and 2x faster than similar methods without inlining.
Prefetching yields an additional 2x speedup, limited by RAM bandwidth.
QuadRank with prefetching outperforms state-of-the-art FM-indexes, reducing size and increasing speed.
Abstract
Given a text, a query counts the number of occurrences of character among the first characters of the text. Space-efficient methods to answer these rank queries form an important building block in many succinct data structures. For example, the FM-index is a widely used data structure that uses rank queries to locate all occurrences of a pattern in a text. In bioinformatics applications, the goal is usually to process a given input as fast as possible. Thus, data structures should have high throughput when used with many threads. Contributions. For the binary alphabet, we develop BiRank with 3.28% space overhead. It merges the central ideas of two recent papers: (1) we interleave (inline) offsets in each cache line of the underlying bit vector [Laws et al., 2024], reducing cache-misses, and (2) these offsets are to the middle of each block so that only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
