QuadRank: Engineering a High Throughput Rank

R. Groot Koerkamp

arXiv:2602.04103·cs.DS·April 2, 2026

QuadRank: Engineering a High Throughput Rank

R. Groot Koerkamp

PDF

TL;DR

QuadRank introduces high-throughput, space-efficient rank data structures for DNA and binary alphabets, achieving near-cache-miss-free performance and significant speedups in bioinformatics applications.

Contribution

It extends recent inlining techniques to the DNA alphabet, enabling single cache miss queries and efficient batch processing with prefetching.

Findings

01

BiRank and QuadRank are 1.5x and 2x faster than similar methods without inlining.

02

Prefetching yields an additional 2x speedup, limited by RAM bandwidth.

03

QuadRank with prefetching outperforms state-of-the-art FM-indexes, reducing size and increasing speed.

Abstract

Given a text, a query $rank (q, c)$ counts the number of occurrences of character $c$ among the first $q$ characters of the text. Space-efficient methods to answer these rank queries form an important building block in many succinct data structures. For example, the FM-index is a widely used data structure that uses rank queries to locate all occurrences of a pattern in a text. In bioinformatics applications, the goal is usually to process a given input as fast as possible. Thus, data structures should have high throughput when used with many threads. Contributions. For the binary alphabet, we develop BiRank with 3.28% space overhead. It merges the central ideas of two recent papers: (1) we interleave (inline) offsets in each cache line of the underlying bit vector [Laws et al., 2024], reducing cache-misses, and (2) these offsets are to the middle of each block so that only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.