Fast Prefix Search in Little Space, with Applications
Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, Sebastiano, Vigna

TL;DR
This paper introduces space-efficient data structures for weak prefix searches that are faster and more cache-friendly than traditional methods, enabling applications like prefix counting with minimal space and constant query time.
Contribution
It presents novel, asymptotically space-optimal data structures for weak prefix searches with query times depending only on prefix length, outperforming previous solutions in speed and simplicity.
Findings
Data structures are asymptotically space-optimal.
Query time depends only on prefix length, down to constant time.
Applications include prefix counting and approximate tuple matching.
Abstract
It has been shown in the indexing literature that there is an essential difference between prefix/range searches on the one hand, and predecessor/rank searches on the other hand, in that the former provably allows faster query resolution. Traditionally, prefix search is solved by data structures that are also dictionaries---they actually contain the strings in . For very large collections stored in slow-access memory, we propose much more compact data structures that support \emph{weak} prefix searches---they return the ranks of matching strings provided that \emph{some} string in starts with the given prefix. In fact, we show that our most space-efficient data structure is asymptotically space-optimal. Previously, data structures such as String B-trees (and more complicated cache-oblivious string data structures) have implicitly supported weak prefix queries, but they all have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Network Packet Processing and Optimization
