Fast entropy-bounded string dictionary look-up with mismatches
Pawe{\l} Gawrychowski, Gad M. Landau, Tatiana Starikovskaya

TL;DR
This paper introduces a new data structure for fast dictionary look-up with mismatches, extending previous work to handle larger mismatch values efficiently in terms of query time and space.
Contribution
It generalizes existing entropy-bounded string dictionary look-up methods to support a wider range of mismatch parameters with improved bounds.
Findings
Query time is O(m/w + log^k d + occ) for larger k values.
Uses O(w d log^k d) extra bits of space, similar to previous optimal bounds.
Applicable to a broader range of mismatch scenarios in string dictionaries.
Abstract
We revisit the fundamental problem of dictionary look-up with mismatches. Given a set (dictionary) of strings of length and an integer , we must preprocess it into a data structure to answer the following queries: Given a query string of length , find all strings in the dictionary that are at Hamming distance at most from . Chan and Lewenstein (CPM 2015) showed a data structure for with optimal query time , where is the size of a machine word and is the size of the output. The data structure occupies extra bits of space (beyond the entropy-bounded space required to store the dictionary strings). In this work we give a solution with similar bounds for a much wider range of values . Namely, we give a data structure that has query time and uses extra bits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
