Fast entropy-bounded string dictionary look-up with mismatches

Pawe{\l} Gawrychowski; Gad M. Landau; Tatiana Starikovskaya

arXiv:1806.09646·cs.DS·June 27, 2018

Fast entropy-bounded string dictionary look-up with mismatches

Pawe{\l} Gawrychowski, Gad M. Landau, Tatiana Starikovskaya

PDF

TL;DR

This paper introduces a new data structure for fast dictionary look-up with mismatches, extending previous work to handle larger mismatch values efficiently in terms of query time and space.

Contribution

It generalizes existing entropy-bounded string dictionary look-up methods to support a wider range of mismatch parameters with improved bounds.

Findings

01

Query time is O(m/w + log^k d + occ) for larger k values.

02

Uses O(w d log^k d) extra bits of space, similar to previous optimal bounds.

03

Applicable to a broader range of mismatch scenarios in string dictionaries.

Abstract

We revisit the fundamental problem of dictionary look-up with mismatches. Given a set (dictionary) of $d$ strings of length $m$ and an integer $k$ , we must preprocess it into a data structure to answer the following queries: Given a query string $Q$ of length $m$ , find all strings in the dictionary that are at Hamming distance at most $k$ from $Q$ . Chan and Lewenstein (CPM 2015) showed a data structure for $k = 1$ with optimal query time $O (m / w + occ)$ , where $w$ is the size of a machine word and $occ$ is the size of the output. The data structure occupies $O (w d lo g^{1 + ε} d)$ extra bits of space (beyond the entropy-bounded space required to store the dictionary strings). In this work we give a solution with similar bounds for a much wider range of values $k$ . Namely, we give a data structure that has $O (m / w + lo g^{k} d + occ)$ query time and uses $O (w d lo g^{k} d)$ extra bits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.