Space-Efficient Indexes for Uncertain Strings
Esteban Gabory, Chang Liu, Grigorios Loukides, Solon P., Pissis, Wiktor Zuba

TL;DR
This paper introduces a space-efficient index for uncertain strings that significantly reduces storage requirements while maintaining fast pattern matching capabilities, addressing the impracticality of existing large indexes.
Contribution
The authors propose a novel index with expected size $rac{nz}{ ext{ell}} imes ext{log} z$, which is much smaller than previous methods, enabling efficient pattern searches in uncertain strings.
Findings
Index is up to two orders of magnitude smaller than state-of-the-art.
Supports fast pattern matching for patterns of length at least $ ext{ell}$.
Offers competitive or faster query and construction times.
Abstract
Strings in the real world are often encoded with some level of uncertainty. In the character-level uncertainty model, an uncertain string of length on an alphabet is a sequence of probability distributions over . Given an uncertain string and a weight threshold , we say that pattern occurs in at position , if the product of probabilities of the letters of at positions is at least . While indexing standard strings for online pattern searches can be performed in linear time and space, indexing uncertain strings is much more challenging. Specifically, the state-of-the-art index for uncertain strings has size, requires time and space to be constructed, and answers pattern matching queries in the optimal time,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Algorithms and Data Compression · Advanced Data Compression Techniques
