Probabilistic Threshold Indexing for Uncertain Strings
Sharma V. Thankachan, Manish Patil, Rahul Shah, and Sudip Biswas

TL;DR
This paper introduces a novel probabilistic indexing method for uncertain strings, enabling efficient substring search and string listing with theoretical guarantees, applicable to biological data and other uncertain information sources.
Contribution
It presents the first indexing solution for uncertain strings that supports arbitrary probability thresholds with strong theoretical bounds and near-optimal query times.
Findings
Indexes constructed in linear space with near-optimal query time
Supports arbitrary probability threshold values greater than τ
Includes an approximate index for faster substring searches with additive error
Abstract
Strings form a fundamental data type in computer systems. String searching has been extensively studied since the inception of computer science. Increasingly many applications have to deal with imprecise strings or strings with fuzzy information in them. String matching becomes a probabilistic event when a string contains uncertainty, i.e. each position of the string can have different probable characters with associated probability of occurrence for each character. Such uncertain strings are prevalent in various applications such as biological sequence data, event monitoring and automatic ECG annotations. We explore the problem of indexing uncertain strings to support efficient string searching. In this paper we consider two basic problems of string searching, namely substring searching and string listing. In substring searching, the task is to find the occurrences of a deterministic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Algorithms and Data Compression · Data Management and Algorithms
