Efficient Index for Weighted Sequences
Carl Barton, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski

TL;DR
This paper introduces an efficient index for weighted sequences that enables fast pattern matching, prefix table computation, and cover detection, significantly improving performance over previous methods.
Contribution
It presents a novel $O(nz)$-time index construction for weighted sequences, enhancing query efficiency and related computations compared to prior approaches.
Findings
Achieves $O(nz)$ construction time for the index
Answers pattern matching queries in optimal time
Improves performance over previous methods by a factor of $z \, \log z$
Abstract
The problem of finding factors of a text string which are identical or similar to a given pattern string is a central problem in computer science. A generalised version of this problem consists in implementing an index over the text to support efficient on-line pattern queries. We study this problem in the case where the text is weighted: for every position of the text and every letter of the alphabet a probability of occurrence of this letter at this position is given. Sequences of this type, also called position weight matrices, are commonly used to represent imprecise or uncertain data. A weighted sequence may represent many different strings, each with probability of occurrence equal to the product of probabilities of its letters at subsequent positions. Given a probability threshold , we say that a pattern string matches a weighted text at position if the product of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
