String Indexing for Top-$k$ Close Consecutive Occurrences
Philip Bille, Inge Li G{\o}rtz, Max Rish{\o}j Pedersen, Eva, Rotenberg, Teresa Anna Steiner

TL;DR
This paper introduces the string indexing for top-$k$ close consecutive occurrences problem, providing efficient data structures with various space-time trade-offs for reporting closest consecutive pattern occurrences.
Contribution
It formulates a new string indexing problem and offers three different data structures balancing space and query time, including novel techniques like line segment intersection translation and recursive clustering.
Findings
Optimal query time of O(m+k) with O(n log n) space.
Linear space solutions with query times O(m+k^{1+ε}) and O(m+log^{1+ε} n).
Development of new techniques such as line segment intersection translation and recursive clustering.
Abstract
The classic string indexing problem is to preprocess a string into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string , report all occurrences of within . In this paper, we study a basic and natural extension of string indexing called the string indexing for top- close consecutive occurrences problem (SITCCO). Here, a consecutive occurrence is a pair , , such that occurs at positions and in and there is no occurrence of between and , and their distance is defined as . Given a pattern and a parameter , the goal is to report the top- consecutive occurrences of in of minimal distance. The challenge is to compactly represent while supporting queries in time close to the length of and . We give three time-space trade-offs for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · Data Management and Algorithms
