Gapped Indexing for Consecutive Occurrences
Philip Bille, Inge Li G{\o}rtz, Max Rish{\o}j Pedersen, Teresa Anna, Steiner

TL;DR
This paper introduces new data structures for efficiently finding consecutive pattern occurrences within a specified gap range in a string, balancing space and query time with theoretical lower bounds.
Contribution
It presents novel data structures for gap-constrained pattern matching with near-linear space and analyzes their optimality through conditional lower bounds.
Findings
Data structures with O(n) space and O(|P1|+|P2|+n^{2/3}) query time
Conditional lower bounds based on set intersection problem
New suffix tree decomposition technique
Abstract
The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient pattern matching queries. Typical queries include existential queries (decide if the pattern occurs in S), reporting queries (return all positions where the pattern occurs), and counting queries (return the number of occurrences of the pattern). In this paper we consider a variant of string indexing, where the goal is to compactly represent the string such that given two patterns P1 and P2 and a gap range [\alpha,\beta] we can quickly find the consecutive occurrences of P1 and P2 with distance in [\alpha,\beta], i.e., pairs of occurrences immediately following each other and with distance within the range. We present data structures that use \~O(n) space and query time \~O(|P1|+|P2|+n^(2/3)) for existence and counting and \~O(|P1|+|P2|+n^(2/3)*occ^(1/3)) for reporting.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
