Data Structures for Range Sorted Consecutive Occurrence Queries
Waseem Akram, Takuya Mieno

TL;DR
This paper introduces advanced data structures for efficiently answering range-based consecutive occurrence queries in strings, improving query times and space complexity for pattern matching problems with applications in bioinformatics and information retrieval.
Contribution
It presents novel data structures for range top-k and gap-bounded consecutive occurrence queries with optimized space and query time complexities.
Findings
Achieved O(n log^2 n)-space for range top-k queries with O(m + log log n + k) time.
Developed O(n log^{2+ε} n)-space structure for gap-bounded queries with O(m + log log n + output) time.
Connected consecutive occurrences to closed substrings, with applications in geometric problems.
Abstract
The string indexing problem is a fundamental computational problem with numerous applications, including information retrieval and bioinformatics. It aims to efficiently solve the pattern matching problem: given a text T of length n for preprocessing and a pattern P of length m as a query, the goal is to report all occurrences of P as substrings of T. Navarro and Thankachan [CPM 2015, Theor. Comput. Sci. 2016] introduced a variant of this problem called the gap-bounded consecutive occurrence query, which reports pairs of consecutive occurrences of P in T such that their gaps (i.e., the distances between them) lie within a query-specified range [g_1, g_2]. Recently, Bille et al. [FSTTCS 2020, Theor. Comput. Sci. 2022] proposed the top-k close consecutive occurrence query, which reports the k closest consecutive occurrences of P in T, sorted in non-decreasing order of distance. Both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Database Systems and Queries · DNA and Biological Computing
