Gapped Indexing for Consecutive Occurrences

Philip Bille; Inge Li G{\o}rtz; Max Rish{\o}j Pedersen; Teresa Anna; Steiner

arXiv:2102.02505·cs.DS·February 5, 2021

Gapped Indexing for Consecutive Occurrences

Philip Bille, Inge Li G{\o}rtz, Max Rish{\o}j Pedersen, Teresa Anna, Steiner

PDF

TL;DR

This paper introduces new data structures for efficiently finding consecutive pattern occurrences within a specified gap range in a string, balancing space and query time with theoretical lower bounds.

Contribution

It presents novel data structures for gap-constrained pattern matching with near-linear space and analyzes their optimality through conditional lower bounds.

Findings

01

Data structures with O(n) space and O(|P1|+|P2|+n^{2/3}) query time

02

Conditional lower bounds based on set intersection problem

03

New suffix tree decomposition technique

Abstract

The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient pattern matching queries. Typical queries include existential queries (decide if the pattern occurs in S), reporting queries (return all positions where the pattern occurs), and counting queries (return the number of occurrences of the pattern). In this paper we consider a variant of string indexing, where the goal is to compactly represent the string such that given two patterns P1 and P2 and a gap range [\alpha,\beta] we can quickly find the consecutive occurrences of P1 and P2 with distance in [\alpha,\beta], i.e., pairs of occurrences immediately following each other and with distance within the range. We present data structures that use \~O(n) space and query time \~O(|P1|+|P2|+n^(2/3)) for existence and counting and \~O(|P1|+|P2|+n^(2/3)*occ^(1/3)) for reporting.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.