Sliding Window String Indexing in Streams
Philip Bille, Johannes Fischer, Inge Li G{\o}rtz, Max Rish{\o}j, Pedersen, Tord Joakim Stordalen

TL;DR
This paper introduces an efficient streaming data structure for sliding window string indexing that supports fast pattern matching with minimal space, improving worst-case processing time and handling delayed queries effectively.
Contribution
It presents the first worst-case efficient streaming index for sliding window pattern matching, using a hierarchical suffix tree structure inspired by log-structured merge trees.
Findings
Achieves $O(w)$ space and $O( ext{log } w)$ time per character with high probability.
Supports delayed queries with $O(w + ext{delay})$ space and $O( ext{log }(w/ ext{delay}))$ time.
Provides constant-time processing per character for delays proportional to window size.
Abstract
Given a string over an alphabet , the 'string indexing problem' is to preprocess to subsequently support efficient pattern matching queries, i.e., given a pattern string report all the occurrences of in . In this paper we study the 'streaming sliding window string indexing problem'. Here the string arrives as a stream, one character at a time, and the goal is to maintain an index of the last characters, called the 'window', for a specified parameter . At any point in time a pattern matching query for a pattern may arrive, also streamed one character at a time, and all occurrences of within the current window must be returned. The streaming sliding window string indexing problem naturally captures scenarios where we want to index the most recent data (i.e. the window) of a stream while supporting efficient pattern matching. Our main result…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
