Tight bounds on the maximum number of shortest unique substrings
Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

TL;DR
This paper investigates the structural properties of shortest unique substrings (SUS) in strings, establishing tight bounds on their maximum number for point and interval SUSs, and providing insights into their combinatorial characteristics.
Contribution
It reveals tight bounds on the maximum number of SUSs in strings and analyzes their structural and combinatorial properties, advancing understanding of SUS complexity.
Findings
Number of point SUS intervals is less than 1.5n, matching the upper and lower bounds.
Provides structural and combinatorial insights into SUS properties.
Establishes tight bounds for maximum interval SUSs in strings.
Abstract
A substring Q of a string S is called a shortest unique substring (SUS) for interval [s,t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s,t], and every substring of S which contains interval [s,t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s,t] all the SUSs for interval [s,t] can be answered quickly. When s = t, we call the SUSs for [s,t] as point SUSs, and when s \leq t, we call the SUSs for [s,t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Genome Rearrangement Algorithms
