Relating Left and Right Extensions of Maximal Repeats
Shunsuke Inenaga, Dmitry Kosolobov

TL;DR
This paper investigates the relationship between left and right extensions of maximal repeats in strings, establishing tight bounds on their ratio and analyzing the stability of the CDAWG index under reversal.
Contribution
It proves tight bounds on the ratio of left to right extensions of maximal repeats and analyzes the stability of CDAWG size under string reversal.
Findings
The ratio of left to right extensions can be as large as O(√n).
The established bounds are asymptotically tight.
The ratio depends on the alphabet size, with specific bounds given.
Abstract
The compact directed acyclic word graph (CDAWG) of a string is an index occupying space, where is the number of right extensions of maximal repeats in . For highly repetitive datasets, the measure typically is small compared to the length of and, thus, the CDAWG serves as a compressed index. Unlike other compressibility measures (as LZ77, string attractors, BWT runs, etc.), is very unstable with respect to reversals: the CDAWG of the reversed string has size , where is the number of left extensions of maximal repeats in , and there are strings with . In this note, we prove that this lower bound is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Computability, Logic, AI Algorithms · semigroups and automata theory
