Substring Complexity in Sublinear Space
Giulia Bernardini, Gabriele Fici, Pawe{\l} Gawrychowski and, Solon P. Pissis

TL;DR
This paper investigates the computational complexity of approximating the substring complexity measure δ of a string, providing algorithms that operate efficiently with sublinear space in different computational models.
Contribution
It introduces new algorithms for computing δ in sublinear space, with bounds depending on available workspace and computational model.
Findings
Algorithms for δ computation with sublinear space in comparison and word RAM models.
Time complexity bounds depend on workspace size, e.g., O(n^3 log b / b^2) time with O(b) space.
Feasibility of approximating string complexity measures efficiently in limited space environments.
Abstract
Shannon's entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad hoc measures are employed to estimate the repetitiveness of strings, e.g., the size of the Lempel-Ziv parse or the number of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size of a smallest string attractor. Let be a string of length . A string attractor of is a set of positions of capturing the occurrences of all the substrings of . Unfortunately, Kempa and Prezza [STOC 2018] showed that computing is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure of compressibility that is based on the function counting the number of distinct substrings of length of , also known as the substring complexity of . This new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
