Substring Complexity in Sublinear Space

Giulia Bernardini; Gabriele Fici; Pawe{\l} Gawrychowski and; Solon P. Pissis

arXiv:2007.08357·cs.DS·November 16, 2023

Substring Complexity in Sublinear Space

Giulia Bernardini, Gabriele Fici, Pawe{\l} Gawrychowski and, Solon P. Pissis

PDF

TL;DR

This paper investigates the computational complexity of approximating the substring complexity measure δ of a string, providing algorithms that operate efficiently with sublinear space in different computational models.

Contribution

It introduces new algorithms for computing δ in sublinear space, with bounds depending on available workspace and computational model.

Findings

01

Algorithms for δ computation with sublinear space in comparison and word RAM models.

02

Time complexity bounds depend on workspace size, e.g., O(n^3 log b / b^2) time with O(b) space.

03

Feasibility of approximating string complexity measures efficiently in limited space environments.

Abstract

Shannon's entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad hoc measures are employed to estimate the repetitiveness of strings, e.g., the size $z$ of the Lempel-Ziv parse or the number $r$ of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size $γ$ of a smallest string attractor. Let $T$ be a string of length $n$ . A string attractor of $T$ is a set of positions of $T$ capturing the occurrences of all the substrings of $T$ . Unfortunately, Kempa and Prezza [STOC 2018] showed that computing $γ$ is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure of compressibility that is based on the function $S_{T} (k)$ counting the number of distinct substrings of length $k$ of $T$ , also known as the substring complexity of $T$ . This new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.