
TL;DR
This paper introduces the concept of string attractors as a unifying framework for understanding string repetitiveness, providing bounds, approximation algorithms, and applications to compressed data structures.
Contribution
It defines string attractors, relates them to existing repetitiveness measures, and develops approximation algorithms and applications for compressed text indexing.
Findings
Minimum attractor size bounds string repetitiveness measures.
Existing compressors approximate the smallest attractor size.
Universal compressed data structure for text extraction achieved.
Abstract
Let be a string of length . In this paper we introduce the notion of \emph{string attractor}: a subset of the string's positions such that every distinct substring of has an occurrence crossing one of the attractor's elements. We first show that the minimum attractor's size yields upper-bounds to the string's repetitiveness as measured by its linguistic complexity and by the length of its longest repeated substring. We then prove that all known compressors for repetitive strings induce a string attractor whose size is bounded by their associated repetitiveness measure, and can therefore be considered as approximations of the smallest one. Using further reductions, we derive the approximation ratios of these compressors with respect to the smallest attractor and solve several open problems related to the asymptotic relations between repetitiveness measures (in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
