String Attractors

Nicola Prezza

arXiv:1709.05314·cs.DS·September 20, 2017

String Attractors

Nicola Prezza

PDF

TL;DR

This paper introduces the concept of string attractors as a unifying framework for understanding string repetitiveness, providing bounds, approximation algorithms, and applications to compressed data structures.

Contribution

It defines string attractors, relates them to existing repetitiveness measures, and develops approximation algorithms and applications for compressed text indexing.

Findings

01

Minimum attractor size bounds string repetitiveness measures.

02

Existing compressors approximate the smallest attractor size.

03

Universal compressed data structure for text extraction achieved.

Abstract

Let $S$ be a string of length $n$ . In this paper we introduce the notion of \emph{string attractor}: a subset of the string's positions $[1, n]$ such that every distinct substring of $S$ has an occurrence crossing one of the attractor's elements. We first show that the minimum attractor's size yields upper-bounds to the string's repetitiveness as measured by its linguistic complexity and by the length of its longest repeated substring. We then prove that all known compressors for repetitive strings induce a string attractor whose size is bounded by their associated repetitiveness measure, and can therefore be considered as approximations of the smallest one. Using further reductions, we derive the approximation ratios of these compressors with respect to the smallest attractor and solve several open problems related to the asymptotic relations between repetitiveness measures (in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.