On the near-tightness of $\chi \leq 2r$: a general $\sigma$-ary construction and a binary case via LFSRs
Vinicius T. V. Date, Leandro M. Zatesko

TL;DR
This paper investigates the tightness of the bound relating the repetitiveness measures $ ext{chi}$ and $r$ in compressed string indexes, providing constructions for arbitrary alphabet sizes and analyzing binary cases using LFSRs.
Contribution
It introduces a general construction for the asymptotic tightness of the $ ext{chi} \, extless= 2r$ bound and characterizes de Bruijn sequences that achieve minimal run patterns.
Findings
The $ ext{chi} \, extless= 2r$ bound is tight for certain constructions.
Binary de Bruijn sequences can achieve the minimal run pattern.
For alphabets with size $\sigma \, extgreater= 3$, de Bruijn sequences do not close the gap.
Abstract
In the field of compressed string indexes, recent work has introduced suffixient sets and their corresponding repetitiveness measure . In particular, researchers have explored its relationship to other repetitiveness measures, notably , the number of runs in the Burrows--Wheeler Transform (BWT) of a string. Navarro et al. (2025) proved that , although empirical results by Cenzato et al. (2024) suggest that this bound is loose, with real data bounding by around to when the size of the alphabet is . To better understand this gap, we present two cases for the asymptotic tightness of the bound: a general construction for arbitrary values, and a binary alphabet case, consisting of de Bruijn sequences constructed by linear-feedback shift registers (LFSRs) from primitive polynomials over . The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Coding theory and cryptography · Cellular Automata and Applications
