Computing NP-hard Repetitiveness Measures via MAX-SAT
Hideo Bannai, Keisuke Goto, Masakazu Ishihata, Shunsuke Kanda, and Dominik K\"oppl, Takaaki Nishimoto

TL;DR
This paper introduces MAX-SAT based methods for the exact computation of NP-hard repetitiveness measures in datasets, enabling precise analysis of string attractors, macro schemes, and straight-line programs.
Contribution
It presents the first MAX-SAT formulations for exact computation of these measures, improving over heuristic approaches and enabling analysis of larger datasets.
Findings
Exact computation feasible for small to medium datasets
Implemented methods successfully compute measures for texts up to a million characters
Demonstrated practical applicability of MAX-SAT formulations in dataset analysis
Abstract
Repetitiveness measures reveal profound characteristics of datasets, and give rise to compressed data structures and algorithms working in compressed space. Alas, the computation of some of these measures is NP-hard, and straight-forward computation is infeasible for datasets of even small sizes. Three such measures are the smallest size of a string attractor, the smallest size of a bidirectional macro scheme, and the smallest size of a straight-line program. While a vast variety of implementations for heuristically computing approximations exist, exact computation of these measures has received little to no attention. In this paper, we present MAX-SAT formulations that provide the first non-trivial implementations for exact computation of smallest string attractors, smallest bidirectional macro schemes, and smallest straight-line programs. Computational experiments show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Web Data Mining and Analysis · Natural Language Processing Techniques
