Practical evaluation of Lyndon factors via alphabet reordering
Marcelo K. Albertini, Felipe A. Louza

TL;DR
This paper investigates how different alphabet orderings affect Lyndon factorization, revealing that most reorderings produce few factors but can also lead to very long factors, impacting algorithm efficiency.
Contribution
It provides an empirical evaluation of Lyndon factorization under various alphabet reorderings and introduces randomized permutations as a baseline for future heuristic assessments.
Findings
Most alphabet reorderings result in a small number of Lyndon factors.
Longest Lyndon factor can be as large as the original string.
Randomized permutations serve as effective baselines for heuristic evaluation.
Abstract
We evaluate the influence of different alphabet orderings on the Lyndon factorization of a string. Experiments with Pizza & Chili datasets show that for most alphabet reorderings, the number of Lyndon factors is usually small, and the length of the longest Lyndon factor can be as large as the input string, which is unfavorable for algorithms and indexes that depend on the number of Lyndon factors. We present results with randomized alphabet permutations that can be used as a baseline to assess the effectiveness of heuristics and methods designed to modify the Lyndon factorization of a string via alphabet reordering.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Text and Document Classification Technologies · semigroups and automata theory
