Alternative Algorithms for Lyndon Factorization
Sukhpal Singh Ghuman, Emanuele Giaquinta, Jorma Tarhio

TL;DR
This paper introduces two improved algorithms for Lyndon factorization, one optimized for small alphabets with run-skipping, and another for run-length encoded strings, both offering significant efficiency gains.
Contribution
It presents two novel algorithms for Lyndon factorization, enhancing speed for specific data types and encoding methods compared to existing algorithms.
Findings
The small alphabet algorithm is over ten times faster on DNA strings.
The run-length encoded algorithm computes Lyndon factorization in linear time.
Both algorithms outperform previous methods in their respective scenarios.
Abstract
We present two variations of Duval's algorithm for computing the Lyndon factorization of a word. The first algorithm is designed for the case of small alphabets and is able to skip a significant portion of the characters of the string, for strings containing runs of the smallest character in the alphabet. Experimental results show that it is faster than Duval's original algorithm, more than ten times in the case of long DNA strings. The second algorithm computes, given a run-length encoded string of length , the Lyndon factorization of in time and constant space.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · semigroups and automata theory
