$O(n \log n)$-time text compression by LZ-style longest first substitution
Akihiro Nishi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai,, Masayuki Takeda

TL;DR
This paper introduces a faster $O(n \, \log n)$-time algorithm for LZ-LFS text compression, improving efficiency over previous quadratic-time methods and offering a simpler linear-time variant.
Contribution
It presents an $O(n \, \log n)$-time algorithm for LZ-LFS compression and a simplified linear-time version, enhancing speed and simplicity over prior approaches.
Findings
Achieved $O(n \, \log n)$ compression algorithm.
Developed a linear-time simplified LZ-LFS variant.
Demonstrated improved compression efficiency for repetitive texts.
Abstract
Mauer et al. [A Lempel-Ziv-style Compression Method for Repetitive Texts, PSC 2017] proposed a hybrid text compression method called LZ-LFS which has both features of Lempel-Ziv 77 factorization and longest first substitution. They showed that LZ-LFS can achieve better compression ratio for repetitive texts, compared to some state-of-the-art compression algorithms. The drawback of Mauer et al.'s method is that their LZ-LFS compression algorithm takes time on an input string of length . In this paper, we show a faster LZ-LFS compression algorithm that works in time. We also propose a simpler version of LZ-LFS that can be computed in time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · Natural Language Processing Techniques
