Substring Complexities on Run-length Compressed Strings
Akiyoshi Kawamoto, Tomohiro I

TL;DR
This paper introduces an efficient method to compute the substring complexity measure elta in run-length compressed strings, enabling better analysis of string repetitiveness with optimal time and space complexity.
Contribution
It presents a novel algorithm to compute elta directly from run-length compressed strings in near-optimal time and linear space, improving analysis of repetitive string structures.
Findings
elta can be computed in _{sort}(r, n) time
The algorithm operates in O(r) space complexity
Efficient analysis of highly-repetitive strings is enabled
Abstract
Let denote the set of distinct substrings of length in a string , then the -th substring complexity is defined by its cardinality . Recently, is shown to be a good compressibility measure of highly-repetitive strings. In this paper, given of length in the run-length compressed form of size , we show that can be computed in time and space, where is the time complexity for sorting -bit integers in space in the Word-RAM model with word size .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · DNA and Biological Computing
