Minimal Absent Words on Run-Length Encoded Strings
Tooru Akagi, Kouta Okabe, Takuya Mieno, Yuto Nakashima, Shunsuke, Inenaga

TL;DR
This paper develops methods to compute minimal absent words directly from run-length encoded strings, providing bounds and a compact data structure that efficiently reports all such words.
Contribution
It introduces the first approach for computing MAWs from RLE-compressed strings, including bounds and an optimal reporting data structure.
Findings
Bounds for the number of MAWs in most RLE-based categories
A compact data structure with O(m) space
Efficient reporting of all MAWs in optimal time
Abstract
A string is called a minimal absent word (MAW) for another string if does not occur (as a substring) in and any proper substring of occurs in . State-of-the-art data structures for reporting the set of MAWs from a given string of length require space, can be built in time, and can report all MAWs in time upon a query. This paper initiates the problem of computing MAWs from a compressed representation of a string. In particular, we focus on the most basic compressed representation of a string, run-length encoding (RLE), which represents each maximal run of the same characters by where is the length of the run. Let be the RLE-size of string . After categorizing the MAWs into five disjoint sets , , , , using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · semigroups and automata theory
