Minimal Absent Words on Run-Length Encoded Strings

Tooru Akagi; Kouta Okabe; Takuya Mieno; Yuto Nakashima; Shunsuke; Inenaga

arXiv:2202.13591·cs.DS·April 18, 2022

Minimal Absent Words on Run-Length Encoded Strings

Tooru Akagi, Kouta Okabe, Takuya Mieno, Yuto Nakashima, Shunsuke, Inenaga

PDF

Open Access

TL;DR

This paper develops methods to compute minimal absent words directly from run-length encoded strings, providing bounds and a compact data structure that efficiently reports all such words.

Contribution

It introduces the first approach for computing MAWs from RLE-compressed strings, including bounds and an optimal reporting data structure.

Findings

01

Bounds for the number of MAWs in most RLE-based categories

02

A compact data structure with O(m) space

03

Efficient reporting of all MAWs in optimal time

Abstract

A string $w$ is called a minimal absent word (MAW) for another string $T$ if $w$ does not occur (as a substring) in $T$ and any proper substring of $w$ occurs in $T$ . State-of-the-art data structures for reporting the set $MAW (T)$ of MAWs from a given string $T$ of length $n$ require $O (n)$ space, can be built in $O (n)$ time, and can report all MAWs in $O (∣ MAW (T) ∣)$ time upon a query. This paper initiates the problem of computing MAWs from a compressed representation of a string. In particular, we focus on the most basic compressed representation of a string, run-length encoding (RLE), which represents each maximal run of the same characters $a$ by $a^{p}$ where $p$ is the length of the run. Let $m$ be the RLE-size of string $T$ . After categorizing the MAWs into five disjoint sets $M_{1}$ , $M_{2}$ , $M_{3}$ , $M_{4}$ , $M_{5}$ using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · DNA and Biological Computing · semigroups and automata theory