Optimally Computing Compressed Indexing Arrays Based on the Compact Directed Acyclic Word Graph
Hiroki Arimura, Shunsuke Inenaga, Yasuaki Kobayashi, Yuto Nakashima,, Mizuki Sue

TL;DR
This paper studies the computational complexity of converting a CDAWG, a text index structure, into various other highly compressed text indexing structures, providing optimal algorithms for these conversions.
Contribution
It introduces the first optimal algorithms for converting CDAWG into multiple compressed indexing structures in linear time without text access.
Findings
Conversion from CDAWG to other structures is achievable in O(e) time.
Techniques for enumerating suffixes using CDAWG are developed.
All conversions are performed with minimal working space.
Abstract
In this paper, we present the first study of the computational complexity of converting an automata-based text index structure, called the Compact Directed Acyclic Word Graph (CDAWG), of size for a text of length into other text indexing structures for the same text, suitable for highly repetitive texts: the run-length BWT of size , the irreducible PLCP array of size , and the quasi-irreducible LPF array of size , as well as the lex-parse of size and the LZ77-parse of size , where . As main results, we showed that the above structures can be optimally computed from either the CDAWG for stored in read-only memory or its self-index version of size without a text in worst-case time and words of working space. To obtain the above results, we devised techniques for enumerating a particular subset of suffixes in the lexicographic and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · DNA and Biological Computing
