Linear-time Computation of DAWGs, Symmetric Indexing Structures, and MAWs for Integer Alphabets
Yuta Fujishige, Yuki Tsujimaru, Shunsuke Inenaga, Hideo Bannai,, Masayuki Takeda

TL;DR
This paper presents new linear-time algorithms for constructing DAWGs, suffix trees, affix trees, and related structures for strings over integer alphabets, improving efficiency in text indexing tasks.
Contribution
It introduces the first linear-time algorithms for constructing DAWGs and affix trees from suffix trees for integer alphabets, and applies these to compute minimal absent words efficiently.
Findings
DAWG construction from suffix tree in O(n) time for integer alphabets
Linear-time construction of affix trees and symmetric CDAWGs
Efficient computation of minimal absent words in O(n + |MAW|) time
Abstract
The directed acyclic word graph (DAWG) of a string of length is the smallest (partial) DFA which recognizes all suffixes of with only nodes and edges. In this paper, we show how to construct the DAWG for the input string from the suffix tree for , in time for integer alphabets of polynomial size in . In so doing, we first describe a folklore algorithm which, given the suffix tree for , constructs the DAWG for the reversed string of in time. Then, we present our algorithm that builds the DAWG for in time for integer alphabets, from the suffix tree for . We also show that a straightforward modification to our DAWG construction algorithm leads to the first -time algorithm for constructing the affix tree of a given string over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Network Packet Processing and Optimization
