Linear-time Minimization of Wheeler DFAs
Jarno Alanko, Nicola Cotumaccio, Nicola Prezza

TL;DR
This paper introduces a linear-time algorithm for minimizing Wheeler DFAs, significantly improving efficiency over previous methods and enabling faster, more compact data structures for pattern matching in large datasets.
Contribution
The authors develop the first linear-time minimization algorithm for Wheeler DFAs, surpassing the prior $O(n \, log \, n)$ complexity inherited from general DFA minimization.
Findings
Reduces node count by up to 51% on DNA datasets
Achieves over 1 million nodes per second in implementation
Enables more efficient compressed data structures for pattern matching
Abstract
Wheeler DFAs (WDFAs) are a sub-class of finite-state automata which is playing an important role in the emerging field of compressed data structures: as opposed to general automata, WDFAs can be stored in just bits per edge, being the alphabet's size, and support optimal-time pattern matching queries on the substring closure of the language they recognize. An important step to achieve further compression is minimization. When the input is a general deterministic finite-state automaton (DFA), the state-of-the-art is represented by the classic Hopcroft's algorithm, which runs in time. This algorithm stands at the core of the only existing minimization algorithm for Wheeler DFAs, which inherits its complexity. In this work, we show that the minimum WDFA equivalent to a given input WDFA can be computed in linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · semigroups and automata theory · Network Packet Processing and Optimization
