Packed Acyclic Deterministic Finite Automata
Hiroki Shibata, Masakazu Ishihata, Shunsuke Inenaga

TL;DR
This paper introduces the packed ADFA, a space-efficient data structure that enhances pattern searching speed and reduces memory usage compared to traditional tries, especially for small dictionaries and long patterns.
Contribution
The paper presents the packed ADFA (PADFA), a novel compact variant of ADFA that improves pattern search efficiency and space utilization through encoding paths as packed strings.
Findings
PADFA achieves near time-optimal pattern searching.
PADFA uses fewer bits than tries for small dictionaries.
Empirical results show improved space and time efficiency on real datasets.
Abstract
An acyclic deterministic finite automaton (ADFA) is a data structure that represents a set of strings (i.e., a dictionary) and facilitates a pattern searching problem of determining whether a given pattern string is present in the dictionary. We introduce the packed ADFA (PADFA), a compact variant of ADFA, which is designed to achieve more efficient pattern searching by encoding specific paths as packed strings stored in contiguous memory. We theoretically demonstrate that pattern searching in PADFA is near time-optimal with a small additional overhead and becomes fully time-optimal for sufficiently long patterns. Moreover, we prove that a PADFA requires fewer bits than a trie when the dictionary size is relatively smaller than the number of states in the PADFA. Lastly, we empirically show that PADFAs improve both the space and time efficiency of pattern searching on real-world datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicssemigroups and automata theory · DNA and Biological Computing · Cellular Automata and Applications
