Engineering faster double-array Aho-Corasick automata
Shunsuke Kanda, Koichi Akabe, Yusuke Oda

TL;DR
This paper reviews and enhances double-array Aho-Corasick automata (DAACs) for faster string pattern matching, proposing new techniques, comprehensive analysis, and an open-source Rust library that outperforms existing solutions.
Contribution
It provides a comprehensive review of DAAC implementation techniques, introduces new optimization methods, and develops a high-performance Rust library for pattern matching.
Findings
Optimal technique combinations differ from existing libraries.
Daachorse outperforms other AC-automaton implementations.
The new library is suitable for fast pattern matching applications.
Abstract
Multiple pattern matching in strings is a fundamental problem in text processing applications such as regular expressions or tokenization. This paper studies efficient implementations of double-array Aho-Corasick automata (DAACs), data structures for quickly performing the multiple pattern matching. The practical performance of DAACs is improved by carefully designing the data structure, and many implementation techniques have been proposed thus far. A problem in DAACs is that their ideas are not aggregated. Since comprehensive descriptions and experimental analyses are unavailable, engineers face difficulties in implementing an efficient DAAC. In this paper, we review implementation techniques for DAACs and provide a comprehensive description of them. We also propose several new techniques for further improvement. We conduct exhaustive experiments through real-world datasets and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Packet Processing and Optimization · Algorithms and Data Compression · semigroups and automata theory
