On the Practical Power of Automata in Pattern Matching
Ora Amir, Amihood Amir, Aviezri Fraenkel, David Sarne

TL;DR
This paper compares the practical efficiency of the KMP automaton and naive pattern matching algorithms on random and DNA texts, revealing that KMP outperforms naive matching in parameterized matching scenarios, contrary to traditional expectations.
Contribution
It provides a comprehensive analysis of the KMP automaton's performance in parameterized pattern matching on random and biological texts, highlighting surprising efficiency gains.
Findings
KMP automaton is faster than naive in parameterized matching on random texts.
In exact matching, naive and KMP perform similarly, confirming folklore.
Structured cases can significantly improve automaton efficiency.
Abstract
The classical pattern matching paradigm is that of seeking occurrences of one string - the pattern, in another - the text, where both strings are drawn from an alphabet set . Assuming the text length is and the pattern length is , this problem can naively be solved in time . In Knuth, Morris and Pratt's seminal paper of 1977, an automaton, was developed that allows solving this problem in time for any alphabet. This automaton, which we will refer to as the {\em KMP-automaton}, has proven useful in solving many other problems. A notable example is the {\em parameterized pattern matching} model. In this model, a consistent renaming of symbols from is allowed in a match. The parameterized matching paradigm has proven useful in problems in software engineering, computer vision, and other applications. It has long been suspected that for texts where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Network Packet Processing and Optimization
