A Report on Achieving Complete Regular-Expression Matching using Mealy Machines
Ricardo Almeida

TL;DR
This paper introduces an algorithm to construct Mealy machines from regular expressions that can find all matches in a data stream, including overlapping ones, while reading each input symbol only once, improving efficiency.
Contribution
It presents a novel method to build minimal Mealy machines from regexes, enabling complete and efficient pattern matching with pattern differentiation.
Findings
Constructed Mealy machines find all pattern matches, including overlaps.
The approach reduces processing cost by reading each symbol only once.
A formalization allows minimization of Mealy machines for optimal performance.
Abstract
While regexp matching is a powerful mechanism for finding patterns in data streams, regexp engines in general only find matches that do not overlap. Moreover, different forms of nondeterministic exploration, where symbols read are processed more than once, are often used, which can be costly in real-time matching. We present an algorithm that constructs from any regexp a Mealy machine that finds all matches and while reading each input symbol only once. The machine computed can also detect and distinguish different patterns or sub-patterns inside patterns. Additionally, we show how to compute a minimal Mealy machine via a variation of DFA minimization, by formalizing Mealy machines in terms of regular languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Database Systems and Queries · Data Mining Algorithms and Applications
