Fast Pattern Matching with Epsilon Transitions
Nicola Cotumaccio

TL;DR
This paper extends efficient pattern matching algorithms on Wheeler automata to include epsilon transitions, significantly enhancing their expressive power while maintaining linear time and space complexity.
Contribution
It introduces a method to incorporate epsilon transitions into Wheeler automata-based pattern matching without increasing complexity, using only two additional bitvectors.
Findings
Pattern matching with epsilon transitions is achievable in linear time.
Two additional bitvectors suffice to handle epsilon transitions efficiently.
The approach maintains space efficiency comparable to previous models.
Abstract
In the String Matching in Labeled Graphs (SMLG) problem, we need to determine whether a pattern string appears on a given labeled graph or a given automaton. Under the Orthogonal Vectors hypothesis, the SMLG problem cannot be solved in subquadratic time [ICALP 2019]. In typical bioinformatics applications, pattern matching algorithms should be both fast and space-efficient, so we need to determine useful classes of graphs on which the SLMG problem can be solved efficiently. In this paper, we improve on a recent result [STACS 2024] that shows how to solve the SMLG problem in linear time on the compressed representation of Wheeler generalized automata, a class of string-labeled automata that extend de Bruijn graphs. More precisely, we show how to remove the assumption that the automata contain no -transitions (namely, edges labeled with the empty string), while retaining the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Machine Learning and Algorithms · semigroups and automata theory
