Subset seed automaton
Gregory Kucherov, Laurent No\'e, Mikhail Roytberg

TL;DR
This paper introduces a compact automaton for seed-based similarity search, improving efficiency over traditional methods and demonstrating its applicability through experimental results.
Contribution
It presents a new, smaller automaton for pattern matching in seed-based similarity search, with an efficient construction method and broader applicability.
Findings
The automaton is significantly smaller than Aho-Corasick automaton.
Efficient construction algorithm for the automaton.
Successful application to various similarity search scenarios.
Abstract
We study the pattern matching automaton introduced in (A unifying framework for seed sensitivity and its application to subset seeds) for the purpose of seed-based similarity search. We show that our definition provides a compact automaton, much smaller than the one obtained by applying the Aho-Corasick construction. We study properties of this automaton and present an efficient implementation of the automaton construction. We also present some experimental results and show that this automaton can be successfully applied to more general situations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
