Construction of minimal DFAs from biological motifs
Tobias Marschall

TL;DR
This paper presents a method to efficiently construct minimal deterministic finite automata (DFAs) from biological motifs by using simple non-deterministic finite automata (NFAs) and the subset construction, applicable to common bioinformatics patterns.
Contribution
It introduces a class of simple NFAs and proves their subset construction yields minimal DFAs, specifically tailored for biological motif patterns.
Findings
Subset construction from simple NFAs produces minimal DFAs.
Simple NFAs can be constructed from generalized strings and Hamming neighborhoods.
The method improves efficiency in automaton construction for biological data.
Abstract
Deterministic finite automata (DFAs) are constructed for various purposes in computational biology. Little attention, however, has been given to the efficient construction of minimal DFAs. In this article, we define simple non-deterministic finite automata (NFAs) and prove that the standard subset construction transforms NFAs of this type into minimal DFAs. Furthermore, we show how simple NFAs can be constructed from two types of patterns popular in bioinformatics, namely (sets of) generalized strings and (generalized) strings with a Hamming neighborhood.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicssemigroups and automata theory · Algorithms and Data Compression · DNA and Biological Computing
