Minimum Probabilistic Finite State Learning Problem on Finite Data Sets: Complexity, Solution and Approximations
Elisabeth Paulson, Christopher Griffin

TL;DR
This paper investigates the computational complexity of learning minimal probabilistic finite state machines from finite data, proving NP-hardness, and introduces an approximation algorithm with empirical analysis.
Contribution
It establishes NP-hardness for the problem and provides a provably correct approximation algorithm with empirical performance evaluation.
Findings
NP-hardness of the minimum probabilistic finite state machine problem
Development of a provably correct approximation algorithm
Empirical analysis of the algorithm's running time
Abstract
In this paper, we study the problem of determining a minimum state probabilistic finite state machine capable of generating statistically identical symbol sequences to samples provided. This problem is qualitatively similar to the classical Hidden Markov Model problem and has been studied from a practical point of view in several works beginning with the work presented in: Shalizi, C.R., Shalizi, K.L., Crutchfield, J.P. (2002) \textit{An algorithm for pattern discovery in time series.} Technical Report 02-10-060, Santa Fe Institute. arxiv.org/abs/cs.LG/0210025. We show that the underlying problem is -hard and thus all existing polynomial time algorithms must be approximations on finite data sets. Using our -hardness proof, we show how to construct a provably correct algorithm for constructing a minimum state probabilistic finite state machine given data and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · semigroups and automata theory
