Learning Mixtures of Markov Chains and MDPs
Chinmaya Kausik, Kevin Tan, Ambuj Tewari

TL;DR
This paper introduces a comprehensive algorithm for learning mixtures of Markov chains and MDPs from short, unlabeled trajectories, with theoretical guarantees and superior empirical performance over existing methods.
Contribution
The paper proposes a novel multi-step algorithm combining subspace estimation, spectral clustering, EM refinement, and classification for learning mixture models of Markov chains and MDPs.
Findings
Achieves 96.6% accuracy on gridworld MDP mixture
Provides end-to-end guarantees with linear trajectory length
Outperforms EM with random initialization in experiments
Abstract
We present an algorithm for learning mixtures of Markov chains and Markov decision processes (MDPs) from short unlabeled trajectories. Specifically, our method handles mixtures of Markov chains with optional control input by going through a multi-step process, involving (1) a subspace estimation step, (2) spectral clustering of trajectories using "pairwise distance estimators," along with refinement using the EM algorithm, (3) a model estimation step, and (4) a classification step for predicting labels of new trajectories. We provide end-to-end performance guarantees, where we only explicitly require the length of trajectories to be linear in the number of states and the number of trajectories to be linear in a mixing time parameter. Experimental results support these guarantees, where we attain 96.6% average accuracy on a mixture of two MDPs in gridworld, outperforming the EM algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBayesian Modeling and Causal Inference · Time Series Analysis and Forecasting · Bayesian Methods and Mixture Models
MethodsSpectral Clustering
