Offline Estimation of Controlled Markov Chains: Minimaxity and Sample Complexity
Imon Banerjee, Harsha Honnappa, Vinayak Rao

TL;DR
This paper develops sample complexity bounds for nonparametric estimation of transition probabilities in controlled Markov chains using offline data, highlighting the influence of mixing properties and demonstrating applicability across various chain types.
Contribution
It introduces new statistical bounds and conditions for minimaxity in estimating controlled Markov chain transition matrices from fixed datasets, considering mixing properties.
Findings
Sample complexity bounds depend on mixing properties.
Conditions for minimaxity are established.
Results apply to various Markov chain types.
Abstract
In this work, we study a natural nonparametric estimator of the transition probability matrices of a finite controlled Markov chain. We consider an offline setting with a fixed dataset, collected using a so-called logging policy. We develop sample complexity bounds for the estimator and establish conditions for minimaxity. Our statistical bounds depend on the logging policy through its mixing properties. We show that achieving a particular statistical risk bound involves a subtle and interesting trade-off between the strength of the mixing properties and the number of samples. We demonstrate the validity of our results under various examples, such as ergodic Markov chains, weakly ergodic inhomogeneous Markov chains, and controlled Markov chains with non-stationary Markov, episodic, and greedy controls. Lastly, we use these sample complexity bounds to establish concomitant ones for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Bayesian Modeling and Causal Inference · Statistical Methods and Inference
