A central limit theorem for temporally non-homogenous Markov chains with applications to dynamic programming
Alessandro Arlotto, J. Michael Steele

TL;DR
This paper establishes a central limit theorem for additive processes in non-homogeneous Markov chains, with applications to the asymptotic normality of optimal rewards in finite horizon Markov decision problems.
Contribution
It generalizes Dobrushin's classic CLT to include summands depending on current and future states, enhancing analysis of finite horizon MDPs.
Findings
Proves a CLT for additive processes in non-homogeneous Markov chains.
Demonstrates asymptotic normality of optimal total rewards in finite horizon MDPs.
Provides examples showing the method's advantages over state space enlargement techniques.
Abstract
We prove a central limit theorem for a class of additive processes that arise naturally in the theory of finite horizon Markov decision problems. The main theorem generalizes a classic result of Dobrushin (1956) for temporally non-homogeneous Markov chains, and the principal innovation is that here the summands are permitted to depend on both the current state and a bounded number of future states of the chain. We show through several examples that this added flexibility gives one a direct path to asymptotic normality of the optimal total reward of finite horizon Markov decision problems. The same examples also explain why such results are not easily obtained by alternative Markovian techniques such as enlargement of the state space.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
