Central Limit Theorems for Transition Probabilities of Controlled Markov Chains

Ziwei Su; Imon Banerjee; Diego Klabjan

arXiv:2508.01517·math.ST·March 26, 2026

Central Limit Theorems for Transition Probabilities of Controlled Markov Chains

Ziwei Su, Imon Banerjee, Diego Klabjan

PDF

Open Access

TL;DR

This paper establishes central limit theorems for estimators of transition probabilities and value functions in controlled Markov chains, providing statistical tools for offline policy evaluation and hypothesis testing.

Contribution

It introduces CLTs for transition matrices and value functions in controlled Markov chains, with conditions on logging policies and applications to goodness-of-fit tests.

Findings

01

CLTs for transition probability estimators under specific conditions

02

Asymptotic normality of value and Q-functions for stationary policies

03

Development of goodness-of-fit tests for logged data

Abstract

We develop a central limit theorem (CLT) for a non-parametric estimator of the transition matrices in controlled Markov chains (CMCs) with finite state-action spaces. Our results establish precise conditions on the logging policy under which the estimator is asymptotically normal, and reveal settings in which no CLT can exist. We then build on it to derive CLTs for the value, Q-, and advantage functions of any stationary stochastic policy, including the optimal policy recovered from the estimated model. Goodness-of-fit tests are derived as a corollary, which enable to test whether the logged data is stochastic. These results provide new statistical tools for offline policy evaluation and optimal policy recovery, and enable hypothesis tests for transition probabilities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Queuing Theory Analysis · Reinforcement Learning in Robotics · Markov Chains and Monte Carlo Methods