Uncertainty quantification for Markov chain induced martingales with application to temporal difference learning
Weichen Wu, Yuting Wei, Alessandro Rinaldo

TL;DR
This paper develops new high-dimensional concentration inequalities for Markov chain-induced martingales and applies them to analyze the statistical properties of Temporal Difference learning in Reinforcement Learning.
Contribution
It introduces general concentration bounds for vector-valued martingales from Markov chains and provides sharp performance guarantees for TD learning algorithms.
Findings
Established high-probability consistency guarantees for TD learning.
Derived an $O(T^{-1/4}\log T)$ distributional convergence rate for the Gaussian approximation.
Provided broad martingale bounds applicable to various stochastic processes.
Abstract
We establish novel and general high-dimensional concentration inequalities and Berry-Esseen bounds for vector-valued martingales induced by Markov chains. We apply these results to analyze the performance of the Temporal Difference (TD) learning algorithm with linear function approximations, a widely used method for policy evaluation in Reinforcement Learning (RL), obtaining a sharp high-probability consistency guarantee that matches the asymptotic variance up to logarithmic factors. Furthermore, we establish an distributional convergence rate for the Gaussian approximation of the TD estimator, measured in convex distance. Our martingale bounds are of broad applicability, and our analysis of TD learning provides new insights into statistical inference for RL algorithms, bridging gaps between classical stochastic approximation theory and modern RL applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
