Uncertainty quantification for Markov chain induced martingales with application to temporal difference learning

Weichen Wu; Yuting Wei; Alessandro Rinaldo

arXiv:2502.13822·stat.ML·May 22, 2026

Uncertainty quantification for Markov chain induced martingales with application to temporal difference learning

Weichen Wu, Yuting Wei, Alessandro Rinaldo

PDF

TL;DR

This paper develops new high-dimensional concentration inequalities for Markov chain-induced martingales and applies them to analyze the statistical properties of Temporal Difference learning in Reinforcement Learning.

Contribution

It introduces general concentration bounds for vector-valued martingales from Markov chains and provides sharp performance guarantees for TD learning algorithms.

Findings

01

Established high-probability consistency guarantees for TD learning.

02

Derived an $O(T^{-1/4}\log T)$ distributional convergence rate for the Gaussian approximation.

03

Provided broad martingale bounds applicable to various stochastic processes.

Abstract

We establish novel and general high-dimensional concentration inequalities and Berry-Esseen bounds for vector-valued martingales induced by Markov chains. We apply these results to analyze the performance of the Temporal Difference (TD) learning algorithm with linear function approximations, a widely used method for policy evaluation in Reinforcement Learning (RL), obtaining a sharp high-probability consistency guarantee that matches the asymptotic variance up to logarithmic factors. Furthermore, we establish an $O (T^{- \frac{1}{4}} lo g T)$ distributional convergence rate for the Gaussian approximation of the TD estimator, measured in convex distance. Our martingale bounds are of broad applicability, and our analysis of TD learning provides new insights into statistical inference for RL algorithms, bridging gaps between classical stochastic approximation theory and modern RL applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems