Finite-Time Accuracy of Temporal-Difference Learning Under Schur-Stable Recursions
Donghwan Lee, Do Wan Kim

TL;DR
This paper develops a new finite-time error analysis for tabular TD learning in reinforcement learning, utilizing control-theoretic methods and Schur stability to provide insights and a reusable framework for finite-sample analysis.
Contribution
It introduces a novel finite-time error analysis framework for TD learning that exploits Schur stability and stochastic linear system representation, offering new theoretical insights.
Findings
Provides finite-time error bounds for TD learning.
Introduces a control-theoretic analysis framework.
Offers insights for future finite-sample RL research.
Abstract
Temporal difference (TD) learning is a cornerstone reinforcement learning (RL) method for policy evaluation, where the goal is to estimate the value function of a Markov decision process under a fixed policy. While a substantial body of work has established its convergence and stability properties, more recent efforts have focused on its statistical efficiency through finite-time error bounds. In this paper, we advance this line of research by developing a new finite-time error analysis for tabular TD learning that directly exploits a discrete-time stochastic linear system representation and leverages Schur stability of the associated matrices. Beyond the specific bounds obtained, the proposed framework provides a reusable template for analyzing TD learning and related RL algorithms, and it offers control-theoretic insights that may guide future developments in finite-sample RL theory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovation Diffusion and Forecasting · Traffic control and management
