Stability and Sensitivity Analysis of Relative Temporal-Difference Learning: Extended Version
Masoud S. Sakha, Rushikesh Kamalapurkar, Sean Meyn

TL;DR
This paper analyzes the stability and sensitivity of relative TD learning with linear function approximation, highlighting the importance of baseline distribution choice and providing bounds on bias and covariance as the discount factor nears one.
Contribution
It establishes stability conditions for relative TD learning with function approximation and characterizes the bias and covariance, especially when using empirical baseline distributions.
Findings
The algorithm is stable for any non-negative baseline weight with empirical baseline.
Asymptotic bias and covariance remain bounded as the discount factor approaches one.
Baseline distribution choice critically affects stability and performance.
Abstract
Relative temporal-difference (TD) learning was introduced to mitigate the slow convergence of TD methods when the discount factor approaches one by subtracting a baseline from the temporal-difference update. While this idea has been studied in the tabular setting, stability guarantees with function approximation remain poorly understood. This paper analyzes relative TD learning with linear function approximation. We establish stability conditions for the algorithm and show that the choice of baseline distribution plays a central role. In particular, when the baseline is chosen as the empirical distribution of the state-action process, the algorithm is stable for any non-negative baseline weight and any discount factor. We also provide a sensitivity analysis of the resulting parameter estimates, characterizing both asymptotic bias and covariance. The asymptotic covariance and asymptotic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
