Baird Counterexample is Solved: with an example of How to Debug a Two-time-scale Algorithm
Hengshuai Yao

TL;DR
This paper analyzes the slow convergence of two-time-scale algorithms on Baird's counterexample, introduces debugging techniques to understand this behavior, and demonstrates that recent algorithms converge rapidly, effectively solving the classic divergence problem.
Contribution
It provides a debugging analysis of why TDC is slow on Baird's example and shows that recent algorithms like Impression GTD achieve fast, linear convergence.
Findings
TDC converges slowly on Baird's counterexample
Debugging techniques reveal reasons for slow convergence
Impression GTD converges rapidly with linear rate
Abstract
Baird counterexample was proposed by Leemon Baird in 1995, first used to show that the Temporal Difference (TD(0)) algorithm diverges on this example. Since then, it is often used to test and compare off-policy learning algorithms. Gradient TD algorithms solved the divergence issue of TD on Baird counterexample. However, their convergence on this example is still very slow, and the nature of the slowness is not well understood, e.g., see (Sutton and Barto 2018). This note is to understand in particular, why TDC is slow on this example, and provide a debugging analysis to understand this behavior. Our debugging technique can be used to study the convergence behavior of two-time-scale stochastic approximation algorithms. We also provide empirical results of the recent Impression GTD algorithm on this example, showing the convergence is very fast, in fact, in a linear rate. We conclude…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · Stochastic Gradient Optimization Techniques · Distributed Sensor Networks and Detection Algorithms
