Finite-Time Error Bounds for Greedy-GQ
Yue Wang, Yi Zhou, Shaofeng Zou

TL;DR
This paper derives the tightest finite-time error bounds for the Greedy-GQ algorithm in reinforcement learning, demonstrating its convergence rates under different settings and providing insights for practical step-size choices.
Contribution
It develops the first tight finite-time error bounds for Greedy-GQ, a non-linear two-timescale off-policy RL algorithm, and introduces a variant with matching sample complexity.
Findings
Converges at $\\mathcal{O}({1}/{\sqrt{T}})$ under i.i.d. data.
Converges at $\\mathcal{O}({\log T}/{\sqrt{T}})$ under Markovian data.
Sample complexity of the variant matches vanilla Greedy-GQ.
Abstract
Greedy-GQ with linear function approximation, originally proposed in \cite{maei2010toward}, is a value-based off-policy algorithm for optimal control in reinforcement learning, and it has a non-linear two timescale structure with the non-convex objective function. This paper develops its tightest finite-time error bounds. We show that the Greedy-GQ algorithm converges as fast as under the i.i.d.\ setting and under the Markovian setting. We further design a variant of the vanilla Greedy-GQ algorithm using the nested-loop approach, and show that its sample complexity is , which matches with the one of the vanilla Greedy-GQ. Our finite-time error bounds match with one of the stochastic gradient descent algorithms for general smooth non-convex optimization problems, despite its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization
