Finite-Time Error Bounds for Greedy-GQ

Yue Wang; Yi Zhou; Shaofeng Zou

arXiv:2209.02555·cs.LG·May 3, 2024

Finite-Time Error Bounds for Greedy-GQ

Yue Wang, Yi Zhou, Shaofeng Zou

PDF

Open Access

TL;DR

This paper derives the tightest finite-time error bounds for the Greedy-GQ algorithm in reinforcement learning, demonstrating its convergence rates under different settings and providing insights for practical step-size choices.

Contribution

It develops the first tight finite-time error bounds for Greedy-GQ, a non-linear two-timescale off-policy RL algorithm, and introduces a variant with matching sample complexity.

Findings

01

Converges at $\\mathcal{O}({1}/{\sqrt{T}})$ under i.i.d. data.

02

Converges at $\\mathcal{O}({\log T}/{\sqrt{T}})$ under Markovian data.

03

Sample complexity of the variant matches vanilla Greedy-GQ.

Abstract

Greedy-GQ with linear function approximation, originally proposed in \cite{maei2010toward}, is a value-based off-policy algorithm for optimal control in reinforcement learning, and it has a non-linear two timescale structure with the non-convex objective function. This paper develops its tightest finite-time error bounds. We show that the Greedy-GQ algorithm converges as fast as $O (1 / T)$ under the i.i.d.\ setting and $O (lo g T / T)$ under the Markovian setting. We further design a variant of the vanilla Greedy-GQ algorithm using the nested-loop approach, and show that its sample complexity is $O (lo g (1/ ϵ) ϵ^{- 2})$ , which matches with the one of the vanilla Greedy-GQ. Our finite-time error bounds match with one of the stochastic gradient descent algorithms for general smooth non-convex optimization problems, despite its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization