On Convergence of Gradient Expected Sarsa($\lambda$)
Long Yang, Gang Zheng, Yu Zhang, Qian Zheng, Pengfei Li, Gang Pan

TL;DR
This paper analyzes the convergence issues of Expected Sarsa(λ) with linear function approximation and introduces a new convergent gradient-based algorithm, GES(λ), with theoretical guarantees and empirical validation.
Contribution
It proposes GES(λ), a convergent gradient Expected Sarsa(λ) algorithm with proven linear convergence and a novel Lyapunov function technique for finite-time analysis.
Findings
GES(λ) converges linearly to the optimal solution.
Applying off-line estimates to Expected Sarsa(λ) is unstable off-policy.
Experimental results verify the effectiveness of GES(λ).
Abstract
We study the convergence of with linear function approximation. We show that applying the off-line estimate (multi-step bootstrapping) to is unstable for off-policy learning. Furthermore, based on convex-concave saddle-point framework, we propose a convergent () algorithm. The theoretical analysis shows that our converges to the optimal solution at a linear convergence rate, which is comparable to extensive existing state-of-the-art gradient temporal difference learning algorithms. Furthermore, we develop a Lyapunov function technique to investigate how the step-size influences finite-time performance of , such technique of Lyapunov function can be potentially generalized to other GTD algorithms. Finally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
