On Convergence of Gradient Expected Sarsa($\lambda$)

Long Yang; Gang Zheng; Yu Zhang; Qian Zheng; Pengfei Li; Gang Pan

arXiv:2012.07199·cs.LG·December 15, 2020

On Convergence of Gradient Expected Sarsa($\lambda$)

Long Yang, Gang Zheng, Yu Zhang, Qian Zheng, Pengfei Li, Gang Pan

PDF

Open Access

TL;DR

This paper analyzes the convergence issues of Expected Sarsa(λ) with linear function approximation and introduces a new convergent gradient-based algorithm, GES(λ), with theoretical guarantees and empirical validation.

Contribution

It proposes GES(λ), a convergent gradient Expected Sarsa(λ) algorithm with proven linear convergence and a novel Lyapunov function technique for finite-time analysis.

Findings

01

GES(λ) converges linearly to the optimal solution.

02

Applying off-line estimates to Expected Sarsa(λ) is unstable off-policy.

03

Experimental results verify the effectiveness of GES(λ).

Abstract

We study the convergence of $Expected Sarsa (λ)$ with linear function approximation. We show that applying the off-line estimate (multi-step bootstrapping) to $Expected Sarsa (λ)$ is unstable for off-policy learning. Furthermore, based on convex-concave saddle-point framework, we propose a convergent $Gradient Expected Sarsa (λ)$ ( $GES (λ)$ ) algorithm. The theoretical analysis shows that our $GES (λ)$ converges to the optimal solution at a linear convergence rate, which is comparable to extensive existing state-of-the-art gradient temporal difference learning algorithms. Furthermore, we develop a Lyapunov function technique to investigate how the step-size influences finite-time performance of $GES (λ)$ , such technique of Lyapunov function can be potentially generalized to other GTD algorithms. Finally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms