Gradient Q$(\sigma, \lambda)$: A Unified Algorithm with Function   Approximation for Reinforcement Learning

Long Yang; Yu Zhang; Qian Zheng; Pengfei Li; Gang Pan

arXiv:1909.02877·cs.LG·September 9, 2019

Gradient Q$(\sigma, \lambda)$: A Unified Algorithm with Function Approximation for Reinforcement Learning

Long Yang, Yu Zhang, Qian Zheng, Pengfei Li, Gang Pan

PDF

Open Access

TL;DR

This paper introduces GQ$(\sigma,\lambda)$, a unified reinforcement learning algorithm combining sampling and expectation methods with function approximation, and proves its convergence with empirical validation.

Contribution

It extends the tabular Q$(\sigma,\lambda)$ to large-scale settings using linear function approximation and provides convergence guarantees.

Findings

01

GQ$(\sigma,\lambda)$ outperforms traditional methods in standard domains.

02

The algorithm effectively combines sampling and expectation techniques.

03

Empirical results demonstrate improved performance over existing algorithms.

Abstract

Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa) algorithms are efficient and frequently used techniques in reinforcement learning. Q $(σ, λ)$ is the first approach unifies them with eligibility trace through the sampling degree $σ$ . However, it is limited to the tabular case, for large-scale learning, the Q $(σ, λ)$ is too expensive to require a huge volume of tables to accurately storage value functions. To address above problem, we propose a GQ $(σ, λ)$ that extends tabular Q $(σ, λ)$ with linear function approximation. We prove the convergence of GQ $(σ, λ)$ . Empirical results on some standard domains show that GQ $(σ, λ)$ with a combination of full-sampling with pure-expectation reach a better performance than full-sampling and pure-expectation methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Optimization and Search Problems

MethodsEligibility Trace