Variance Reduction for Deep Q-Learning using Stochastic Recursive   Gradient

Haonan Jia; Xiao Zhang; Jun Xu; Wei Zeng; Hao Jiang; Xiaohui Yan,; Ji-Rong Wen

arXiv:2007.12817·cs.LG·July 28, 2020·5 cites

Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient

Haonan Jia, Xiao Zhang, Jun Xu, Wei Zeng, Hao Jiang, Xiaohui Yan,, Ji-Rong Wen

PDF

Open Access

TL;DR

This paper introduces SRG-DQN, a recursive variance reduction method for deep Q-learning that improves gradient estimation stability and training efficiency by eliminating the need for anchor points, supported by theoretical and experimental validation.

Contribution

The paper proposes a novel recursive gradient update algorithm for deep Q-learning, overcoming limitations of SVRG-based methods and enhancing training stability and efficiency.

Findings

01

SRG-DQN reduces gradient variance effectively.

02

The method accelerates training convergence.

03

Experimental results outperform existing algorithms.

Abstract

Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance, resulting in unstable training and poor sampling efficiency. Stochastic variance-reduced gradient methods such as SVRG have been applied to reduce the estimation variance (Zhao et al. 2019). However, due to the online instance generation nature of reinforcement learning, directly applying SVRG to deep Q-learning is facing the problem of the inaccurate estimation of the anchor points, which dramatically limits the potentials of SVRG. To address this issue and inspired by the recursive gradient variance reduction algorithm SARAH (Nguyen et al. 2017), this paper proposes to introduce the recursive framework for updating the stochastic gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN. Unlike the SVRG-based algorithms, SRG-DQN designs a recursive update of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM · Reinforcement Learning in Robotics

MethodsQ-Learning · Adam