Fast Stochastic Policy Gradient: Negative Momentum for Reinforcement   Learning

Haobin Zhang; Zhuang Yang

arXiv:2405.12228·cs.LG·May 22, 2024

Fast Stochastic Policy Gradient: Negative Momentum for Reinforcement Learning

Haobin Zhang, Zhuang Yang

PDF

Open Access

TL;DR

This paper introduces SPG-NM, a stochastic policy gradient algorithm with negative momentum, which accelerates convergence in reinforcement learning tasks while maintaining computational efficiency and robustness to hyper-parameters.

Contribution

The paper proposes a novel negative momentum technique for stochastic policy gradient methods, improving convergence speed without increasing computational complexity.

Findings

01

Faster convergence in bandit and MDP tasks compared to state-of-the-art algorithms.

02

Robustness of SPG-NM to hyper-parameter variations.

03

Maintains similar computational complexity as existing SPG algorithms.

Abstract

Stochastic optimization algorithms, particularly stochastic policy gradient (SPG), report significant success in reinforcement learning (RL). Nevertheless, up to now, that how to speedily acquire an optimal solution for RL is still a challenge. To tackle this issue, this work develops a fast SPG algorithm from the perspective of utilizing a momentum, coined SPG-NM. Specifically, in SPG-NM, a novel type of the negative momentum (NM) technique is applied into the classical SPG algorithm. Different from the existing NM techniques, we have adopted a few hyper-parameters in our SPG-NM algorithm. Moreover, the computational complexity is nearly same as the modern SPG-type algorithms, e.g., accelerated policy gradient (APG), which equips SPG with Nesterov's accelerated gradient (NAG). We evaluate the resulting algorithm on two classical tasks, bandit setting and Markov decision process (MDP).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics