Fast Stochastic Policy Gradient: Negative Momentum for Reinforcement Learning
Haobin Zhang, Zhuang Yang

TL;DR
This paper introduces SPG-NM, a stochastic policy gradient algorithm with negative momentum, which accelerates convergence in reinforcement learning tasks while maintaining computational efficiency and robustness to hyper-parameters.
Contribution
The paper proposes a novel negative momentum technique for stochastic policy gradient methods, improving convergence speed without increasing computational complexity.
Findings
Faster convergence in bandit and MDP tasks compared to state-of-the-art algorithms.
Robustness of SPG-NM to hyper-parameter variations.
Maintains similar computational complexity as existing SPG algorithms.
Abstract
Stochastic optimization algorithms, particularly stochastic policy gradient (SPG), report significant success in reinforcement learning (RL). Nevertheless, up to now, that how to speedily acquire an optimal solution for RL is still a challenge. To tackle this issue, this work develops a fast SPG algorithm from the perspective of utilizing a momentum, coined SPG-NM. Specifically, in SPG-NM, a novel type of the negative momentum (NM) technique is applied into the classical SPG algorithm. Different from the existing NM techniques, we have adopted a few hyper-parameters in our SPG-NM algorithm. Moreover, the computational complexity is nearly same as the modern SPG-type algorithms, e.g., accelerated policy gradient (APG), which equips SPG with Nesterov's accelerated gradient (NAG). We evaluate the resulting algorithm on two classical tasks, bandit setting and Markov decision process (MDP).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
