Stein Variational Policy Gradient

Yang Liu; Prajit Ramachandran; Qiang Liu; Jian Peng

arXiv:1704.02399·cs.LG·April 11, 2017·65 cites

Stein Variational Policy Gradient

Yang Liu, Prajit Ramachandran, Qiang Liu, Jian Peng

PDF

Open Access

TL;DR

This paper introduces Stein Variational Policy Gradient (SVPG), a novel method that enhances policy diversity and efficiency in reinforcement learning by combining Bayesian inference with a repulsive functional, improving performance on continuous control tasks.

Contribution

The paper proposes SVPG, a new policy gradient method that promotes diverse policies through Stein variational inference, improving convergence and exploration in reinforcement learning.

Findings

01

SVPG improves average return on continuous control tasks.

02

SVPG enhances data efficiency compared to baseline methods.

03

SVPG is robust to initialization and easy to parallelize.

Abstract

Policy gradient methods have been successfully applied to many complex reinforcement learning problems. However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration. In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian inference problem. We then propose a novel Stein variational policy gradient method (SVPG) which combines existing policy gradient methods and a repulsive functional to generate a set of diverse but well-behaved policies. SVPG is robust to initialization and can easily be implemented in a parallel manner. On continuous control problems, we find that implementing SVPG on top of REINFORCE and advantage actor-critic algorithms improves both average return and data efficiency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Insurance, Mortality, Demography, Risk Management

MethodsREINFORCE · Adam · Stein Variational Policy Gradient