Variational Inference for Policy Gradient
Tianbing Xu

TL;DR
This paper introduces a variational inference approach to improve policy gradient methods in reinforcement learning by explicitly minimizing KL divergence, enabling Bayesian neural network parameterizations.
Contribution
It develops a novel variational inference technique for policy gradient methods, integrating Bayesian neural networks into reinforcement learning algorithms.
Findings
Enhanced policy gradient methods with Bayesian neural networks.
Effective sample generation from posterior distributions.
Improved reinforcement learning performance.
Abstract
Inspired by the seminal work on Stein Variational Inference and Stein Variational Policy Gradient, we derived a method to generate samples from the posterior variational parameter distribution by \textit{explicitly} minimizing the KL divergence to match the target distribution in an amortize fashion. Consequently, we applied this varational inference technique into vanilla policy gradient, TRPO and PPO with Bayesian Neural Network parameterizations for reinforcement learning problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Markov Chains and Monte Carlo Methods · Advanced Bandit Algorithms Research
MethodsEntropy Regularization · Proximal Policy Optimization · Trust Region Policy Optimization
