Improving Actor-Critic Reinforcement Learning via Hamiltonian Monte Carlo Method
Duo Xu, Faramarz Fekri

TL;DR
This paper introduces Hamiltonian Policy, integrating Hamiltonian Monte Carlo with actor-critic reinforcement learning to enhance policy approximation, exploration, and safety in continuous control tasks.
Contribution
It proposes a novel Hamiltonian Policy method that reduces the amortization gap, improves exploration, and enhances safety in actor-critic RL through HMC integration and a new leapfrog operator.
Findings
Improves policy approximation and exploration efficiency.
Reduces safety constraint violations in safe RL.
Achieves better data efficiency on continuous control benchmarks.
Abstract
The actor-critic RL is widely used in various robotic control tasks. By viewing the actor-critic RL from the perspective of variational inference (VI), the policy network is trained to obtain the approximate posterior of actions given the optimality criteria. However, in practice, the actor-critic RL may yield suboptimal policy estimates due to the amortization gap and insufficient exploration. In this work, inspired by the previous use of Hamiltonian Monte Carlo (HMC) in VI, we propose to integrate the policy network of actor-critic RL with HMC, which is termed as {\it Hamiltonian Policy}. As such we propose to evolve actions from the base policy according to HMC, and our proposed method has many benefits. First, HMC can improve the policy distribution to better approximate the posterior and hence reduce the amortization gap. Second, HMC can also guide the exploration more to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
MethodsVariational Inference
