V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
H. Francis Song, Abbas Abdolmaleki, Jost Tobias Springenberg, Aidan, Clark, Hubert Soyer, Jack W. Rae, Seb Noury, Arun Ahuja, Siqi Liu, Dhruva, Tirumala, Nicolas Heess, Dan Belov, Martin Riedmiller, Matthew M. Botvinick

TL;DR
V-MPO is a novel on-policy reinforcement learning algorithm that improves stability and performance in discrete and continuous control tasks by using a policy iteration approach based on a learned value function, outperforming prior methods.
Contribution
Introduces V-MPO, an on-policy adaptation of MPO that eliminates the need for importance weighting and entropy regularization, achieving state-of-the-art results across multiple benchmarks.
Findings
V-MPO surpasses previous scores on Atari-57 and DMLab-30 benchmarks.
V-MPO achieves higher scores on individual DMLab and Atari levels.
V-MPO effectively controls high-dimensional humanoid robots and OpenAI Gym tasks.
Abstract
Some of the most successful applications of deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradient algorithms, we introduce V-MPO, an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) that performs policy iteration based on a learned state-value function. We show that V-MPO surpasses previously reported scores for both the Atari-57 and DMLab-30 benchmark suites in the multi-task setting, and does so reliably without importance weighting, entropy regularization, or population-based tuning of hyperparameters. On individual DMLab and Atari levels, the proposed algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Zebrafish Biomedical Research Applications · Single-cell and spatial transcriptomics
