Model-Based Reinforcement Learning via Meta-Policy Optimization
Ignasi Clavera, Jonas Rothfuss, John Schulman, Yasuhiro Fujita, Tamim, Asfour, Pieter Abbeel

TL;DR
This paper introduces MB-MPO, a model-based reinforcement learning method that uses meta-policy optimization with an ensemble of models to achieve data efficiency and robustness, matching the performance of model-free methods.
Contribution
MB-MPO is a novel approach that reduces reliance on accurate dynamics models by meta-learning a policy that adapts quickly to model discrepancies.
Findings
MB-MPO is more robust to model imperfections than previous methods.
It achieves asymptotic performance comparable to model-free algorithms.
Requires significantly less experience to reach high performance.
Abstract
Model-based reinforcement learning approaches carry the promise of being data efficient. However, due to challenges in learning dynamics models that sufficiently match the real-world dynamics, they struggle to achieve the same asymptotic performance as model-free methods. We propose Model-Based Meta-Policy-Optimization (MB-MPO), an approach that foregoes the strong reliance on accurate learned dynamics models. Using an ensemble of learned dynamic models, MB-MPO meta-learns a policy that can quickly adapt to any model in the ensemble with one policy gradient step. This steers the meta-policy towards internalizing consistent dynamics predictions among the ensemble while shifting the burden of behaving optimally w.r.t. the model discrepancies towards the adaptation step. Our experiments show that MB-MPO is more robust to model imperfections than previous model-based approaches. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics
