Model-Based Reinforcement Learning via Meta-Policy Optimization

Ignasi Clavera; Jonas Rothfuss; John Schulman; Yasuhiro Fujita; Tamim; Asfour; Pieter Abbeel

arXiv:1809.05214·cs.LG·September 17, 2018·117 cites

Model-Based Reinforcement Learning via Meta-Policy Optimization

Ignasi Clavera, Jonas Rothfuss, John Schulman, Yasuhiro Fujita, Tamim, Asfour, Pieter Abbeel

PDF

Open Access 1 Repo

TL;DR

This paper introduces MB-MPO, a model-based reinforcement learning method that uses meta-policy optimization with an ensemble of models to achieve data efficiency and robustness, matching the performance of model-free methods.

Contribution

MB-MPO is a novel approach that reduces reliance on accurate dynamics models by meta-learning a policy that adapts quickly to model discrepancies.

Findings

01

MB-MPO is more robust to model imperfections than previous methods.

02

It achieves asymptotic performance comparable to model-free algorithms.

03

Requires significantly less experience to reach high performance.

Abstract

Model-based reinforcement learning approaches carry the promise of being data efficient. However, due to challenges in learning dynamics models that sufficiently match the real-world dynamics, they struggle to achieve the same asymptotic performance as model-free methods. We propose Model-Based Meta-Policy-Optimization (MB-MPO), an approach that foregoes the strong reliance on accurate learned dynamics models. Using an ensemble of learned dynamic models, MB-MPO meta-learns a policy that can quickly adapt to any model in the ensemble with one policy gradient step. This steers the meta-policy towards internalizing consistent dynamics predictions among the ensemble while shifting the burden of behaving optimally w.r.t. the model discrepancies towards the adaptation step. Our experiments show that MB-MPO is more robust to model imperfections than previous model-based approaches. Finally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ray-project/ray/tree/master/rllib
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics