Evaluating model-based planning and planner amortization for continuous   control

Arunkumar Byravan; Leonard Hasenclever; Piotr Trochim; Mehdi Mirza,; Alessandro Davide Ialongo; Yuval Tassa; Jost Tobias Springenberg; Abbas; Abdolmaleki; Nicolas Heess; Josh Merel; Martin Riedmiller

arXiv:2110.03363·cs.RO·October 8, 2021·1 cites

Evaluating model-based planning and planner amortization for continuous control

Arunkumar Byravan, Leonard Hasenclever, Piotr Trochim, Mehdi Mirza,, Alessandro Davide Ialongo, Yuval Tassa, Jost Tobias Springenberg, Abbas, Abdolmaleki, Nicolas Heess, Josh Merel, Martin Riedmiller

PDF

Open Access 1 Video

TL;DR

This paper evaluates the effectiveness of model-based control combined with learned models and policies in continuous control tasks, demonstrating improved data efficiency and the potential for distilling planning into policies.

Contribution

It introduces a hybrid approach combining MPC with learned models and policies, and shows how to distill planning into policies without performance loss.

Findings

01

MPC with learned proposals improves performance in multi-task settings

02

Well-tuned model-free agents are strong baselines for high DoF control

03

Planning can be distilled into policies to amortize computation

Abstract

There is a widespread intuition that model-based control methods should be able to surpass the data efficiency of model-free approaches. In this paper we attempt to evaluate this intuition on various challenging locomotion tasks. We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning; the learned policy serves as a proposal for MPC. We find that well-tuned model-free agents are strong baselines even for high DoF control problems but MPC with learned proposals and models (trained on the fly or transferred from related tasks) can significantly improve performance and data efficiency in hard multi-task/multi-goal settings. Finally, we show that it is possible to distil a model-based planner into a policy that amortizes the planning computation without any loss of performance. Videos of agents performing different tasks can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Evaluating Model-Based Planning and Planner Amortization for Continuous Control· slideslive

Taxonomy

TopicsAI-based Problem Solving and Planning · Robotic Path Planning Algorithms · Reinforcement Learning in Robotics