Exploring Model-based Planning with Policy Networks

Tingwu Wang; Jimmy Ba

arXiv:1906.08649·cs.LG·June 21, 2019·76 cites

Exploring Model-based Planning with Policy Networks

Tingwu Wang, Jimmy Ba

PDF

Open Access 1 Repo

TL;DR

This paper introduces POPLIN, a model-based reinforcement learning algorithm that combines policy networks with online planning, achieving state-of-the-art sample efficiency in complex environments by optimizing action sequences and policy parameters.

Contribution

The paper proposes a novel algorithm, POPLIN, integrating policy networks with online planning, and demonstrates its superior performance and smoother optimization surface compared to existing methods.

Findings

01

POPLIN achieves about 3x more sample efficiency than PETS, TD3, and SAC.

02

Optimization in parameter space results in a smoother surface, improving planning.

03

Distilled policy networks can be used without model predictive control during testing.

Abstract

Model-based reinforcement learning (MBRL) with model-predictive control or online planning has shown great potential for locomotion control tasks in terms of both sample efficiency and asymptotic performance. Despite their initial successes, the existing planning methods search from candidate sequences randomly generated in the action space, which is inefficient in complex high-dimensional environments. In this paper, we propose a novel MBRL algorithm, model-based policy planning (POPLIN), that combines policy networks with online planning. More specifically, we formulate action planning at each time-step as an optimization problem using neural networks. We experiment with both optimization w.r.t. the action sequences initialized from the policy network, and also online optimization directly w.r.t. the parameters of the policy network. We show that POPLIN obtains state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WilsonWangTHU/POPLIN
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robotic Locomotion and Control · Real-time simulation and control systems

MethodsExperience Replay · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Target Policy Smoothing · Clipped Double Q-learning · Adam · Twin Delayed Deep Deterministic