Policy Prediction Network: Model-Free Behavior Policy with Model-Based   Learning in Continuous Action Space

Zac Wellmer; James Kwok

arXiv:1909.07373·cs.LG·September 18, 2019

Policy Prediction Network: Model-Free Behavior Policy with Model-Based Learning in Continuous Action Space

Zac Wellmer, James Kwok

PDF

TL;DR

This paper introduces a novel deep reinforcement learning architecture called Policy Prediction Network that combines model-free and model-based methods for continuous control, improving sample efficiency without increasing rollout computation.

Contribution

It is the first to incorporate implicit model-based learning into Policy Gradient algorithms for continuous actions using a clipping scheme.

Findings

01

Improved sample complexity in MuJoCo environments

02

Enhanced performance over traditional methods

03

Effective integration of model-based learning in policy gradients

Abstract

This paper proposes a novel deep reinforcement learning architecture that was inspired by previous tree structured architectures which were only useable in discrete action spaces. Policy Prediction Network offers a way to improve sample complexity and performance on continuous control problems in exchange for extra computation at training time but at no cost in computation at rollout time. Our approach integrates a mix between model-free and model-based reinforcement learning. Policy Prediction Network is the first to introduce implicit model-based learning to Policy Gradient algorithms for continuous action space and is made possible via the empirically justified clipping scheme. Our experiments are focused on the MuJoCo environments so that they can be compared with similar work done in this area.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.