Policy Prediction Network: Model-Free Behavior Policy with Model-Based Learning in Continuous Action Space
Zac Wellmer, James Kwok

TL;DR
This paper introduces a novel deep reinforcement learning architecture called Policy Prediction Network that combines model-free and model-based methods for continuous control, improving sample efficiency without increasing rollout computation.
Contribution
It is the first to incorporate implicit model-based learning into Policy Gradient algorithms for continuous actions using a clipping scheme.
Findings
Improved sample complexity in MuJoCo environments
Enhanced performance over traditional methods
Effective integration of model-based learning in policy gradients
Abstract
This paper proposes a novel deep reinforcement learning architecture that was inspired by previous tree structured architectures which were only useable in discrete action spaces. Policy Prediction Network offers a way to improve sample complexity and performance on continuous control problems in exchange for extra computation at training time but at no cost in computation at rollout time. Our approach integrates a mix between model-free and model-based reinforcement learning. Policy Prediction Network is the first to introduce implicit model-based learning to Policy Gradient algorithms for continuous action space and is made possible via the empirically justified clipping scheme. Our experiments are focused on the MuJoCo environments so that they can be compared with similar work done in this area.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
