Model-Augmented Actor-Critic: Backpropagating through Paths

Ignasi Clavera; Violet Fu; Pieter Abbeel

arXiv:2005.08068·cs.LG·May 19, 2020·40 cites

Model-Augmented Actor-Critic: Backpropagating through Paths

Ignasi Clavera, Violet Fu, Pieter Abbeel

PDF

Open Access

TL;DR

This paper introduces a differentiable model-based reinforcement learning method that leverages pathwise derivatives for policy optimization, improving sample efficiency and scalability to long horizons.

Contribution

It presents a novel actor-critic algorithm that backpropagates through the model's paths, enhancing model-based RL by exploiting differentiability for better performance.

Findings

01

More sample efficient than existing methods

02

Matches asymptotic performance of model-free algorithms

03

Scales effectively to long horizons

Abstract

Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator to augment the data for policy optimization or value function learning. In this paper, we show how to make more effective use of the model by exploiting its differentiability. We construct a policy optimization algorithm that uses the pathwise derivative of the learned model and policy across future timesteps. Instabilities of learning across many timesteps are prevented by using a terminal value function, learning the policy in an actor-critic fashion. Furthermore, we present a derivation on the monotonic improvement of our objective in terms of the gradient error in the model and value function. We show that our approach (i) is consistently more sample efficient than existing state-of-the-art model-based algorithms, (ii) matches the asymptotic performance of model-free…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications · Model-Driven Software Engineering Techniques