Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

Shixiang Gu; Timothy Lillicrap; Zoubin Ghahramani; Richard E.; Turner; Sergey Levine

arXiv:1611.02247·cs.LG·March 1, 2017·100 cites

Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E., Turner, Sergey Levine

PDF

Open Access 2 Repos

TL;DR

Q-Prop introduces a novel policy gradient method that combines the stability of on-policy algorithms with the sample efficiency of off-policy methods by using a Taylor expansion of the critic as a control variate.

Contribution

The paper proposes Q-Prop, a new policy gradient algorithm that leverages off-policy critics to improve sample efficiency while maintaining stability, bridging the gap between on-policy and off-policy RL.

Findings

01

Q-Prop outperforms TRPO with GAE in sample efficiency.

02

Q-Prop shows improved stability over DDPG.

03

Q-Prop achieves better performance on MuJoCo environments.

Abstract

Model-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is their high sample complexity. Batch policy gradient methods offer stable learning, but at the cost of high variance, which often requires large batches. TD-style methods, such as off-policy actor-critic and Q-learning, are more sample-efficient but biased, and often require costly hyperparameter sweeps to stabilize. In this work, we aim to develop methods that combine the stability of policy gradients with the efficiency of off-policy RL. We present Q-Prop, a policy gradient method that uses a Taylor expansion of the off-policy critic as a control variate. Q-Prop is both sample efficient and stable, and effectively combines the benefits of on-policy and off-policy methods. We analyze the connection between Q-Prop…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials · Advanced Neural Network Applications