Compatible Value Gradients for Reinforcement Learning of Continuous Deep   Policies

David Balduzzi; Muhammad Ghifary

arXiv:1509.03005·cs.LG·September 11, 2015·23 cites

Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

David Balduzzi, Muhammad Ghifary

PDF

Open Access

TL;DR

This paper introduces GProp, a novel deep reinforcement learning algorithm that effectively learns continuous policies by estimating value function gradients and employing a three-network DAC model, excelling in complex tasks.

Contribution

The paper presents GProp, combining a new TD-based gradient learning method with the deviator-actor-critic model for improved continuous policy learning in reinforcement learning.

Findings

01

GProp performs competitively on a nonparametric regression-based bandit task.

02

GProp achieves state-of-the-art results on the octopus arm benchmark.

03

The method accurately estimates value function gradients in complex environments.

Abstract

This paper proposes GProp, a deep reinforcement learning algorithm for continuous policies with compatible function approximation. The algorithm is based on two innovations. Firstly, we present a temporal-difference based method for learning the gradient of the value-function. Secondly, we present the deviator-actor-critic (DAC) model, which comprises three neural networks that estimate the value function, its gradient, and determine the actor's policy respectively. We evaluate GProp on two challenging tasks: a contextual bandit problem constructed from nonparametric regression datasets that is designed to probe the ability of reinforcement learning algorithms to accurately estimate gradients; and the octopus arm, a challenging reinforcement learning benchmark. GProp is competitive with fully supervised methods on the bandit task and achieves the best performance to date on the octopus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research