Reinforcement Learning by Value Gradients
Michael Fairbank

TL;DR
This paper introduces the concept of value-gradients in reinforcement learning, demonstrating that learning these gradients enables more efficient control and convergence without stochastic exploration, supported by theoretical analysis and experiments.
Contribution
It proposes value-gradients as a core focus in reinforcement learning, showing their advantages over value-only learning and establishing their equivalence to policy-gradient methods.
Findings
Value-gradients improve learning efficiency by several orders of magnitude.
Learning value-gradients eliminates the need for stochastic exploration.
An equivalence between value-gradient and policy-gradient algorithms is proven.
Abstract
The concept of the value-gradient is introduced and developed in the context of reinforcement learning. It is shown that by learning the value-gradients exploration or stochastic behaviour is no longer needed to find locally optimal trajectories. This is the main motivation for using value-gradients, and it is argued that learning value-gradients is the actual objective of any value-function learning algorithm for control problems. It is also argued that learning value-gradients is significantly more efficient than learning just the values, and this argument is supported in experiments by efficiency gains of several orders of magnitude, in several problem domains. Once value-gradients are introduced into learning, several analyses become possible. For example, a surprising equivalence between a value-gradient learning algorithm and a policy-gradient learning algorithm is proven, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Adaptive Dynamic Programming Control
