Meta-Gradient Reinforcement Learning
Zhongwen Xu, Hado van Hasselt, David Silver

TL;DR
This paper introduces a meta-gradient reinforcement learning algorithm that dynamically adapts the return function online, leading to improved performance across diverse Atari 2600 games.
Contribution
The paper presents a novel meta-gradient method that optimizes the return parameters in reinforcement learning during interaction with the environment.
Findings
Achieved state-of-the-art results on 57 Atari games.
Demonstrated effective online adaptation of return functions.
Improved learning efficiency over traditional fixed-return methods.
Abstract
The goal of reinforcement learning algorithms is to estimate and/or optimise the value function. However, unlike supervised learning, no teacher or oracle is available to provide the true value function. Instead, the majority of reinforcement learning algorithms estimate and/or optimise a proxy for the value function. This proxy is typically based on a sampled and bootstrapped approximation to the true value function, known as a return. The particular choice of return is one of the chief components determining the nature of the algorithm: the rate at which future rewards are discounted; when and how values should be bootstrapped; or even the nature of the rewards themselves. It is well-known that these decisions are crucial to the overall success of RL algorithms. We discuss a gradient-based meta-learning algorithm that is able to adapt the nature of the return, online, whilst…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Is human data enough? | David Silver· youtube
Taxonomy
TopicsMachine Learning and Data Classification · Reinforcement Learning in Robotics · Data Stream Mining Techniques
