Meta-Gradient Reinforcement Learning with an Objective Discovered Online
Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder, Singh, David Silver

TL;DR
This paper introduces a meta-gradient reinforcement learning algorithm that autonomously discovers and adapts its learning objectives online, improving efficiency and performance in dynamic environments like Atari games.
Contribution
It presents a novel meta-gradient method that learns its own objective function from experience, enabling adaptive and more effective reinforcement learning.
Findings
The algorithm discovers objectives addressing bootstrapping, non-stationarity, and off-policy issues.
It outperforms a strong actor-critic baseline on Atari games.
The method adapts over time to improve learning efficiency.
Abstract
Deep reinforcement learning includes a broad family of algorithms that parameterise an internal representation, such as a value function or policy, by a deep neural network. Each algorithm optimises its parameters with respect to an objective, such as Q-learning or policy gradient, that defines its semantics. In this work, we propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment. Over time, this allows the agent to learn how to learn increasingly effectively. Furthermore, because the objective is discovered online, it can adapt to changes over time. We demonstrate that the algorithm discovers how to address several important issues in RL, such as bootstrapping, non-stationarity, and off-policy learning. On the Atari Learning Environment, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Model Reduction and Neural Networks
MethodsQ-Learning
