Meta-Gradient Reinforcement Learning

Zhongwen Xu; Hado van Hasselt; David Silver

arXiv:1805.09801·cs.LG·May 25, 2018·96 cites

Meta-Gradient Reinforcement Learning

Zhongwen Xu, Hado van Hasselt, David Silver

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a meta-gradient reinforcement learning algorithm that dynamically adapts the return function online, leading to improved performance across diverse Atari 2600 games.

Contribution

The paper presents a novel meta-gradient method that optimizes the return parameters in reinforcement learning during interaction with the environment.

Findings

01

Achieved state-of-the-art results on 57 Atari games.

02

Demonstrated effective online adaptation of return functions.

03

Improved learning efficiency over traditional fixed-return methods.

Abstract

The goal of reinforcement learning algorithms is to estimate and/or optimise the value function. However, unlike supervised learning, no teacher or oracle is available to provide the true value function. Instead, the majority of reinforcement learning algorithms estimate and/or optimise a proxy for the value function. This proxy is typically based on a sampled and bootstrapped approximation to the true value function, known as a return. The particular choice of return is one of the chief components determining the nature of the algorithm: the rate at which future rewards are discounted; when and how values should be bootstrapped; or even the nature of the rewards themselves. It is well-known that these decisions are crucial to the overall success of RL algorithms. We discuss a gradient-based meta-learning algorithm that is able to adapt the nature of the return, online, whilst…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RobvanGastel/meta-rl-algorithms
pytorch

Videos

Is human data enough? | David Silver· youtube

Taxonomy

TopicsMachine Learning and Data Classification · Reinforcement Learning in Robotics · Data Stream Mining Techniques