Metatrace Actor-Critic: Online Step-size Tuning by Meta-gradient Descent   for Reinforcement Learning Control

Kenny Young; Baoxiang Wang; Matthew E. Taylor

arXiv:1805.04514·cs.LG·May 27, 2019

Metatrace Actor-Critic: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control

Kenny Young, Baoxiang Wang, Matthew E. Taylor

PDF

TL;DR

This paper introduces Metatrace, a meta-gradient based step-size tuning method for online reinforcement learning control, improving learning speed and robustness to non-stationarity with both linear and nonlinear function approximation.

Contribution

The paper develops a novel meta-gradient based step-size tuning algorithm, Metatrace, for online RL control that adapts step-sizes dynamically during learning.

Findings

01

Metatrace speeds up learning in control tasks.

02

It is robust to initial step-size choices.

03

It effectively handles non-stationarity in RL environments.

Abstract

Reinforcement learning (RL) has had many successes in both "deep" and "shallow" settings. In both cases, significant hyperparameter tuning is often required to achieve good performance. Furthermore, when nonlinear function approximation is used, non-stationarity in the state representation can lead to learning instability. A variety of techniques exist to combat this --- most notably large experience replay buffers or the use of multiple parallel actors. These techniques come at the cost of moving away from the online RL problem as it is traditionally formulated (i.e., a single agent learning online without maintaining a large database of training examples). Meta-learning can potentially help with both these issues by tuning hyperparameters online and allowing the algorithm to more robustly adjust to non-stationarity in a problem. This paper applies meta-gradient descent to derive a set…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Experience Replay · Eligibility Trace