Metatrace Actor-Critic: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control
Kenny Young, Baoxiang Wang, Matthew E. Taylor

TL;DR
This paper introduces Metatrace, a meta-gradient based step-size tuning method for online reinforcement learning control, improving learning speed and robustness to non-stationarity with both linear and nonlinear function approximation.
Contribution
The paper develops a novel meta-gradient based step-size tuning algorithm, Metatrace, for online RL control that adapts step-sizes dynamically during learning.
Findings
Metatrace speeds up learning in control tasks.
It is robust to initial step-size choices.
It effectively handles non-stationarity in RL environments.
Abstract
Reinforcement learning (RL) has had many successes in both "deep" and "shallow" settings. In both cases, significant hyperparameter tuning is often required to achieve good performance. Furthermore, when nonlinear function approximation is used, non-stationarity in the state representation can lead to learning instability. A variety of techniques exist to combat this --- most notably large experience replay buffers or the use of multiple parallel actors. These techniques come at the cost of moving away from the online RL problem as it is traditionally formulated (i.e., a single agent learning online without maintaining a large database of training examples). Meta-learning can potentially help with both these issues by tuning hyperparameters online and allowing the algorithm to more robustly adjust to non-stationarity in a problem. This paper applies meta-gradient descent to derive a set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Experience Replay · Eligibility Trace
