One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient   Reinforcement Learning

Cl\'ement Bonnet; Paul Caron; Thomas Barrett; Ian Davies; Alexandre; Laterre

arXiv:2111.00206·cs.LG·November 2, 2021

One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient Reinforcement Learning

Cl\'ement Bonnet, Paul Caron, Thomas Barrett, Ian Davies, Alexandre, Laterre

PDF

Open Access

TL;DR

This paper investigates the use of multi-step meta-gradient reinforcement learning, revealing increased variance issues and proposing a mixing method that balances bias and variance, improving stability and performance.

Contribution

It introduces a novel mixing meta-gradient method that reduces variance and enhances robustness in multi-step meta-gradient reinforcement learning.

Findings

01

Multi-step meta-gradients increase learning signal but also variance.

02

The mixing method reduces variance by a factor of 3.

03

The proposed approach achieves comparable or better performance on the Snake game.

Abstract

Self-tuning algorithms that adapt the learning process online encourage more effective and robust learning. Among all the methods available, meta-gradients have emerged as a promising approach. They leverage the differentiability of the learning rule with respect to some hyper-parameters to adapt them in an online fashion. Although meta-gradients can be accumulated over multiple learning steps to avoid myopic updates, this is rarely used in practice. In this work, we demonstrate that whilst multi-step meta-gradients do provide a better learning signal in expectation, this comes at the cost of a significant increase in variance, hindering performance. In the light of this analysis, we introduce a novel method mixing multiple inner steps that enjoys a more accurate and robust meta-gradient signal, essentially trading off bias and variance in meta-gradient estimation. When applied to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Model Reduction and Neural Networks