One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient Reinforcement Learning
Cl\'ement Bonnet, Paul Caron, Thomas Barrett, Ian Davies, Alexandre, Laterre

TL;DR
This paper investigates the use of multi-step meta-gradient reinforcement learning, revealing increased variance issues and proposing a mixing method that balances bias and variance, improving stability and performance.
Contribution
It introduces a novel mixing meta-gradient method that reduces variance and enhances robustness in multi-step meta-gradient reinforcement learning.
Findings
Multi-step meta-gradients increase learning signal but also variance.
The mixing method reduces variance by a factor of 3.
The proposed approach achieves comparable or better performance on the Snake game.
Abstract
Self-tuning algorithms that adapt the learning process online encourage more effective and robust learning. Among all the methods available, meta-gradients have emerged as a promising approach. They leverage the differentiability of the learning rule with respect to some hyper-parameters to adapt them in an online fashion. Although meta-gradients can be accumulated over multiple learning steps to avoid myopic updates, this is rarely used in practice. In this work, we demonstrate that whilst multi-step meta-gradients do provide a better learning signal in expectation, this comes at the cost of a significant increase in variance, hindering performance. In the light of this analysis, we introduce a novel method mixing multiple inner steps that enjoys a more accurate and robust meta-gradient signal, essentially trading off bias and variance in meta-gradient estimation. When applied to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Model Reduction and Neural Networks
