An Investigation of the Bias-Variance Tradeoff in Meta-Gradients
Risto Vuorio, Jacob Beck, Shimon Whiteson, Jakob Foerster, Gregory, Farquhar

TL;DR
This paper analyzes the bias-variance tradeoff in meta-gradient estimation for reinforcement learning, comparing methods like Hessian-based estimators, truncated backpropagation, and evolution strategies, especially in long-horizon settings.
Contribution
It provides a detailed empirical study disentangling bias and variance sources in meta-gradient estimators, highlighting limitations of Hessian-based methods and exploring alternatives.
Findings
Hessian estimators like DiCE introduce bias and variance.
Truncated backpropagation reduces bias but increases variance.
Evolution strategies offer a different tradeoff in long-horizon meta-learning.
Abstract
Meta-gradients provide a general approach for optimizing the meta-parameters of reinforcement learning (RL) algorithms. Estimation of meta-gradients is central to the performance of these meta-algorithms, and has been studied in the setting of MAML-style short-horizon meta-RL problems. In this context, prior work has investigated the estimation of the Hessian of the RL objective, as well as tackling the problem of credit assignment to pre-adaptation behavior by making a sampling correction. However, we show that Hessian estimation, implemented for example by DiCE and its variants, always adds bias and can also add variance to meta-gradient estimation. Meanwhile, meta-gradient estimation has been studied less in the important long-horizon setting, where backpropagation through the full inner optimization trajectories is not feasible. We study the bias and variance tradeoff arising from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Multi-Objective Optimization Algorithms · Model Reduction and Neural Networks
