Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value Function
Cl\'ement Bonnet, Laurence Midgley, Alexandre Laterre

TL;DR
This paper identifies a bias in meta-gradient reinforcement learning caused by using a critic trained with a different discount factor, and proposes a method to eliminate this bias by learning an outer value function, improving training stability and performance.
Contribution
The paper introduces a novel approach to remove bias in meta-gradient RL by learning an outer value function with a separate head, enhancing the accuracy of meta-gradient estimates.
Findings
Bias can cause catastrophic failure in meta-gradient RL.
Using an outer value function reduces bias and improves performance.
The method demonstrates significant gains in complex environments.
Abstract
Meta-gradient Reinforcement Learning (RL) allows agents to self-tune their hyper-parameters in an online fashion during training. In this paper, we identify a bias in the meta-gradient of current meta-gradient RL approaches. This bias comes from using the critic that is trained using the meta-learned discount factor for the advantage estimation in the outer objective which requires a different discount factor. Because the meta-learned discount factor is typically lower than the one used in the outer objective, the resulting bias can cause the meta-gradient to favor myopic policies. We propose a simple solution to this issue: we eliminate this bias by using an alternative, \emph{outer} value function in the estimation of the outer loss. To obtain this outer value function we add a second head to the critic network and train it alongside the classic critic, using the outer loss discount…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural Networks and Reservoir Computing · Machine Learning and ELM
