Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer   Value Function

Cl\'ement Bonnet; Laurence Midgley; Alexandre Laterre

arXiv:2211.10550·cs.LG·November 22, 2022

Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value Function

Cl\'ement Bonnet, Laurence Midgley, Alexandre Laterre

PDF

Open Access 1 Repo

TL;DR

This paper identifies a bias in meta-gradient reinforcement learning caused by using a critic trained with a different discount factor, and proposes a method to eliminate this bias by learning an outer value function, improving training stability and performance.

Contribution

The paper introduces a novel approach to remove bias in meta-gradient RL by learning an outer value function with a separate head, enhancing the accuracy of meta-gradient estimates.

Findings

01

Bias can cause catastrophic failure in meta-gradient RL.

02

Using an outer value function reduces bias and improves performance.

03

The method demonstrates significant gains in complex environments.

Abstract

Meta-gradient Reinforcement Learning (RL) allows agents to self-tune their hyper-parameters in an online fashion during training. In this paper, we identify a bias in the meta-gradient of current meta-gradient RL approaches. This bias comes from using the critic that is trained using the meta-learned discount factor for the advantage estimation in the outer objective which requires a different discount factor. Because the meta-learned discount factor is typically lower than the one used in the outer objective, the resulting bias can cause the meta-gradient to favor myopic policies. We propose a simple solution to this issue: we eliminate this bias by using an alternative, \emph{outer} value function in the estimation of the outer loss. To obtain this outer value function we add a second head to the critic network and train it alongside the classic critic, using the outer loss discount…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

instadeepai/outer-value-function-meta-rl
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural Networks and Reservoir Computing · Machine Learning and ELM