Unifying Gradient Estimators for Meta-Reinforcement Learning via   Off-Policy Evaluation

Yunhao Tang; Tadashi Kozuno; Mark Rowland; R\'emi Munos; Michal Valko

arXiv:2106.13125·cs.LG·November 4, 2021·1 cites

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Yunhao Tang, Tadashi Kozuno, Mark Rowland, R\'emi Munos, Michal Valko

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces a unified framework for estimating higher-order derivatives in meta-reinforcement learning using off-policy evaluation, addressing bias and variance issues and enabling practical implementation with auto-differentiation.

Contribution

It unifies existing Hessian estimation methods under a common framework and proposes new estimators that improve practical performance.

Findings

01

Framework clarifies bias-variance trade-offs in Hessian estimates

02

New estimators are easily implemented with auto-differentiation

03

Performance gains demonstrated in meta-reinforcement learning tasks

Abstract

Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions. This is challenging from an implementation perspective, as repeatedly differentiating policy gradient estimates may lead to biased Hessian estimates. In this work, we provide a unifying framework for estimating higher-order derivatives of value functions, based on off-policy evaluation. Our framework interprets a number of prior approaches as special cases and elucidates the bias and variance trade-off of Hessian estimates. This framework also opens the door to a new family of estimates, which can be easily implemented with auto-differentiation libraries, and lead to performance gains in practice.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

robintyh1/neurips2021-meta-gradient-offpolicy-evaluation
jaxOfficial

Datasets

misovalko/my-research-papers
dataset· 21 dl
21 dl

Videos

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials · Adversarial Robustness in Machine Learning