Value-aware Importance Weighting for Off-policy Reinforcement Learning

Kristopher De Asis; Eric Graves; Richard S. Sutton

arXiv:2306.15625·cs.LG·June 28, 2023

Value-aware Importance Weighting for Off-policy Reinforcement Learning

Kristopher De Asis, Eric Graves, Richard S. Sutton

PDF

Open Access

TL;DR

This paper introduces value-aware importance weights for off-policy reinforcement learning, reducing variance in importance sampling while maintaining unbiasedness, leading to more stable learning algorithms.

Contribution

It proposes a novel class of importance weights that account for sample space, improving stability and accuracy in off-policy RL predictions.

Findings

01

Lower variance in importance weights compared to traditional methods

02

Unbiased estimates maintained with the new weights

03

Empirical evaluation shows improved stability in RL algorithms

Abstract

Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. It provides a strategy for re-weighting samples from a distribution to obtain unbiased estimates under another distribution. However, importance sampling weights tend to exhibit extreme variance, often leading to stability issues in practice. In this work, we consider a broader class of importance weights to correct samples in off-policy learning. We propose the use of $value-aware importance weights$ which take into account the sample space to provide lower variance, but still unbiased, estimates under a target distribution. We derive how such weights can be computed, and detail key properties of the resulting importance weights. We then extend several reinforcement learning prediction algorithms to the off-policy setting with these weights, and evaluate them empirically.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics