Value-aware Importance Weighting for Off-policy Reinforcement Learning
Kristopher De Asis, Eric Graves, Richard S. Sutton

TL;DR
This paper introduces value-aware importance weights for off-policy reinforcement learning, reducing variance in importance sampling while maintaining unbiasedness, leading to more stable learning algorithms.
Contribution
It proposes a novel class of importance weights that account for sample space, improving stability and accuracy in off-policy RL predictions.
Findings
Lower variance in importance weights compared to traditional methods
Unbiased estimates maintained with the new weights
Empirical evaluation shows improved stability in RL algorithms
Abstract
Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. It provides a strategy for re-weighting samples from a distribution to obtain unbiased estimates under another distribution. However, importance sampling weights tend to exhibit extreme variance, often leading to stability issues in practice. In this work, we consider a broader class of importance weights to correct samples in off-policy learning. We propose the use of which take into account the sample space to provide lower variance, but still unbiased, estimates under a target distribution. We derive how such weights can be computed, and detail key properties of the resulting importance weights. We then extend several reinforcement learning prediction algorithms to the off-policy setting with these weights, and evaluate them empirically.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
