Low Variance Off-policy Evaluation with State-based Importance Sampling

David M. Bossens; Philip S. Thomas

arXiv:2212.03932·cs.LG·May 7, 2024

Low Variance Off-policy Evaluation with State-based Importance Sampling

David M. Bossens, Philip S. Thomas

PDF

Open Access 1 Repo

TL;DR

This paper introduces state-based importance sampling estimators for off-policy evaluation in reinforcement learning, significantly reducing variance and improving accuracy by selectively dropping states from importance weight calculations.

Contribution

It proposes novel state-based importance sampling methods that lower variance in off-policy evaluation, applicable across various existing estimators.

Findings

01

State-based estimators consistently reduce variance.

02

Improved accuracy over traditional importance sampling methods.

03

Effective across multiple off-policy evaluation techniques.

Abstract

In many domains, the exploration process of reinforcement learning will be too costly as it requires trying out suboptimal policies, resulting in a need for off-policy evaluation, in which a target policy is evaluated based on data collected from a known behaviour policy. In this context, importance sampling estimators provide estimates for the expected return by weighting the trajectory based on the probability ratio of the target policy and the behaviour policy. Unfortunately, such estimators have a high variance and therefore a large mean squared error. This paper proposes state-based importance sampling estimators which reduce the variance by dropping certain states from the computation of the importance weight. To illustrate their applicability, we demonstrate state-based variants of ordinary importance sampling, weighted importance sampling, per-decision importance sampling,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bossdm/importancesampling
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics