Efficient Policy Evaluation with Offline Data Informed Behavior Policy   Design

Shuze Liu; Shangtong Zhang

arXiv:2301.13734·cs.LG·October 3, 2024

Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design

Shuze Liu, Shangtong Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces novel, data-efficient methods for policy evaluation in reinforcement learning that reduce variance in Monte Carlo estimators using offline data, without bias, and with improved empirical performance.

Contribution

The paper proposes a closed-form behavior policy that reduces estimator variance and algorithms to learn this policy from offline data, enhancing data efficiency and performance.

Findings

01

Reduced variance in Monte Carlo estimators.

02

Better empirical performance across diverse environments.

03

Fewer offline data requirements.

Abstract

Most reinforcement learning practitioners evaluate their policies with online Monte Carlo estimators for either hyperparameter tuning or testing different algorithmic design choices, where the policy is repeatedly executed in the environment to get the average outcome. Such massive interactions with the environment are prohibitive in many scenarios. In this paper, we propose novel methods that improve the data efficiency of online Monte Carlo estimators while maintaining their unbiasedness. We first propose a tailored closed-form behavior policy that provably reduces the variance of an online Monte Carlo estimator. We then design efficient algorithms to learn this closed-form behavior policy from previously collected offline data. Theoretical analysis is provided to characterize how the behavior policy learning error affects the amount of reduced variance. Compared with previous works,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shuzeliu/behavior-policy-design-for-policy-evaluation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications