Bootstrapping with Models: Confidence Intervals for Off-Policy   Evaluation

Josiah P. Hanna; Peter Stone; Scott Niekum

arXiv:1606.06126·cs.AI·September 25, 2018

Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

Josiah P. Hanna, Peter Stone, Scott Niekum

PDF

TL;DR

This paper introduces bootstrapping methods using learned models to estimate confidence bounds on policy performance in off-policy evaluation, applicable to continuous and discrete spaces, with theoretical bias analysis and empirical validation.

Contribution

It proposes novel bootstrapping off-policy evaluation techniques with theoretical bias bounds, extending applicability to continuous state spaces and analyzing their effectiveness.

Findings

01

Model-based bounds can be biased depending on model accuracy

02

Bootstrapping methods perform well with limited data in continuous spaces

03

Theoretical bias bounds help understand when model-based evaluation is reliable

Abstract

For an autonomous agent, executing a poor policy may be costly or even dangerous. For such agents, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. Current methods for exact high confidence off-policy evaluation that use importance sampling require a substantial amount of data to achieve a tight lower bound. Existing model-based methods only address the problem in discrete state spaces. Since exact bounds are intractable for many domains we trade off strict guarantees of safety for more data-efficient approximate bounds. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces. Since direct use of a model may introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.