Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
Josiah P. Hanna, Peter Stone, Scott Niekum

TL;DR
This paper introduces bootstrapping methods using learned models to estimate confidence bounds on policy performance in off-policy evaluation, applicable to continuous and discrete spaces, with theoretical bias analysis and empirical validation.
Contribution
It proposes novel bootstrapping off-policy evaluation techniques with theoretical bias bounds, extending applicability to continuous state spaces and analyzing their effectiveness.
Findings
Model-based bounds can be biased depending on model accuracy
Bootstrapping methods perform well with limited data in continuous spaces
Theoretical bias bounds help understand when model-based evaluation is reliable
Abstract
For an autonomous agent, executing a poor policy may be costly or even dangerous. For such agents, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. Current methods for exact high confidence off-policy evaluation that use importance sampling require a substantial amount of data to achieve a tight lower bound. Existing model-based methods only address the problem in discrete state spaces. Since exact bounds are intractable for many domains we trade off strict guarantees of safety for more data-efficient approximate bounds. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces. Since direct use of a model may introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
