PAC Off-Policy Prediction of Contextual Bandits
Yilong Wan, Yuqiang Li, Xianyi Wu

TL;DR
This paper introduces a PAC-based conformal prediction method for off-policy evaluation in contextual bandits, providing finite-sample guarantees and improved coverage properties for safety-critical applications.
Contribution
It develops a novel PAC-valid conformal prediction algorithm that offers theoretical coverage guarantees and demonstrates superior empirical performance.
Findings
The method achieves PAC-type coverage bounds in finite samples.
Empirical results show improved coverage accuracy over existing approaches.
Theoretical analysis confirms both finite-sample and asymptotic validity.
Abstract
This paper investigates off-policy evaluation in contextual bandits, aiming to quantify the performance of a target policy using data collected under a different and potentially unknown behavior policy. Recently, methods based on conformal prediction have been developed to construct reliable prediction intervals that guarantee marginal coverage in finite samples, making them particularly suited for safety-critical applications. To further achieve coverage conditional on a given offline data set, we propose a novel algorithm that constructs probably approximately correct prediction intervals. Our method builds upon a PAC-valid conformal prediction framework, and we strengthen its theoretical guarantees by establishing PAC-type bounds on coverage. We analyze both finite-sample and asymptotic properties of the proposed method, and compare its empirical performance with existing methods in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Bandit Algorithms Research
