PAC Off-Policy Prediction of Contextual Bandits

Yilong Wan; Yuqiang Li; Xianyi Wu

arXiv:2507.16236·stat.ML·July 23, 2025

PAC Off-Policy Prediction of Contextual Bandits

Yilong Wan, Yuqiang Li, Xianyi Wu

PDF

Open Access

TL;DR

This paper introduces a PAC-based conformal prediction method for off-policy evaluation in contextual bandits, providing finite-sample guarantees and improved coverage properties for safety-critical applications.

Contribution

It develops a novel PAC-valid conformal prediction algorithm that offers theoretical coverage guarantees and demonstrates superior empirical performance.

Findings

01

The method achieves PAC-type coverage bounds in finite samples.

02

Empirical results show improved coverage accuracy over existing approaches.

03

Theoretical analysis confirms both finite-sample and asymptotic validity.

Abstract

This paper investigates off-policy evaluation in contextual bandits, aiming to quantify the performance of a target policy using data collected under a different and potentially unknown behavior policy. Recently, methods based on conformal prediction have been developed to construct reliable prediction intervals that guarantee marginal coverage in finite samples, making them particularly suited for safety-critical applications. To further achieve coverage conditional on a given offline data set, we propose a novel algorithm that constructs probably approximately correct prediction intervals. Our method builds upon a PAC-valid conformal prediction framework, and we strengthen its theoretical guarantees by establishing PAC-type bounds on coverage. We analyze both finite-sample and asymptotic properties of the proposed method, and compare its empirical performance with existing methods in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Advanced Bandit Algorithms Research