PAC-Bayesian Offline Contextual Bandits With Guarantees

Otmane Sakhi; Pierre Alquier; Nicolas Chopin

arXiv:2210.13132·stat.ML·May 30, 2023·1 cites

PAC-Bayesian Offline Contextual Bandits With Guarantees

Otmane Sakhi, Pierre Alquier, Nicolas Chopin

PDF

Open Access

TL;DR

This paper presents a PAC-Bayesian framework for offline contextual bandit learning that offers tighter generalization bounds, guarantees policy improvement, and does not require hyperparameter tuning, demonstrated through extensive experiments.

Contribution

It introduces a novel PAC-Bayesian approach for offline contextual bandits with tighter bounds and guarantees, avoiding intractable derivations of previous methods.

Findings

01

Tighter generalization bounds than existing methods

02

Algorithms that optimize bounds directly for policy improvement

03

Effective in practical scenarios with performance guarantees

Abstract

This paper introduces a new principled approach for off-policy learning in contextual bandits. Unlike previous work, our approach does not derive learning principles from intractable or loose bounds. We analyse the problem through the PAC-Bayesian lens, interpreting policies as mixtures of decision rules. This allows us to propose novel generalization bounds and provide tractable algorithms to optimize them. We prove that the derived bounds are tighter than their competitors, and can be optimized directly to confidently improve upon the logging policy offline. Our approach learns policies with guarantees, uses all available data and does not require tuning additional hyperparameters on held-out sets. We demonstrate through extensive experiments the effectiveness of our approach in providing performance guarantees in practical scenarios.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Machine Learning and Data Classification