Refined PAC-Bayes Bounds for Offline Bandits

Amaury Gouverneur; Tobias J. Oechtering; and Mikael Skoglund

arXiv:2502.11953·stat.ML·February 18, 2025

Refined PAC-Bayes Bounds for Offline Bandits

Amaury Gouverneur, Tobias J. Oechtering, and Mikael Skoglund

PDF

Open Access

TL;DR

This paper introduces refined PAC-Bayesian bounds for off-policy bandit learning, improving probabilistic reward estimates by optimizing parameters through a new discretization technique, leading to near-optimal bounds.

Contribution

It develops two parameter-free PAC-Bayes bounds for offline bandits using a novel parameter optimization method based on discretization.

Findings

01

Bounds are nearly optimal, matching the rate of data realization.

02

Two bounds are provided: one based on Hoeffding-Azuma, another on Bernstein.

03

The bounds improve upon previous results by Seldin et al. (2010).

Abstract

In this paper, we present refined probabilistic bounds on empirical reward estimates for off-policy learning in bandit problems. We build on the PAC-Bayesian bounds from Seldin et al. (2010) and improve on their results using a new parameter optimization approach introduced by Rodr\'iguez et al. (2024). This technique is based on a discretization of the space of possible events to optimize the "in probability" parameter. We provide two parameter-free PAC-Bayes bounds, one based on Hoeffding-Azuma's inequality and the other based on Bernstein's inequality. We prove that our bounds are almost optimal as they recover the same rate as would be obtained by setting the "in probability" parameter after the realization of the data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Distributed Sensor Networks and Detection Algorithms