PAC-Bayesian Analysis of Martingales and Multiarmed Bandits
Yevgeny Seldin, Fran\c{c}ois Laviolette, John Shawe-Taylor and, Jan Peters, Peter Auer

TL;DR
This paper introduces new PAC-Bayesian methods for analyzing dependent sequences like martingales and multiarmed bandits, expanding the theoretical toolkit for reinforcement learning and related fields.
Contribution
It presents two novel PAC-Bayesian approaches for dependent variables and applies them to derive generalization and regret bounds in bandit problems.
Findings
New lemma for bounding expectations of dependent variables
Integration of Hoeffding-Azuma with PAC-Bayesian analysis
Expanded applicability of PAC-Bayesian methods in reinforcement learning
Abstract
We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concentration of martingale values. Our second approach is based on integration of Hoeffding-Azuma inequality with PAC-Bayesian analysis. We also introduce a way to apply PAC-Bayesian analysis in situation of limited feedback. We combine the new tools to derive PAC-Bayesian generalization and regret bounds for the multiarmed bandit problem. Although our regret bound is not yet as tight as state-of-the-art regret bounds based on other well-established techniques, our results significantly expand the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Stochastic processes and financial applications
