Profitable Bandits
Mastane Achab, Stephan Cl\'emen\c{c}on, Aur\'elien Garivier

TL;DR
This paper introduces the profitable bandit problem, analyzing and comparing three strategies—kl-UCB, Bayes-UCB, and Thompson Sampling—for maximizing earnings, providing theoretical regret bounds and empirical performance insights.
Contribution
It formulates the profitable bandit problem, proves asymptotic optimality of three strategies, and compares their theoretical and empirical performances.
Findings
Thompson Sampling shows a slight practical advantage.
All three strategies are asymptotically optimal.
Simple proofs highlight similarities and differences.
Abstract
Originally motivated by default risk management applications, this paper investigates a novel problem, referred to as the profitable bandit problem here. At each step, an agent chooses a subset of the K possible actions. For each action chosen, she then receives the sum of a random number of rewards. Her objective is to maximize her cumulated earnings. We adapt and study three well-known strategies in this purpose, that were proved to be most efficient in other settings: kl-UCB, Bayes-UCB and Thompson Sampling. For each of them, we prove a finite time regret bound which, together with a lower bound we obtain as well, establishes asymptotic optimality. Our goal is also to compare these three strategies from a theoretical and empirical perspective both at the same time. We give simple, self-contained proofs that emphasize their similarities, as well as their differences. While both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Risk and Portfolio Optimization
