An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits
Amaury Gouverneur, Borja Rodr\'iguez-G\'alvez, Tobias J. Oechtering,, and Mikael Skoglund

TL;DR
This paper analyzes the performance of Thompson Sampling for logistic bandits using an information-theoretic approach, providing new regret bounds that depend logarithmically on the slope parameter and are independent of the number of actions.
Contribution
It establishes a novel bound on the information ratio for logistic bandits, leading to the first regret bounds with logarithmic dependence on the slope parameter and independence from action count.
Findings
Bound on information ratio: 9/2 d alpha^{-2}
Regret bound of order O(d/alpha sqrt(T log(beta T/d)))
Regret is order ~O(d sqrt(T)) when actions include the parameter space
Abstract
We study the performance of the Thompson Sampling algorithm for logistic bandit problems. In this setting, an agent receives binary rewards with probabilities determined by a logistic function, , with slope parameter , and where both the action and parameter lie within the -dimensional unit ball. Adopting the information-theoretic framework introduced by Russo and Van Roy (2016), we analyze the information ratio, a statistic that quantifies the trade-off between the immediate regret incurred and the information gained about the optimal action. We improve upon previous results by establishing that the information ratio is bounded by , where is a minimax measure of the alignment between the action space …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and ELM
MethodsFocus
