An Information-Theoretic Analysis of Thompson Sampling for Logistic   Bandits

Amaury Gouverneur; Borja Rodr\'iguez-G\'alvez; Tobias J. Oechtering,; and Mikael Skoglund

arXiv:2412.02861·stat.ML·February 21, 2025

An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits

Amaury Gouverneur, Borja Rodr\'iguez-G\'alvez, Tobias J. Oechtering,, and Mikael Skoglund

PDF

Open Access

TL;DR

This paper analyzes the performance of Thompson Sampling for logistic bandits using an information-theoretic approach, providing new regret bounds that depend logarithmically on the slope parameter and are independent of the number of actions.

Contribution

It establishes a novel bound on the information ratio for logistic bandits, leading to the first regret bounds with logarithmic dependence on the slope parameter and independence from action count.

Findings

01

Bound on information ratio: 9/2 d alpha^{-2}

02

Regret bound of order O(d/alpha sqrt(T log(beta T/d)))

03

Regret is order ~O(d sqrt(T)) when actions include the parameter space

Abstract

We study the performance of the Thompson Sampling algorithm for logistic bandit problems. In this setting, an agent receives binary rewards with probabilities determined by a logistic function, $exp (β ⟨ a, θ ⟩) / (1 + exp (β ⟨ a, θ ⟩))$ , with slope parameter $β > 0$ , and where both the action $a \in A$ and parameter $θ \in O$ lie within the $d$ -dimensional unit ball. Adopting the information-theoretic framework introduced by Russo and Van Roy (2016), we analyze the information ratio, a statistic that quantifies the trade-off between the immediate regret incurred and the information gained about the optimal action. We improve upon previous results by establishing that the information ratio is bounded by $\frac{9}{2} d α^{- 2}$ , where $α$ is a minimax measure of the alignment between the action space $A$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and ELM

MethodsFocus