When and why randomised exploration works (in linear bandits)

Marc Abeille; David Janz; Ciara Pike-Burke

arXiv:2502.08870·cs.LG·February 14, 2025

When and why randomised exploration works (in linear bandits)

Marc Abeille, David Janz, Ciara Pike-Burke

PDF

Open Access 4 Reviews

TL;DR

This paper analyzes when and why randomized exploration algorithms like Thompson sampling are effective in linear bandits, providing new regret bounds without relying on traditional optimism assumptions.

Contribution

It introduces a novel analysis method for randomized exploration in linear bandits, showing Thompson sampling can achieve optimal regret bounds in certain smooth, convex action spaces.

Findings

01

Thompson sampling achieves $O(d\,\sqrt{n}\log(n))$ regret in smooth, convex action spaces.

02

First demonstration of optimal dimension dependence for Thompson sampling in non-trivial linear bandit settings.

03

Analysis does not depend on forced optimism or posterior inflation techniques.

Abstract

We provide an approach for the analysis of randomised exploration algorithms like Thompson sampling that does not rely on forced optimism or posterior inflation. With this, we demonstrate that in the $d$ -dimensional linear bandit setting, when the action space is smooth and strongly convex, randomised exploration algorithms enjoy an $n$ -step regret bound of the order $O (d n lo g (n))$ . Notably, this shows for the first time that there exist non-trivial linear bandit settings where Thompson sampling can achieve optimal dimension dependence in the regret.

Peer Reviews

Decision·ALT 2025

Reviewer 01Rating · AcceptConfidence 4

Reviewer 02Rating 7Confidence 3

Reviewer 03Rating 8Confidence 4

Reviewer 04Rating 7Confidence 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Auction Theory and Applications