Bandits with Mean Bounds

Nihal Sharma; Soumya Basu; Karthikeyan Shanmugam; Sanjay Shakkottai

arXiv:2002.08405·cs.LG·October 29, 2024·1 cites

Bandits with Mean Bounds

Nihal Sharma, Soumya Basu, Karthikeyan Shanmugam, Sanjay Shakkottai

PDF

Open Access

TL;DR

This paper introduces new algorithms for bandit problems that leverage side information in the form of mean bounds, leading to tighter estimates and improved exploration strategies, with proven regret bounds.

Contribution

It develops R-OFUL and GLUE algorithms that utilize mean bounds to enhance exploration and reduce regret in bandit settings, including linear and stochastic cases.

Findings

01

Regret bounds are never worse than standard OFUL and UCB.

02

Algorithms adapt exploration based on inferred mean bounds.

03

Applicable to learning from confounded logs.

Abstract

We study a variant of the bandit problem where side information in the form of bounds on the mean of each arm is provided. We prove that these translate to tighter estimates of subgaussian factors and develop novel algorithms that exploit these estimates. In the linear setting, we present the Restricted-set OFUL (R-OFUL) algorithm that additionally uses the geometric properties of the problem to (potentially) restrict the set of arms being played and reduce exploration rates for suboptimal arms. In the stochastic case, we propose the non-optimistic Global Under-Explore (GLUE) algorithm which employs the inferred subgaussian estimates to adapt the rate of exploration for the arms. We analyze the regret of R-OFUL and GLUE, showing that our regret upper bounds are never worse than that of the standard OFUL and UCB algorithms respectively. Further, we also consider a practically motivated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research