Efficient and Adaptive Posterior Sampling Algorithms for Bandits

Bingshan Hu; Zhiming Huang; Tianyue H. Zhang; Mathias L\'ecuyer; Nidhi; Hegde

arXiv:2405.01010·cs.LG·May 3, 2024

Efficient and Adaptive Posterior Sampling Algorithms for Bandits

Bingshan Hu, Zhiming Huang, Tianyue H. Zhang, Mathias L\'ecuyer, Nidhi, Hegde

PDF

Open Access

TL;DR

This paper improves regret bounds for Thompson Sampling in stochastic bandits and introduces two scalable, adaptive algorithms that balance utility and computational resources, suitable for large-scale applications.

Contribution

It provides a tighter regret bound for Gaussian prior Thompson Sampling and proposes two new algorithms with adjustable utility-computation trade-offs.

Findings

01

Tighter regret bounds for Thompson Sampling with Gaussian priors.

02

Introduction of TS-MA-α and TS-TD-α algorithms with adjustable parameters.

03

Both algorithms achieve regret bounds of O(K ln^{α+1}(T)/Δ).

Abstract

We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \leq 288 e^{64}$ , we derive a more practical bound that tightens the coefficient of the leading term %from $288 e^{64}$ to $1270$ . Additionally, motivated by large-scale real-world applications that require scalability, adaptive computational resource allocation, and a balance in utility and computation, we propose two parameterized Thompson Sampling-based algorithms: Thompson Sampling with Model Aggregation (TS-MA- $α$ ) and Thompson Sampling with Timestamp Duelling (TS-TD- $α$ ), where $α \in [0, 1]$ controls the trade-off between utility and computation. Both algorithms achieve $O (K ln^{α + 1} (T) /Δ)$ regret bound, where $K$ is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Cognitive Radio Networks and Spectrum Sensing