Efficient and Adaptive Posterior Sampling Algorithms for Bandits
Bingshan Hu, Zhiming Huang, Tianyue H. Zhang, Mathias L\'ecuyer, Nidhi, Hegde

TL;DR
This paper improves regret bounds for Thompson Sampling in stochastic bandits and introduces two scalable, adaptive algorithms that balance utility and computational resources, suitable for large-scale applications.
Contribution
It provides a tighter regret bound for Gaussian prior Thompson Sampling and proposes two new algorithms with adjustable utility-computation trade-offs.
Findings
Tighter regret bounds for Thompson Sampling with Gaussian priors.
Introduction of TS-MA-α and TS-TD-α algorithms with adjustable parameters.
Both algorithms achieve regret bounds of O(K ln^{α+1}(T)/Δ).
Abstract
We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when , we derive a more practical bound that tightens the coefficient of the leading term %from to . Additionally, motivated by large-scale real-world applications that require scalability, adaptive computational resource allocation, and a balance in utility and computation, we propose two parameterized Thompson Sampling-based algorithms: Thompson Sampling with Model Aggregation (TS-MA-) and Thompson Sampling with Timestamp Duelling (TS-TD-), where controls the trade-off between utility and computation. Both algorithms achieve regret bound, where is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Cognitive Radio Networks and Spectrum Sensing
