Thompson Sampling for Budgeted Multi-armed Bandits

Yingce Xia; Haifang Li; Tao Qin; Nenghai Yu; Tie-Yan Liu

arXiv:1505.00146·cs.LG·May 4, 2015·31 cites

Thompson Sampling for Budgeted Multi-armed Bandits

Yingce Xia, Haifang Li, Tao Qin, Nenghai Yu, Tie-Yan Liu

PDF

Open Access

TL;DR

This paper extends Thompson sampling to budgeted multi-armed bandits with random costs, providing regret bounds and demonstrating effectiveness through simulations.

Contribution

It introduces a Thompson sampling algorithm for Budgeted MAB with Bernoulli rewards and costs, and extends it to general distributions with similar regret guarantees.

Findings

01

Regret bound of O(ln B) for Bernoulli case

02

Extension to general distributions with similar regret

03

Simulation results confirm effectiveness

Abstract

Thompson sampling is one of the earliest randomized algorithms for multi-armed bandits (MAB). In this paper, we extend the Thompson sampling to Budgeted MAB, where there is random cost for pulling an arm and the total cost is constrained by a budget. We start with the case of Bernoulli bandits, in which the random rewards (costs) of an arm are independently sampled from a Bernoulli distribution. To implement the Thompson sampling algorithm in this case, at each round, we sample two numbers from the posterior distributions of the reward and cost for each arm, obtain their ratio, select the arm with the maximum ratio, and then update the posterior distributions. We prove that the distribution-dependent regret bound of this algorithm is $O (ln B)$ , where $B$ denotes the budget. By introducing a Bernoulli trial, we further extend this algorithm to the setting that the rewards (costs) are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms