Bandit Learning with Positive Externalities
Virag Shah, Jose Blanchet, Ramesh Johari

TL;DR
This paper investigates multiarmed bandit problems with positive externalities, revealing limitations of standard algorithms and proposing new methods that achieve optimal regret by balancing exploration and elimination.
Contribution
It introduces the Balanced Exploration algorithm and its adaptive variant, designed to handle positive externalities in bandit problems, with proven asymptotic optimality.
Findings
Standard algorithms like UCB can have linear regret under positive externalities.
Balanced Exploration effectively mitigates the effects of externalities, achieving optimal regret.
Adaptive BE further improves performance by eliminating suboptimal arms over time.
Abstract
In many platforms, user arrivals exhibit a self-reinforcing behavior: future user arrivals are likely to have preferences similar to users who were satisfied in the past. In other words, arrivals exhibit positive externalities. We study multiarmed bandit (MAB) problems with positive externalities. We show that the self-reinforcing preferences may lead standard benchmark algorithms such as UCB to exhibit linear regret. We develop a new algorithm, Balanced Exploration (BE), which explores arms carefully to avoid suboptimal convergence of arrivals before sufficient evidence is gathered. We also introduce an adaptive variant of BE which successively eliminates suboptimal arms. We analyze their asymptotic regret, and establish optimality by showing that no algorithm can perform better.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Bandit Learning with Positive Externalities· youtube
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems
