Bandit Learning with Positive Externalities

Virag Shah; Jose Blanchet; Ramesh Johari

arXiv:1802.05693·cs.LG·March 8, 2019·1 cites

Bandit Learning with Positive Externalities

Virag Shah, Jose Blanchet, Ramesh Johari

PDF

Open Access 1 Video

TL;DR

This paper investigates multiarmed bandit problems with positive externalities, revealing limitations of standard algorithms and proposing new methods that achieve optimal regret by balancing exploration and elimination.

Contribution

It introduces the Balanced Exploration algorithm and its adaptive variant, designed to handle positive externalities in bandit problems, with proven asymptotic optimality.

Findings

01

Standard algorithms like UCB can have linear regret under positive externalities.

02

Balanced Exploration effectively mitigates the effects of externalities, achieving optimal regret.

03

Adaptive BE further improves performance by eliminating suboptimal arms over time.

Abstract

In many platforms, user arrivals exhibit a self-reinforcing behavior: future user arrivals are likely to have preferences similar to users who were satisfied in the past. In other words, arrivals exhibit positive externalities. We study multiarmed bandit (MAB) problems with positive externalities. We show that the self-reinforcing preferences may lead standard benchmark algorithms such as UCB to exhibit linear regret. We develop a new algorithm, Balanced Exploration (BE), which explores arms carefully to avoid suboptimal convergence of arrivals before sufficient evidence is gathered. We also introduce an adaptive variant of BE which successively eliminates suboptimal arms. We analyze their asymptotic regret, and establish optimality by showing that no algorithm can perform better.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Bandit Learning with Positive Externalities· youtube

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems