Multiplayer Bandit Learning, from Competition to Cooperation
Simina Br\^anzei, Yuval Peres

TL;DR
This paper investigates how competition and cooperation influence exploration strategies in multiplayer multi-armed bandit problems, revealing that competition reduces exploration while cooperation enhances it, with implications for strategic learning.
Contribution
It introduces a model analyzing the effects of different cooperation levels on exploration in multiplayer bandits, highlighting the contrasting behaviors and outcomes.
Findings
Competing players explore less than a single player.
Cooperating players explore more than a single player.
Neutral players achieve higher total rewards through mutual learning.
Abstract
The stochastic multi-armed bandit model captures the tradeoff between exploration and exploitation. We study the effects of competition and cooperation on this tradeoff. Suppose there are arms and two players, Alice and Bob. In every round, each player pulls an arm, receives the resulting reward, and observes the choice of the other player but not their reward. Alice's utility is (and similarly for Bob), where is Alice's total reward and is a cooperation parameter. At the players are competing in a zero-sum game, at , they are fully cooperating, and at , they are neutral: each player's utility is their own reward. The model is related to the economics literature on strategic experimentation, where usually players observe each other's rewards. With discount factor , the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Multiplayer Bandit Learning - From Competition to Cooperation· youtube
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Experimental Behavioral Economics Studies
