Bandit Social Learning with Exploration Episodes
Kiarash Banihashem, Natalie Collina, Aleksandrs Slivkins

TL;DR
This paper analyzes social learning dynamics where agents follow bandit protocols within episodes, revealing that aggregate exploration often fails, leading to linear regret growth, and emphasizing the need for external exploration mechanisms.
Contribution
It demonstrates that self-interested agents following simple bandit protocols with episodes typically fail to explore effectively, resulting in linear regret growth over time.
Findings
Aggregate exploration fails in typical social learning scenarios.
Bayesian regret grows linearly over time despite internal incentives.
External exploration mechanisms are necessary for effective learning.
Abstract
We study a stylized social learning dynamics where self-interested agents collectively follow a simple multi-armed bandit protocol. Each agent controls an ``episode": a short sequence of consecutive decisions. Motivating applications include users repeatedly interacting with an AI, or repeatedly shopping at a marketplace. While agents are incentivized to explore within their respective episodes, we show that the aggregate exploration fails: e.g., its Bayesian regret grows linearly over time. In fact, such failure is a (very) typical case, not just a worst-case scenario. This conclusion persists even if an agent's per-episode utility is some fixed function of the per-round outcomes: e.g., or , not just the sum. Thus, externally driven exploration is needed even when some amount of exploration happens organically.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Machine Learning and Algorithms
