Bandit Social Learning with Exploration Episodes

Kiarash Banihashem; Natalie Collina; Aleksandrs Slivkins

arXiv:2602.05835·cs.GT·February 6, 2026

Bandit Social Learning with Exploration Episodes

Kiarash Banihashem, Natalie Collina, Aleksandrs Slivkins

PDF

Open Access

TL;DR

This paper analyzes social learning dynamics where agents follow bandit protocols within episodes, revealing that aggregate exploration often fails, leading to linear regret growth, and emphasizing the need for external exploration mechanisms.

Contribution

It demonstrates that self-interested agents following simple bandit protocols with episodes typically fail to explore effectively, resulting in linear regret growth over time.

Findings

01

Aggregate exploration fails in typical social learning scenarios.

02

Bayesian regret grows linearly over time despite internal incentives.

03

External exploration mechanisms are necessary for effective learning.

Abstract

We study a stylized social learning dynamics where self-interested agents collectively follow a simple multi-armed bandit protocol. Each agent controls an ``episode": a short sequence of consecutive decisions. Motivating applications include users repeatedly interacting with an AI, or repeatedly shopping at a marketplace. While agents are incentivized to explore within their respective episodes, we show that the aggregate exploration fails: e.g., its Bayesian regret grows linearly over time. In fact, such failure is a (very) typical case, not just a worst-case scenario. This conclusion persists even if an agent's per-episode utility is some fixed function of the per-round outcomes: e.g., $min$ or $max$ , not just the sum. Thus, externally driven exploration is needed even when some amount of exploration happens organically.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Machine Learning and Algorithms