Bayesian Incentive-Compatible Bandit Exploration
Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis

TL;DR
This paper develops a Bayesian incentive-compatible bandit algorithm that encourages individual decision-makers to balance exploration and exploitation, maximizing social welfare with asymptotically optimal regret.
Contribution
It introduces a novel incentive-compatible bandit algorithm and a black-box reduction that extends any bandit algorithm to be incentive-compatible with minimal regret increase.
Findings
Achieves asymptotically optimal regret among incentive-compatible algorithms.
Provides a general reduction applicable to contextual and feedback-rich bandit settings.
Ensures incentive compatibility under Bayesian priors for decision-makers.
Abstract
Individual decision-makers consume information revealed by the previous decision makers, and produce information that may help in future decisions. This phenomenon is common in a wide range of scenarios in the Internet economy, as well as in other domains such as medical decisions. Each decision-maker would individually prefer to "exploit": select an action with the highest expected reward given her current information. At the same time, each decision-maker would prefer previous decision-makers to "explore", producing information about the rewards of various actions. A social planner, by means of carefully designed information disclosure, can incentivize the agents to balance the exploration and exploitation so as to maximize social welfare. We formulate this problem as a multi-armed bandit problem (and various generalizations thereof) under incentive-compatibility constraints induced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
