The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation
Zhe Feng, David C. Parkes, Haifeng Xu

TL;DR
This paper investigates how stochastic bandit algorithms like UCB, psilon-Greedy, and Thompson Sampling perform under strategic manipulation by rational arms, showing they remain robust with regret bounds tight under certain conditions.
Contribution
It provides the first analysis of the robustness of classic bandit algorithms against strategic manipulation, establishing tight regret bounds even with adaptive arm strategies.
Findings
All three algorithms achieve ( ext{max}{B}, K ext{ln} T) regret bounds.
The regret bounds are tight even under Nash equilibrium strategies.
Robustness holds as long as total manipulation budget B is o(T).
Abstract
Motivated by economic applications such as recommender systems, we study the behavior of stochastic bandits algorithms under \emph{strategic behavior} conducted by rational actors, i.e., the arms. Each arm is a \emph{self-interested} strategic player who can modify its own reward whenever pulled, subject to a cross-period budget constraint, in order to maximize its own expected number of times of being pulled. We analyze the robustness of three popular bandit algorithms: UCB, -Greedy, and Thompson Sampling. We prove that all three algorithms achieve a regret upper bound where is the total budget across arms, is the total number of arms and is length of the time horizon. This regret guarantee holds under \emph{arbitrary adaptive} manipulation strategy of arms. Our second set of main results shows that this regret bound is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Reinforcement Learning in Robotics
