Is Thompson Sampling Susceptible to Algorithmic Collusion?
Yi Xiong, Ningyuan Chen, Xuefeng Gao

TL;DR
This paper investigates whether Thompson sampling, a popular bandit algorithm, leads to collusion in repeated games, finding it generally converges to Nash equilibrium unless specific payoff conditions induce collusion.
Contribution
It proves convergence of Thompson sampling to Nash equilibrium in repeated games under mild conditions and introduces a novel sample-path-wise approach for the proof.
Findings
Thompson sampling converges to Nash equilibrium under certain payoff conditions.
Without these conditions, the game may result in collusive outcomes.
A new proof technique was developed due to limitations of existing stochastic approximation methods.
Abstract
When two players are engaged in a repeated game with unknown payoff matrices, they may use single-agent multi-armed bandit algorithms to choose the actions independent of each other. We show that when the players use Thompson sampling, the game dynamics converges to the Nash equilibrium under a mild assumption on the payoff matrices. Therefore, algorithmic collusion doesn't arise in this case despite the fact that the players do not intentionally deploy competitive strategies. To prove the convergence result, we find that the framework developed in stochastic approximation doesn't apply, because of the sporadic and infrequent updates of the inferior actions and the lack of Lipschitz continuity. We develop a novel sample-path-wise approach to show the convergence. However, when the payoff matrices do not satisfy the assumption, the game may converge to collusive outcomes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Opinion Dynamics and Social Influence · Advanced Bandit Algorithms Research
