On the Impossibility of Convergence of Mixed Strategies with No Regret Learning
Vidya Muthukumar, Soham Phade, Anant Sahai

TL;DR
This paper proves that in 2x2 competitive games, players using certain no-regret algorithms cannot have their mixed strategies converge to Nash equilibria, highlighting inherent stochasticity as a key challenge.
Contribution
It establishes a negative convergence result for mean-based, monotonic no-regret algorithms in repeated 2x2 games, extending to variants like Online-Mirror-Descent.
Findings
Limiting mixed strategies do not converge to Nash equilibria.
The negative result holds for broad classes of algorithms, including optimistic variants.
Stochasticity in players' realizations is a fundamental obstacle to convergence.
Abstract
We study the limiting behavior of the mixed strategies that result from optimal no-regret learning strategies in a repeated game setting where the stage game is any 2 by 2 competitive game. We consider optimal no-regret algorithms that are mean-based and monotonic in their argument. We show that for any such algorithm, the limiting mixed strategies of the players cannot converge almost surely to any Nash equilibrium. This negative result is also shown to hold under a broad relaxation of these assumptions, including popular variants of Online-Mirror-Descent with optimism and/or adaptive step-sizes. Finally, we conjecture that the monotonicity assumption can be removed, and provide partial evidence for this conjecture. Our results identify the inherent stochasticity in players' realizations as a critical factor underlying this divergence in outcomes between using the opponent's mixtures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Reinforcement Learning in Robotics
