Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes
James P. Bailey, Georgios Piliouras

TL;DR
This paper demonstrates that in zero-sum games, gradient descent with fixed step sizes can achieve vanishing average regret and convergence to Nash equilibrium, challenging previous beliefs about the necessity of diminishing step sizes.
Contribution
It introduces the concept of 'fast and furious' learning, showing fixed step sizes can yield optimal regret bounds in simple zero-sum games without prior horizon knowledge.
Findings
Achieves BCBTD regret with fixed step sizes
Convergence of strategies to Nash equilibrium
Applicable to simple two-agent zero-sum games
Abstract
We show for the first time, to our knowledge, that it is possible to reconcile in online learning in zero-sum games two seemingly contradictory objectives: vanishing time-average regret and non-vanishing step sizes. This phenomenon, that we coin ``fast and furious" learning in games, sets a new benchmark about what is possible both in max-min optimization as well as in multi-agent systems. Our analysis does not depend on introducing a carefully tailored dynamic. Instead we focus on the most well studied online dynamic, gradient descent. Similarly, we focus on the simplest textbook class of games, two-agent two-strategy zero-sum games, such as Matching Pennies. Even for this simplest of benchmarks the best known bound for total regret, prior to our work, was the trivial one of , which is immediately applicable even to a non-learning agent. Based on a tight understanding of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
