Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games
Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Chung-Wei Lee,, Haipeng Luo, Tuomas Sandholm

TL;DR
This paper introduces uncoupled learning dynamics for multiplayer games that achieve near-logarithmic swap regret bounds, improving previous results and also maintaining optimal regret in adversarial settings.
Contribution
The paper presents a novel uncoupled learning dynamics with time-invariant rates that bounds second-order path lengths by O(log T), leading to improved swap regret bounds in multiplayer games.
Findings
Achieves O(log T) swap regret in multiplayer games
Maintains O(√T) swap regret in adversarial regimes
Uses a novel combination of optimistic regularization and self-concordant barriers
Abstract
In this paper we establish efficient and \emph{uncoupled} learning dynamics so that, when employed by all players in a general-sum multiplayer game, the \emph{swap regret} of each player after repetitions of the game is bounded by , improving over the prior best bounds of . At the same time, we guarantee optimal swap regret in the adversarial regime as well. To obtain these results, our primary contribution is to show that when all players follow our dynamics with a \emph{time-invariant} learning rate, the \emph{second-order path lengths} of the dynamics up to time are bounded by , a fundamental property which could have further implications beyond near-optimally bounding the (swap) regret. Our proposed learning dynamics combine in a novel way \emph{optimistic} regularized learning with the use of \emph{self-concordant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
