Multi-agent online learning in time-varying games
Benoit Duvocelle, Panayotis Mertikopoulos, Mathias Staudigl and, Dries Vermeulen

TL;DR
This paper studies how multi-agent online learning algorithms behave over time in games that change, showing convergence to equilibrium under certain conditions and including scenarios with limited feedback.
Contribution
It introduces convergence results for mirror descent-based policies in time-varying games, including bandit feedback scenarios, extending understanding of dynamic multi-agent learning.
Findings
Convergence to Nash equilibrium in stabilizing time-varying games.
Players' strategies remain close to the evolving equilibrium under strong monotonicity.
Results cover both gradient-based and payoff-based feedback cases.
Abstract
We examine the long-run behavior of multi-agent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit; and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient-based and payoff-based feedback - i.e., the "bandit feedback" case where players only get to observe the payoffs of their chosen actions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Game Theory and Applications
