Multi-agent online learning in time-varying games

Benoit Duvocelle; Panayotis Mertikopoulos; Mathias Staudigl and; Dries Vermeulen

arXiv:1809.03066·cs.GT·August 11, 2022·1 cites

Multi-agent online learning in time-varying games

Benoit Duvocelle, Panayotis Mertikopoulos, Mathias Staudigl and, Dries Vermeulen

PDF

Open Access

TL;DR

This paper studies how multi-agent online learning algorithms behave over time in games that change, showing convergence to equilibrium under certain conditions and including scenarios with limited feedback.

Contribution

It introduces convergence results for mirror descent-based policies in time-varying games, including bandit feedback scenarios, extending understanding of dynamic multi-agent learning.

Findings

01

Convergence to Nash equilibrium in stabilizing time-varying games.

02

Players' strategies remain close to the evolving equilibrium under strong monotonicity.

03

Results cover both gradient-based and payoff-based feedback cases.

Abstract

We examine the long-run behavior of multi-agent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit; and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient-based and payoff-based feedback - i.e., the "bandit feedback" case where players only get to observe the payoffs of their chosen actions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Game Theory and Applications