Generalized Individual Q-learning for Polymatrix Games with Partial Observations
Ahmed Said Donmez, Muhammed O. Sayin

TL;DR
This paper introduces a generalized Q-learning method for multi-agent systems with partial observations, improving convergence to equilibrium by combining belief and payoff-based learning strategies.
Contribution
It proposes a novel generalized individual Q-learning dynamics that adaptively integrates observation-based and payoff-based learning for multi-agent polymatrix games.
Findings
Faster convergence to quantal response equilibrium with partial observations.
Dynamics unify and extend existing learning algorithms like fictitious play and standard Q-learning.
Numerical simulations confirm improved convergence rates due to partial observations.
Abstract
This paper addresses the challenge of limited observations in non-cooperative multi-agent systems where agents can have partial access to other agents' actions. We present the generalized individual Q-learning dynamics that combine belief-based and payoff-based learning for the networked interconnections of more than two self-interested agents. This approach leverages access to opponents' actions whenever possible, demonstrably achieving a faster (guaranteed) convergence to quantal response equilibrium in multi-agent zero-sum and potential polymatrix games. Notably, the dynamics reduce to the well-studied smoothed fictitious play and individual Q-learning under full and no access to opponent actions, respectively. We further quantify the improvement in convergence rate due to observing opponents' actions through numerical simulations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Bayesian Modeling and Causal Inference · Reinforcement Learning in Robotics
