Generalized Individual Q-learning for Polymatrix Games with Partial   Observations

Ahmed Said Donmez; Muhammed O. Sayin

arXiv:2409.02663·cs.GT·September 5, 2024

Generalized Individual Q-learning for Polymatrix Games with Partial Observations

Ahmed Said Donmez, Muhammed O. Sayin

PDF

Open Access

TL;DR

This paper introduces a generalized Q-learning method for multi-agent systems with partial observations, improving convergence to equilibrium by combining belief and payoff-based learning strategies.

Contribution

It proposes a novel generalized individual Q-learning dynamics that adaptively integrates observation-based and payoff-based learning for multi-agent polymatrix games.

Findings

01

Faster convergence to quantal response equilibrium with partial observations.

02

Dynamics unify and extend existing learning algorithms like fictitious play and standard Q-learning.

03

Numerical simulations confirm improved convergence rates due to partial observations.

Abstract

This paper addresses the challenge of limited observations in non-cooperative multi-agent systems where agents can have partial access to other agents' actions. We present the generalized individual Q-learning dynamics that combine belief-based and payoff-based learning for the networked interconnections of more than two self-interested agents. This approach leverages access to opponents' actions whenever possible, demonstrably achieving a faster (guaranteed) convergence to quantal response equilibrium in multi-agent zero-sum and potential polymatrix games. Notably, the dynamics reduce to the well-studied smoothed fictitious play and individual Q-learning under full and no access to opponent actions, respectively. We further quantify the improvement in convergence rate due to observing opponents' actions through numerical simulations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Bayesian Modeling and Causal Inference · Reinforcement Learning in Robotics