Finite-sample Guarantees for Nash Q-learning with Linear Function Approximation
Pedro Cisneros-Velarde, Sanmi Koyejo

TL;DR
This paper provides finite-sample guarantees for Nash Q-learning with linear function approximation in large or continuous state spaces, showing its sample efficiency and near-matching single-agent RL performance.
Contribution
It extends finite-sample analysis of Nash Q-learning to linear function approximation, a significant step for large-scale multi-agent RL.
Findings
Performance nearly matches single-agent RL results under the same representation.
Finite-sample guarantees indicate the sample efficiency of Nash Q-learning with linear approximation.
Achieves polynomial gap compared to the best tabular case results.
Abstract
Nash Q-learning may be considered one of the first and most known algorithms in multi-agent reinforcement learning (MARL) for learning policies that constitute a Nash equilibrium of an underlying general-sum Markov game. Its original proof provided asymptotic guarantees and was for the tabular case. Recently, finite-sample guarantees have been provided using more modern RL techniques for the tabular case. Our work analyzes Nash Q-learning using linear function approximation -- a representation regime introduced when the state space is large or continuous -- and provides finite-sample guarantees that indicate its sample efficiency. We find that the obtained performance nearly matches an existing efficient result for single-agent RL under the same representation and has a polynomial gap when compared to the best-known result for the tabular case.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Game Theory and Applications
MethodsQ-Learning
