Finite-sample Guarantees for Nash Q-learning with Linear Function   Approximation

Pedro Cisneros-Velarde; Sanmi Koyejo

arXiv:2303.00177·cs.LG·March 2, 2023·1 cites

Finite-sample Guarantees for Nash Q-learning with Linear Function Approximation

Pedro Cisneros-Velarde, Sanmi Koyejo

PDF

Open Access

TL;DR

This paper provides finite-sample guarantees for Nash Q-learning with linear function approximation in large or continuous state spaces, showing its sample efficiency and near-matching single-agent RL performance.

Contribution

It extends finite-sample analysis of Nash Q-learning to linear function approximation, a significant step for large-scale multi-agent RL.

Findings

01

Performance nearly matches single-agent RL results under the same representation.

02

Finite-sample guarantees indicate the sample efficiency of Nash Q-learning with linear approximation.

03

Achieves polynomial gap compared to the best tabular case results.

Abstract

Nash Q-learning may be considered one of the first and most known algorithms in multi-agent reinforcement learning (MARL) for learning policies that constitute a Nash equilibrium of an underlying general-sum Markov game. Its original proof provided asymptotic guarantees and was for the tabular case. Recently, finite-sample guarantees have been provided using more modern RL techniques for the tabular case. Our work analyzes Nash Q-learning using linear function approximation -- a representation regime introduced when the state space is large or continuous -- and provides finite-sample guarantees that indicate its sample efficiency. We find that the obtained performance nearly matches an existing efficient result for single-agent RL under the same representation and has a polynomial gap when compared to the best-known result for the tabular case.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Game Theory and Applications

MethodsQ-Learning