Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning   with Parameter Convergence

Sarath Pattathil; Kaiqing Zhang; Asuman Ozdaglar

arXiv:2210.12812·math.OC·March 21, 2023·1 cites

Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence

Sarath Pattathil, Kaiqing Zhang, Asuman Ozdaglar

PDF

Open Access

TL;DR

This paper analyzes the convergence issues of natural policy gradient methods in multi-agent reinforcement learning and proposes symmetric variants with guaranteed parameter convergence, supported by theoretical proofs and simulations.

Contribution

It identifies the non-convergence problem of vanilla NPG in multi-agent settings and introduces symmetric NPG variants with proven global last-iterate parameter convergence guarantees.

Findings

01

Vanilla NPG may not converge in parameters even with regularization.

02

Symmetric NPG variants achieve global last-iterate parameter convergence.

03

Simulations support theoretical convergence results.

Abstract

Multi-agent interactions are increasingly important in the context of reinforcement learning, and the theoretical foundations of policy gradient methods have attracted surging research interest. We investigate the global convergence of natural policy gradient (NPG) algorithms in multi-agent learning. We first show that vanilla NPG may not have parameter convergence, i.e., the convergence of the vector that parameterizes the policy, even when the costs are regularized (which enabled strong convergence guarantees in the policy space in the literature). This non-convergence of parameters leads to stability issues in learning, which becomes especially relevant in the function approximation setting, where we can only operate on low-dimensional parameters, instead of the high-dimensional policy. We then propose variants of the NPG algorithm, for several standard multi-agent learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Machine Learning and ELM