Polymatrix Competitive Gradient Descent

Jeffrey Ma; Alistair Letcher; Florian Sch\"afer; Yuanyuan Shi; and; Anima Anandkumar

arXiv:2111.08565·cs.LG·November 17, 2021

Polymatrix Competitive Gradient Descent

Jeffrey Ma, Alistair Letcher, Florian Sch\"afer, Yuanyuan Shi, and, Anima Anandkumar

PDF

Open Access

TL;DR

This paper introduces Polymatrix Competitive Gradient Descent (PCGD), a new method for solving multi-agent competitive optimization problems that converges locally and outperforms existing algorithms in multi-agent reinforcement learning tasks.

Contribution

The paper proposes PCGD, a novel algorithm for general-sum multi-agent games, with proven local convergence and efficient computation, applicable to multi-agent reinforcement learning.

Findings

01

PCGD converges locally to stable fixed points in n-player general-sum games.

02

PCGD outperforms existing methods like simultaneous gradient descent and extragradient in speed and effectiveness.

03

Agents trained with PCGD achieve better performance in multi-agent RL environments.

Abstract

Many economic games and machine learning approaches can be cast as competitive optimization problems where multiple agents are minimizing their respective objective function, which depends on all agents' actions. While gradient descent is a reliable basic workhorse for single-agent optimization, it often leads to oscillation in competitive optimization. In this work we propose polymatrix competitive gradient descent (PCGD) as a method for solving general sum competitive optimization involving arbitrary numbers of agents. The updates of our method are obtained as the Nash equilibria of a local polymatrix approximation with a quadratic regularization, and can be computed efficiently by solving a linear system of equations. We prove local convergence of PCGD to stable fixed points for $n$ -player general-sum games, and show that it does not require adapting the step size to the strength of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques