Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

Stephen McAleer; JB Lanier; Kevin Wang; Pierre Baldi; Roy Fox; Tuomas; Sandholm

arXiv:2207.06541·cs.GT·July 15, 2022·1 cites

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

Stephen McAleer, JB Lanier, Kevin Wang, Pierre Baldi, Roy Fox, Tuomas, Sandholm

PDF

Open Access

TL;DR

Self-Play PSRO introduces stochastic policies into the population, enabling faster convergence to Nash equilibrium in two-player zero-sum games compared to previous methods like APSRO.

Contribution

The paper proposes Self-Play PSRO, a novel algorithm that incorporates stochastic policies to accelerate convergence in equilibrium-finding methods.

Findings

01

SP-PSRO converges faster than APSRO in empirical tests.

02

In many games, SP-PSRO converges within a few iterations.

03

SP-PSRO effectively adds stochastic policies to improve equilibrium approximation.

Abstract

In competitive two-agent environments, deep reinforcement learning (RL) methods based on the \emph{Double Oracle (DO)} algorithm, such as \emph{Policy Space Response Oracles (PSRO)} and \emph{Anytime PSRO (APSRO)}, iteratively add RL best response policies to a population. Eventually, an optimal mixture of these population policies will approximate a Nash equilibrium. However, these methods might need to add all deterministic policies before converging. In this work, we introduce \emph{Self-Play PSRO (SP-PSRO)}, a method that adds an approximately optimal stochastic policy to the population in each iteration. Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well. As a result, SP-PSRO empirically tends to converge much faster than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics