Actor-Critic Policy Optimization in Partially Observable Multiagent   Environments

Sriram Srinivasan; Marc Lanctot; Vinicius Zambaldi; Julien Perolat,; Karl Tuyls; Remi Munos; Michael Bowling

arXiv:1810.09026·cs.LG·June 15, 2020·71 cites

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, Julien Perolat,, Karl Tuyls, Remi Munos, Michael Bowling

PDF

Open Access 1 Repo

TL;DR

This paper explores actor-critic policy optimization methods in partially observable multiagent environments, providing new convergence guarantees and demonstrating effective learning in complex adversarial games like Poker.

Contribution

It introduces novel policy update rules linked to regret minimization, offering convergence guarantees and applying them to model-free multiagent RL in zero-sum games.

Findings

01

Achieved empirical convergence to approximate Nash equilibria in Poker.

02

Demonstrated performance comparable or superior to baseline algorithms.

03

No domain-specific state space reductions needed.

Abstract

Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return. In this paper, we examine the role of these policy gradient and actor-critic algorithms in partially-observable multiagent environments. We show several candidate policy update rules and relate them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees. We apply our method to model-free multiagent reinforcement learning in adversarial sequential decision problems (zero-sum imperfect information games), using RL-style function approximation. We evaluate on commonly used benchmark Poker domains, showing performance against…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deepmind/open_spiel
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Sports Analytics and Performance