Learning to Cooperate via Policy Search

Leonid Peshkin; Kee-Eung Kim; Nicolas Meuleau; Leslie Pack; Kaelbling

arXiv:cs/0105032·cs.LG·May 25, 2017

Learning to Cooperate via Policy Search

Leonid Peshkin, Kee-Eung Kim, Nicolas Meuleau, Leslie Pack, Kaelbling

PDF

Open Access

TL;DR

This paper introduces a gradient-based distributed policy search method for cooperative games, addressing partial observability issues and comparing local optima to Nash equilibria, with experimental validation in a simulated soccer domain.

Contribution

It presents a novel policy search approach for cooperative games under partial observability and analyzes local optima versus Nash equilibria.

Findings

01

Effective in a simulated soccer domain

02

Addresses partial observability in cooperative games

03

Provides insights into local optima and Nash equilibria

Abstract

Cooperative games are those in which both agents share the same payoff structure. Value-based reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policy-search method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Experimental Behavioral Economics Studies · Auction Theory and Applications