Learning to Cooperate via Policy Search
Leonid Peshkin, Kee-Eung Kim, Nicolas Meuleau, Leslie Pack Kaelbling

TL;DR
This paper introduces a gradient-based distributed policy search method for cooperative games, addressing partial observability issues, and compares local optima to Nash equilibria through experiments in a simulated soccer domain.
Contribution
It presents a novel policy search approach for cooperative games under partial observability and analyzes local optima versus Nash equilibria.
Findings
The method effectively learns cooperative policies in a partially observable soccer domain.
Local optima can differ from Nash equilibria in cooperative game settings.
Experimental results demonstrate the approach's viability in complex environments.
Abstract
Cooperative games are those in which both agents share the same payoff structure. Value-based reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policy-search method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Experimental Behavioral Economics Studies · Auction Theory and Applications
