Learning to Cooperate via Policy Search
Leonid Peshkin, Kee-Eung Kim, Nicolas Meuleau, Leslie Pack, Kaelbling

TL;DR
This paper introduces a gradient-based distributed policy search method for cooperative games, addressing partial observability issues and comparing local optima to Nash equilibria, with experimental validation in a simulated soccer domain.
Contribution
It presents a novel policy search approach for cooperative games under partial observability and analyzes local optima versus Nash equilibria.
Findings
Effective in a simulated soccer domain
Addresses partial observability in cooperative games
Provides insights into local optima and Nash equilibria
Abstract
Cooperative games are those in which both agents share the same payoff structure. Value-based reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policy-search method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Experimental Behavioral Economics Studies · Auction Theory and Applications
