Status-quo policy gradient in Multi-Agent Reinforcement Learning

Pinkesh Badjatiya; Mausoom Sarkar; Nikaash Puri; Jayakumar; Subramanian; Abhishek Sinha; Siddharth Singh; Balaji Krishnamurthy

arXiv:2111.11692·cs.MA·November 24, 2021

Status-quo policy gradient in Multi-Agent Reinforcement Learning

Pinkesh Badjatiya, Mausoom Sarkar, Nikaash Puri, Jayakumar, Subramanian, Abhishek Sinha, Siddharth Singh, Balaji Krishnamurthy

PDF

Open Access

TL;DR

This paper introduces a status-quo bias-based loss function for reinforcement learning agents, enabling them to learn high-utility strategies in social dilemmas and complex multi-agent environments, outperforming existing methods.

Contribution

The paper proposes a novel status-quo loss (SQLoss) and policy gradient algorithm that incorporate human-like bias to improve multi-agent RL performance in social dilemmas.

Findings

01

SQLoss enables high-utility policies in social dilemma matrix games.

02

SQLoss outperforms state-of-the-art methods in visual input non-matrix games.

03

SQLoss promotes cooperative behavior in multi-agent settings like Braess' paradox.

Abstract

Individual rationality, which involves maximizing expected individual returns, does not always lead to high-utility individual or group outcomes in multi-agent problems. For instance, in multi-agent social dilemmas, Reinforcement Learning (RL) agents trained to maximize individual rewards converge to a low-utility mutually harmful equilibrium. In contrast, humans evolve useful strategies in such social dilemmas. Inspired by ideas from human psychology that attribute this behavior to the status-quo bias, we present a status-quo loss (SQLoss) and the corresponding policy gradient algorithm that incorporates this bias in an RL agent. We demonstrate that agents trained with SQLoss learn high-utility policies in several social dilemma matrix games (Prisoner's Dilemma, Stag Hunt matrix variant, Chicken Game). We show how SQLoss outperforms existing state-of-the-art methods to obtain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExperimental Behavioral Economics Studies · Evolutionary Game Theory and Cooperation · Reinforcement Learning in Robotics