SA-IGA: A Multiagent Reinforcement Learning Method Towards Socially   Optimal Outcomes

Chengwei Zhang; Xiaohong Li; Jianye Hao; Siqi Chen; Karl; Tuyls; Wanli Xue

arXiv:1803.03021·cs.AI·March 9, 2018

SA-IGA: A Multiagent Reinforcement Learning Method Towards Socially Optimal Outcomes

Chengwei Zhang, Xiaohong Li, Jianye Hao, Siqi Chen, Karl, Tuyls, Wanli Xue

PDF

Open Access

TL;DR

This paper introduces SA-IGA, a gradient ascent multiagent reinforcement learning algorithm that promotes socially optimal outcomes and demonstrates superior performance and robustness in multiagent environments.

Contribution

It proposes a novel social awareness augmentation to gradient ascent algorithms and provides theoretical analysis and a practical Q-learning based implementation.

Findings

01

SA-IGA exhibits linear dynamics in many game types.

02

SA-PGA achieves higher social welfare than previous methods.

03

SA-PGA is robust against rational opponents and reaches Nash equilibrium.

Abstract

In multiagent environments, the capability of learning is important for an agent to behave appropriately in face of unknown opponents and dynamic environment. From the system designer's perspective, it is desirable if the agents can learn to coordinate towards socially optimal outcomes, while also avoiding being exploited by selfish opponents. To this end, we propose a novel gradient ascent based algorithm (SA-IGA) which augments the basic gradient-ascent algorithm by incorporating social awareness into the policy update process. We theoretically analyze the learning dynamics of SA-IGA using dynamical system theory and SA-IGA is shown to have linear dynamics for a wide range of games including symmetric games. The learning dynamics of two representative games (the prisoner's dilemma game and the coordination game) are analyzed in details. Based on the idea of SA-IGA, we further propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research