SA-IGA: A Multiagent Reinforcement Learning Method Towards Socially Optimal Outcomes
Chengwei Zhang, Xiaohong Li, Jianye Hao, Siqi Chen, Karl, Tuyls, Wanli Xue

TL;DR
This paper introduces SA-IGA, a gradient ascent multiagent reinforcement learning algorithm that promotes socially optimal outcomes and demonstrates superior performance and robustness in multiagent environments.
Contribution
It proposes a novel social awareness augmentation to gradient ascent algorithms and provides theoretical analysis and a practical Q-learning based implementation.
Findings
SA-IGA exhibits linear dynamics in many game types.
SA-PGA achieves higher social welfare than previous methods.
SA-PGA is robust against rational opponents and reaches Nash equilibrium.
Abstract
In multiagent environments, the capability of learning is important for an agent to behave appropriately in face of unknown opponents and dynamic environment. From the system designer's perspective, it is desirable if the agents can learn to coordinate towards socially optimal outcomes, while also avoiding being exploited by selfish opponents. To this end, we propose a novel gradient ascent based algorithm (SA-IGA) which augments the basic gradient-ascent algorithm by incorporating social awareness into the policy update process. We theoretically analyze the learning dynamics of SA-IGA using dynamical system theory and SA-IGA is shown to have linear dynamics for a wide range of games including symmetric games. The learning dynamics of two representative games (the prisoner's dilemma game and the coordination game) are analyzed in details. Based on the idea of SA-IGA, we further propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research
