Toward Finding Strong Pareto Optimal Policies in Multi-Agent Reinforcement Learning
Bang Giang Le, Viet Cuong Ta

TL;DR
This paper introduces MGDA++, an improved algorithm for multi-agent reinforcement learning that effectively finds strong Pareto optimal policies, addressing convergence issues in cooperative multi-agent systems.
Contribution
We propose MGDA++, an enhanced algorithm that guarantees convergence to strong Pareto optimal solutions in cooperative multi-agent reinforcement learning.
Findings
MGDA++ converges efficiently in convex, smooth bi-objective problems.
MGDA++ outperforms standard MGDA and other methods in Gridworld benchmark.
Standard MGDA suffers from weak Pareto convergence issues in multi-agent settings.
Abstract
In this work, we study the problem of finding Pareto optimal policies in multi-agent reinforcement learning problems with cooperative reward structures. We show that any algorithm where each agent only optimizes their reward is subject to suboptimal convergence. Therefore, to achieve Pareto optimality, agents have to act altruistically by considering the rewards of others. This observation bridges the multi-objective optimization framework and multi-agent reinforcement learning together. We first propose a framework for applying the Multiple Gradient Descent algorithm (MGDA) for learning in multi-agent settings. We further show that standard MGDA is subjected to weak Pareto convergence, a problem that is often overlooked in other learning settings but is prevalent in multi-agent reinforcement learning. To mitigate this issue, we propose MGDA++, an improvement of the existing algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
