MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning
Kefan Su, Siyuan Zhou, Jiechuan Jiang, Chuang Gan, Xiangjun Wang,, Zongqing Lu

TL;DR
This paper introduces MA2QL, a simple yet theoretically grounded decentralized multi-agent reinforcement learning method where agents alternate Q-learning updates, effectively addressing non-stationarity and outperforming independent Q-learning in cooperative tasks.
Contribution
The paper proposes MA2QL, a minimalist, fully decentralized MARL algorithm with theoretical convergence guarantees, requiring minimal modifications to existing independent Q-learning methods.
Findings
MA2QL outperforms independent Q-learning in various cooperative tasks.
Agents' alternating updates lead to convergence to Nash equilibrium.
Minimal changes to existing Q-learning suffice for effective decentralized learning.
Abstract
Decentralized learning has shown great promise for cooperative multi-agent reinforcement learning (MARL). However, non-stationarity remains a significant challenge in fully decentralized learning. In the paper, we tackle the non-stationarity problem in the simplest and fundamental way and propose multi-agent alternate Q-learning (MA2QL), where agents take turns updating their Q-functions by Q-learning. MA2QL is a minimalist approach to fully decentralized cooperative MARL but is theoretically grounded. We prove that when each agent guarantees -convergence at each turn, their joint policy converges to a Nash equilibrium. In practice, MA2QL only requires minimal changes to independent Q-learning (IQL). We empirically evaluate MA2QL on a variety of cooperative multi-agent tasks. Results show MA2QL consistently outperforms IQL, which verifies the effectiveness of MA2QL, despite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Game Theory and Applications · Distributed Control Multi-Agent Systems
MethodsQ-Learning · Implicit Q-Learning
