Mixed Q-Functionals: Advancing Value-Based Methods in Cooperative MARL with Continuous Action Domains
Yasin Findik, S. Reza Ahmadzadeh

TL;DR
This paper introduces Mixed Q-Functionals (MQF), a novel value-based algorithm for cooperative multi-agent reinforcement learning in continuous action spaces, improving sample efficiency and performance over existing methods.
Contribution
The paper proposes MQF, a new multi-agent value-based method that evaluates multiple actions simultaneously and enhances collaboration in continuous domains.
Findings
MQF outperforms four variants of Deep Deterministic Policy Gradient.
MQF achieves faster action evaluation and higher sample efficiency.
MQF demonstrates superior performance in six cooperative multi-agent scenarios.
Abstract
Tackling multi-agent learning problems efficiently is a challenging task in continuous action domains. While value-based algorithms excel in sample efficiency when applied to discrete action domains, they are usually inefficient when dealing with continuous actions. Policy-based algorithms, on the other hand, attempt to address this challenge by leveraging critic networks for guiding the learning process and stabilizing the gradient estimation. The limitations in the estimation of true return and falling into local optima in these methods result in inefficient and often sub-optimal policies. In this paper, we diverge from the trend of further enhancing critic networks, and focus on improving the effectiveness of value-based methods in multi-agent continuous domains by concurrently evaluating numerous actions. We propose a novel multi-agent value-based algorithm, Mixed Q-Functionals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCollaboration in agile enterprises
MethodsFocus
