Multi-agent Collaboration for Feasible Collaborative Behavior Construction and Evaluation
Yunkai Wang, Shenhan Jia, Zexi Chen, Zheyuan Huang, Rong Xiong

TL;DR
This paper introduces a reinforcement learning-based method for multi-agent collaboration that constructs feasible behavior sets and efficiently selects optimal actions, demonstrated through RoboCup robot passing tasks.
Contribution
It proposes a novel approach combining action space discretization, model-based prediction, and deep Q-learning for effective multi-agent collaboration in complex environments.
Findings
Efficient construction of feasible collaborative behavior sets.
Successful application to RoboCup robot passing.
Improved policy safety and calculation speed.
Abstract
In the case of the two-person zero-sum stochastic game with a central controller, this paper proposes a best collaborative behavior search and selection algorithm based on reinforcement learning, in response to how to choose the best collaborative object and action for the central controller. In view of the existing multi-agent collaboration and confrontation reinforcement learning methods, the methods of traversing all actions in a certain state leads to the problem of long calculation time and unsafe policy exploration. This paper proposes to construct a feasible collaborative behavior set by using action space discretization, establishing models of both sides, model-based prediction and parallel search. Then, we use the deep q-learning method in reinforcement learning to train the scoring function to select the optimal collaboration behavior from the feasible collaborative behavior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Evacuation and Crowd Dynamics
