Sub-optimal Policy Aided Multi-Agent Reinforcement Learning for Flocking Control
Yunbo Qiu, Yue Jin, Jian Wang, Xudong Zhang

TL;DR
This paper introduces SPA-MARL, a sample-efficient multi-agent reinforcement learning algorithm that leverages sub-optimal policies to improve flocking control, reducing training time and outperforming baselines.
Contribution
It proposes SPA-MARL, which utilizes sub-optimal policies to enhance learning efficiency in multi-agent flocking control tasks.
Findings
SPA-MARL accelerates training compared to traditional MARL.
SPA-MARL outperforms the sub-optimal policy and baseline methods.
Using a classical control policy as prior improves learning efficiency.
Abstract
Flocking control is a challenging problem, where multiple agents, such as drones or vehicles, need to reach a target position while maintaining the flock and avoiding collisions with obstacles and collisions among agents in the environment. Multi-agent reinforcement learning has achieved promising performance in flocking control. However, methods based on traditional reinforcement learning require a considerable number of interactions between agents and the environment. This paper proposes a sub-optimal policy aided multi-agent reinforcement learning algorithm (SPA-MARL) to boost sample efficiency. SPA-MARL directly leverages a prior policy that can be manually designed or solved with a non-learning method to aid agents in learning, where the performance of the policy can be sub-optimal. SPA-MARL recognizes the difference in performance between the sub-optimal policy and itself, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Control Multi-Agent Systems · Reinforcement Learning in Robotics · UAV Applications and Optimization
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
