Sub-optimal Policy Aided Multi-Agent Reinforcement Learning for Flocking   Control

Yunbo Qiu; Yue Jin; Jian Wang; Xudong Zhang

arXiv:2209.08347·cs.LG·September 20, 2022

Sub-optimal Policy Aided Multi-Agent Reinforcement Learning for Flocking Control

Yunbo Qiu, Yue Jin, Jian Wang, Xudong Zhang

PDF

Open Access

TL;DR

This paper introduces SPA-MARL, a sample-efficient multi-agent reinforcement learning algorithm that leverages sub-optimal policies to improve flocking control, reducing training time and outperforming baselines.

Contribution

It proposes SPA-MARL, which utilizes sub-optimal policies to enhance learning efficiency in multi-agent flocking control tasks.

Findings

01

SPA-MARL accelerates training compared to traditional MARL.

02

SPA-MARL outperforms the sub-optimal policy and baseline methods.

03

Using a classical control policy as prior improves learning efficiency.

Abstract

Flocking control is a challenging problem, where multiple agents, such as drones or vehicles, need to reach a target position while maintaining the flock and avoiding collisions with obstacles and collisions among agents in the environment. Multi-agent reinforcement learning has achieved promising performance in flocking control. However, methods based on traditional reinforcement learning require a considerable number of interactions between agents and the environment. This paper proposes a sub-optimal policy aided multi-agent reinforcement learning algorithm (SPA-MARL) to boost sample efficiency. SPA-MARL directly leverages a prior policy that can be manually designed or solved with a non-learning method to aid agents in learning, where the performance of the policy can be sub-optimal. SPA-MARL recognizes the difference in performance between the sub-optimal policy and itself, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed Control Multi-Agent Systems · Reinforcement Learning in Robotics · UAV Applications and Optimization

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings