Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning
Xian Wang, Jin Zhou, Yuanli Feng, Jiahao Mei, Jiming Chen, Shuo Li

TL;DR
This paper introduces a decentralized multi-agent reinforcement learning approach for time-optimal multi-drone flight, balancing efficiency and collision avoidance, validated through extensive simulations and real-world experiments.
Contribution
It develops a novel PPO-based multi-agent policy with a soft collision mechanism for efficient, stable, and lightweight multi-drone time-optimal motion planning.
Findings
Near-time-optimal performance in multi-drone flight
Low collision rates during dynamic maneuvers
Successful real-world deployment with onboard computation
Abstract
Recent innovations in autonomous drones have facilitated time-optimal flight in single-drone configurations, and enhanced maneuverability in multi-drone systems by applying optimal control and learning-based methods. However, few studies have achieved time-optimal motion planning for multi-drone systems, particularly during highly agile maneuvers or in dynamic scenarios. This paper presents a decentralized policy network using multi-agent reinforcement learning for time-optimal multi-drone flight. To strike a balance between flight efficiency and collision avoidance, we introduce a soft collision-free mechanism inspired by optimization-based methods. By customizing PPO in a centralized training, decentralized execution (CTDE) fashion, we unlock higher efficiency and stability in training while ensuring lightweight implementation. Extensive simulations show that, despite slight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Optimization and Search Problems · Distributed Control Multi-Agent Systems
MethodsEntropy Regularization · Proximal Policy Optimization · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
