Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning
Ruize Zhang, Sirui Xiang, Zelai Xu, Feng Gao, Shilong Ji, Wenhao Tang, Wenbo Ding, Chao Yu, Yu Wang

TL;DR
This paper introduces Hierarchical Co-Self-Play (HCSP), a hierarchical reinforcement learning framework enabling multi-drone teams to learn strategic coordination and agile control for 3v3 volleyball, achieving high performance without expert demonstrations.
Contribution
The paper presents a novel hierarchical RL approach with a three-stage training pipeline for multi-drone volleyball, combining strategy and skill learning from scratch.
Findings
HCSP outperforms non-hierarchical and rule-based baselines with 82.9% win rate.
Emergent team behaviors include role switching and coordinated formations.
Hierarchical design effectively manages long-horizon, multi-agent drone tasks.
Abstract
In this paper, we tackle the problem of learning to play 3v3 multi-drone volleyball, a new embodied competitive task that requires both high-level strategic coordination and low-level agile control. The task is turn-based, multi-agent, and physically grounded, posing significant challenges due to its long-horizon dependencies, tight inter-agent coupling, and the underactuated dynamics of quadrotors. To address this, we propose Hierarchical Co-Self-Play (HCSP), a hierarchical reinforcement learning framework that separates centralized high-level strategic decision-making from decentralized low-level motion control. We design a three-stage population-based training pipeline to enable both strategy and skill to emerge from scratch without expert demonstrations: (I) training diverse low-level skills, (II) learning high-level strategy via self-play with fixed low-level skills, and (III)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · UAV Applications and Optimization
