Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning

Ruize Zhang; Sirui Xiang; Zelai Xu; Feng Gao; Shilong Ji; Wenhao Tang; Wenbo Ding; Chao Yu; Yu Wang

arXiv:2505.04317·cs.AI·February 27, 2026

Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning

Ruize Zhang, Sirui Xiang, Zelai Xu, Feng Gao, Shilong Ji, Wenhao Tang, Wenbo Ding, Chao Yu, Yu Wang

PDF

Open Access

TL;DR

This paper introduces Hierarchical Co-Self-Play (HCSP), a hierarchical reinforcement learning framework enabling multi-drone teams to learn strategic coordination and agile control for 3v3 volleyball, achieving high performance without expert demonstrations.

Contribution

The paper presents a novel hierarchical RL approach with a three-stage training pipeline for multi-drone volleyball, combining strategy and skill learning from scratch.

Findings

01

HCSP outperforms non-hierarchical and rule-based baselines with 82.9% win rate.

02

Emergent team behaviors include role switching and coordinated formations.

03

Hierarchical design effectively manages long-horizon, multi-agent drone tasks.

Abstract

In this paper, we tackle the problem of learning to play 3v3 multi-drone volleyball, a new embodied competitive task that requires both high-level strategic coordination and low-level agile control. The task is turn-based, multi-agent, and physically grounded, posing significant challenges due to its long-horizon dependencies, tight inter-agent coupling, and the underactuated dynamics of quadrotors. To address this, we propose Hierarchical Co-Self-Play (HCSP), a hierarchical reinforcement learning framework that separates centralized high-level strategic decision-making from decentralized low-level motion control. We design a three-stage population-based training pipeline to enable both strategy and skill to emerge from scratch without expert demonstrations: (I) training diverse low-level skills, (II) learning high-level strategy via self-play with fixed low-level skills, and (III)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · UAV Applications and Optimization