Learned Controllers for Agile Quadrotors in Pursuit-Evasion Games
Alejandro Sanchez Roncero, Yixi Cai, Olov Andersson, Petter Ogren

TL;DR
This paper introduces an asynchronous multi-stage population-based reinforcement learning algorithm to train agile quadrotor controllers for pursuit-evasion, addressing non-stationarity and catastrophic forgetting, and demonstrating superior performance in simulation.
Contribution
The paper proposes the AMSPB algorithm for stable multi-stage training of quadrotor controllers, improving pursuit-evasion performance and generalization in high-fidelity simulations.
Findings
AMSPB-trained policies outperform baseline methods.
Body-rate-and-thrust controllers enable more agile flight.
Policies generalize well across different arena sizes.
Abstract
We address the problem of agile 1v1 quadrotor pursuit-evasion, where a pursuer and an evader learn to outmaneuver each other through reinforcement learning (RL). Such settings face two major challenges: non-stationarity, since each agent's evolving policy alters the environment dynamics and destabilizes training, and catastrophic forgetting, where a policy overfits to the current adversary and loses effectiveness against previously encountered strategies. To tackle these issues, we propose an Asynchronous Multi-Stage Population-Based (AMSPB) algorithm. At each stage, the pursuer and evader are trained asynchronously against a frozen pool of opponents sampled from a growing population of past and current policies, stabilizing training and ensuring exposure to diverse behaviors. Within this framework, we train neural network controllers that output either velocity commands or body rates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robotic Path Planning Algorithms · Guidance and Control Systems
