Learning Acrobatic Flight from Preferences
Colin Merk, Ismail Geles, Jiaxu Xing, Angel Romero, Giorgia Ramponi, Davide Scaramuzza

TL;DR
This paper introduces REC, a probabilistic framework for preference-based reinforcement learning that effectively learns complex acrobatic flight policies from human preferences, achieving high performance and successful real-world transfer.
Contribution
The paper presents REC, a novel ensemble-based method that models reward uncertainty and improves policy learning from preferences in complex, dynamic tasks.
Findings
REC achieves 88.4% of shaped reward performance in acrobatic quadrotor control.
Policies trained with REC transfer zero-shot to real-world scenarios.
Validation on a continuous control benchmark confirms REC's broader applicability.
Abstract
Preference-based reinforcement learning (PbRL) enables agents to learn control policies without requiring manually designed reward functions, making it well-suited for tasks where objectives are difficult to formalize or inherently subjective. Acrobatic flight poses a particularly challenging problem due to its complex dynamics, rapid movements, and the importance of precise execution. However, manually designed reward functions for such tasks often fail to capture the qualities that matter: we find that hand-crafted rewards agree with human judgment only 60.7% of the time, underscoring the need for preference-driven approaches. In this work, we propose Reward Ensemble under Confidence (REC), a probabilistic reward learning framework for PbRL that explicitly models per-timestep reward uncertainty through an ensemble of distributional reward models. By propagating uncertainty into the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
