Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training
Mahya Ramezani, Holger Voos

TL;DR
This paper introduces a hierarchical decision-making framework combining rule-based high-level guidance with goal-conditioned reinforcement learning for UAV search-and-rescue missions, enhancing safety and efficiency in limited training scenarios.
Contribution
It proposes a novel hybrid framework that integrates interpretable rules with online RL, improving early safety and sample efficiency in UAV SAR tasks under limited training.
Findings
Reduces collision-related terminations in UAV missions.
Improves early safety and sample efficiency.
Maintains online adaptability to scenario dynamics.
Abstract
This paper presents a hierarchical decision-making framework for unmanned aerial vehicle (UAV) missions motivated by search-and-rescue (SAR) scenarios under limited simulation training. The framework combines a fixed rule-based high-level advisor with an online goal-conditioned low-level reinforcement learning (RL) controller. To stress-test early adaptation, we also consider a strict no-pretraining deployment regime. The high-level advisor is defined offline from a structured task specification and compiled into deterministic rules. It provides interpretable mission- and safety-aware guidance through recommended actions, avoided actions, and regime-dependent arbitration weights. The low-level controller learns online from task-defined dense rewards and reuses experience through a mode-aware prioritized replay mechanism augmented with rule-derived metadata. We evaluate the framework on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
