Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training

Mahya Ramezani; Holger Voos

arXiv:2604.26833·cs.RO·April 30, 2026

Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training

Mahya Ramezani, Holger Voos

PDF

TL;DR

This paper introduces a hierarchical decision-making framework combining rule-based high-level guidance with goal-conditioned reinforcement learning for UAV search-and-rescue missions, enhancing safety and efficiency in limited training scenarios.

Contribution

It proposes a novel hybrid framework that integrates interpretable rules with online RL, improving early safety and sample efficiency in UAV SAR tasks under limited training.

Findings

01

Reduces collision-related terminations in UAV missions.

02

Improves early safety and sample efficiency.

03

Maintains online adaptability to scenario dynamics.

Abstract

This paper presents a hierarchical decision-making framework for unmanned aerial vehicle (UAV) missions motivated by search-and-rescue (SAR) scenarios under limited simulation training. The framework combines a fixed rule-based high-level advisor with an online goal-conditioned low-level reinforcement learning (RL) controller. To stress-test early adaptation, we also consider a strict no-pretraining deployment regime. The high-level advisor is defined offline from a structured task specification and compiled into deterministic rules. It provides interpretable mission- and safety-aware guidance through recommended actions, avoided actions, and regime-dependent arbitration weights. The low-level controller learns online from task-defined dense rewards and reuses experience through a mode-aware prioritized replay mechanism augmented with rule-derived metadata. We evaluate the framework on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.