Autonomous UAV Flight Navigation in Confined Spaces: A Reinforcement Learning Approach

Marco S. Tayar; Lucas K. de Oliveira; Felipe Andrade G. Tommaselli; Juliano D. Negri; Thiago H. Segreto; Ricardo V. Godoy; Marcelo Becker

arXiv:2508.16807·cs.RO·December 19, 2025

Autonomous UAV Flight Navigation in Confined Spaces: A Reinforcement Learning Approach

Marco S. Tayar, Lucas K. de Oliveira, Felipe Andrade G. Tommaselli, Juliano D. Negri, Thiago H. Segreto, Ricardo V. Godoy, Marcelo Becker

PDF

TL;DR

This paper compares on-policy and off-policy reinforcement learning algorithms for UAV navigation in confined spaces, finding that on-policy PPO offers more reliable, collision-free policies in safety-critical environments.

Contribution

It provides a direct comparison between PPO and SAC in high-fidelity simulated duct navigation, highlighting the importance of training stability over sample efficiency for safety-critical tasks.

Findings

01

PPO achieved stable, collision-free navigation in all trials.

02

SAC failed to find complete solutions, only navigating initial segments.

03

On-policy methods may be preferable for safety-critical UAV applications.

Abstract

Autonomous UAV inspection of confined industrial infrastructure, such as ventilation ducts, demands robust navigation policies where collisions are unacceptable. While Deep Reinforcement Learning (DRL) offers a powerful paradigm for developing such policies, it presents a critical trade-off between on-policy and off-policy algorithms. Off-policy methods promise high sample efficiency, a vital trait for minimizing costly and unsafe real-world fine-tuning. In contrast, on-policy methods often exhibit greater training stability, which is essential for reliable convergence in hazard-dense environments. This paper directly investigates this trade-off by comparing a leading on-policy algorithm, Proximal Policy Optimization (PPO), against an off-policy counterpart, Soft Actor-Critic (SAC), for precision flight in procedurally generated ducts within a high-fidelity simulator. Our results show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.