Model-Free versus Model-Based Reinforcement Learning for Fixed-Wing UAV   Attitude Control Under Varying Wind Conditions

David Olivares; Pierre Fournier; Pavan Vasishta; Julien Marzat

arXiv:2409.17896·cs.RO·September 27, 2024

Model-Free versus Model-Based Reinforcement Learning for Fixed-Wing UAV Attitude Control Under Varying Wind Conditions

David Olivares, Pierre Fournier, Pavan Vasishta, Julien Marzat

PDF

Open Access

TL;DR

This study compares model-free and model-based reinforcement learning methods for fixed-wing UAV attitude control, demonstrating the superiority of a Temporal Difference Model Predictive Control agent in robustness and accuracy under varying wind conditions.

Contribution

The paper introduces a novel comparison framework for RL methods in UAV control, highlighting the effectiveness of a TD model predictive approach in nonlinear and disturbed environments.

Findings

01

TD model predictive control outperforms PID and other RL methods in accuracy and robustness

02

Actuation fluctuation metrics reveal insights into energy efficiency and actuator wear

03

Control methods' performance varies under turbulence and gusts, affecting their MDP assumptions

Abstract

This paper evaluates and compares the performance of model-free and model-based reinforcement learning for the attitude control of fixed-wing unmanned aerial vehicles using PID as a reference point. The comparison focuses on their ability to handle varying flight dynamics and wind disturbances in a simulated environment. Our results show that the Temporal Difference Model Predictive Control agent outperforms both the PID controller and other model-free reinforcement learning methods in terms of tracking accuracy and robustness over different reference difficulties, particularly in nonlinear flight regimes. Furthermore, we introduce actuation fluctuation as a key metric to assess energy efficiency and actuator wear, and we test two different approaches from the literature: action variation penalty and conditioning for action policy smoothness. We also evaluate all control methods when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Adaptive Control of Nonlinear Systems · Advanced Control Systems Optimization