Reinforcement learning reward function in unmanned aerial vehicle control tasks
Mikhail S. Tovarnov, Nikita V. Bykov

TL;DR
This paper introduces a novel reward function based on Bezier curve trajectories for deep reinforcement learning in UAV control, successfully applied to navigation, interception avoidance, and interception tasks in a virtual environment.
Contribution
A new reward function based on simplified trajectory estimation improves deep reinforcement learning for UAV control in 2D and 3D environments.
Findings
All tested algorithms performed well with the new reward function.
The reward function effectively guided UAV navigation and interception tasks.
The approach is applicable to multiple deep RL algorithms.
Abstract
This paper presents a new reward function that can be used for deep reinforcement learning in unmanned aerial vehicle (UAV) control and navigation problems. The reward function is based on the construction and estimation of the time of simplified trajectories to the target, which are third-order Bezier curves. This reward function can be applied unchanged to solve problems in both two-dimensional and three-dimensional virtual environments. The effectiveness of the reward function was tested in a newly developed virtual environment, namely, a simplified two-dimensional environment describing the dynamics of UAV control and flight, taking into account the forces of thrust, inertia, gravity, and aerodynamic drag. In this formulation, three tasks of UAV control and navigation were successfully solved: UAV flight to a given point in space, avoidance of interception by another UAV, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
