Asynchronous Curriculum Experience Replay: A Deep Reinforcement Learning Approach for UAV Autonomous Motion Control in Unknown Dynamic Environments
Zijian Hu, Xiaoguang Gao, Kaifang Wan, Qianglong Wang, Yiwei Zhai

TL;DR
This paper introduces an advanced deep reinforcement learning method called ACER for UAV autonomous motion control in complex dynamic 3D environments, improving learning efficiency and robustness.
Contribution
It proposes an asynchronous curriculum experience replay (ACER) that enhances experience sampling and prioritization, integrating curriculum learning for better UAV control in unknown environments.
Findings
ACER speeds up convergence by 24.66%.
Achieves 5.59% better convergence results than TD3.
Demonstrates strong robustness and generalization in various environments.
Abstract
Unmanned aerial vehicles (UAVs) have been widely used in military warfare. In this paper, we formulate the autonomous motion control (AMC) problem as a Markov decision process (MDP) and propose an advanced deep reinforcement learning (DRL) method that allows UAVs to execute complex tasks in large-scale dynamic three-dimensional (3D) environments. To overcome the limitations of the prioritized experience replay (PER) algorithm and improve performance, the proposed asynchronous curriculum experience replay (ACER) uses multithreads to asynchronously update the priorities, assigns the true priorities and applies a temporary experience pool to make available experiences of higher quality for learning. A first-in-useless-out (FIUO) experience pool is also introduced to ensure the higher use value of the stored experiences. In addition, combined with curriculum learning (CL), a more reasonable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Reinforcement Learning in Robotics · Human Pose and Action Recognition
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Dense Connections · Retrace · Trust Region Policy Optimization · Stochastic Dueling Network · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Softmax · Entropy Regularization · ACER
