UAV Path Planning Employing MPC- Reinforcement Learning Method Considering Collision Avoidance
Mahya Ramezani, Hamed Habibi, Jose luis Sanchez Lopez, Holger Voos

TL;DR
This paper introduces a novel UAV path planning method combining MPC and reinforcement learning, utilizing LSTM networks for better collision avoidance and efficiency in complex environments.
Contribution
It proposes an LSTM-based MPC integrated with DDPG, enhancing robustness, convergence speed, and collision avoidance in UAV path planning.
Findings
Improved convergence speed over traditional methods
Reduced failure rate in path planning
Enhanced robustness in complex environments
Abstract
In this paper, we tackle the problem of Unmanned Aerial (UA V) path planning in complex and uncertain environments by designing a Model Predictive Control (MPC), based on a Long-Short-Term Memory (LSTM) network integrated into the Deep Deterministic Policy Gradient algorithm. In the proposed solution, LSTM-MPC operates as a deterministic policy within the DDPG network, and it leverages a predicting pool to store predicted future states and actions for improved robustness and efficiency. The use of the predicting pool also enables the initialization of the critic network, leading to improved convergence speed and reduced failure rate compared to traditional reinforcement learning and deep reinforcement learning methods. The effectiveness of the proposed solution is evaluated by numerical simulations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Control Multi-Agent Systems · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control
MethodsWeight Decay · Convolution · Experience Replay · Batch Normalization · Dense Connections · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Deep Deterministic Policy Gradient · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
