UAV Path Planning Employing MPC- Reinforcement Learning Method   Considering Collision Avoidance

Mahya Ramezani; Hamed Habibi; Jose luis Sanchez Lopez; Holger Voos

arXiv:2302.10669·cs.LG·March 8, 2023

UAV Path Planning Employing MPC- Reinforcement Learning Method Considering Collision Avoidance

Mahya Ramezani, Hamed Habibi, Jose luis Sanchez Lopez, Holger Voos

PDF

Open Access

TL;DR

This paper introduces a novel UAV path planning method combining MPC and reinforcement learning, utilizing LSTM networks for better collision avoidance and efficiency in complex environments.

Contribution

It proposes an LSTM-based MPC integrated with DDPG, enhancing robustness, convergence speed, and collision avoidance in UAV path planning.

Findings

01

Improved convergence speed over traditional methods

02

Reduced failure rate in path planning

03

Enhanced robustness in complex environments

Abstract

In this paper, we tackle the problem of Unmanned Aerial (UA V) path planning in complex and uncertain environments by designing a Model Predictive Control (MPC), based on a Long-Short-Term Memory (LSTM) network integrated into the Deep Deterministic Policy Gradient algorithm. In the proposed solution, LSTM-MPC operates as a deterministic policy within the DDPG network, and it leverages a predicting pool to store predicted future states and actions for improved robustness and efficiency. The use of the predicting pool also enables the initialization of the critic network, leading to improved convergence speed and reduced failure rate compared to traditional reinforcement learning and deep reinforcement learning methods. The effectiveness of the proposed solution is evaluated by numerical simulations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed Control Multi-Agent Systems · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control

MethodsWeight Decay · Convolution · Experience Replay · Batch Normalization · Dense Connections · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Deep Deterministic Policy Gradient · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings