On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration

Ali Moltajaei Farid; Jafar Roshanian; Malek Mouhoub

arXiv:2409.11058·cs.MA·September 18, 2024

On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration

Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub

PDF

Open Access

TL;DR

This paper presents a multi-UAV exploration method using on-policy reinforcement learning with PPO, employing CNN and LSTM to improve coverage efficiency and obstacle avoidance in unknown environments.

Contribution

It introduces a novel distributed multi-UAV exploration approach using PPO with CNN and LSTM, outperforming other RL techniques in unknown environments.

Findings

01

PPO outperforms PG and A3C in exploration tasks.

02

Combining LSTM with CNN enhances exploration efficiency.

03

The method successfully explores new maps differing from training data.

Abstract

Unmanned aerial vehicles (UAVs) have become increasingly popular in various fields, including precision agriculture, search and rescue, and remote sensing. However, exploring unknown environments remains a significant challenge. This study aims to address this challenge by utilizing on-policy Reinforcement Learning (RL) with Proximal Policy Optimization (PPO) to explore the {two dimensional} area of interest with multiple UAVs. The UAVs will avoid collision with obstacles and each other and do the exploration in a distributed manner. The proposed solution includes actor-critic networks using deep convolutional neural networks {(CNN)} and long short-term memory (LSTM) for identifying the UAVs and areas that have already been covered. Compared to other RL techniques, such as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the simulation results demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Distributed Control Multi-Agent Systems · Optimization and Search Problems

MethodsEntropy Regularization · Tanh Activation · Proximal Policy Optimization · Sigmoid Activation · Long Short-Term Memory