On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration
Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub

TL;DR
This paper presents a multi-UAV exploration method using on-policy reinforcement learning with PPO, employing CNN and LSTM to improve coverage efficiency and obstacle avoidance in unknown environments.
Contribution
It introduces a novel distributed multi-UAV exploration approach using PPO with CNN and LSTM, outperforming other RL techniques in unknown environments.
Findings
PPO outperforms PG and A3C in exploration tasks.
Combining LSTM with CNN enhances exploration efficiency.
The method successfully explores new maps differing from training data.
Abstract
Unmanned aerial vehicles (UAVs) have become increasingly popular in various fields, including precision agriculture, search and rescue, and remote sensing. However, exploring unknown environments remains a significant challenge. This study aims to address this challenge by utilizing on-policy Reinforcement Learning (RL) with Proximal Policy Optimization (PPO) to explore the {two dimensional} area of interest with multiple UAVs. The UAVs will avoid collision with obstacles and each other and do the exploration in a distributed manner. The proposed solution includes actor-critic networks using deep convolutional neural networks {(CNN)} and long short-term memory (LSTM) for identifying the UAVs and areas that have already been covered. Compared to other RL techniques, such as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the simulation results demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Distributed Control Multi-Agent Systems · Optimization and Search Problems
MethodsEntropy Regularization · Tanh Activation · Proximal Policy Optimization · Sigmoid Activation · Long Short-Term Memory
