Deep Reinforcement Learning for Continuous Docking Control of Autonomous   Underwater Vehicles: A Benchmarking Study

Mihir Patil; Bilal Wehbe; Matias Valdenegro-Toro

arXiv:2108.02665·cs.RO·August 6, 2021

Deep Reinforcement Learning for Continuous Docking Control of Autonomous Underwater Vehicles: A Benchmarking Study

Mihir Patil, Bilal Wehbe, Matias Valdenegro-Toro

PDF

Open Access

TL;DR

This study applies advanced deep reinforcement learning algorithms to the continuous control problem of autonomous underwater vehicle docking, demonstrating the effectiveness of a new reward function and the superiority of the TD3 algorithm in simulation.

Contribution

It introduces a novel reward function and benchmarks three DRL algorithms for AUV docking, highlighting TD3's superior performance in a physics-based simulation environment.

Findings

01

TD3 achieved 100% success rate in docking tasks.

02

The new reward function outperforms previous formulations.

03

Simulation results favor TD3 over PPO and SAC.

Abstract

Docking control of an autonomous underwater vehicle (AUV) is a task that is integral to achieving persistent long term autonomy. This work explores the application of state-of-the-art model-free deep reinforcement learning (DRL) approaches to the task of AUV docking in the continuous domain. We provide a detailed formulation of the reward function, utilized to successfully dock the AUV onto a fixed docking platform. A major contribution that distinguishes our work from the previous approaches is the usage of a physics simulator to define and simulate the underwater environment as well as the DeepLeng AUV. We propose a new reward function formulation for the docking task, incorporating several components, that outperforms previous reward formulations. We evaluate proximal policy optimization (PPO), twin delayed deep deterministic policy gradients (TD3) and soft actor-critic (SAC) in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Clipped Double Q-learning · Experience Replay · Target Policy Smoothing · Dense Connections · Adam · Twin Delayed Deep Deterministic