Deep Reinforcement Learning for Continuous Docking Control of Autonomous Underwater Vehicles: A Benchmarking Study
Mihir Patil, Bilal Wehbe, Matias Valdenegro-Toro

TL;DR
This study applies advanced deep reinforcement learning algorithms to the continuous control problem of autonomous underwater vehicle docking, demonstrating the effectiveness of a new reward function and the superiority of the TD3 algorithm in simulation.
Contribution
It introduces a novel reward function and benchmarks three DRL algorithms for AUV docking, highlighting TD3's superior performance in a physics-based simulation environment.
Findings
TD3 achieved 100% success rate in docking tasks.
The new reward function outperforms previous formulations.
Simulation results favor TD3 over PPO and SAC.
Abstract
Docking control of an autonomous underwater vehicle (AUV) is a task that is integral to achieving persistent long term autonomy. This work explores the application of state-of-the-art model-free deep reinforcement learning (DRL) approaches to the task of AUV docking in the continuous domain. We provide a detailed formulation of the reward function, utilized to successfully dock the AUV onto a fixed docking platform. A major contribution that distinguishes our work from the previous approaches is the usage of a physics simulator to define and simulate the underwater environment as well as the DeepLeng AUV. We propose a new reward function formulation for the docking task, incorporating several components, that outperforms previous reward formulations. We evaluate proximal policy optimization (PPO), twin delayed deep deterministic policy gradients (TD3) and soft actor-critic (SAC) in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Clipped Double Q-learning · Experience Replay · Target Policy Smoothing · Dense Connections · Adam · Twin Delayed Deep Deterministic
