Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking   Control of Autonomous Underwater Vehicles

Wenjie Shi; Shiji Song; Cheng Wu; C. L. Philip Chen

arXiv:1909.03204·cs.LG·September 10, 2019

Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

Wenjie Shi, Shiji Song, Cheng Wu, C. L. Philip Chen

PDF

Open Access

TL;DR

This paper introduces a novel multi pseudo Q-learning based deterministic policy gradient algorithm for improving trajectory tracking accuracy and learning stability in autonomous underwater vehicles with unknown dynamics.

Contribution

It proposes a hybrid actors-critics architecture with multi pseudo Q-learning to enhance tracking control and stability in AUVs, addressing limitations of existing policy gradient methods.

Findings

01

Achieves high tracking accuracy for AUVs.

02

Demonstrates stable learning with multiple actors and critics.

03

Validates effectiveness on different reference trajectories.

Abstract

This paper investigates trajectory tracking problem for a class of underactuated autonomous underwater vehicles (AUVs) with unknown dynamics and constrained inputs. Different from existing policy gradient methods which employ single actor-critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively. Specifically, for the critics, the expected absolute Bellman error based updating rule is used to choose the worst critic to be updated in each time step. Subsequently, to calculate the loss function with more accurate target value for the chosen critic, Pseudo Q-learning, which uses sub-greedy policy to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Mechanical Circulatory Support Devices

MethodsQ-Learning