Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles
Wenjie Shi, Shiji Song, Cheng Wu, C. L. Philip Chen

TL;DR
This paper introduces a novel multi pseudo Q-learning based deterministic policy gradient algorithm for improving trajectory tracking accuracy and learning stability in autonomous underwater vehicles with unknown dynamics.
Contribution
It proposes a hybrid actors-critics architecture with multi pseudo Q-learning to enhance tracking control and stability in AUVs, addressing limitations of existing policy gradient methods.
Findings
Achieves high tracking accuracy for AUVs.
Demonstrates stable learning with multiple actors and critics.
Validates effectiveness on different reference trajectories.
Abstract
This paper investigates trajectory tracking problem for a class of underactuated autonomous underwater vehicles (AUVs) with unknown dynamics and constrained inputs. Different from existing policy gradient methods which employ single actor-critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively. Specifically, for the critics, the expected absolute Bellman error based updating rule is used to choose the worst critic to be updated in each time step. Subsequently, to calculate the loss function with more accurate target value for the chosen critic, Pseudo Q-learning, which uses sub-greedy policy to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Mechanical Circulatory Support Devices
MethodsQ-Learning
