MPC-based Reinforcement Learning for a Simplified Freight Mission of   Autonomous Surface Vehicles

Wenqi Cai; Arash B. Kordabad; Hossein N. Esfahani; Anastasios M.; Lekkas; Sebastien Gros

arXiv:2106.08634·eess.SY·August 6, 2021·1 cites

MPC-based Reinforcement Learning for a Simplified Freight Mission of Autonomous Surface Vehicles

Wenqi Cai, Arash B. Kordabad, Hossein N. Esfahani, Anastasios M., Lekkas, Sebastien Gros

PDF

Open Access

TL;DR

This paper introduces an MPC-based reinforcement learning approach for autonomous surface vehicles to optimize freight missions involving path following and docking, demonstrating improved performance in simulations.

Contribution

It presents a novel MPC-LSTD-based DPG method for ASV freight missions, integrating control and learning for better policy optimization.

Findings

01

Enhanced closed-loop performance during learning

02

Effective collision-free path following and docking

03

Successful simulation validation of the approach

Abstract

In this work, we propose a Model Predictive Control (MPC)-based Reinforcement Learning (RL) method for Autonomous Surface Vehicles (ASVs). The objective is to find an optimal policy that minimizes the closed-loop performance of a simplified freight mission, including collision-free path following, autonomous docking, and a skillful transition between them. We use a parametrized MPC-scheme to approximate the optimal policy, which considers path-following/docking costs and states (position, velocity)/inputs (thruster force, angle) constraints. The Least Squares Temporal Difference (LSTD)-based Deterministic Policy Gradient (DPG) method is then applied to update the policy parameters. Our simulation results demonstrate that the proposed MPC-LSTD-based DPG method could improve the closed-loop performance during learning for the freight mission problem of ASV.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Electric Vehicles and Infrastructure · Advanced Control Systems Optimization

MethodsDeterministic Policy Gradient