CACTO: Continuous Actor-Critic with Trajectory Optimization -- Towards   global optimality

Gianluigi Grandesso; Elisa Alboni; Gastone P. Rosati Papini; Patrick; M. Wensing; Andrea Del Prete

arXiv:2211.06625·cs.RO·May 9, 2023

CACTO: Continuous Actor-Critic with Trajectory Optimization -- Towards global optimality

Gianluigi Grandesso, Elisa Alboni, Gastone P. Rosati Papini, Patrick, M. Wensing, Andrea Del Prete

PDF

Open Access

TL;DR

This paper introduces CACTO, a novel algorithm that integrates Trajectory Optimization and Reinforcement Learning to improve continuous control of nonlinear systems, effectively escaping local minima and reducing computational costs.

Contribution

The paper proposes a combined TO-RL framework that enhances trajectory optimization by guiding it with RL policies, addressing local minima and efficiency issues in continuous control tasks.

Findings

01

CACTO outperforms DDPG and PPO in escaping local minima.

02

The method is validated on diverse dynamical systems including a 6D car model.

03

CACTO demonstrates improved computational efficiency and control performance.

Abstract

This paper presents a novel algorithm for the continuous control of dynamical systems that combines Trajectory Optimization (TO) and Reinforcement Learning (RL) in a single framework. The motivations behind this algorithm are the two main limitations of TO and RL when applied to continuous nonlinear systems to minimize a non-convex cost function. Specifically, TO can get stuck in poor local minima when the search is not initialized close to a "good" minimum. On the other hand, when dealing with continuous state and control spaces, the RL training process may be excessively long and strongly dependent on the exploration strategy. Thus, our algorithm learns a "good" control policy via TO-guided RL policy search that, when used as initial guess provider for TO, makes the trajectory optimization process less prone to converge to poor local optima. Our method is validated on several reaching…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms · Autonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics

MethodsBatch Normalization · Weight Decay · Adam · Experience Replay · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Deep Deterministic Policy Gradient