Final-State Constrained Optimal Control via a Projection Operator   Approach

Ivano Notarnicola; Florian A. Bayer; Giuseppe Notarstefano and; Frank Allgower

arXiv:1703.08356·cs.SY·March 27, 2017

Final-State Constrained Optimal Control via a Projection Operator Approach

Ivano Notarnicola, Florian A. Bayer, Giuseppe Notarstefano and, Frank Allgower

PDF

Open Access

TL;DR

This paper introduces a numerical method for solving nonlinear optimal control problems with exact final-state constraints, ensuring recursive feasibility and suitability for real-time applications.

Contribution

It extends the PRONTO method to exactly handle final-state constraints using a projection operator, guaranteeing feasibility and enabling real-time implementation.

Findings

01

Successfully applied to inverted pendulum state transfer

02

Guarantees recursive feasibility of final-state constraints

03

Enables real-time optimal control with exact final-state satisfaction

Abstract

In this paper we develop a numerical method to solve nonlinear optimal control problems with final-state constraints. Specifically, we extend the PRojection Operator based Netwon's method for Trajectory Optimization (PRONTO), which was proposed by Hauser for unconstrained optimal control problems. While in the standard method final-state constraints can be only approximately handled by means of a terminal penalty, in this work we propose a methodology to meet the constraints exactly. Moreover, our method guarantees recursive feasibility of the final-state constraint. This is an appealing property especially in realtime applications in which one would like to be able to stop the computation even if the desired tolerance has not been reached, but still satisfy the constraints. Following the same conceptual idea of PRONTO, the proposed strategy is based on two main steps which (differently…

Equations68

(x (\cdot), u (\cdot)) minimize subject to \int_{0}^{T} ℓ (x (τ), u (τ)) d τ \overset{x}{˙} (t) = f (x (t), u (t)) x (0) = x_{0}, x (T) = x_{T},

(x (\cdot), u (\cdot)) minimize subject to \int_{0}^{T} ℓ (x (τ), u (τ)) d τ \overset{x}{˙} (t) = f (x (t), u (t)) x (0) = x_{0}, x (T) = x_{T},

H (x (t), p (t), u (t)) := ℓ (x (t), u (t)) + p (t)^{T} f (x (t), u (t)),

H (x (t), p (t), u (t)) := ℓ (x (t), u (t)) + p (t)^{T} f (x (t), u (t)),

q (ξ) \cdot (ζ, ζ) := \int_{0}^{T} [z (τ) v (τ)]^{T} [H_{xx} (τ) H_{ux} (τ) H_{xu} (τ) H_{uu} (τ)] [z (τ) v (τ)] d τ,

q (ξ) \cdot (ζ, ζ) := \int_{0}^{T} [z (τ) v (τ)]^{T} [H_{xx} (τ) H_{ux} (τ) H_{xu} (τ) H_{uu} (τ)] [z (τ) v (τ)] d τ,

\overset{z}{˙} = f_{x} (\overset{x}{ˉ} (t), \overset{u}{ˉ} (t)) z + f_{u} (\overset{x}{ˉ} (t), \overset{u}{ˉ} (t)) v

\overset{z}{˙} = f_{x} (\overset{x}{ˉ} (t), \overset{u}{ˉ} (t)) z + f_{u} (\overset{x}{ˉ} (t), \overset{u}{ˉ} (t)) v

(x (\cdot), u (\cdot)) minimize subject to \int_{0}^{T} a (τ)^{T} x (τ) + b (τ)^{T} u (τ) + \frac{1}{2} [x (τ) u (τ)]^{T} [Q (τ) S (τ)^{T} S (τ) R (τ)] [x (τ) u (τ)] d τ \overset{x}{˙} = A (t) x + B (t) u, x (0) = x_{0}, x (T) = x_{T},

(x (\cdot), u (\cdot)) minimize subject to \int_{0}^{T} a (τ)^{T} x (τ) + b (τ)^{T} u (τ) + \frac{1}{2} [x (τ) u (τ)]^{T} [Q (τ) S (τ)^{T} S (τ) R (τ)] [x (τ) u (τ)] d τ \overset{x}{˙} = A (t) x + B (t) u, x (0) = x_{0}, x (T) = x_{T},

u

u

\displaystyle\begin{bmatrix}\dot{x}\\ \dot{p}\end{bmatrix}\!\!=\!\!\begin{bmatrix}\tilde{A}&-BR^{-1}B^{T}\\ -\tilde{Q}&-\tilde{A}^{T}\end{bmatrix}\!\!\begin{bmatrix}x\\ p\end{bmatrix}\!+\!\begin{bmatrix}-BR^{-1}b\\ SR^{-1}b-a\end{bmatrix}\!,\!\!\!\begin{array}[]{l}x(0)\!=\!x_{0}\\ p(T)\!=\!p_{1}\end{array}\!\!,

\displaystyle\begin{bmatrix}\dot{x}\\ \dot{p}\end{bmatrix}\!\!=\!\!\begin{bmatrix}\tilde{A}&-BR^{-1}B^{T}\\ -\tilde{Q}&-\tilde{A}^{T}\end{bmatrix}\!\!\begin{bmatrix}x\\ p\end{bmatrix}\!+\!\begin{bmatrix}-BR^{-1}b\\ SR^{-1}b-a\end{bmatrix}\!,\!\!\!\begin{array}[]{l}x(0)\!=\!x_{0}\\ p(T)\!=\!p_{1}\end{array}\!\!,

p = P x + r .

p = P x + r .

- \dot{P}

- \dot{P}

- \overset{r}{˙}

\overset{x}{˙}

\overset{x}{˙}

x (T) = x_{u} (T) + x_{f, b} (T) + x_{f, r} (T),

x (T) = x_{u} (T) + x_{f, b} (T) + x_{f, r} (T),

W_{c} (t) := \int_{0}^{t} Φ_{c} (t, τ) B (τ) R (τ)^{- 1} B (τ)^{T} Φ_{c} (t, τ)^{T} d τ,

W_{c} (t) := \int_{0}^{t} Φ_{c} (t, τ) B (τ) R (τ)^{- 1} B (τ)^{T} Φ_{c} (t, τ)^{T} d τ,

\overset{n}{˙}

\overset{n}{˙}

p_{1} = W_{c} (T)^{- 1} (x_{T} - x_{u} (T) - n (T)) .

p_{1} = W_{c} (T)^{- 1} (x_{T} - x_{u} (T) - n (T)) .

(x (\cdot), u (\cdot)) minimize subject to \int_{0}^{T} ℓ (x (τ), u (τ)) d τ + m (x (T)) \overset{x}{˙} (t) = f (x (t), u (t)), x (0) = x_{0},

(x (\cdot), u (\cdot)) minimize subject to \int_{0}^{T} ℓ (x (τ), u (τ)) d τ + m (x (T)) \overset{x}{˙} (t) = f (x (t), u (t)), x (0) = x_{0},

\displaystyle\Bigg{\{}\begin{split}&\dot{x}(t)=f(x(t),u(t)),\hskip 39.83368ptx(0)=\alpha(0)\\ &u(t)=\mu(t)+K(t)[\alpha(t)-x(t)].\end{split}

\displaystyle\Bigg{\{}\begin{split}&\dot{x}(t)=f(x(t),u(t)),\hskip 39.83368ptx(0)=\alpha(0)\\ &u(t)=\mu(t)+K(t)[\alpha(t)-x(t)].\end{split}

\displaystyle\Bigg{\{}\begin{split}&\dot{z}(t)=f_{x}(x(t),u(t))z(t)+f_{u}(x(t),u(t))v(t),\>z(0)=0\\ &v(t)=\nu(t)+K(t)[\beta(t)-z(t)].\end{split}

\displaystyle\Bigg{\{}\begin{split}&\dot{z}(t)=f_{x}(x(t),u(t))z(t)+f_{u}(x(t),u(t))v(t),\>z(0)=0\\ &v(t)=\nu(t)+K(t)[\beta(t)-z(t)].\end{split}

h (ξ) := \int_{0}^{T} ℓ (x (τ), u (τ)) d τ + m (x (T)),

h (ξ) := \int_{0}^{T} ℓ (x (τ), u (τ)) d τ + m (x (T)),

ζ_{i} = argmin_{ζ \in T_{ξ_{i}} T} D g (ξ_{i}) \cdot ζ + \frac{1}{2} D^{2} g (ξ_{i}) \cdot (ζ, ζ)

ζ_{i} = argmin_{ζ \in T_{ξ_{i}} T} D g (ξ_{i}) \cdot ζ + \frac{1}{2} D^{2} g (ξ_{i}) \cdot (ζ, ζ)

γ_{i} = argmin_{γ \in (0, 1]} g (ξ_{i} + γ ζ_{i})

γ_{i} = argmin_{γ \in (0, 1]} g (ξ_{i} + γ ζ_{i})

ξ_{i + 1} = P (ξ_{i} + γ_{i} ζ_{i})

ξ_{i + 1} = P (ξ_{i} + γ_{i} ζ_{i})

ζ = (z (\cdot), v (\cdot)) minimize subject to \int_{0}^{T} a (τ)^{T} z (τ) + b (τ)^{T} v (τ) + \frac{1}{2} [z (τ) v (τ)]^{T} [Q (τ) S (τ)^{T} S (τ) R (τ)] [z (τ) v (τ)] d τ + z (T)^{T} P_{1} z (T) + r_{1}^{T} z (T) \overset{z}{˙} = A (t) z + B (t) v, z (0) = 0.

ζ = (z (\cdot), v (\cdot)) minimize subject to \int_{0}^{T} a (τ)^{T} z (τ) + b (τ)^{T} v (τ) + \frac{1}{2} [z (τ) v (τ)]^{T} [Q (τ) S (τ)^{T} S (τ) R (τ)] [z (τ) v (τ)] d τ + z (T)^{T} P_{1} z (T) + r_{1}^{T} z (T) \overset{z}{˙} = A (t) z + B (t) v, z (0) = 0.

G (ξ_{k}) + D G (ξ_{k}) \cdot ζ_{k} = 0.

G (ξ_{k}) + D G (ξ_{k}) \cdot ζ_{k} = 0.

x_{k} (T) - x_{T} + z_{k} (T) = 0.

x_{k} (T) - x_{T} + z_{k} (T) = 0.

\displaystyle\begin{split}\zeta_{k}:=(z_{k}(\cdot),v_{k}(\cdot))=\mathop{\rm argmin}_{(z(\cdot),v(\cdot))}&\,\frac{1}{2}\int_{0}^{T}\!\big{\|}z(\tau)\big{\|}^{2}+\big{\|}v(\tau)\big{\|}^{2}\,\mathrm{d}\tau\\ \text{subj. to}\>&\>\dot{z}=A(t)z+B(t)v\\ &\>z(0)\!=\!0,\>\!z(T)\!=\!-x(T)\!+\!x_{T},\end{split}

\displaystyle\begin{split}\zeta_{k}:=(z_{k}(\cdot),v_{k}(\cdot))=\mathop{\rm argmin}_{(z(\cdot),v(\cdot))}&\,\frac{1}{2}\int_{0}^{T}\!\big{\|}z(\tau)\big{\|}^{2}+\big{\|}v(\tau)\big{\|}^{2}\,\mathrm{d}\tau\\ \text{subj. to}\>&\>\dot{z}=A(t)z+B(t)v\\ &\>z(0)\!=\!0,\>\!z(T)\!=\!-x(T)\!+\!x_{T},\end{split}

ζ_{k} = argmin_{ζ}

ζ_{k} = argmin_{ζ}

ζ \in T_{ξ_{k}} T

(π_{1} ζ) (T) = - F (ξ_{k})

ξ_{k + 1} = P (ξ_{k} + ζ_{k})

ξ_{k + 1} = P (ξ_{k} + ζ_{k})

ζ_{i} = argmin_{ζ \in T_{ξ_{i}} T} subj. to D g (ξ_{i}) \cdot ζ + \frac{1}{2} D^{2} g (ξ_{i}) \cdot (ζ, ζ) (π_{1} ζ_{i}) (T) = 0

ζ_{i} = argmin_{ζ \in T_{ξ_{i}} T} subj. to D g (ξ_{i}) \cdot ζ + \frac{1}{2} D^{2} g (ξ_{i}) \cdot (ζ, ζ) (π_{1} ζ_{i}) (T) = 0

γ_{i} = argmin_{γ \in (0, 1]} g (ξ_{i} + γ ζ_{i})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVehicle Dynamics and Control Systems · Advanced Optimization Algorithms Research · Advanced Control Systems Optimization

Full text

Final-State Constrained Optimal Control via a Projection Operator

Approach

Ivano Notarnicola1, Florian A. Bayer2, Giuseppe Notarstefano1, and Frank Allgöwer2 1Ivano Notarnicola and Giuseppe Notarstefano are with the Department of Engineering, Università del Salento, Lecce, Italy, [email protected]2Florian Bayer and Frank Allgöwer are with the Institute for Systems Theory and Automatic Control, University of Stuttgart, 70550 Stuttgart, Germany, {bayer, allgower}@ist.uni-stuttgart.de This result is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 638992 - OPT4SMART).

F. Bayer and F. Allgöwer would like to thank the German Research Foundation (DFG) for financial support within the Cluster of Excellence in Simulation Technology (EXC 310/2) at the University of Stuttgart.

Abstract

In this paper we develop a numerical method to solve nonlinear optimal control problems with final-state constraints. Specifically, we extend the PRojection Operator based Netwon’s method for Trajectory Optimization (PRONTO), which was proposed by Hauser for unconstrained optimal control problems. While in the standard method final-state constraints can be only approximately handled by means of a terminal penalty, in this work we propose a methodology to meet the constraints exactly. Moreover, our method guarantees recursive feasibility of the final-state constraint. This is an appealing property especially in realtime applications in which one would like to be able to stop the computation even if the desired tolerance has not been reached, but still satisfy the constraints. Following the same conceptual idea of PRONTO, the proposed strategy is based on two main steps which (differently from the standard scheme) preserve the feasibility of the final-state constraints: (i) solve a quadratic approximation of the nonlinear problem to find a descent direction, and (ii) get a (feasible) trajectory by means of a feedback law (which turns out to be a nonlinear projection operator). To find the (feasible) descent direction we take advantage of final-state constrained Linear Quadratic optimal control methods, while the second step is performed by suitably designing a constrained version of the trajectory tracking projection operator. The effectiveness of the proposed strategy is tested on the optimal state transfer of an inverted pendulum.

I Introduction

Optimal control problems (OCPs) are an active field of research in the controls community since they may arise in many application areas as, e.g., Process Control, Robotics, Aerospace and Automotive. Throughout the last decades, many different approaches have been presented to solve these problems. A possible classification of these methods has been given in [1]: (i) Dynamic programming, (ii) Indirect Methods, and (iii) Direct methods. While methods in the first class solve the OCP by finding optimal input segments using the Principle of Optimality (see, e.g., [2], or [3]), the ones in the second area are based on solving the necessary conditions for optimality using a (two-point) boundary value problem, which can be solved by means of calculus of variations ([4], [5]) or Pontryagin’s Maximum Principle ([6], [7]). The third direction is the most investigated and simplifies the OCP by parameterizing the control. According to the way the dynamics is handled, these methods are classified into fully discretized (or collocation) methods (see, e.g., [8]) and direct shooting methods, where the dynamics are included by some integration scheme (see, e.g., [9]). A detailed overview over Direct methods can, for example, be found in [10].

Of special interest for our paper is the PRojection Operator based Newton method for Trajectory Optimization (PRONTO) which was introduced in [11], see also [12]. In contrast to many other approaches solving optimal control problems, this method is able to guarantee feasibility of the dynamics after each iteration of the underlying Newton method using a “projection operator” defined by a feedback, closed-loop system. According to the classification in [1] this can be seen as a combination of shooting and collocation.

This method was designed to handle unconstrained optimal control problems (and extended to input-constrained problems in [13]), considering final-state constraints only approximately by means of a final penalty. Matching exactly final-state constraints is of interest in many control applications. This is the case, for example, in the field of hybrid systems, that is, systems that consist of continuous and discrete event dynamics (see, e.g., [14] and the references therein). Discontinuous jumps of continuous states may occur when the system state traverses a certain region of the state space. This demands for an exact satisfaction of constraints on the final state. Another field where this is of interest is the field of Model Predictive Control (MPC) (see, e.g., [15] and the references therein). In MPC, the system is controlled by means of repeatedly solving a finite-horizon OCP. In many approaches within MPC, convergence and stability can be guaranteed if a certain terminal condition is satisfied. This leads to the need of an algorithm being able to handle final state constraints.

A first approach to solve the nonlinear transfer problem was introduced in [16]. In there, the terminal constraint was satisfied asymptotically by iteratively choosing a terminal reference until the actual final state matches the target one.

The contribution of this paper is twofold. First, we introduce a new projection operator, inspired by the one presented in [12], such that not only the dynamics, but also the terminal constraint is satisfied after each iteration of the optimization algorithm. We reformulate the constrained projection as a root-finding of an infinite dimensional functional, which can be accomplished by means of a Newton root-finding in Banach spaces. Then, based on this new projection operator, as main contribution we propose an optimal control method solving final-state constrained problems which shows recursive feasibility. The proposed algorithm consists of two steps. First, a feasible descent direction is determined using a quadratic approximation of the nonlinear problem. The descent direction is chosen such that the mismatch on the final state is zero. Second, the perturbed curve is projected on the feasible manifold such that the dynamics and the terminal constraint are satisfied.

An interesting feature of the proposed algorithm is that it is amenable to realtime, fast MPC schemes. Indeed, in many applications one may not be able to run the algorithm until convergence is achieved with a desired tolerance. Due to a reduced computation time it could be that a (much) shorter number of iterations can be run. Since feasibility of both the dynamics and the final state-constraint are guaranteed at each iteration one can stop the computation and still get a feasible trajectory.

The paper is organized as follows. In Section II we introduce the problem setup and recall how to solve final-state constrained linear quadratic optimal control problems. PRONTO is introduced in Section III. Our new final-state constrained PRONTO is presented in Section IV and a numerical simulation for the optimal state-transfer of an inverted pendulum is given in Section V.

Notation

Given a smooth vector field $f(x,u)$ , we denote by $f_{x}(\bar{x},\bar{u})$ its derivative with respect to $x$ evaluated at $(\bar{x},\bar{u})$ , and, consistently, by $f_{u}$ its derivative with respect to $u$ . For the curve $\xi=(x(\cdot),u(\cdot))$ , we introduce the projections $\pi_{1}=[I~{}0]$ and $\pi_{2}=[0~{}I]$ such that $x(\cdot)=\pi_{1}\xi$ and $u(\cdot)=\pi_{2}\xi$ . Given a functional $\mathcal{G}:X\to$ , with $X$ a Banach space, and a point $\xi\in X$ , we denote by $D\mathcal{G}(\xi)$ the first Fréchet derivative of $\mathcal{G}$ evaluated at $\xi$ , and, consistently, by $D^{2}\mathcal{G}(\xi)$ its second Fréchet derivative, [17].

II Problem Setup and Preliminaries

In this paper we consider a final-state constrained optimal control problem. That is, we aim at finding a trajectory of a dynamical system that minimizes a given objective functional while satisfying an initial and a terminal constraint. Formally, we consider the problem

[TABLE]

where $\ell:^{n}\times^{m}\to$ is the running cost, $f:^{n}\times^{m}\to^{n}$ is the nonlinear vector field describing the control system, and $x_{0}\in^{n}$ and $x_{T}\in^{n}$ are the initial and final fixed states respectively. We assume $\ell$ and $f$ to be $\mathcal{C}^{2}$ functions. Notice that in the rest of the paper, for the sake of brevity, we will omit the dimensions of the quantities when it will be clear from the equations.

Before stating the main assumptions for problem (1), we recall some notation that will be also useful in the rest of the paper. Consider the Hamiltonian of (1) given by

[TABLE]

where $p(\cdot)$ is the costate. Then, for $\xi=(\bar{x}(\cdot),\bar{u}(\cdot))$ define

[TABLE]

where $\zeta=(z(\cdot),v(\cdot))$ is a (state-input) curve representing a variation from $\xi$ , while $H_{xx}(t)$ , $H_{xu}(t)$ and $H_{uu}(t)$ denote the appropriate second derivative of the $H$ evaluated along the extremal state-control-costate trajectory, e.g., $H_{xx}(t)=H_{xx}(\bar{x}(t),\bar{p}(t),\bar{u}(t))$ .

Given a dynamical system $\dot{x}=f(x,u)$ , $x(0)=x_{0}$ , we say that a state-input curve $\xi=(\bar{x}(t),\bar{u}(t))$ is a trajectory of the system if it satisfies the dynamics, i.e., $\dot{\bar{x}}(t)=f(\bar{x}(t),\bar{u}(t))$ for all $t\in[0,T]$ and $\bar{x}(0)=x_{0}$ . We denote the (infinite-dimensional) manifold of all system trajectories by $\mathcal{T}$ , so that we write $\xi\in\mathcal{T}$ .

Given a trajectory $\xi=(\bar{x}(t),\bar{u}(t))$ , we denote by $T_{\xi}\mathcal{T}$ the manifold of curves $\zeta=(z(\cdot),v(\cdot))$ satisfying the linearized dynamics

[TABLE]

with $z(0)=0$ and for $v(\cdot)\in L_{2}$ . We say that $T_{\xi}\mathcal{T}$ is the tangent space of the trajectory manifold at $\xi$ .

Assumption II.1 (Linear controllability).

The system $\dot{x}=f(x,u)$ is linearly controllable around any trajectory. That is, for any $(\bar{x}(\cdot),\bar{u}(\cdot))$ defined on $[0,T]$ , the linearized system (4) is controllable over $[0,T]$ .

Assumption II.2 (Second Order Sufficiency).

Given a trajectory $\xi\in\mathcal{T}$ , the Hamiltonian $H$ satisfies $H_{uu}(t)\geq r_{0}I$ for $t\in[0,T]$ and some $r_{0}>0$ , and the quadratic functional $q$ is positive-definite111See, e.g., [17] for the definition of positive definite functional. on $T_{\xi}\mathcal{T}$ . $\square$

Theorem II.3 ([16, Theorem $2.1$ ]).

Let $\xi=(x(\cdot),u(\cdot))$ be a stationary trajectory of (1) with corresponding costate trajectory $p(\cdot)$ . Suppose that Assumption II.2 hold at $\xi$ . If the system is linearly controllable around $\xi$ , then $\xi$ is an isolated local minimum of (1). $\square$

Remark II.4.

Assumption II.1 not only is a sufficient condition for the theorem above, but also guarantees that the algorithm we propose will be solvable at each iteration. $\square$

II-A Linear Quadratic (LQ) optimal state transfer problem

We start by considering a special version of problem (1) in which the cost is quadratic and the dynamics is linear and time-varying, i.e., we consider the problem

[TABLE]

where we assume that $a(\cdot)$ and $b(\cdot)$ are piecewise continuous vectors, and $A(\cdot)$ , $B(\cdot)$ , $Q(\cdot)=Q(\cdot)^{T}$ , $R(\cdot)=R(\cdot)^{T}$ , and $S(\cdot)$ are piecewise continuous matrices with $R(t)\geq r_{0}I$ , $t\in[0,T]$ , for some $r_{0}>0$ .

Remark II.5.

Problem (5) can be obtained as the linear-quadratic approximation of problem (1). In particular, $A(\cdot)$ and $B(\cdot)$ result from the linearization of the nonlinear dynamics $f$ at a given trajectory, while $Q$ , $R$ , $S$ , $a$ and $b$ define the quadratic approximation of the nonlinear cost functional $\ell$ at the same trajectory. $\square$

Theorem II.6 ([16, Proposition $1.1$ ]).

If $(A(\cdot),B(\cdot))$ in (5) describes a controllable linear time-varying system over $[0,T]$ and $q$ is positive definite on the space of the system trajectories, then problem (5) has a unique solution. $\square$

Next, we recall how to solve problem (5). We start by imposing the first-order necessary conditions of optimality.

Setting to zero the first variation of the Hamiltonian with respect to $u$ , we obtain the optimal feedback law

[TABLE]

By setting the first variations of the Hamiltonian with respect to $x$ and $p$ to zero and by using (6), we obtain the following linear two-point boundary value problem

[TABLE]

where $p(t)$ is the costate, $p_{1}$ is a boundary value to be determined, $\tilde{A}:=A-BR^{-1}S^{T}$ and $\tilde{Q}:=Q-SR^{-1}S^{T}$ .

It can be shown that $p$ and $x$ in (9) are related via an affine relation, i.e.,

[TABLE]

By defining the gain matrix $K:=R^{-1}(S^{T}+B^{T}P)$ , the optimal input (6) results into the affine feedback law $u=-Kx-R^{-1}(B^{T}r+b).$ Then, equation (9) can be decoupled by means of the sweep method, [3], which leads to the following differential (Riccati) equations

[TABLE]

where the boundary conditions follow from (10).

The above equations should be integrated to determine the optimal control (6) and thus solve problem (5). However, the terminal vector $p_{1}$ is still unknown. Thus, we need to express explicitly the relation between $p_{1}$ and the terminal condition $x_{T}$ . Plugging (10) into the first equation of (9), we obtain

[TABLE]

Next, we observe that

[TABLE]

where $x_{u}(T)$ is the unforced response of system (13) at time $t=T$ , whereas $x_{f,b}(T)$ and $x_{f,r}(T)$ are the forced responses due to the inputs $BR^{-1}b$ and $BR^{-1}B^{T}r$ , respectively.

Focusing on $x_{f,r}(T)$ , we note that it can be further split into two contributions related, respectively, to the forced and unforced responses of $r$ . The latter contribution depends directly on $p_{1}$ and it can be shown that equation (14) can be rewritten as $x(T)=x_{u}(T)+n(T)-W_{c}(T)p_{1}$ , where $W_{c}(T)$ is the controllability Gramian matrix,

[TABLE]

evaluated at time $T$ , with $\Phi_{c}$ being the state transition function associated to closed-loop system with state matrix $A-BK$ , while $n(T)$ denotes the terminal state of

[TABLE]

where $r_{f}$ denotes the forced response of $r$ , i.e., it solves (12) with zero terminal condition.

To conclude, $p_{1}$ can be computed as

[TABLE]

III Projection Operator Newton Method for Trajectory Optimization (PRONTO)

PRONTO was introduced in [12] to solve the following finite-horizon optimal control problem

[TABLE]

which, differently from problem (1), has a terminal penalty $m:^{n}\to$ rather than a terminal constraint.

The key idea of PRONTO is to (i) convert the dynamically constrained (infinite-dimensional) optimization problem into an unconstrained one by means of a projection operator, and (ii) solve the unconstrained problem via an infinite-dimensional Newton method.

We start recalling the projection operator, which is based on a trajectory tracking feedback law.

III-A The trajectory tracking nonlinear projection operator

Suppose that $\xi:=(\alpha(\cdot),\mu(\cdot))$ (defined on $t\geq 0$ ) is a bounded state-input curve and let $\eta:=(x(\cdot),u(\cdot))$ be the trajectory determined by the nonlinear feedback system

[TABLE]

Under suitable conditions on $f$ and $K$ , the feedback system in (16) defines a continuous nonlinear projection operator $\mathcal{P}:\xi=(\alpha(\cdot),\mu(\cdot))\mapsto\eta=(x(\cdot),u(\cdot))$ .

The operator $\mathcal{P}$ is a projection since $\mathcal{P}=\mathcal{P}\circ\mathcal{P}$ on its domain. Indeed, independent of $K$ , if $\xi$ is a trajectory of $f$ , then $\xi$ is a fixed point of $\mathcal{P}$ , i.e., $\xi=\mathcal{P}(\xi)$ . As a consequence, a trajectory can be characterized in terms of the projection operator as $\xi\in\mathcal{T}$ if and only if $\xi=\mathcal{P}(\xi)$ . In [12], the authors have proven that the projection operator $\mathcal{P}$ is as smooth as $f$ and one can compute (and analyze) its derivatives. In particular, if $f$ is $\mathcal{C}^{1}$ , then the first derivative of the projection operator is the linear mapping $\zeta=(\beta(\cdot),\nu(\cdot))\mapsto D\mathcal{P}(\xi)\cdot\zeta=(z(\cdot),v(\cdot))$ defined by

[TABLE]

which is obtained by linearizing (16) about $\xi\in\mathcal{T}$ . It can be shown that $D\mathcal{P}(\xi)$ is itself a projection, so that $\zeta\in T_{\xi}\mathcal{T}$ if and only if $\zeta=D\mathcal{P}(\xi)\cdot\zeta$ .

III-B The PRONTO algorithm

Writing the cost in (15) as the functional

[TABLE]

we see that the optimal control problem (15) is equivalent to the constrained optimization problem $\min_{\xi\in\mathcal{T}}h(\xi)$ . Using the trajectory characterization and defining $g(\xi):=h(\mathcal{P}(\xi))$ the constrained problem can be converted into an unconstrained one as $\min_{\xi\in\mathcal{T}}h(\xi)=\min_{\xi}g(\xi).$

The PRONTO algorithm, stated in Algorithm 1, is based on a Newton method applied to $\min_{\xi}g(\xi)$ and includes two key steps. First, the search direction $\zeta_{i}$ is determined by an optimization problem considering the first and second derivatives of the nonlinear functional $g$ . Since the derivatives of $g$ are computed, the projection $\mathcal{P}$ is inherently considered within the calculation of the search direction. Moreover, the search direction is limited to the tangent space of the trajectory manifold at the current trajectory $\xi_{i}$ , that is, $\zeta_{i}\in T_{\xi}\mathcal{T}$ . Second, the update is performed using the projection $\mathcal{P}$ in (18), thus a feasible trajectory is determined after each iteration of the optimization algorithm.

Remark III.1.

Notice that step (17) consists of solving a (standard) LQR problem in the form

[TABLE]

Step (18) consists of computing the updated trajectory $\xi_{i+1}=(x_{i+1}(\cdot),u_{i+1}(\cdot))$ by running the closed loop system (16) with (given) curve $(\alpha(\cdot),\mu(\cdot))=\xi_{i}+\gamma_{i}\zeta_{i}=(x_{i}(\cdot)+\gamma_{i}z_{i}(\cdot),u_{i}(\cdot)+\gamma_{i}v_{i}(\cdot))$ . $\square$

IV Final-state constrained PRONTO

In this section, we introduce an optimization algorithm which solves the nonlinear optimal state transfer problem. The key approach is to: (i) introduce a projection operator, inspired by the one introduced in [12] (and recalled in Section III), such that not only the dynamics, but also the terminal constraint is satisfied, and (ii) compute a descent direction that satisfies the final-state constraint to first-order.

IV-A Final-state constrained projection operator

The Projection Operator as recalled in Section III-A is not able to guarantee an exact matching of the terminal constraint. As a key step of our algorithm, we introduce a final-state constrained projection operator, $\xi=(\alpha(\cdot),\mu(\cdot))\mapsto\mathcal{P}_{c}(\xi)=\eta=(x(\cdot),u(\cdot)),$ satisfying $x(T)=\alpha(T)$ where, as usual, $\xi$ is a curve while $\eta\in\mathcal{T}$ a trajectory. Our idea is to design the operator $\mathcal{P}_{c}$ as an iterative routine in which, at each iteration: (i) we perturb the actual trajectory in order to hit exactly the terminal constraint and (ii) we project the resulting curve by means of the standard projection operator (16).

The final-state constrained projection can be formalized in terms of an infinite dimensional root-finding. Given $x_{T}\in^{n}$ , let us define a functional $\mathcal{F}$ which associates to a state-input curve $\xi=(\alpha(\cdot),\mu(\cdot))$ the difference between its terminal state $\alpha(T)$ and $x_{T}$ . Hence, a trajectory $\eta$ being a root of $\mathcal{F}$ , i.e., such that $\mathcal{F}(\eta)=0$ , is exactly what we expect to be the result of the final-state constrained projection operator $\mathcal{P}_{c}$ when applied to a curve $\xi$ .

Following the same high level idea in Section III-B to derive the PRONTO algorithm, we convert the constrained root-finding of $\mathcal{F}$ into the unconstrained root-finding of $\mathcal{G}(\cdot):=\mathcal{F}(\mathcal{P}(\cdot))$ , with $\mathcal{P}$ being the (unconstrained) projection operator introduced in (16).

Given an initial curve $\xi$ , the root of the functional $\mathcal{G}$ is found by means of an infinite-dimensional Netwon method. Formally, at each iteration the perturbation $\zeta_{k}$ is obtained by setting to zero the first order approximation of the perturbed functional, i.e. by solving for $\zeta_{k}$ the following equation

[TABLE]

Using the chain rule, the linear mapping $D\mathcal{G}(\xi_{k})$ applied to a state-input curve $\zeta_{k}$ can be expressed as $D\mathcal{G}(\xi_{k})\cdot\zeta_{k}=D\mathcal{F}(\xi_{k})\cdot D\mathcal{P}(\xi_{k})\cdot\zeta_{k}$ . When $\xi_{k}$ is a trajectory, the linear mapping $D\mathcal{P}(\xi_{k})$ is a projection on the tangent space $T_{\xi_{k}}\mathcal{T}$ (see [12]). Moreover, the first order expansion of the perturbed functional $\mathcal{F}(\xi_{k}+\zeta)$ turns out to be $D\mathcal{F}(\xi_{k})\cdot\zeta=(\pi_{1}\zeta)(T)$ . Thus, we can conclude that equation (19) simply enforces a terminal condition on $\zeta_{k}$ , i.e., find the state component $z_{k}(\cdot)$ of $D\mathcal{P}(\xi_{k})\cdot\zeta_{k}\in T_{\xi_{k}}\mathcal{T}$ such that

[TABLE]

Note that, since the linear mapping $D\mathcal{G}(\xi_{k})$ is not invertible, the solution of (19) is not unique.

A finite dimensional counter-part of equation (19) is a linear system of the form $Mz+n=0$ . When $\ker M$ is non-empty, the equation has not a unique solution. A typical approach to overcome this problem is to consider the equivalent least-square problem, which selects the minimum norm solution of the linear system.

Motivated by this finite-dimensional observation, a reasonable choice is to select a $\zeta_{k}\in T_{\xi_{k}}\mathcal{T}$ satisfying condition (20) with minimum $L_{2}$ norm. It can be obtained solving the following linear quadratic optimal state transfer problem

[TABLE]

where $A(\cdot)$ and $B(\cdot)$ result by the linearization of dynamics $f$ around the current iterate $\xi_{k}$ .

A pseudo code of the constrained projection operator $\mathcal{P}_{c}$ is given in the following table (Algorithm 2).

Remark IV.1.

The convergence of Algorithm 2 can be guaranteed by satisfying the hypotheses of Newton-Kantorovich theorem (see, e.g., [18, 19]). $\square$

IV-B fsPRONTO Algorithm

We are ready to present the final-state constrained PRojection Operator Newton method for Trajectory Optimization (fsPRONTO) algorithm which is an iterative algorithm able to solve problem (1). The algorithm extends the PRONTO outlined in Section III-B combining a particular descent direction and the final-state constrained projection operator presented in Section IV-A.

First, we search for a descent direction $\zeta_{i}\in T_{\xi_{i}}\mathcal{T}$ satisfying the final constraint to first-order by means of a linear-quadratic state transfer problem as in (5). Since each $\xi_{i}$ is already feasible, in order to maintain feasibility to first order, the perturbation $\zeta_{i}$ must satisfy the terminal constraint $z_{i}(T):=(\pi_{1}\zeta_{i})(T)=0$ . Second, we perform a backtracking line-search to modulate the descent direction. Finally, we perform the projection step by means of the constrained projection operator described by Algorithm 2.

The fsPRONTO algorithm is formally stated in the following table (Algorithm 3).

In the following, we have a closer look at some of the specific aspects of our newly presented Algorithm 3.

Remark IV.2.

Notice that step (21) consists of solving a linear quadratic optimal state transfer problem in the form

[TABLE]

as discussed in detail in Section II-A. Step (23) consists of computing the updated trajectory $\xi_{i+1}=(x_{i+1}(\cdot),u_{i+1}(\cdot))$ via Algorithm 2 with a (given) curve $\bar{\xi}=\xi_{i}+\gamma_{i}\zeta_{i}=(x_{i}(\cdot)+\gamma_{i}z_{i}(\cdot),u_{i}(\cdot)+\gamma_{i}v_{i}(\cdot))$ . $\square$

V Numerical Computations

In this section we provide numerical computations showing the effectiveness of the proposed nonlinear algorithm. We solve the optimal state transfer problem for a driven inverted pendulum. We consider the problem

[TABLE]

with $L=0.5$ m being the length of the pendulum and $g$ the gravity acceleration. We set the time horizon to $T=20$ s. Moreover, $(x_{d}(\cdot),u_{d}(\cdot))$ is a (continuous) desired curve, $Q\in^{2\times 2}$ is a symmetric, positive-definite matrix and $R$ is a positive scalar.

Before testing the fsPRONTO algorithm, we highlight the applicability of the final-state constrained projection operator presented in Algorithm 2.

We consider a given curve $\xi$ which is not a feasible trajectory of the inverted pendulum. The projected state $x_{1}$ is depicted in Figure 2. Both projections $\mathcal{P}(\xi)$ (in magenta) and $\mathcal{P}_{c}(\xi)$ (in red) provide a trajectory close to the curve $\xi$ (in green). However, when closely checking the terminal state, one can see that only the trajectory projected under $\mathcal{P}_{c}(\xi)$ satisfies the terminal constraint.

Next, we apply the fsPRONTO (Algorithm 3) in order to optimize the trajectory of an inverted pendulum. We use $Q=\operatorname{diag}(100,1)$ and $R=1$ as cost parameters. The choice of a higher penalty on the first component $x_{1}$ of the least-square distance will result in an optimal solution (solid red) which almost overlaps the first component of the desired curve (dashed-dot blue) as shown in Figure 1.

It is worth nothing that, as expected, the algorithm guarantees recursive feasibility. In fact, the terminal error, highlighted in the inset, is zero at each iteration for both the state components.

In Figure 3 the descent at each iteration, in logarithmic scale, is depicted. It gives a measure of the rate of convergence of the algorithm which appears to be quadratic.

VI Conclusions

In this paper we have presented a new numerical approach for solving final-state constrained optimal control problems. The main advantage of the proposed method is that it guarantees recursive feasibility of both the dynamics and the final-state constraint at each iteration. Specifically, we have proposed a Newton method, inspired to the one introduced in [11], based on: (i) the design of a final-state constrained projection operator, being able to find a trajectory satisfying the final constraint, and (ii) the computation of a descent direction satisfying the final constraint to first-order.

Bibliography19

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. Diehl, H. G. Bock, H. Diedam, and P.-B. Wieber, “Fast direct multiple shooting algorithms for optimal robot control,” in Fast motions in biomechanics and robotics . Springer, 2006, pp. 65–93.
2[2] D. P. Bertsekas, Dynamic Programming and Optimal Control , 2nd ed. Athena Scientific, 2005.
3[3] A. E. Bryson and Y.-C. Ho, Applied Optimal Control - Optimization, Estimation, and Control . Hemisphere Publishing Cooperation, 1975.
4[4] D. E. Kirk, Optimal Control Theory . Prentice-Hall Inc., 1970.
5[5] A. P. Sage, Optimum Systems Control . Prentice-Hall, 1968.
6[6] L. S. Pontryagin, V. G. Boltyanskii, R. Gamkrelidze, and E. F. Mishchenko, The Mathematical Theory of Optimal Processes . Wiley (NY), 1962.
7[7] D. Liberzon, Calculus of Variations and Optimal Control Theory: a Concise Introduction . Princeton, NJ: Princeton University Press, 2012.
8[8] A. Cervantes and L. T. Biegler, “Large-scale DAE optimization using a simultaneous NLP formulation,” AI Ch E Journal , vol. 44, no. 5, pp. 1038–1050, 1998.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Final-State Constrained Optimal Control via a Projection Operator

Abstract

I Introduction

Notation

II Problem Setup and Preliminaries

Assumption II.1** (Linear controllability).**

Assumption II.2** (Second Order Sufficiency).**

Theorem II.3** ([16, Theorem 2.12.12.1]).**

Remark II.4**.**

II-A Linear Quadratic (LQ) optimal state transfer problem

Remark II.5**.**

Theorem II.6** ([16, Proposition 1.11.11.1]).**

III Projection Operator Newton Method for Trajectory Optimization (PRONTO)

III-A The trajectory tracking nonlinear projection operator

III-B The PRONTO algorithm

Remark III.1**.**

IV Final-state constrained PRONTO

IV-A Final-state constrained projection operator

Remark IV.1**.**

IV-B fsPRONTO Algorithm

Remark IV.2**.**

V Numerical Computations

VI Conclusions

Assumption II.1 (Linear controllability).

Assumption II.2 (Second Order Sufficiency).

Theorem II.3 ([16, Theorem $2.1$ ]).

Remark II.4.

Remark II.5.

Theorem II.6 ([16, Proposition $1.1$ ]).

Remark III.1.

Remark IV.1.

Remark IV.2.