Timescale Separation in Autonomous Optimization

Adrian Hauswirth; Saverio Bolognani; Gabriela Hug; Florian D\"orfler

arXiv:1905.06291·math.OC·October 8, 2020·IEEE Trans. Autom. Control.

Timescale Separation in Autonomous Optimization

Adrian Hauswirth, Saverio Bolognani, Gabriela Hug, Florian D\"orfler

PDF

TL;DR

This paper analyzes the stability of autonomous optimization controllers modeled after optimization algorithms, emphasizing the importance of timescale separation and providing stability bounds for various methods.

Contribution

It quantifies the necessary timescale separation for stable autonomous optimization and offers direct design prescriptions using singular perturbation analysis.

Findings

01

Derived stability bounds for gradient-based feedback laws

02

Identified robustness issues with certain optimization algorithms in autonomous settings

03

Provided guidelines for designing stable autonomous optimization controllers

Abstract

Autonomous optimization refers to the design of feedback controllers that steer a physical system to a steady state that solves a predefined, possibly constrained, optimization problem. As such, no exogenous control inputs such as setpoints or trajectories are required. Instead, these controllers are modeled after optimization algorithms that take the form of dynamical systems. The interconnection of this type of optimization dynamics with a physical system is however not guaranteed to be stable unless both dynamics act on sufficiently different timescales. In this paper, we quantify the required timescale separation and give prescriptions that can be directly used in the design of this type of feedback controllers. Using ideas from singular perturbation analysis, we derive stability bounds for different feedback laws that are based on common continuous-time optimization schemes. In…

Equations218

\overset{x}{˙} = f (x, t), x (0) = x_{0},

\overset{x}{˙} = f (x, t), x (0) = x_{0},

T_{v} V

T_{v} V

N_{v} V

\nabla Φ (v^{⋆}) + \nabla ξ (v^{⋆})^{T} λ + \nabla ζ (v^{⋆})^{T} μ = 0

\nabla Φ (v^{⋆}) + \nabla ξ (v^{⋆})^{T} λ + \nabla ζ (v^{⋆})^{T} μ = 0

\overset{x}{˙}

\overset{x}{˙}

∥ x (t) - h (\overset{u}{^}) ∥ \leq K ∥ x_{0} - h (\overset{u}{^}) ∥ e^{- τ t},

∥ x (t) - h (\overset{u}{^}) ∥ \leq K ∥ x_{0} - h (\overset{u}{^}) ∥ e^{- τ t},

α ∥ x - h (u) ∥^{2} \leq W (x, u)

α ∥ x - h (u) ∥^{2} \leq W (x, u)

\dot{W} (x, u)

∥ \nabla_{u} W (x, u) ∥

\overset{u}{˙} = - Q (u) \nabla \tilde{Φ} (u)^{T}, u (0) = u_{0}

\overset{u}{˙} = - Q (u) \nabla \tilde{Φ} (u)^{T}, u (0) = u_{0}

x, u minimize Φ (x, u) x = h (u),

x, u minimize Φ (x, u) x = h (u),

u minimize \tilde{Φ} (u)

u minimize \tilde{Φ} (u)

\overset{u}{˙} = - Q (u) \nabla \tilde{Φ} (u) = - Q (u) H (u)^{T} \nabla Φ (h (u), u),

\overset{u}{˙} = - Q (u) \nabla \tilde{Φ} (u) = - Q (u) H (u)^{T} \nabla Φ (h (u), u),

H (u)^{T} := [\nabla h (u)^{T} I_{p}] .

H (u)^{T} := [\nabla h (u)^{T} I_{p}] .

\overset{x}{˙}

\overset{x}{˙}

\overset{u}{˙}

0 = \nabla Φ (x^{⋆}, u^{⋆}) + [I_{n} - \nabla h (u^{⋆})^{T}] λ^{⋆} .

0 = \nabla Φ (x^{⋆}, u^{⋆}) + [I_{n} - \nabla h (u^{⋆})^{T}] λ^{⋆} .

y, u minimize Φ_{io} (y, u) y = h_{io} (u),

y, u minimize Φ_{io} (y, u) y = h_{io} (u),

\overset{u}{˙} = - Q (u) H_{io}^{T} (u) \nabla Φ_{io} (y, u)

\overset{u}{˙} = - Q (u) H_{io}^{T} (u) \nabla Φ_{io} (y, u)

H_{io}^{T} (u) \nabla Φ_{io} (g (x), u)

H_{io}^{T} (u) \nabla Φ_{io} (g (x), u)

H (u)^{T} (\nabla Φ (x^{'}, u) - \nabla Φ (x, u)) \leq L ∥ x^{'} - x ∥

H (u)^{T} (\nabla Φ (x^{'}, u) - \nabla Φ (x, u)) \leq L ∥ x^{'} - x ∥

u \in R^{p} sup ∥ Q (u) ∥ < \frac{γ}{ζ L},

u \in R^{p} sup ∥ Q (u) ∥ < \frac{γ}{ζ L},

min {Φ (x, u) ∣ x = h (u)} .

min {Φ (x, u) ∣ x = h (u)} .

Ψ (x, u) = (1 - δ) \tilde{Φ} (u) + δ W (x, u),

Ψ (x, u) = (1 - δ) \tilde{Φ} (u) + δ W (x, u),

Λ := [- (1 - δ) \frac{1}{2} (κ L (1 - δ) + κ ζ δ) \frac{1}{2} (κ L (1 - δ) + κ ζ δ) - γ δ]

Λ := [- (1 - δ) \frac{1}{2} (κ L (1 - δ) + κ ζ δ) \frac{1}{2} (κ L (1 - δ) + κ ζ δ) - γ δ]

\dot{Ψ} (x, u) = (1 - δ) \nabla \tilde{Φ} (u) Q (u) g (x, u) + δ \nabla_{x} W (x, u) f (x, u) + δ \nabla_{u} W (x, u) Q (u) g (x, u),

\dot{Ψ} (x, u) = (1 - δ) \nabla \tilde{Φ} (u) Q (u) g (x, u) + δ \nabla_{x} W (x, u) f (x, u) + δ \nabla_{u} W (x, u) Q (u) g (x, u),

\nabla \tilde{Φ} (u) Q (u) g (x, u)

\nabla \tilde{Φ} (u) Q (u) g (x, u)

= - \nablaΦ (h (u), u) H (u) Q (u) g (x, u)

= - (\nablaΦ (h (u), u) - \nablaΦ (x, u)) H (u) Q (u) g (x, u)

- \nablaΦ (x, u) H (u) Q (u) g (x, u)

\leq L ∥ x - h (u) ∥ Q^{\frac{1}{2}} (u) Q^{\frac{1}{2}} (u) g (x, u)

- \nablaΦ (x, u) H (u) Q (u) g (x, u)

\leq κ L ∥ x - h (u) ∥∥ g (x, u) ∥_{Q (u)} - g (x, u)^{T} Q (u) g (x, u)

\leq κ L ∥ x - h (u) ∥∥ g (x, u) ∥_{Q (u)} - ∥ g (x, u) ∥_{Q (u)}^{2},

\nabla_{u} W (x, u) Q (u) g (x, u)

\nabla_{u} W (x, u) Q (u) g (x, u)

\leq κ ζ ∥ x - h (u) ∥∥ g (x, u) ∥_{Q (u)} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Timescale Separation in Autonomous Optimization

Adrian Hauswirth, Saverio Bolognani, Gabriela Hug, and Florian Dörfler The authors are with the Department of Information Technology and Electrical Engineering, ETH Zürich, 8092 Zürich, Switzerland. Email: {hadrian,bsaverio,ghug,dorfler}@ethz.ch.This work was supported by ETH Zürich funds, by the SNF AP Energy Grant #160573, and by the Swiss Federal Office of Energy grant #SI/501708 UNICORN.Manuscript received May 20, 2019

Abstract

Autonomous optimization refers to the design of feedback controllers that steer a physical system to a steady state that solves a predefined, possibly constrained, optimization problem. As such, no exogenous control inputs such as setpoints or trajectories are required. Instead, these controllers are modeled after optimization algorithms that take the form of dynamical systems. The interconnection of this type of optimization dynamics with a physical system is however not guaranteed to be stable unless both dynamics act on sufficiently different timescales. In this paper, we quantify the required timescale separation and give prescriptions that can be directly used in the design of this type of feedback controllers. Using ideas from singular perturbation analysis, we derive stability bounds for different feedback laws that are based on common continuous-time optimization schemes. In particular, we consider gradient descent and its variations, including projected gradient, and Newton gradient. We further give stability bounds for momentum methods and saddle-point flows. Finally, we discuss how optimization algorithms like subgradient and accelerated gradient descent, while well-behaved in offline settings, are unsuitable for autonomous optimization due to their general lack of robustness.

Index Terms:

Optimization, Gradient methods, Closed-loop systems

I Introduction

Two of the first and foremost motivations for feedback control have traditionally been the stabilization of unstable dynamical systems and tracking of a reference signal in the presence of disturbances. Although prevalent control design methods often serve to accomplish both goals at the same time, the task of stabilization is generally associated with the design of a proportional controller, whereas tracking of a setpoint under constant disturbances usually requires the incorporation of an integral control component. These setpoints are, in turn, carefully designed, e.g., in a conventional setting via an offline (i.e., feedforward) optimization procedure.

Against this backdrop, we consider in this paper the concept of autonomous optimization (or feedback-based optimization), which aims at generalizing controllers beyond basic setpoint tracking. Instead, we consider the design of (integral) feedback controllers that steer a (stable) physical system to the solution of a general optimization problem without requiring an explicit solution in the form of an exogenous setpoint, hence being “autonomous”. This particular choice of words also refers to the fact that for most practical applications only time-invariant feedback controllers are of relevance.

A particular feature of feedback-based optimization are the different possibilities to incorporate constraints that need to be satisfied at steady state. These constraints can either be saturation-like in that are satisfied at all times or asymptotic, in the sense that the can be violated during the transient behavior, but need to be satisfied in the limit. As the name suggests, saturation-like constraints are generally associated with physical saturation, e.g., due to limited actuation capabilities at the input, and constraints on outputs are often formulated as asymptotic constraints.

The concept of autonomous optimization is in marked contrast with optimal control frameworks such as dynamic programming or model predictive control, since transient optimality of trajectories is not the primary goal. Instead, one aims for controllers that achieve asymptotic optimality at low computational cost and with little model information.

The problem of steering the state (or output) of a physical system to an optimal steady state has been considered in different contexts and fields (see next section). However, many previous works start from a timescale separation assumption where the physical system exhibits fast-decaying dynamics that are ignored in the control design. This simplifies the problem since the physical system can be abstracted by algebraic constraints, i.e., its steady-state behavior.

In this paper, we quantify the required timescale separation for feedback-based optimization schemes that take the simple form illustrated in Fig. 1. Namely, we consider a physical system that is interconnected with optimization dynamics that are modeled after common optimization algorithms (e.g. gradient descent, momentum methods, or saddle-point flows) and apply ideas inspired by singular perturbation analysis to derive sufficient conditions for closed-loop stability.

Throughout, we assume that the physical system is stable (or stabilized by an appropriate fast controller). By doing so, we follow a paradigm of “first stabilize, then optimize” which is in contrast to other recent works that base their designs on integral quadratic constraints [1, 2], backstepping [3], or output regulation [4]. In particular, [1, 2] pursue a holistic perspective where stabilization and tracking are considered as joint objectives. These works, however, arrive at complex and convoluted LMI conditions to certify stability that are computationally expensive at large scales and often do not directly translate into a systematic design method.

In contrast, our results—although simple and potentially conservative—give immediate design prescriptions while requiring only limited model information, that can often be estimated in practice. They can be applied to large-scale systems without redesigning existing stabilizing controllers and have an intuitive interpretation in terms of the timescale separation required between slow optimization dynamics and fast underlying system behavior. Finally, the generality of our approach allows us to consider nonlinear plants as well as a plethora of optimization algorithms.

I-A Related Work

The problem of driving a physical system to an optimal steady state has a considerable history. Early precursors can be found in process control under the name of optimizing control [5] which has evolved into the modern notion of real-time optimization [6, 7, 8]. This line of work is, however, mostly concerned with reducing the effect of inaccurate steady-state models, rather than the interactions with fast dynamics.

Further, the concept of extremum-seeking [9, 10, 11] aims at learning a gradient direction without recourse to any model information by means of a probing signal and exploitation of non-commutativity, but significant limitations arise when considering high-dimensional systems or constraints.

The historic roots of the approach pursued in this paper can be traced back to the study of communication networks where congestion control algorithms have been analyzed from an optimization perspective [12, 13, 14]. Similar ideas have recently attracted a lot of interest in power systems, where feedback-based optimization schemes have been proposed for voltage control [15, 16], frequency control [17, 18, 19], or general power flow optimization [20, 21, 22, 23]. For a survey see [24].

I-B Contributions

In this paper, we extend and generalize the results in [25]. Namely, we consider nonlinear physical systems instead of linear time-invariant (LTI) plants and we study a variety of optimization dynamics other than mere gradient flows. In particular, we study a general class of variable-metric gradient descent algorithms, including special cases such as Newton descent. Furthermore, we consider the case of projected gradient descent which, in the feedback-optimization context, can be interpreted as a model for physical input saturation. We also develop a stability bound for momentum methods (such as the heavy ball method). Finally, we provide a general result that can be applied, for instance, to saddle-point algorithms that are commonly used in autonomous optimization to enforce asymptotic constraints (that can be transiently violated) on output variables.

For our analysis, we use ideas from singular perturbation analysis to construct classes of Lyapunov functions that cannot only be used to certify stability but provide direct prescriptions for the feedback control synthesis.

Finally, through the non-examples of subgradient flows and accelerated gradient descent, we illustrate the sharpness of our analysis (in the sense that our assumptions cannot generally be avoided) and the fundamental limitations of the general framework of autonomous optimization.

I-C Organization

In Section II we fix the notation and recall basic results from nonlinear systems theory. Section III provides a comprehensive study of gradient-based feedback controllers, describes the main proof ideas, and explores specific examples and variations of gradient-based schemes. In Sections IV and V we consider momentum-based algorithms and general feedback optimization schemes, respectively. Finally, in Section VI we summarize our results and discuss open problems. In the Section -D we also provide an additional result specialized to LTI systems.

II Preliminaries

We consider the usual Euclidean setup for $\mathbb{R}^{n}$ where $\left\langle\cdot,\cdot\right\rangle$ denotes the canonical inner product and $\|\cdot\|$ the associated 2-norm. The non-negative real line is denoted by $\mathbb{R}_{+}$ . If $A\in\mathbb{R}^{n\times m}$ is a matrix, $\|A\|$ denotes the induced matrix norm, namely $\|A\|:=\sup_{\|v\|=1}\|Av\|$ . In particular, if $A$ is square and symmetric, then $\lambda^{\max}_{{A}}=\|A\|$ and $\lambda^{\min}_{{A}}$ denote the maximum and minimum eigenvalue of $A$ , respectively. If $A$ is positive definite, denoted by $A\in\mathbb{S}^{n}_{+}$ , we use the notation $\|x\|_{A}:=\sqrt{x^{T}Ax}$ for $x\in\mathbb{R}^{n}$ to denote the norm on $\mathbb{R}^{n}$ induced by $A$ . A map $A:\mathbb{R}^{n}\rightarrow\mathbb{S}_{+}^{n}$ is called a metric on the space $\mathbb{R}^{n}$ , in the sense that it defines a (variable) norm $\|v\|_{A(x)}$ at every point $x\in\mathbb{R}^{n}$ and vector $v\in\mathbb{R}^{n}$ .

Let $\mathcal{X}\subset\mathbb{R}^{n}$ be open and consider a map $f:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}$ . Unless noted otherwise, differentiability is understood in the usual sense (of Fréchet). Namely, $\nabla f(x)$ denotes the $m\times n$ matrix of partial derivatives of $f(x)$ evaluated at $x\in\mathcal{X}$ . If $x^{\prime}$ is a subset of variables, then $\nabla_{x^{\prime}}f(x)$ denotes the Jacobian with respect to $x^{\prime}$ . The map $f$ is $L$ -Lipschitz continuous if $\|f(x)-f(y)\|\leq L\|x-y\|$ for all $x,y\in\mathcal{X}$ . If $m=1$ we call $\boldsymbol{\nabla}f(x):=\nabla f(x)^{T}$ the gradient of $f$ at $x$ . In this case, $f(x)$ is $\mu$ -strongly convex if $f(y)-f(x)-\nabla f(x)(y-x)\geq\frac{\mu}{2}\|y-x\|^{2}$ for all $x,y\in\mathbb{R}^{n}$ . In particular, if $f$ is twice continuously differentiable, $f$ is $\mu$ -strongly convex if and only if $\lambda^{\min}_{{\nabla^{2}f(x)}}\geq\mu$ for all $x\in\mathbb{R}^{n}$ .

Dynamical Systems

Given a vector field $f:\mathbb{R}^{n}\times\mathbb{R}\rightarrow\mathbb{R}^{n}$ , consider the initial value problem

[TABLE]

where $x_{0}\in\mathbb{R}^{n}$ is an initial condition. A function $x:\mathbb{R}_{+}\rightarrow\mathbb{R}^{n}$ is called a complete solution to (1) if $x$ is continuously differentiable, $x(0)=x_{0}$ , and $\dot{x}(t)=f(x(t),t)$ holds for all $t\in\mathbb{R}_{+}$ . A set $\mathcal{S}\subset\mathbb{R}^{n}$ is invariant if all solutions with $x_{0}\in\mathcal{S}$ remain in $\mathcal{S}$ for all $t$ . Given a differentiable function $V:\mathbb{R}^{n}\rightarrow\mathbb{R}$ , we denote its Lie derivative along the vector field $f$ (which is usually clear from the context) by $\dot{V}(x):=\nabla V(x)f(x)$ . Stability and asymptotic stability are understood in the sense of Lyapunov. That is, a set $\mathcal{X}\subset\mathbb{R}^{n}$ is stable, if for every neighborhood $\mathcal{V}$ of $\mathcal{X}$ there exists another neighborhood $\mathcal{W}$ such that all trajectories starting in $\mathcal{W}$ remain in $\mathcal{V}$ .

Nonlinear Optimization

Given two continuously differentiable functions $\xi:\mathbb{R}^{n}\rightarrow\mathbb{R}^{s}$ and $\zeta:\mathbb{R}^{n}\rightarrow\mathbb{R}^{r}$ , let $\mathcal{V}:=\{v\in\mathbb{R}^{n}\,|\,\xi(v)=0,\,\zeta(v)\leq 0\}$ and let $\mathbf{I}(v):=\{j\,|\,\zeta_{j}(v)=0\}$ denote the set of active inequality constraints at $v$ . We call $\mathcal{V}$ a regular set if for all $v\in\mathcal{V}$ the matrix $\begin{bmatrix}\nabla\xi(v)^{T}&\nabla\zeta_{\mathbf{I}(v)}(v)^{T}\end{bmatrix}^{T}$ has full row rank $s+|\mathbf{I}(v)|$ .111 The term regular alludes to the fact that these sets are in fact Clarke regular (or tangentially regular) [26]. Furthermore, the requirement that $\operatorname{rank}\begin{bmatrix}\nabla\xi(v)^{T}&\nabla\zeta_{\mathbf{I}(v)}(v)^{T}\end{bmatrix}^{T}=s+\|\mathbf{I}(u)\|$ is known in the optimization literature as linear independence constraint qualification (LICQ). The tangent and normal cone of $\mathcal{V}$ at $v$ are respectively

[TABLE]

Namely, $T_{v}\mathcal{V}$ and $N_{v}\mathcal{V}$ are both closed convex cones, and they are polar cones to each other. For an optimization problem $\min\{\Phi(v)\,|\,v\in\mathcal{V}\}$ where $\Phi:\mathbb{R}^{n}\rightarrow\mathbb{R}$ is continuously differentiable, a point $v^{\star}$ is critical if it satisfies the first-order optimality conditions (KKT conditions). Namely, $v^{\star}\in\mathcal{V}$ and $-\boldsymbol{\nabla}\Phi(v^{\star})\in N_{v}\mathcal{V}$ . This is equivalent to the existence of $\lambda\in\mathbb{R}^{s}$ and $\mu\in\mathbb{R}^{r}_{+}$ such that

[TABLE]

and $\mu_{i}\zeta_{i}(v^{\star})=0$ for all $i=1,\ldots,r$ . A point $v^{\star}$ is a local minimizer if for all $v\in\mathcal{V}$ in a neighborhood of $v^{\star}$ it holds that $\Phi(v^{\star})\leq\Phi(v)$ . A local minimizer is strict if $\Phi(v^{\star})<\Phi(v)$ holds for all $v\neq v^{\star}$ .

II-A Nonlinear Plant Dynamics

Throughout, we consider physical plants modeled as

[TABLE]

where $x\in\mathbb{R}^{n}$ is the system state, $u:\mathbb{R}_{+}\rightarrow\mathbb{R}^{p}$ is a measurable control input, $x_{0}\in\mathbb{R}^{n}$ is an initial condition, $f:\mathbb{R}^{n}\times\mathbb{R}^{p}\rightarrow\mathbb{R}^{n}$ is a locally Lipschitz continuous vector field. Hence, the existence of a local solution $x:[0,T)\rightarrow\mathbb{R}^{n}$ for some $T>0$ and any initial condition $x_{0}$ is guaranteed.

Assumption II.1.

The function $f$ in (3) is continuously differentiable, $\ell_{x}$ -Lipschitz in $x$ , and $\ell_{u}$ -Lipschitz in $u$ . There exists a differentiable, $\ell$ -Lipschitz continuous map $h:\mathbb{R}^{p}\rightarrow\mathbb{R}^{m}$ such that $f(h(u),u)=0$ for all $u\in\mathbb{R}^{p}$ . Finally, there exist $\tau,K>0$ such that for every initial condition $x_{0}\in\mathbb{R}^{n}$ and every constant $\hat{u}\in\mathbb{R}^{p}$ it holds that

[TABLE]

where $x(t)$ is a solution to (3) with $x(0)=x_{0}$ and $u(t)\equiv\hat{u}$ .

The existence of well-defined steady-state map can for instance be guaranteed if $f$ is continuously differentiable and $\nabla_{x}f(x,u)$ is invertible for all $x\in\mathbb{R}^{n}$ and $u\in\mathbb{R}^{p}$ . In this case the implicit function theorem guarantees the existence of $h:\mathbb{R}^{p}\rightarrow\mathbb{R}^{n}$ such that $f(h(u),u)=0$ for all $u\in\mathbb{R}^{p}$ . Lipschitz continuity of $h$ is guaranteed if $f$ is Lipschitz continuous and all eigenvalues of $\nabla_{x}f(x,u)$ are bounded away from 0 with some minimal distance for all $(x,u)$ . Note that Assumption II.1 implies that trajectories are complete, i.e., can be extended to $t\rightarrow\infty$ .

*Remark 1**.*

For simplicity, we assume that $x$ and $u$ can take any value in $\mathbb{R}^{n}$ and $\mathbb{R}^{p}$ , respectively. However, if some subsets $\mathcal{X}\subset\mathbb{R}^{n}$ and $\mathcal{U}\subset\mathbb{R}^{p}$ are known to be invariant under given dynamics, Assumption II.1 can be weakened because it needs to be satisfied only on $\mathcal{X}$ and $\mathcal{U}$ . In Section III-C4 we illustrate this possibility for the example of projected gradient flows. $\blacksquare$

Assumption II.1 requires (3) to be exponentially stable with decay rate $\tau$ . This, in turn, implies the existence of Lyapunov function, as indicated by the following result.

Proposition II.1.

Let Assumption II.1 hold. Then, for any fixed $u\in\mathbb{R}^{p}$ there exists a Lyapunov function $W:\mathbb{R}^{n}\times\mathbb{R}^{p}\rightarrow\mathbb{R}$ for the system (3) and parameters $\alpha,\beta,\gamma,\zeta>0$ such that

[TABLE]

Proposition II.1 is a condensation of a standard converse Lyapunov theorem for exponentially stable systems [27, Th. 4.14]. Only the definition of $\zeta$ (which captures a Lipschitz-type property of $W$ with respect to $u$ ) is non-standard. A proof can be found in Section -B.

II-B Variable-Metric Gradient Flows

A gradient flow is a dynamical system on $\mathbb{R}^{p}$ defined as

[TABLE]

for some initial condition $u_{0}\in\mathbb{R}^{p}$ where $\tilde{\Phi}:\mathbb{R}^{p}\rightarrow\mathbb{R}$ is continuously differentiable with locally Lipschitz gradient, and $Q(u)$ is a locally Lipschitz continuous metric on $\mathbb{R}^{p}$ , i.e., as a map from $\mathbb{R}^{p}$ to $\mathbb{S}^{p}_{+}$ . Namely, Lipschitz continuity of $\nabla\tilde{\Phi}(u)$ and $Q(u)$ guarantee the existence and uniqueness of local solution trajectories of (4) for any initial condition.

Although gradient flows are one of the most basic optimization dynamics, generally, one can only conclude the following:

Theorem II.2.

If $\tilde{\Phi}(u)$ has compact level sets, all trajectories of (4) are complete and converge to the set $\{u\,|\,\nabla\tilde{\Phi}(u)=0\}$ .

Theorem II.2 follows from the Invariance Principle [28, Prop 5.22]. The fact that trajectories are complete follows from the fact that level sets of $\tilde{\Phi}$ are compact and invariant.

The use of a variable metric generalizes the class of gradient flows to include, for instance, Newton gradient flows; see Section III-C. It modifies the solution trajectories, but does not change the qualitative convergence behavior.

In general—and even if $Q(u)=\mathbb{I}_{n}$ —it is not possible to conclude that trajectories converge to minimizers of $\tilde{\Phi}(u)$ [29]. One option is to assume convexity of $\tilde{\Phi}$ in which case convergence to the set of global minimizers follows immediately.

Without convexity it is still possible to identify minimizers based on their stability properties as dynamic equilibria.

Theorem II.3.

[29]** For a critical point of (4) the following relations hold:

${SLM}$${LM}$${ASE}$${SE}$

where (S)LM stands for (strict) local minimizer and (A)SE for (asymptotically) stable equilibrium.

In particular, a local minimizer of $\tilde{\Phi}$ is not necessarily a stable equilibrium and vice versa. A common remedy to avoid this kind of pathological behavior is to require the objective function (and the metric) to be real analytic [29].

Nevertheless, it is an important observation that, from the dynamical systems point-of-view, asymptotic stability of an equilibrium replaces the need for second-order optimality conditions such as positive definiteness of the Hessian of $\tilde{\Phi}(u)$ .

III Gradient-Based Feedback Controllers

We now show how gradient flows lend themselves to designing nonlinear feedback controllers that can steer physical systems to an optimal steady state. In particular, we derive a basic requirement for stability of the feedback interconnection with a physical plant. Finally, we discuss our results (and their limitations) in the context of three special classes of gradient-type controllers.

III-A Gradient-Based Feedback Control

As a starting point for our control design, we consider the optimization problem

[TABLE]

where $h(u)$ is the steady-state map of a plant satisfying Assumption II.1 and $\Phi(x,u)$ is a differentiable cost function depending on the system state and the control input.

By substituting $x$ with $h(u)$ in the objective function, we arrive at the unconstrained optimization problem

[TABLE]

where $\tilde{\Phi}(u):=\Phi(h(u),u)$ . Adopting singular perturbation terminology, we call (6) the reduced problem since it assumes that the physical system is at steady state.

Based on (6), we can formulate a gradient flow of $\tilde{\Phi}(u)$ as

[TABLE]

where $Q:\mathbb{R}^{p}\rightarrow\mathbb{R}^{m}$ is a Lipschitz continuous metric and where we have applied the chain rule and defined

[TABLE]

A feedback controller can be obtained from (7) by replacing $h(u)$ in the evaluation of $\nabla\Phi(h(u),u)$ by the measured value of $x$ . The interconnection is hence defined by

[TABLE]

Existence and uniqueness of local solutions of (8) are guaranteed for any initial condition $(x_{0},u_{0})$ , since $f,Q,h$ , and $\nabla\Phi$ are locally Lipschitz continuous by assumption. Completeness of solutions will be shown jointly with stability. Independently, equilibria of (8) always coincide with the critical points of (5):

Proposition III.1.

Every minimizer $(x^{\star},u^{\star})$ of (5) is an equilibrium point of (8). Conversely, every equilibrium point of (8) is a critical point of (5).

Proof.

First, note that $\operatorname{gph}h:=\{(x,u)\,|\,x=h(u)\}$ is a regular set since $\operatorname{rank}\begin{bmatrix}\mathbb{I}_{n}&-\nabla h(u)\end{bmatrix}=n$ and hence first-order optimality conditions are applicable. Given an optimizer $(x^{\star},u^{\star})$ , we have $x^{\star}=h(u^{\star})$ and therefore $f(h(u^{\star}),u^{\star})=0$ . Further, there exists $\lambda^{\star}$ such that (2) holds, more specifically

[TABLE]

Note that $\begin{bmatrix}\mathbb{I}_{n}&-\nabla h(u^{\star})\end{bmatrix}H(u^{\star})=0$ , and therefore (2) implies that $H(u^{\star})^{T}\boldsymbol{\nabla}\Phi(x^{\star},u^{\star})=0$ . It follows that $(x^{\star},u^{\star})$ is an equilibrium of (8). Conversely, let $(x^{\star},u^{\star})$ be an equilibrium and therefore $x^{\star}=h(u^{\star})$ and $\boldsymbol{\nabla}\Phi(x^{\star},u^{\star})\in\ker H(u^{\star})^{T}=\operatorname{im}H(u^{\star})^{\perp}$ . However, $\operatorname{im}H(u^{\star})^{\perp}$ is spanned by $\begin{bmatrix}\mathbb{I}_{n}&-\nabla h(u^{\star})\end{bmatrix}^{T}$ , and therefore (2) holds. ∎

*Remark 2**.*

The feedback law (8b) does not need to be implemented as a state-feedback controller. Assume that only output measurements $y=g(x)$ are available, where $g:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}$ is continuously differentiable. This gives rise to a differentiable input-output steady-state map $h_{\mathrm{io}}(u):=g(h(u))$ . Further, instead of (5), consider the problem

[TABLE]

where $\Phi_{\mathrm{io}}$ is a cost function only depending on the system output and the control input.

Then, by substituting $y$ with $h_{\mathrm{io}}(u)$ in the objective function as before and computing the gradient of the reduced cost function, one arrives at the output-feedback law

[TABLE]

where $H_{\mathrm{io}}^{T}(u):=\begin{bmatrix}\nabla h_{\mathrm{io}}^{T}(u)&\mathbb{I}_{p}\end{bmatrix}$ .

If $g(x)=Cx+d$ is an affine map, this feedback controller is equivalent to (8). To see this, note that $\nabla h_{\mathrm{io}}(u)^{T}=\nabla h(u)^{T}\nabla g(h(u))^{T}$ with $\nabla g(h(u))^{T}=C^{T}$ and therefore

[TABLE]

where $\Phi(x,u):=\Phi_{\mathrm{io}}(g(x),u)$ .

Consequently, although (8b) is formulated in terms of the state $x$ , it is not necessarily a state feedback controller, and can be implemented as an output feedback law. The formulation in terms of the internal state is nevertheless important for the forthcoming stability analysis. $\blacksquare$

III-B Stability Analysis

Even though (3) and (7) are individually asymptotically stable by Assumption II.1 and Theorem II.2, respectively, the interconnection (8) is not guaranteed to be stable. However, under the following mild assumption we can derive conditions for the asymptotic stability of (8).

Assumption III.1.

For the objective function $\Phi(x,u)$ and the steady-state map $h(u)$ in (5) there exists $L>0$ such that

[TABLE]

for all $x^{\prime},x\in\mathbb{R}^{n}$ and all $u\in\mathbb{R}^{p}$ .

*Remark 3**.*

Assumption III.1 is a weakened Lipschitz condition. It is for instance satisfied if $\nabla\Phi$ is $\tilde{L}$ -Lipschitz continuous, in which case $L$ can be chosen as $L:=\ell\tilde{L}$ where $\ell$ is the Lipschitz constant of $H(u)$ (which exists by Assumption II.1). However, in practice a tighter bound can often be established by exploiting the structure of $H(u)$ and $\Phi(x,u)$ . $\blacksquare$

Our first main result establishes a sufficient condition for the asymptotic stability of (8) where we consider the metric $Q(u)$ as a design parameter. In particular, the bound illustrates the trade-off between the decay properties of the fast physical system and the gain of the slow optimization dynamics. This behavior will also be illustrated in Section III-C with the help of numerical examples.

Theorem III.2.

Consider (8) and let Assumptions II.1 and III.1 hold. If $\Phi(h(u),u)$ has compact level sets, then all trajectories of (8) are complete and converge to the set of first-order optimal points of (5) whenever

[TABLE]

where $L>0$ is a constant satisfying (9). Furthermore, $\gamma$ and $\zeta$ are constants associated with a Lyapunov function $W(x,h(u))$ for (3) according to Proposition II.1. Finally, asymptotically stable equilibrium points of (8) are strict local minimizers of (5), and strict local minimizers are stable equilibria.

In many practical applications the righthand side of (10) can be estimated. The parameter $L$ is can be derived from model information (see Remark 3) and the parameters $\gamma$ and $\zeta$ can often be estimated from measurements of the decay rate of the open-loop system without explicitly formulating a Lyapunov function [28, Thm 5.17].

If $Q(u)=\epsilon Q$ where $Q\succ 0$ is constant, the bound (10) expresses a design condition on the global control gain $\epsilon>0$ .

Corollary III.3.

Consider the same setup as in Theorem III.2 and assume $Q\equiv\epsilon\mathbb{I}_{n}$ . Then, for all $\epsilon<\epsilon^{\star}:=\frac{\gamma}{\zeta L}$ the system (8) is asymptotically stable.

*Remark 4**.*

If the integrator of the controller is grouped together with the plant in order to make the feedback law purely proportional, then $\frac{\zeta}{\gamma}$ is an estimate of the input-to-state (ISS) gain of the augmented plant and $\sup_{u\in\mathbb{R}^{p}}\|Q(u)\|\cdot L$ is the ISS gain of the proportional feedback law. Hence, the condition (10) can also be interpreted as a small gain result: The product of the two gains has to be less than unity. $\blacksquare$

It is immediate that under the additional assumption of convexity the following stronger conclusion can be drawn.

Corollary III.4.

Consider the same setup as in Theorem III.2, and assume that $\Phi$ is convex and $h(u)$ is linear. Then, if (10) holds, all trajectories converge to the global minimizers of

[TABLE]

Proof of Theorem III.2

Our proof is similarly structured as in [25] and is inspired by ideas from singular perturbation analysis [30, 27]. Namely, we work towards an application of the LaSalle invariance principle. For this, we consider a LaSalle function of the form

[TABLE]

where $0<\delta<1$ is a convex combination coefficient. In this context, note that $\tilde{\Phi}(u):=\Phi(h(u),u)$ and that $W(x,u)$ is essentially of the form $W(x,u):=V(x-h(u),u)$ (see Section -B) where $x-h(u)$ is referred to as boundary-layer error coordinates in singular perturbation terminology and measures the deviation from the steady state.

First, we establish the requirement for $\Psi$ to be non-increasing along the trajectories of (8). We then show that the level sets of $\Psi$ are compact (and hence invariant) and therefore the invariance principle is applicable. Finally, we prove the connection between stability and optimality of equilibria.

Asymptotic Convergence

The following key lemma establishes an upper bound on the Lie derivative of $\Psi$ .

Lemma III.1.

If for some $\delta\in(0,1)$ , the 2-by-2-matrix

[TABLE]

is negative definite, then $\dot{\Psi}(x(t),u(t))\leq 0$ .

Furthermore, if $\Lambda$ is negative definite, then $\dot{\Psi}(x^{\star},u^{\star})=0$ implies that $x^{\star}=h(u^{\star})$ and $H(u^{\star})^{T}\boldsymbol{\nabla}\Phi(x^{\star},u^{\star})=0$ .

Proof.

The Lie derivative of $\Psi(x,u)$ along (8) is

[TABLE]

where $g(x,u):=-H(u)^{T}\boldsymbol{\nabla}\Phi(x,u)$ . Each of the terms in (12) can be bounded.

Namely, for the first term we can do a rearrangement, apply Cauchy-Schwarz and Assumption III.1 (first inequality below) and use the definition of $\|\cdot\|_{Q(u)}$ to write

[TABLE]

where $Q^{\frac{1}{2}}(u)$ is the unique positive definite square root of $Q(u)\in\mathbb{S}^{n}_{+}$ and $\kappa:=\sup_{u\in\mathbb{R}^{p}}\|Q^{\frac{1}{2}}(u)\|$ .

According to Proposition II.1, we have for the second term in (12) that $\nabla_{x}W(x,u)f(x,u)\leq-\gamma\|x-h(u)\|^{2}$ . Furthermore, for the third term we can apply Cauchy-Schwarz and the definition of $\|\cdot\|_{Q(u)}$ as in (13) to arrive at

[TABLE]

Therefore the Lie derivative of $\Psi$ is bounded by a quadratic function that can be rewritten in matricial form as

[TABLE]

where $\Lambda$ is given by (11). Clearly, if $\Lambda\prec 0$ , then $\dot{\Psi}(t)\leq 0$ .

Finally, we note that if $\Lambda\prec 0$ , then $\dot{\Psi}(x^{\star},u^{\star})=0$ holds only if $\|x^{\star}-h(u^{\star})\|=0$ and $\|g(x^{\star},u^{\star})\|=0$ . Hence the point $(x^{\star},u^{\star})$ is an equilibrium of (8), and satisfies the first-order optimality conditions of (5) by Proposition III.1. This completes the proof of Lemma III.1. ∎

In order to choose an appropriate $\delta$ that guarantees $\Lambda\prec 0$ and therefore $\dot{\Psi}(t)\leq 0$ , we use Lemma A.1 in the appendix. Namely, by setting $\alpha_{1}=1$ , $\alpha_{2}=\gamma$ , $\xi=0$ , $\beta_{1}=\kappa L$ and $\beta_{2}=\kappa\zeta$ , we conclude that $\Lambda\prec 0$ whenever we choose

[TABLE]

thus recovering the bound (10) in Theorem III.2.

Finally, we apply Lemma A.2 to find that the sublevel sets of $\Psi$ are compact and therefore invariant. Consequently, all the requirements of the invariance principle are satisfied, and we conclude that all trajectories converge to the closure of the largest invariant subset for which $\dot{\Psi}=0$ . This, in turn, coincides with the set of critical points of (5).

Relation between Stability and Optimality

The fact that asymptotically stable equilibria are strict local minimizer has been shown in [25] for LTI plants and the standard metric. The proof extends to the present case without major modifications.

To show that strict local minimizers of (5) are stable, let $\mathcal{V}$ be any compact neighborhood of $(x^{\star},u^{\star})$ in which $u^{\star}$ is a strict minimizer of $\tilde{\Phi}(u)$ . We construct a neighborhood $\mathcal{W}\subset\mathbb{R}^{n}\times\mathbb{R}^{p}$ of $(x^{\star},u^{\star})$ such that every trajectory starting in $\mathcal{W}$ remains in $\mathcal{V}$ , thus proving stability.

Hence, consider the LaSalle function $\Psi$ in the previous section, and let $\alpha$ be such that $\Psi(x^{\star},u^{\star})<\alpha<\min_{(x,u)\in\partial\mathcal{V}}\Psi(x,u)$ where $\partial\mathcal{V}$ denotes the boundary of $\mathcal{V}$ . Define $\mathcal{W}:=\{(x,u)\in\mathbb{R}^{n}\times\mathbb{R}^{p}\,|\,\Psi(x,u)\leq\alpha\}\subset\mathcal{V}$ which has a non-empty interior because $\Psi(x^{\star},u^{\star})<\alpha$ . Furthermore, as a sublevel set of $\Psi$ , the set $\mathcal{W}$ is invariant since $\dot{\Psi}(x,u)\leq 0$ (with the proper choice of $\delta$ according to Lemma III.1). This establishes stability of $(x^{\star},u^{\star})$ .

III-C Examples of Gradient-Based Controllers

In the following we discuss three algorithms that, broadly speaking, can be considered variations or extensions of the basic gradient flow (4). In particular, we discuss their suitability for autonomous optimization and the limits of stability when interconnected with a dynamical system. Note that in Section -D we also present a more specific result for LTI plants.

III-C1 Basic Gradient Flows

In general, the conservativeness of the bound (10) depends largely on the specific problem. Figs. 2 and 3 illustrate this fact based on two random problem instances. In both examples, we consider, for simplicity, the case where $Q\equiv\epsilon\mathbb{I}_{n}$ (i.e., as in Corollary III.3), the cost function $\Phi$ is convex quadratic, and the plant is LTI (and consequently $h$ is linear). In each case, we have $n=20$ (state dimension) and $p=5$ (input dimension).

In both cases, the interconnected gradient system (7) is stable for values of $\epsilon$ larger than $\epsilon^{\star}=\frac{\gamma}{\zeta L}$ . For $\epsilon=\epsilon^{\star}$ , the feedback interconnection illustrated in Fig. 2 exhibits a similar convergence rate as the reduced system. However, for $\epsilon$ larger than $10\epsilon^{\star}$ instability of the interconnected system occurs.

For the second example (Fig. 3) the stability bound on $\epsilon$ is more conservative. For $\epsilon=200\epsilon^{\star}$ the interconnected system is stable, however, the convergence rate compared to the reduced system is significantly deteriorated. For this problem instance, instability occurs for values of $\epsilon$ larger than $290\epsilon^{\star}$ .

These examples illustrate not only the variable degree of conservativeness of our stability bound, but also the gradual performance degradation as the stability limit of the interconnected system is reached.

III-C2 Newton Gradient Flows

The classical Newton method finds widespread application in numerical optimization as a second-order method (i.e., requiring information about second-order derivatives) with superlinear convergence [31, Chap 3.3]. The continuous-time limit of the Newton method is given by a simple gradient flow of the form (4), namely,

[TABLE]

where $\epsilon>0$ serves to adjust the convergence rate.

For (14) to be well-defined, we may assume that $\tilde{\Phi}$ is $\mu$ -strongly convex and twice continuously differentiable such that the metric $(\nabla^{2}\tilde{\Phi}(u))^{-1}$ is well-defined for all $u\in\mathbb{R}^{p}$ . Hence, convergence to the unique equilibrium is exponential and moreover isotropic, i.e., trajectories approach the equilibrium from all directions with the same speed. In other words, the linearization around the equilibrium point $u^{\star}$ is given by $\dot{u}=-\epsilon(u-u^{\star})$ .

In terms of stability, Newton flows are well-suited for the implementation as feedback controllers. Although the evaluation (or estimation) of the inverse Hessian of $\tilde{\Phi}$ can pose computational problems.

Theorem III.2 can be directly applied to give a condition for asymptotic stability in closed loop. Namely, since $\tilde{\Phi}$ is $\mu$ -strongly convex, we have that $\sup_{u\in\mathbb{R}^{p}}\|\epsilon(\nabla^{2}\tilde{\Phi}(u))^{-1}\|\leq\epsilon/\mu$ and therefore the following holds.

Corollary III.5.

Consider the same setup as in Theorem III.2 and assume that $\tilde{\Phi}$ is $\mu$ -strongly convex and twice continuously differentiable. With the metric $Q(u):=\epsilon(\nabla^{2}\tilde{\Phi}(u))^{-1}$ , the closed-loop system (8) is asymptotically stable and converges to the unique global minimizer of (5) whenever

[TABLE]

Compared to the previous results, the above bound on $\epsilon$ is invariant with respect to a uniform scaling of $\tilde{\Phi}$ by a constant $\alpha>0$ since this will scale both $L$ and $\mu$ by the same factor $\alpha$ . Furthermore, the requirement that $\tilde{\Phi}$ is strongly convex implies the uniqueness of the optimizer, but it does not necessarily require that the problem (5) is itself convex.

Fig. 4 illustrates, similarly to Figs. 2 and 3, the interconnection of an LTI plant with a Newton flow for a quadratic function. In this case $Q\equiv\epsilon(\nabla^{2}\Phi)^{-1}$ is constant. As before, the interconnected system is stable even for $\epsilon$ larger than the theoretical bound in Corollary III.5, however, the convergence rate gradually worsens compared to the reduced system.

III-C3 Subgradient Flow (Non-Example)

Subgradient flows are the continuous-time version of subgradient descent and generalize gradient flows to the case where $\tilde{\Phi}$ is not differentiable. Namely, assuming that $\tilde{\Phi}$ is convex, its subgradient at $u\in\mathbb{R}^{p}$ is defined as the set

[TABLE]

As a set-valued map, $\partial\tilde{\Phi}$ gives rise to a dynamical system in the form of a differential inclusion $\dot{u}\in-\partial\tilde{\Phi}$ .

Subgradient inclusions are well-defined (i.e., existence of generalized solutions is guaranteed under technical assumptions) and convergence to critical points is also assured. However, subgradient flows are in general not appropriate for feedback-based optimization.

Apart from issues relating to the physical implementability, Theorem III.2 is not applicable since Assumption III.1 is in general not satisfied. Namely, if $\tilde{\Phi}$ is not continuously differentiable, then its gradient cannot be Lipschitz continuous.

In fact, subgradient flows in closed loop with a dynamical system are in general not asymptotically stable. To see this, consider a one-dimensional physical system in the form

[TABLE]

with $a>0$ and steady-state map $x=h(u)=\frac{b}{a}u$ . Further, as an objective we consider the absolute value $\Phi(x):=|x|$ that gives rise to a subgradient control law

[TABLE]

It is easy to see that this control law exhibits a bang-bang behavior that will not allow the closed-loop system to converge to the optimizer $x^{\star}=0$ .

Fig. 5 illustrates this behavior for a higher dimensional setup where we minimize an objective function $\Phi(x,u):=\Phi(x,u)+\rho\|x\|_{1}$ with an $\ell_{1}$ -regularization term in an attempt to promote sparsity of the minimizing state variables.

III-C4 Projected Gradient Flows

In order to model the input saturation as part of the system (3) that enforces a constraint $u\in\mathcal{U}$ on the inputs that cannot be violated, we resort the mathematical formalism of projected dynamical systems. For convenience, we have summarized the relevant key definitions in the appendix. Hence, instead of (5), we consider

[TABLE]

where $\mathcal{U}\subset\mathbb{R}^{p}$ is a regular set expressing constraints on the control inputs, e.g., limited actuation capacity. Given the gradient vector field $\nabla\tilde{\Phi}(u)$ where $\tilde{\Phi}(u):=\Phi(h(u),u)$ as before, a projected gradient flow is defined as

[TABLE]

where the projected gradient is defined according to (-C.1). Existence of so-called Carathéodory solutions is guaranteed by Theorems C.1 and C.2 (which is also an invariance principle).

A feedback implementation of (16) takes the form

[TABLE]

where $\epsilon>0$ . For the sake of simplicity, we do not consider the use of a variable metric $Q(u)$ , since that would require a more general definition of the projection operator $[\cdot]_{\mathcal{U}}^{u}$ in order to take into account oblique projections.

In fact, by definition of projected dynamical systems, it must holds that $u(t)\in\mathcal{U}$ for all $t$ . Consequently, in the following, strictly speaking, Assumptions II.1 and III.1 have to hold only for $u\in\mathcal{U}$ instead of all $u\in\mathbb{R}^{n}$ .

Stability of (17) can be shown similarly to Theorem III.2:

Corollary III.6.

Consider the same setup as in Theorem III.2, but let the feedback control law be given by (17b). Then, the same conclusions as in Theorem III.2 hold. Namely, all trajectories of (17) are complete and converge to critical points of (15) whenever $\epsilon\leq\frac{\gamma}{\zeta L}$ .

Proof.

The proof is analogous to the proof of Theorem III.2 with $Q(u)=\epsilon\mathbb{I}_{p}$ , but slightly different, because non-differentiable Carathéodory solutions (and their possible non-uniqueness) have to be considered instead of standard (differentiable) solutions, and Theorem C.2 has to be applied instead of the standard invariance principle for continuous dynamics. Nevertheless, the final stability bound remains the same.

The difference lies in the proof of Lemma III.1. In particular, when deriving the bound for the term $\nabla\tilde{\Phi}(u)Q(u)g(x,u)$ with $Q(u)=\epsilon\mathbb{I}_{p}$ we make use of Lemma C.3 which states that

[TABLE]

where $v=-H(u)^{T}\boldsymbol{\nabla}\Phi(x,u)$ and $\eta\in N_{u}\mathcal{U}$ . Hence, instead of (13) we can establish the bound

[TABLE]

where we have used Lemma C.3 to establish the last inequality. Thus, the bound is the same bound as in (13). ∎

Hence, input saturation that can be modeled by a projected dynamical system does not pose an obstacle in our timescale separation analysis, other than the fact that a specialized notion of solution and existence results inherent to projected dynamical systems have to be used.

IV Momentum-Based Controllers

We now consider a class of optimization dynamics that arises as the so-called momentum methods [32] which have recently gained renewed interest in the context of machine learning but have not yet been extensively considered for feedback-based optimization. In the following, we primarily consider a continuous-time generalization of Polyak’s heavy-ball method [33] interconnected with a physical system and derive a stability requirement analogous to Theorem III.2. With a counter-example at the end of this section we show that time-varying optimization dynamics are in general not suited for feedback-based optimization, in particular, if they do not exhibit uniform asymptotic convergence. Namely, for a continuous-time version of Nesterov’s accelerated gradient method [34] which violates our analysis assumptions, we show that the interconnection with a exponentially decaying physical system is in general not asymptotically stable. This feature is not surprising since an online implementation of this algorithm is a time-varying controller with asymptotically infinite gain.

Given a continuous metric $Q(u)$ and a differentiable objective function $\tilde{\Phi}(u)$ , as before, we consider continuous-time heavy-ball dynamics of the form

[TABLE]

where $z\in\mathbb{R}^{n}$ denotes a momentum variable, and $D(u)\in\mathbb{S}^{n}_{+}$ is a positive definite damping matrix depending on $u$ .

Asymptotic convergence of the optimization dynamics (18) is guaranteed by the following result.

Theorem IV.1.

If $\Phi$ has compact level sets, then the dynamical system (18) is asymptotically stable, and all trajectories converge to the set of points $(u^{\star},z^{\star})$ such that $z^{\star}=0$ and $\nabla\tilde{\Phi}(u^{\star})=0$ . In particular, if $\tilde{\Phi}$ is convex, then convergence is to the set of global optimizers of the optimization problem

[TABLE]

Proof.

Consider the LaSalle function $V(u,z):=\tilde{\Phi}(u)+\frac{1}{2}z^{T}z$ . Its Lie derivative along the trajectories of (18) is

[TABLE]

Furthermore, note that the sublevel sets of $V$ are compact. This leads us to conclude that all trajectories of (18) converge to the largest invariant subset $\Omega$ for which $z=0$ . This, in turn, implies $\dot{u}=0$ and $u$ is constant on $\Omega$ . Furthermore, since $z$ is constant on $\Omega$ , we need $\dot{z}=0$ and consequently $\nabla\tilde{\Phi}(u)=0$ which corresponds to being a critical point of (19). ∎

IV-A Control Design & Stability Analysis

As before, we are primarily interested in the stability of the interconnection between (18) with a physical system (3), that is, we consider systems of the form

[TABLE]

where $\Phi(x,u)$ and $H(u)$ are defined as before.

Similarly to Theorem III.2 we derive a requirement on $Q(u)$ and $D(u)$ that guarantees asymptotic stability of (20).

Theorem IV.2.

Consider (20) and Assumptions II.1 and III.1 hold. If $\Phi(h(u),u)$ has compact sublevel sets, then all trajectories of (20) converge asymptotically to the set of points $(x^{\star},u^{\star},z^{\star})$ for which $z^{\star}=0$ and $(x^{\star},u^{\star})$ is a critical point of (5) whenever it holds that

[TABLE]

where $L>0$ is a constant satisfying (9). Further, $\gamma$ and $\zeta$ are constants associated with a Lyapunov function $W(x,u)$ for (3) according to Proposition II.1.

Theorem IV.2 gives a design condition on $Q$ and $D$ expresses a trade-off between the two. Namely, $Q$ acts as a generalized gain in the same way as in Theorem III.2, whereas a large damping has a stabilizing effect.

Analoguous corollaries and facts as for Theorem III.2 can be developed for Theorem IV.2. For example, one can show that asymptotically stable equilibria of (20) are optimizers of $\tilde{\Phi}$ .

Proof.

As in the proof of Theorem III.2, we consider a LaSalle function of the form

[TABLE]

where $\delta\in[0,1]$ is a convex combination parameter and $V(u,z):=\tilde{\Phi}(u)+\frac{1}{2}z^{T}z$ . The Lie derivative of $\Psi$ is

[TABLE]

where $g(u,z):=Q(u)z$ and $k(x,u,z):=-D(u)z-Q(u)H(u)^{T}\boldsymbol{\nabla}\Phi(x,u)$ .

The four terms in the expression of $\dot{\Psi}$ can be bounded as follows. For the first two terms we have

[TABLE]

where we have used $\kappa:=\sup_{u\in\mathbb{R}^{p}}\lambda^{\max}_{{Q(u)}}$ and $\lambda=\inf_{u\in\mathbb{R}^{p}}\lambda^{\min}_{{D(u)}}$ . Note that in the fourth equation, the first and the last term cancel out.

For the third and fourth term we get as before

[TABLE]

With these bounds, we can upper-bound $\dot{\Psi}$ with a quadratic function $\begin{bmatrix}\|z\|&\|x-h(u)\|\end{bmatrix}\Lambda\begin{bmatrix}\|z\|&\|x-h(u)\|\end{bmatrix}^{T}$ where

[TABLE]

Thus, we can apply Lemma A.1 with $\alpha_{1}=\lambda$ , $\alpha_{2}=\gamma$ , $\xi=0$ , $\beta_{1}=\kappa L$ and $\beta_{2}=\kappa\zeta$ which yields that $\Lambda\prec 0$ whenever

[TABLE]

The remainder of the proof analogous to the proof of Theorem III.2. Namely, Lemma A.2 serves to certify that $\Psi$ has compact (and hence invariant) sublevel sets for an appropriate choice $\delta$ . Thus, solutions converge to the largest invariant subset for which $\dot{\Psi}=0$ , which, in turn, is equivalent to the points for which $z=0$ , $x=h(u)$ and $\nabla\tilde{\Phi}(u^{\star})=0$ . ∎

IV-B Non-Example: Accelerated Gradient Flows

A special and widely popular variation of (18) consists in making the damping decay over time. Namely, in [35, 34] the authors show that the ODEs of the form

[TABLE]

can be interpreted as continuous-time limit of Nesterov’s accelerated gradient descent.

As before, we can derive a feedback controller from (22). Strictly speaking, Theorem IV.2 does not apply to this type of time-varying control, but an extension is possible.

Nevertheless, as one can easily see, with a damping term that decays monotonically over time, the bound (21) eventually (i.e., for $t$ large enough) fails to hold and the feedback interconnection between a physical system and the accelerated gradient dynamics will become unstable. In other words, the feedback controller is time-varying with asymptotically infinite gain. This behavior is illustrated in Fig. 6 for a one-dimensional plant $\dot{x}=-ax+bu$ with $a>0$ , steady-state map $x=h(u)=\frac{b}{a}u$ , and $\tilde{\Phi}=\Phi(h(u))$ where $\Phi(x):=\|x\|^{2}$ .

This example violates our assumptions (thus indicating the sharpness of our analysis) and fails in practice (showing a general limitation of autonomous optimization).

V General Optimization Dynamics

Next, we turn to more general optimization algorithms. A particular class that we cover with the subsequent analysis are saddle-point flows (see [36, 37] and references therein), that can be interpreted controllers with memory.

Hence, in this section, we consider the general dynamics

[TABLE]

where $z\in\mathbb{R}^{r}$ is an internal variable of the controller $g:\mathbb{R}^{n+p+r}\rightarrow\mathbb{R}^{p}$ and $k:\mathbb{R}^{n+p+r}\rightarrow\mathbb{R}^{r}$ are define the controller behavior, and $h(u)$ is the steady-state map of the plant.

For autonomous optimization, the dynamics (23) are chosen such that their equilibria correspond to criticial points of a predefined optimization problem. As an example of (23), we later consider the case of primal-dual saddle-point flows that have been successfully applied to enforce constraints on the output variables of a physical system.

The only assumption that we require are Lipschitz continuity and a Lyapunov function for (23).

Assumption V.1.

The vector field $(g,k)$ is $L$ -Lipschitz in $x$ , i.e., for all $x,x^{\prime}\in\mathbb{R}^{n}$ , and all $u\in\mathbb{R}^{p}$ and $z\in\mathbb{R}^{p}$ one has

[TABLE]

Assumption V.2.

The reduced vector field $(\tilde{g},\tilde{h})$ , where $\tilde{g}(u,z):=g(h(u),u,z)$ and $\tilde{k}(u,z):=k(h(u),u,z)$ , is $\ell$ -Lipschitz, i.e., for all $u^{\prime},u\in\mathbb{R}^{p}$ and $z^{\prime},z\in\mathbb{R}^{p}$ it holds that

[TABLE]

Assumptions V.1 and V.2 can be relaxed or combined in several ways. For instance, if the norm of the map $H(u):=\begin{bmatrix}\nabla h(u)^{T}&\mathbb{I}_{p}\end{bmatrix}$ is bounded by $\eta$ , then choosing $\ell$ such that $\ell=\eta L$ will satisfy Assumption V.2.

Further, Assumptions V.1 and V.2 guarantee the existence of complete solutions to both the reduced system (23) as well as the dynamic interconnection which takes the form

[TABLE]

where $\epsilon>0$ is a control gain and tuning parameter.

Assumption V.3.

The system (23) has a unique equilibrium point $(u^{\star},z^{\star})$ , and there exists a positive definite Lyapunov function $V(u,z)$ according to Proposition II.1. Namely, there exist $\kappa,\mu>0$ such that

[TABLE]

where $e(u,z):=\left[\begin{smallmatrix}u-u^{\star}\\ z-z^{\star}\end{smallmatrix}\right]$ .

*Remark 5**.*

Assumption V.3 is in particular satisfied if the vector field $(\tilde{g},\tilde{k})$ is $\mu$ -strongly monotone, i.e.,

[TABLE]

holds for all $u^{\prime},u\in\mathbb{R}^{p}$ and $z,z^{\prime}\in\mathbb{R}^{r}$ , and it has a unique equilibrium point $(u^{\star},z^{\star})$ . In this case, we have $V(u,z)=\|e(u,z)\|^{2}$ and $\kappa=1$ . $\blacksquare$

In the same spirit as Theorems III.2 and IV.2 we can derive a requirement on $\epsilon$ that guarantees asymptotic stability of (24).

Theorem V.1.

Under Assumptions II.1, V.1, V.2, and V.3 all trajectories of (24) converge asymptotically to $(x^{\star},u^{\star},z^{\star})$ whenever $\epsilon>0$ is chosen such that

[TABLE]

Similarly to the bounds in Theorems III.2 and IV.2 the bound (25) contains the term $\frac{\gamma}{\zeta L}$ . However, the generality of the bound (25) comes at the expense of another factor $1/(1+\frac{\kappa\ell}{\mu})$ that deteriorates the stability bound depending on the conditioning of the reduced vector field.

Proof.

Analogously to the proofs of Theorems III.2 and IV.2, we consider a LaSalle function of the form

[TABLE]

The Lie derivative of $\Psi$ is given by

[TABLE]

For the first two terms of $\dot{\Psi}$ can be bounded as

[TABLE]

For the third and fourth term we have

[TABLE]

Hence, we can establish a quadratic bound on $\dot{\Psi}$ as a function $\begin{bmatrix}\|e(u,z)\|\,\|x-h(u)\|\end{bmatrix}\Lambda\begin{bmatrix}\|e(u,z)\|\,\|x-h(u)\|\end{bmatrix}^{T}$ where

[TABLE]

Lemma A.1 with $\alpha_{1}=\epsilon\mu$ , $\alpha_{2}=\gamma$ , $\xi=\epsilon\zeta L$ , $\beta_{1}=\epsilon\kappa L$ and $\beta_{2}=\epsilon\zeta\ell$ guarantees negative definiteness of $\Lambda$ for

[TABLE]

which simplifies (25). The remainder of the proof is the same as before in Theorems III.2 and IV.2. ∎

V-A Example: A weak bound for Convex Gradient Flows

Theorem V.1 can also be applied to the algorithms in the previous sections, but in this case the stability bound (25) is weaker than previous tailor-made conditions. To compare Theorem V.1 and Theorem III.2 we reconsider the case of a gradient-based feedback controller as given by the system (8) with the metric $Q(u)=\epsilon\mathbb{I}$ . Further, assume that $\tilde{\Phi}(u)$ has a $\ell$ -Lipschitz gradient and is $\mu$ -strongly convex. Thus, Assumptions V.2 and V.3 are satisfied with $\ell$ and $\mu$ , respectively.

Then, one can choose $V(u)=\frac{1}{2}\|u-u^{\star}\|^{2}$ as the Lyapunov function according to Assumption V.3 with $\kappa=1$ . The parameter $L$ of Assumption III.1 and Assumption V.1 coincide. It follows from Theorem V.1 that the feedback gradient system is asymptotically stable for $\epsilon<\frac{\gamma}{\zeta L(1+\frac{\ell}{\mu})}$ which is weaker than the bound in Theorem III.2 by at least a factor 2 because $\ell$ is the Lipschitz constant $\nabla\tilde{\Phi}(u)$ and $\mu$ the modulus of strong convexity of $\tilde{\Phi}$ and therefore $\ell\geq\mu$ .

V-B Example: Primal-Dual Saddle-Point Flow

A key requirement of many autonomous optimization scenarios is the satisfaction of constraints. As seen previously, constraints on the input variable $u$ can be (strictly) enforced, e.g., by projection. Incorporating constraints on the state (or output) variables is trickier and they need to be treated as constraints that can be violated during the transients. For this purpose, saddle-point flows have proven to be an adequate tool. As an illustrative example, instead of (5), we consider

[TABLE]

where $A\in\mathbb{R}^{r\times n}$ and $b\in\mathbb{R}^{r}$ . Namely, $Ax-b$ defines a constraint on the state variables that has to be satisfied asymptotically at steady state. After eliminating $x$ from (26), the augmented Lagrangian is given by

[TABLE]

where $\sigma$ is an augmentation parameter.

The corresponding augmented saddle point flow is given by

[TABLE]

Note that equilibria of (27) and critical points of (26) coincide.

In a feedback interconnection with a physical system we instead replace $h(u)$ with the measured value of $x$ to arrive at

[TABLE]

Intuitively, augmented saddle-point flows, implemented as feedback controllers, provide a proportional and integral feedback of the measured constraint violation $Ax-b$ , hence acting as PI-control (on top of the integral controller that defines the optimization dynamics). Namely, the augmentation term results in the proportional component, whereas the dual variable $\lambda$ yields the integral term.

Clearly, (28) falls into the class of systems of the form (24). Furthermore, Assumptions V.1 and V.2 are in general satisfied and $\ell$ and $L$ depend on the optimization problem only.

The application of Theorem V.1 hinges on Assumption V.3 and therefore on the existence of an explicit Lyapunov function for the dynamics (23). For the special case (26), this assumption is satisfied [38]. Whether this setup can be generalized, remains open and a topic of active study [39, 40].

Nevertheless, the numerical simulations of (28) of randomized problem instances suggest that the interconnection of a saddle-point flow and a dynamical plant has benign stability properties with a stability threshold on $\epsilon$ .

Fig. 7 illustrates, as before, the (stable) interconnection of an LTI plant (with $n=20$ and $u=10$ ) and saddle-point flow (27) where $\Phi$ is a quadratic function and $r=5$ (# of output constraints). Namely, after an initial transient, the physical plant remains almost at steady state and the interconnected system closely tracks the trajectory of the reduced system and converges to the optimizer of (26).

VI Conclusion

We have studied the implementation of different types of optimization algorithms as feedback controllers with the goal of steering a physical system to a steady state that (locally) solves a predefined optimization problem. In particular, we have derived stability bounds inspired by singular perturbation analysis that guarantee closed-loop stability. We have illustrated the generality of our approach by treating three general classes of algorithms and several specific instances. In general, our approach only requires limited information about a Lyapunov function for plant dynamics and Lipschitz constants for the optimization problem.

Our results give immediate prescriptions for the design of feedback controllers that are easy to evaluate. The conservativeness of our bounds is domain-specific, but they are sometimes of practical relevance, for instance, in power system [25].

While our work establishes stability conditions, it does not give quantitative results on the rate of convergence, the robustness against noise, or the tracking performance for time-varying problem setups. All of these question remain open problems and are the subject of ongoing research. Further, it is unclear whether for the case of discrete-time systems corrupted by noise analogous stability results can be derived by using so-called stochastic approximations. Finally, from a practical perspective, it is highly desirable to solve the corresponding design problem, i.e., to optimize the metric $Q$ with respect to a given robustness objective. Although it is relatively simple formulating such a problem, its solvability is unclear.

-A Technical Results

Lemma A.1.

Consider a $2\times 2$ -matrix defined as

[TABLE]

where $\beta_{1},\beta_{2},\xi\in\mathbb{R}$ , $\delta\in(0,1)$ , and $\alpha_{1},\alpha_{2}>0$ . If $\frac{\alpha_{1}\alpha_{2}}{\alpha_{1}\xi+\beta_{1}\beta_{2}}>1$ and $\delta=\frac{\beta_{1}}{\beta_{1}+\beta_{2}}$ , then $\Lambda$ is negative definite.

The proof of Lemma A.1 is standard [30, pp.296].

Lemma A.2.

Consider a system satisfying Assumption II.1 with a Lyapunov function $W(x,u)$ and a steady-state map $h:\mathbb{R}^{p}\rightarrow\mathbb{R}^{n}$ . Further, let $Z(x,u)=V(u)+W(x,u)$ where $V:\mathbb{R}^{p}\rightarrow\mathbb{R}$ is continuous and has compact level sets. Then, $Z$ has compact sublevel sets.

Lemma A.2 is a straightforward extension of [25, Lem. 4]. We provide a proof for completeness.

Proof.

Consider a sublevel set $\Omega_{c}:=\{(x,u)\,|\,Z(x,u)\leq c\}$ , for some $c\in\mathbb{R}$ . Since we have $W(x,u)\geq 0$ , $(x,u)\in\Omega_{c}$ implies that $V(u)\leq c$ . But since $V(u)$ has compact sublevel sets, there exist $U$ such that $\|u\|\leq U$ for all $(x,u)\in\Omega_{c}$ .

On the compact set $\{u\,|\,\|u\|\leq U\}$ the continuous function $V(u)$ is also lower bounded by some value $L$ . We therefore have that $W(x,u)\leq c-L$ in $\Omega_{c}$ . As $W(x,u)$ is positive definite, we must have that $\|x-h(u)\|^{2}\leq(c-L)/\lambda_{\min}(P)$ . We then have

[TABLE]

where $\ell$ is the Lipschitz constant of $h$ , and therefore $\|x\|$ is also bounded for all $(x,u)\in\Omega_{c}$ . ∎

-B Proof of Proposition II.1

We use the change of coordinates $z:=x-h(u)$ such that (3) can be written as

[TABLE]

By Lipschitz continuity of $f$ (in $z$ and $u$ ) and $h$ , we have

[TABLE]

where the last inequality follows from the triangle inequality and Cauchy-Schwarz.

Let $\varphi(t,z,u)$ denote the solution of (-B.1) at time $t$ that starts in $z$ for fixed $u$ . Define

[TABLE]

with $T=\tfrac{1}{2\tau}\ln(2K^{2})$ . Analogously to the proof of [27, Th. 4.14], it can be shown that $V(z,u)$ satisfies

[TABLE]

with $\alpha=\tfrac{1}{2\ell_{x}}(1-e^{-2\ell_{x}T}),\beta=\tfrac{K^{2}}{2\tau}(1-e^{-2\tau T}),\gamma=1/2$ , and $\delta=\tfrac{2K}{\tau-\ell_{x}}(1-e^{(\ell_{x}-\tau)T})$ .

Next, we proceed similarly as in the proof of [27, Lem. 9.8]. The sensitivity function $\varphi_{u}:=\nabla_{u}\varphi(t,z,u)$ of the solution $\varphi$ with respect to changes in $u$ [27, Ch. 3.3], satisfies the ODE

[TABLE]

with $\varphi_{u}(0,z,u)=0$ . Using Lipschitz continuity of $g$ we get

[TABLE]

Applying a special case of the Gronwall inequality [41, Cor. 6.2] (because $\ell^{\prime}t$ is monotone increasing) yields the bound $\|\varphi_{u}(t,z,u)\|\leq\ell^{\prime}te^{\ell_{x}t}$ , and we have

[TABLE]

where we have used $\|\varphi(t,z,u)\|\leq K\|z\|e^{-\tau t}$ by exponential stability and $\zeta^{\prime}:=\tfrac{2K\ell^{\prime}}{(\ell_{x}-\tau)^{2}}\left((\ell_{x}T-\tau T-1)e^{(\ell_{x}-\tau)T}+1\right)$ .

Finally, we can reverse the change of coordinates by defining $W(x,u):=V(x-h(u),u)$ . We immediately have the desired bounds

[TABLE]

and the time derivative of $W$ with respect to (3) as

[TABLE]

For the final bound, note that we have

[TABLE]

where $\zeta:=\delta\ell+\zeta^{\prime}$ . This completes the proof.

-C Projected Dynamical Systems

For convenience, we restrict ourselves to a simplified definition of projected dynamical systems that is centered around regular sets as defined in Section II. For a more comprehensive treatment the reader is referred to [42]. We define a projection operator for a regular set $\mathcal{U}\subset\mathbb{R}^{p}$ , $u\in\mathcal{X}$ , and $v\in\mathbb{R}^{p}$ as

[TABLE]

that is, $[v]_{\mathcal{U}}^{u}$ projects a vector $v$ onto the tangent cone of $\mathcal{U}$ at the point $u$ . Since $T_{u}\mathcal{U}$ is a closed convex set for any $u\in\mathcal{U}$ , the minimum norm projection of $v$ on $T_{u}\mathcal{U}$ exists and is unique, and $[v]_{\mathcal{U}}^{u}$ is well-defined. Furthermore, it holds that $\epsilon[v]_{\mathcal{U}}^{u}=[\epsilon v]_{\mathcal{U}}^{u}$ for all $\epsilon>0$ since $T_{u}\mathcal{U}$ is a cone. Further, we have the following crucial property [42, Lem. 4.5]:

Lemma C.3.

For a regular set $\mathcal{U}\subset\mathbb{R}^{p}$ , $u\in\mathcal{U}$ , and $v\in\mathbb{R}^{n}$ , there exists a unique $\eta\in N_{u}\mathcal{U}$ such that $[v]_{\mathcal{U}}^{u}=v-\eta$ . Further, it holds that $\eta^{T}(v-\eta)=0$ and $v^{T}(v-\eta)=\|v-\eta\|^{2}$ .

A projected dynamical system is thus defined by applying the projection operator to a standard vector field $F:\mathcal{U}\rightarrow\mathbb{R}^{p}$ at every point. This leads to the initial value problem

[TABLE]

where $u_{0}\in\mathcal{U}$ denotes an initial condition.

In general, $[F(u)]_{\mathcal{U}}^{u}$ is not continuous and standard existence results for ODEs do not apply. Instead, a (Carathéodory) solution to (-C.2) is defined as an absolutely continuous function $u:[0,T)\rightarrow\mathcal{U}$ for some $T>0$ and $u(0)=u_{0}$ , and for which $\dot{u}(t)=[F(u(t))]_{\mathcal{U}}^{u}$ holds almost everywhere, i.e., for almost all $t\in[0,T)$ . In particular, a solution has to remain in $\mathcal{U}$ .

The following two theorems establish the existence of solutions to (-C.2). First, according to Corollary 5.2 in [42], we have local existence of Carathéodory solutions:

Theorem C.1 (Local Existence).

Let $\mathcal{U}\subset\mathbb{R}^{n}$ be a regular set and let $F:\mathcal{U}\rightarrow\mathbb{R}^{p}$ be a locally Lipschitz vector field. Then, for every $u_{0}\in\mathcal{U}$ there exists a local solution $u:[0,T)\rightarrow\mathcal{U}$ of (-C.2) for some $T>0$ .

Second, Proposition 8.6 in [42] provides an invariance principle for projected dynamical systems that can also serve to certify the existence of complete solutions.

Theorem C.2 (Invariance Principle).

Consider (-C.2) with $\mathcal{U}$ regular and $F(u)$ locally Lipschitz. Furthermore, let $\Psi:\mathbb{R}^{p}\rightarrow\mathbb{R}$ be continuously differentiable with compact sublevel sets $\mathcal{S}_{\ell}:=\{u\in\mathcal{U}\,|\,\Psi(u)\leq\ell\}$ . If it holds that $\dot{\Psi}(u)\leq 0$ for all $u\in\mathcal{U}$ , then every solution to (-C.2) starting at $u_{0}\in\mathcal{S}_{\ell}$ is complete and will converge to the largest weakly invariant subset of $\operatorname{cl}\{u\in\mathcal{S}_{\ell}\,|\,\dot{\Psi}(u)=0\}$ .

-D LTI systems

In the following, we show how for LTI plant dynamics, the previously developed stability bounds take a particularly easy form and can, in fact, be made independent of the internal state representation. This also allows us to formulate a simple example in which our stability bound is tight.

For simplicity, we limit ourselves to the case of gradient-based controllers, although the same ideas can be extended to other classes of optimizing feedback controllers.

Hence, instead of (3) we consider the special case

[TABLE]

where $A\in\mathbb{R}^{n\times n}$ , $B\in\mathbb{R}^{n\times p}$ , and $w$ is a fixed, but unknown, disturbance.

For a fixed $u$ , exponential stability of (-D.1) is equivalent to $A$ being Hurwitz (i.e., only having eigenvalues with negative real part) and consequently $A$ being invertible. Hence, the steady-state map takes the explicit form

[TABLE]

where $H:=-A^{-1}B$ and $R:=-A^{-1}$ .

Furthermore, let $P\succ 0$ be such that

[TABLE]

and hence $W(x,u)=\frac{1}{2}\|x-h(u)\|_{P}^{2}$ is a Lyapunov function satisfying the conditions of Proposition II.1. In particular, we have $\gamma=2\tau\lambda^{\min}_{{P}}$ since

[TABLE]

and

[TABLE]

This allows us to directly state the following corollary to Theorem III.2:

Corollary D.3.

Consider the a plant of the form (-D.1) with $P\succ 0$ and $\tau>0$ satyisfying (-D.2). Further let Assumption III.1 hold and $\Phi(h(u),u)$ have compact level sets. Then, the same conclusions as in Theorem III.2 hold whenever

[TABLE]

where $L$ satisfies (9).

In particular, (-D.3) is satisfied if it holds that

[TABLE]

where $\kappa_{{P}}:=\lambda^{\max}_{{P}}/\lambda^{\min}_{{P}}$ is the condition number of $P$ .

*Remark 6**.*

In [25], instead of (-D.2), the dynamical system is only required to satisfy $A^{T}P+PA\preceq-\mathbb{I}_{n}$ . This is more easily solvable, but yields a suboptimal estimate of the decay rate and therefore a more conservative stability bound. $\blacksquare$

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Z. E. Nelson and E. Mallada, “An integral quadratic constraint framework for real-time steady-state optimization of linear time-invariant systems,” in 2018 Annual American Control Conference (ACC) , Milwaukee, WI, Jun. 2018, pp. 597–603.
2[2] M. Colombino, E. Dall’Anese, and A. Bernstein, “Online Optimization as a Feedback Controller: Stability and Tracking,” IEEE Trans Control Netw Syst. , 2019.
3[3] F. D. Brunner, H. Dürr, and C. Ebenbauer, “Feedback design for multi-agent systems: A saddle point approach,” in 2012 IEEE 51st IEEE Conference on Decision and Control (CDC) , Dec. 2012, pp. 3783–3789.
4[4] L. S. P. Lawrence, Z. E. Nelson, E. Mallada, and J. W. Simpson-Porco, “Optimal Steady-State Control for Linear Time-Invariant Systems,” in 2018 IEEE Conference on Decision and Control (CDC) , Miami Beach, FL, Dec. 2018, pp. 3251–3257.
5[5] C. E. Garcia and M. Morari, “Optimal operation of integrated processing systems: Part II: Closed-loop on-line optimizing control,” AI Ch E J. , vol. 30, no. 2, pp. 226–234, Mar. 1984.
6[6] S. Skogestad, “Self-optimizing control: The missing link between steady-state optimization and control,” Comput Chem Eng. , vol. 24, no. 2-7, pp. 569–575, Jul. 2000.
7[7] A. Marchetti, B. Chachuat, and D. Bonvin, “Modifier-Adaptation Methodology for Real-Time Optimization,” Ind Eng Chem Res. , vol. 48, no. 13, pp. 6022–6033, Jul. 2009.
8[8] G. Francois and D. Bonvin, “Measurement-Based Real-Time Optimization of Chemical Processes,” Adv Chem Eng. , vol. 43, pp. 1–508, 2013.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Timescale Separation in Autonomous Optimization

Abstract

Index Terms:

I Introduction

I-A Related Work

I-B Contributions

I-C Organization

II Preliminaries

Dynamical Systems

Nonlinear Optimization

II-A Nonlinear Plant Dynamics

Assumption II.1**.**

Remark 1*.*

Proposition II.1**.**

II-B Variable-Metric Gradient Flows

Theorem II.2**.**

Theorem II.3**.**

III Gradient-Based Feedback Controllers

III-A Gradient-Based Feedback Control

Proposition III.1**.**

Proof.

Remark 2*.*

III-B Stability Analysis

Assumption III.1**.**

Remark 3*.*

Theorem III.2**.**

Corollary III.3**.**

Remark 4*.*

Corollary III.4**.**

Proof of Theorem III.2

Asymptotic Convergence

Lemma III.1**.**

Proof.

Relation between Stability and Optimality

III-C Examples of Gradient-Based Controllers

III-C1 Basic Gradient Flows

III-C2 Newton Gradient Flows

Corollary III.5**.**

III-C3 Subgradient Flow (Non-Example)

III-C4 Projected Gradient Flows

Corollary III.6**.**

Proof.

IV Momentum-Based Controllers

Theorem IV.1**.**

Proof.

IV-A Control Design & Stability Analysis

Theorem IV.2**.**

Proof.

IV-B Non-Example: Accelerated Gradient Flows

V General Optimization Dynamics

Assumption V.1**.**

Assumption V.2**.**

Assumption V.3**.**

Remark 5*.*

Theorem V.1**.**

Proof.

V-A Example: A weak bound for Convex Gradient Flows

V-B Example: Primal-Dual Saddle-Point Flow

VI Conclusion

-A Technical Results

Lemma A.1**.**

Lemma A.2**.**

Proof.

-B Proof of Proposition II.1

-C Projected Dynamical Systems

Lemma C.3**.**

Theorem C.1** (Local Existence).**

Theorem C.2** (Invariance Principle).**

-D LTI systems

Corollary D.3**.**

Remark 6*.*

Assumption II.1.

*Remark 1**.*

Proposition II.1.

Theorem II.2.

Theorem II.3.

Proposition III.1.

*Remark 2**.*

Assumption III.1.

*Remark 3**.*

Theorem III.2.

Corollary III.3.

*Remark 4**.*

Corollary III.4.

Lemma III.1.

Corollary III.5.

Corollary III.6.

Theorem IV.1.

Theorem IV.2.

Assumption V.1.

Assumption V.2.

Assumption V.3.

*Remark 5**.*

Theorem V.1.

Lemma A.1.

Lemma A.2.

Lemma C.3.

Theorem C.1 (Local Existence).

Theorem C.2 (Invariance Principle).

Corollary D.3.

*Remark 6**.*