A Fundamental Convergence Rate Bound for Gradient Based Online Optimization Algorithms with Exact Tracking

Alex Xinting Wu; Ian R. Petersen; Iman Shames

arXiv:2508.21335·math.OC·September 12, 2025

A Fundamental Convergence Rate Bound for Gradient Based Online Optimization Algorithms with Exact Tracking

Alex Xinting Wu, Ian R. Petersen, Iman Shames

PDF

TL;DR

This paper establishes a fundamental convergence rate bound for gradient-based online optimization algorithms with integral action, demonstrating how to achieve optimal tracking of time-varying quadratic cost function optima.

Contribution

It introduces a convergence rate bound for linear gradient algorithms with integral action, based on the internal model principle, for tracking polynomially varying optima.

Findings

01

Derived a convergence rate bound depending on the condition number and polynomial order.

02

Constructed algorithms that attain the optimal convergence rate.

03

Achieved zero steady-state error in tracking the optimal point.

Abstract

In this paper, we consider algorithms with integral action for solving online optimization problems characterized by quadratic cost functions with a time-varying optimal point described by an $(n - 1)$ th order polynomial. Using a version of the internal model principle, the optimization algorithms under consideration are required to incorporate a discrete time $n$ -th order integrator in order to achieve exact tracking. By using results on an optimal gain margin problem, we obtain a fundamental convergence rate bound for the class of linear gradient based algorithms exactly tracking a time-varying optimal point. This convergence rate bound is given by $(\frac{κ - 1}{κ + 1})^{\frac{1}{n}}$ , where $κ$ is the condition number for the set of cost functions under consideration. Using our approach, we also construct algorithms which achieve the optimal…

Equations394

x^{*} (t) = ar g min f (x, t),

x^{*} (t) = ar g min f (x, t),

x (t + 1)

x (t + 1)

- j = 0 \sum k α_{j} \nabla_{x} f (x (t - j), t - j),

j = 0 \sum k α_{j} \neq = 0.

j = 0 \sum k α_{j} \neq = 0.

f (x, t)

f (x, t)

+ c (x^{*} (t)),

x^{*} (t) = a_{0} + a_{1} t + .. + a_{n - 1} t^{n - 1},

x^{*} (t) = a_{0} + a_{1} t + .. + a_{n - 1} t^{n - 1},

m I \leq Δ \leq L I, Δ = Δ^{T} \in R^{p \times p}, c (x^{*} (t)) \in R .

m I \leq Δ \leq L I, Δ = Δ^{T} \in R^{p \times p}, c (x^{*} (t)) \in R .

j = 0 \sum k - 1 β_{j} (\hat{k} - j - 1)_{r} = (\hat{k})_{r},

j = 0 \sum k - 1 β_{j} (\hat{k} - j - 1)_{r} = (\hat{k})_{r},

x (t + 1) = x (t) - α \nabla f (x (t)) + β (x (t) - x (t - 1)) .

x (t + 1) = x (t) - α \nabla f (x (t)) + β (x (t) - x (t - 1)) .

α = α_{HB} =

α = α_{HB} =

β = β_{HB} =

x (t + 1)

x (t + 1)

- j = 0 \sum l α_{j} \nabla_{x} f (y (t - j)),

y (t)

j = 0 \sum l α_{j} \neq = 0, ν = 0 \sum k - l γ_{ν} = 1.

j = 0 \sum l α_{j} \neq = 0, ν = 0 \sum k - l γ_{ν} = 1.

ρ_{HB} = \frac{κ - 1}{κ + 1} = \frac{L - m}{L + m} .

ρ_{HB} = \frac{κ - 1}{κ + 1} = \frac{L - m}{L + m} .

\tilde{x} (t + 1)

\tilde{x} (t + 1)

- j = 0 \sum k α_{j} \nabla_{x} \tilde{f} (\tilde{x} (t - j)),

\tilde{x} (t)

\tilde{x} (t)

= x (t) - a_{0} - a_{1} t - a_{2} t^{2} - \dots - a_{n - 1} t^{n - 1} .

f (x, t)

f (x, t)

= \tilde{f} (\tilde{x} (t))

\nabla_{x} f (x, t)

\nabla_{x} f (x, t)

= \nabla_{x} \tilde{f} (x (t) - x^{*} (t))

= \nabla_{\tilde{x}} \tilde{f} (\tilde{x} (t)) .

j = 0 \sum k α_{j} \nabla_{x} f (x (t - j), t - j) = j = 0 \sum k α_{j} \nabla_{x} \tilde{f} (\tilde{x} (t - j)) .

j = 0 \sum k α_{j} \nabla_{x} f (x (t - j), t - j) = j = 0 \sum k α_{j} \nabla_{x} \tilde{f} (\tilde{x} (t - j)) .

j = 0 \sum k - 1 β_{j} (\tilde{x} (t - j) - \tilde{x} (t - j - 1))

j = 0 \sum k - 1 β_{j} (\tilde{x} (t - j) - \tilde{x} (t - j - 1))

= j = 0 \sum k - 1 β_{j} (x (t - j) - x^{*} (t - j))

- (x (t - j - 1) - x^{*} (t - j - 1))

= j = 0 \sum k - 1 β_{j} (x (t - j) - x (t - j - 1))

- j = 0 \sum k - 1 β_{j} (x^{*} (t - j) - x^{*} (t - j - 1)) .

\tilde{x} (t + 1) = x (t + 1) - x^{*} (t + 1) .

\tilde{x} (t + 1) = x (t + 1) - x^{*} (t + 1) .

\tilde{x} (t + 1)

\tilde{x} (t + 1)

+ j = 0 \sum k - 1 β_{j} (x (t - j) - x (t - j - 1))

- j = 0 \sum k α_{j} \nabla_{x} f (x (t - j), t - j) .

\tilde{x} (t + 1)

\tilde{x} (t + 1)

- j = 0 \sum k α_{j} \nabla_{x} \tilde{f} (\tilde{x} (t - j)) + f_{0} (t),

f_{0} (t) = j = 0 \sum k - 1 β_{j} (x^{*} (t - j) - x^{*} (t - j - 1)) - (x^{*} (t + 1) - x^{*} (t)) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Fundamental Convergence Rate Bound for Gradient Based Online Optimization Algorithms with Exact Tracking

Alex (Xinting) Wu, Ian R. Petersen, Iman Shames *This work was supported by the Australian Research Council under grants DP230102443 and DP210102454. A. Wu, I. R. Petersen and I. Shames are with the CIICADA Lab, School of Engineering, The Australian National University, Canberra, ACT 2601, Australia (e-mail: [email protected]; [email protected]; [email protected]).

Abstract

In this paper, we consider algorithms with integral action for solving online optimization problems characterized by quadratic cost functions with a time-varying optimal point described by an $(n-1)$ th order polynomial. Using a version of the internal model principle, the optimization algorithms under consideration are required to incorporate a discrete time $n$ -th order integrator in order to achieve exact tracking. By using results on an optimal gain margin problem, we obtain a fundamental convergence rate bound for the class of linear gradient based algorithms exactly tracking a time-varying optimal point. This convergence rate bound is given by $\left(\frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}\right)^{\frac{1}{n}}$ , where $\kappa$ is the condition number for the set of cost functions under consideration. Using our approach, we also construct algorithms which achieve the optimal convergence rate as well as zero steady-state error when tracking a time-varying optimal point.

I INTRODUCTION

Online optimization [1, 2, 3] has emerged as an important area of research with implications across various fields, including real-time control systems, signal processing, and machine learning. An online optimization problem involves making a sequence of decisions in real time, where the objective function changes continually. The primary challenge lies in efficiently solving these problems while ensuring that the time-varying solutions converge asymptotically. This motivates the development of algorithms capable of fast convergence and zero steady-state error. However, many existing methods for online optimization such as those of [1, 3], do not achieve zero steady-state error.

In recent years, various methods have been employed for designing optimization algorithms and analyzing their convergence behavior, utilizing techniques from control theory, including the internal model principle [2], integral quadratic constraints (IQC) [4, 5], Lyapunov-based analysis [6] and dissipative systems [7]. More recently, [8, 9, 10] build on the connection between the optimal gain margin problem [11] and optimization algorithms, with a particular focus on the heavy ball method. This link has provided new insights into understanding the convergence behavior and stability of optimization algorithms by drawing parallels with control theory concepts.

In this paper, we consider a class of online optimization problems with quadratic cost functions where the optimal point varies polynomially with the time step. The internal model principle suggests that in order to achieve zero steady-state error, an algorithm needs to incorporate a suitable number of discrete time integrators [12]. The paper extends the results of [12] by using the optimal gain margin approach of [11] to establish a fundamental convergence rate bound of $\left(\frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}\right)^{\frac{1}{n}}$ for all linear gradient based algorithms which achieve zero steady-state error when tracking a polynomially varying optimal point of order $n-1$ . Here, $\kappa$ is the condition number for the class of cost functions, which will be defined in the sequel. In addition, for any set of cost functions which includes the set of quadratic cost functions under consideration, will have a convergence rate which satisfies this bound. Moreover, the paper shows that this convergence rate bound is tight in that we give a procedure to construct algorithms which achieve the bound.

A conference version of this paper was presented in [13]. In this paper, we extend the results of [13] by generalizing from a linearly varying optimal point to a polynomially varying optimal point and by establishing a fundamental convergence rate bound for gradient based optimization algorithms with exact tracking.

This paper is organized as follows: Section II presents the problem formulation and preliminary results. In Section III, we describe and prove our main results. Section IV presents an illustrative example. Finally, the paper is concluded in Section V.

II Problem formulation and Preliminary Results

In this section, we formulate the class of optimization problems under consideration and review some existing optimization methods and their convergence rates, alongside relevant preliminary results.

II-A Problem formulation

We consider unconstrained optimization problems of the form:

[TABLE]

where $f:\mathbb{R}^{p+1}\rightarrow\mathbb{R}$ is a quadratic cost function with a unique minimum attained at $x^{*}(t)$ .

To address this problem, we consider gradient based algorithms similar to the general algorithms considered in [8] of the following form:

[TABLE]

where $k\in\mathbb{N},\alpha_{j}\in\mathbb{R},\beta_{j}\in\mathbb{R}$ and

[TABLE]

In (2), $x(t)\in\mathbb{R}^{p}$ represents the current iteration point and $t\in{0,1,\ldots}$ denotes the iteration index. The number $k$ denotes the number of past iterates considered in the optimization algorithm and the number of gradient evaluations at each iteration. Since $k$ corresponds to the number of past iterates, the algorithm needs be initialized with the values $x(0),x(1),\dots,x(k)$ . The parameters $\alpha_{j}$ and $\beta_{j}$ are the algorithm parameters, and $\nabla_{x}f(x(t),t)$ denotes the gradient of the cost function. It is straightforward to verify that this class of algorithms includes all linear gradient based optimization algorithms with constant step size.

We consider a class of time-varying quadratic cost functions defined as follows.

Definition 1

Given $L\geq m>0$ , let $\mathscr{F}^{q}_{m,L}$ denote the class of time-varying quadratic cost functions $f(x,t)$ of the following form:

[TABLE]

where the time-varying optimal point $x^{*}(t)$ is of the form

[TABLE]

and $a_{i}\in\mathbb{R}^{p}$ are constant vectors. In (1),

[TABLE]

We also define $\kappa=\frac{L}{m}$ .

In the sequel, we will show that in order for the algorithm (2) to track the polynomially varying optimal point defined in (5), it is necessary that it satisfies the following condition.

Condition 1

The algorithm (2) is such that

[TABLE]

for any $0\leq r\leq n-2$ and any $\hat{k}\geq k$ , where $(\cdot)_{r}$ denotes the falling factorial of order $r$ [14]; i.e., $(\hat{k})_{r}=\hat{k}(\hat{k}-1)...(\hat{k}-r+1)$ .

In the sequel, we will demonstrate that Condition 1 corresponds to the requirement that the algorithm includes at least $n$ discrete time integrators and so obtains zero steady-state error with an optimal point of the form (5) according to the internal model principle; e.g., see [2].

In this paper, we consider online optimization algorithms of the form (2) to track the minimum in (2) where $f(x,t)\in\mathscr{F}^{q}_{m,L}$ . The algorithms (2) can be reformulated as uncertain linear feedback systems, enabling us to analyze their convergence behavior through the lens of an optimal gain margin problem. This work extends the findings of [8] by considering online optimization problems in which the optimal point varies polynomially with time. The paper extend the findings of [12] by considering the optimal convergence rate with respect to all algorithms of the form (2). In particular, we establish a fundamental convergence rate bound for all algorithms of the form (2) that achieve zero steady-state error for time-varying cost functions of the form $f(x,t)\in\mathscr{F}^{q}_{m,L}$ . This convergence rate bound is given by the formula $\left(\frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}\right)^{\frac{1}{n}}=\left(\frac{\sqrt{L}-\sqrt{m}}{\sqrt{L}+\sqrt{m}}\right)^{\frac{1}{n}}$ . In addition, we show that this bound is tight in the sense that we can construct an algorithm which achieves this bound. This algorithm is constructed using a Blaschke product solution to a corresponding Nevanlinna Pick interpolation problem [15, 16].

II-B The Heavy Ball Method

The heavy ball method [17] is one of the most well-known methods for solving a time-invariant version of optimization problem (1) in which the cost function $f(x,t)$ is independent of $t$ . This approach enhances the standard gradient descent algorithm by incorporating a momentum term, resulting in an optimization algorithm of the form:

[TABLE]

The heavy ball method accelerates convergence by combining the current gradient with information from the previous step. With Polyak’s choice of parameters, this method exhibits the fastest convergence rate for time-invariant cost functions $f(x)\in\mathscr{F}^{q}_{m,L}$ [8]. These parameters are defined as:

[TABLE]

The paper [8] has shown that, for any time-invariant cost functions $f(x)\in\mathscr{F}^{q}_{m,L}$ , the heavy ball method yields the optimal worst-case asymptotic convergence rate, among all algorithms characterized by the following general form:

[TABLE]

where

[TABLE]

The convergence rate of the heavy ball method with parameters (9) is calculated as in [8, 17]:

[TABLE]

This formula then defines a bound on the convergence rate for all algorithms of the form (10) when applied to time-invariant cost functions; see [8, 18].

II-C Preliminary results

Let $\Sigma(\alpha,\beta,m,L)$ denote an algorithm corresponding to the recursion (2) depending on parameters $\alpha=(\alpha_{0},\dots,\alpha_{k}),\beta=(\beta_{0},\dots,\beta_{k-1})$ and the set of cost functions $\mathscr{F}^{q}_{m,L}$ defined by (1), (5), (6) where $L>m>0$ .

The following lemma provides a time-invariant representation of the iterative algorithm (2) in the case that Condition 1 is satisfied. This provides an alternative approach to the internal model principle for exact tracking of polynomially varying signals.

Lemma II.1

Given $\alpha$ , $\beta$ , $m$ and $L$ , such that $L>m>0$ , and an algorithm $\Sigma(\alpha,\beta,m,L)$ of the form (2) such that Condition 1 is satisfied. Then (2) can be rewritten as:

[TABLE]

where

[TABLE]

Proof: To prove this lemma, note that it follows from (II.1) that

[TABLE]

where $\tilde{f}(\tilde{x})=\frac{1}{2}\tilde{x}^{T}\Delta\tilde{x}$ . Also,

[TABLE]

It follows from (II-C) that

[TABLE]

In addition,

[TABLE]

Also, it follows from (II.1) that

[TABLE]

Substituting (2) into (18), we obtain

[TABLE]

Substituting (II.1), (16) and (II-C) into (II-C) yields

[TABLE]

where

[TABLE]

Thus, we need to show that $f_{0}(t)\equiv 0$ . Indeed, it follows from (5) that $f_{0}(t)\equiv 0$ if

[TABLE]

for $\hat{n}=1,2,\dots,n-1$ . To verify (22), we introduce Stirling numbers of the second kind; e.g., see [14, Section 6.1] which gives the following relations

[TABLE]

where $\{\cdot\}$ denotes the Stirling numbers of the second kind and $(t)_{s}$ denotes the falling factorial. Therefore

[TABLE]

Similarly,

[TABLE]

It is straightforward to verify that $r=s-1$ and it follows from (7) that

[TABLE]

Since $\beta_{j}=0$ for $j>k$ , it follows that

[TABLE]

Substituting (II-C) into (II-C), we obtain

[TABLE]

Thus, it follows from (II-C) that the left hand side of (22) equals to the right hand side. Hence, $f_{0}(t)\equiv 0$ and we can conclude that (2) is equivalent to (II.1). $\blacksquare$

Remark 1

It follows immediately from this lemma that an algorithm of the form (2) satisfying Condition 1 will achieve convergence with zero steady-state error for all $f(x,t)\in\mathscr{F}^{q}_{m,L}$ if and only if the corresponding algorithm (II.1) is such that $\tilde{x}(t)\rightarrow 0$ as $t\rightarrow\infty$ for all $\tilde{f}(\tilde{x})\in\mathscr{F}^{q}_{m,L}$ .

We can now reformulate (II.1) as an uncertain linear feedback system:

[TABLE]

where $\tilde{\chi}(t)=[\tilde{x}(t-k)^{T}\quad\dots\quad\tilde{x}(t)^{T}]^{T}\in\mathbb{R}^{(k+1)p}$ , $A\in\mathbb{R}^{(k+1)p\times(k+1)p},\tilde{B}\in\mathbb{R}^{(k+1)p\times(k+1)p},\tilde{C}\in\mathbb{R}^{(k+1)p\times(k+1)p}$ ,

[TABLE]

and

[TABLE]

Here,

[TABLE]

and $\boldsymbol{\Delta}$ is a block diagonal matrix with $k+1$ copies of the symmetric matrix $\Delta$ on its diagonal,

[TABLE]

where $\Delta$ satisfies (6), $\otimes$ denotes the Kronecker product, $I_{k}$ is the $k\times k$ identity matrix, and 0 denotes the zero matrix or vector of appropriate dimension.

Define $\hat{A}(\mathbb{\Delta})=A-\tilde{B}\mathbb{\Delta}\tilde{C}$ , where $A,\tilde{B},\tilde{C}$ are defined as in (30). We also define $\hat{A_{0}}(\lambda)=A_{0}-\lambda\tilde{B}_{0}\tilde{C}_{0}$ where $m\leq\lambda\leq L$ .

The following lemma, which follows from [19, Theorem 10.1.4, p. 301] and [8, Theorem 1], relates the convergence rate of the algorithm (II.1) to the spectral radius of $\hat{A_{0}}(\lambda)$ .

Lemma II.2

Given any $\alpha$ , $\beta$ , $m$ , $L$ , such that the algorithm (II.1) is globally convergent with the convergence rate

[TABLE]

Then

[TABLE]

Here, $\rho(\cdot)$ denotes the spectral radius of a matrix.

In order to apply this lemma to calculate the convergence rate of the algorithm (II.1) and hence the algorithm (2), we first consider the characteristic polynomial of $\hat{A_{0}}(\lambda)$ . Indeed, we compute $\hat{A_{0}}(\lambda)$ as

[TABLE]

where

[TABLE]

Then we compute the characteristic equation of $\hat{A_{0}}(\lambda)$ as

[TABLE]

Equation (41) can be written as $1+\lambda\tilde{P}(z)\tilde{K}(z)=0$ , where

[TABLE]

The following lemma relates Condition 1 to the condition that the algorithm (2) incorporates at least $n$ discrete time integrations.

Lemma II.3

Condition 1 is satisfied if and only if the polynomial $\tilde{D}(z)$ defined in (42) has at least $n-1$ roots at $z=1$ .

Proof: Let $\hat{k}\geq k$ be given and let $\bar{k}=\hat{k}-k\geq 0$ . Also, define

[TABLE]

From this, it follow that $\tilde{D}(z)$ will have at least $n-1$ roots at $z=1$ if and only if $\hat{D}(z)$ has at least $n-1$ roots at $z=1$ .

We now consider the first $n-1$ derivatives of $\hat{D}(z)$ :

[TABLE]

for $r=0,1,\dots,n-1$ . Then $\hat{D}(z)$ will have at least $n-1$ roots at $z=1$ , if and only if

[TABLE]

for $r=0,1,\dots,n-1$ . This is equivalent to

[TABLE]

for $r=0,1,\dots,n-1$ . This is Condition 1, thereby verifying the lemma. $\blacksquare$

III Main Results

The main result of this paper, which is a fundamental convergence rate bound for algorithms of the form (2), is stated in the following theorem.

Theorem 1

Let $L>m>0$ and $\kappa=\frac{L}{m}$ be given. Then any algorithm $\Sigma(\alpha,\beta,m,L)$ of the form (2) which is globally convergent with zero steady-state error for any $f(x,t)\in\mathscr{F}_{m,L}^{q}$ , will have a convergence rate $r(\alpha,\beta,m,L)$ satisfies:

[TABLE]

The following corollary follows immediately from the above theorem.

Corollary 1.1

Let $L>m>0$ and $\kappa=\frac{L}{m}$ be given. Then any algorithm $\Sigma(\alpha,\beta,m,L)$ which is globally convergent with zero steady-state error for any set of cost functions $\mathscr{F}\supset\mathscr{F}_{m,L}^{q}$ , will have a convergence rate which satisfies the bound (1).

The following theorem shows that the bound given in Theorem 1 is tight and gives the construction of an algorithm $\Sigma(\alpha^{*},\beta^{*},m,L)$ which achieves the bound.

Theorem 2

Let $L>m>0$ and $\kappa=\frac{L}{m}$ be given. Then there exists an algorithm $\Sigma(\alpha^{*},\beta^{*},m,L)$ of the form (2) which is globally convergent with zero steady-state error for $f(x,t)\in\mathscr{F}^{q}_{m,L}$ and has the convergence rate

[TABLE]

In this algorithm, $k=2n-1$ and the corresponding values of $\alpha^{*}$ and $\beta^{*}$ are given by

[TABLE]

for $j=0,1,\dots,2n,$ and

[TABLE]

for $j=0,1,\dots,2n-1,$ where

[TABLE]

In order to prove the above theorems, we introduce a series of lemmas.

Lemma III.1

The signal $x^{*}(t)$ in (5) has a $z$ -transform of the form

[TABLE]

where $B_{r}(z)$ is a polynomial of order $r+1$ such that $B_{r}(1)=r!$ for $r=0,1,\dots,n-1$ .

Proof: The proof proceeds by induction on $r$ . By inspection,

[TABLE]

where $B_{0}(z)=z$ is a polynomial of order 1 and $B_{0}(1)=1$ . Hence, the base case holds.

Now assume that for a given $r\geq 0$ , the following formula holds

[TABLE]

where $B_{r}(z)$ is a polynomial of order $r+1$ and $B_{r}(1)=r!$ . We now consider the case of $t^{r+1}$ and use a property of $z$ -transforms; e.g., see [20, Equation 2.9, p. 16]:

[TABLE]

where $B^{\prime}_{r}(z)$ denotes the derivative of $B_{r}(z)$ . Using (52), and the fact that $B_{r}(z)$ is a polynomial of order $r+1$ , it follows that $B_{r+1}(z)$ is a polynomial of order $r+2$ and

[TABLE]

Thus by induction, (51) holds for all $r\geq 0$ . From this, the lemma follows. $\blacksquare$

The following lemma provides a necessary condition for achieving zero steady-state error, and amounts to a version of the internal model principle for the problem under consideration.

Lemma III.2

Given any $L>m>0$ , suppose an algorithm $\Sigma(\alpha,\beta,m,L)$ of the form (2) is globally convergent with zero steady-state error for any $f(x,t)\in\mathscr{F}_{m,L}^{q}$ . Then

[TABLE]

has at least $n$ roots at $z=1$ . That is, $\tilde{D}(z)$ has at least $n-1$ roots at $z=1$ .

Proof: In order to prove this lemma, we write the iterative algorithm (2) in terms of a Luré system by letting

[TABLE]

be the state vector, and defining the reference signal

[TABLE]

Also, let $A$ , $B$ and $C$ be matrices defined by

[TABLE]

with $A_{0}\in\mathbb{R}^{(k+1)\times(k+1)}$ defined in (32),

[TABLE]

It follows that for any $f(x,t)\in\mathscr{F}_{m,L}^{q}$ , (2) can be rewritten as

[TABLE]

where $\mathbb{\Delta}$ is defined as in (37). Since we have assumed that the algorithm guarantees convergence with zero steady-state error for any $f(x,t)\in\mathscr{F}_{m,L}^{q}$ , it follows that we will obtain convergence with zero steady-state error for $f(x,t)\in\mathscr{F}_{m,L}^{q}$ with $\Delta=\lambda I_{p}$ and $m\leq\lambda\leq L$ . In this case, $\mathbb{\Delta}=\lambda I$ and we can rewrite (61) as

[TABLE]

This system, can be rewritten in state space form as

[TABLE]

The system (III) is illustrated in Figure 1.

Let $G(z)$ denote the transfer function $G(z)=C(zI-A)^{-1}B$ , which takes the form

[TABLE]

where the scalar transfer function $G_{0}(z)$ is given by

[TABLE]

where the denominator polynomial is

[TABLE]

Note that (64) follows from the fact that the system (55), (60), (III) is in controllable canonical form.

It follows from Figure 1 that the error signal $e(t)$ is such that

[TABLE]

leading to

[TABLE]

where $\Xi(z)$ is the $z$ -transform of $\xi(t)$ and $E(z)$ is the $z$ -transform of $e(t)$ .

Thus,

[TABLE]

with

[TABLE]

Applying the $z$ -transform to $\xi(t)$ , we obtain

[TABLE]

It follows from Lemma III.1 that

[TABLE]

Hence,

[TABLE]

Multiplying the numerator and denominator of this expression by $(z-1)^{n}$ , we rewrite (III) as

[TABLE]

Using the Final Value Theorem [20, Theorem 2.1], the steady-state error is

[TABLE]

Therefore, to achieve zero steady-state error, it is required that

[TABLE]

Using (III), this implies that the denominator of $G_{0}(z)$ must contain at least $n$ poles at $z=1$ . That is, $D(z)$ has at least $n$ roots at $z=1$ . Hence, $\tilde{D}(z)$ has at least $n-1$ roots at $z=1$ . $\blacksquare$

The proofs of our main results use results on the optimal gain margin problem as established in [11, Problem 2.4] and incorporate a related special case from the theoretical development in [8]. Generalizing the case in [8], our approach considers a plant that includes a discrete time $n$ -th order integrator, enabling the solution of the polynomially time-varying optimization problem with zero steady-state error. The required result is contained in the following lemma.

Lemma III.3

Consider the uncertain linear feedback system depicted in Figure 2, which comprises the plant $P(z)=\frac{1}{(z-1)^{n}}$ , which is a discrete time $n$ -th order integrator, an uncertain constant gain $\lambda$ and a compensator $K(z)$ . Let $\rho\in(0,1)$ be given and suppose $P(z)K(z)$ is strictly proper. If the compensator $K(z)$ places all of the poles of this system in the disk $|z|<\rho$ for all $\lambda\in[m,L]$ then

[TABLE]

where $\rho_{\text{\scriptsize HB}}$ is the convergence rate of the heavy ball method (11).

Proof: The proof of the lemma follows along the same lines as the proof of [8, Lemma 2], with modifications to accommodate our problem setting.

We introduce the sensitivity function $S(z)$ defined as

[TABLE]

The $n$ poles of $P(z)$ at $z=1$ , are $n$ zeros of $S(z)$ . Also the zeros of $P(z)$ at $z=\infty$ , correspond to a zero of $1-S(z)$ , since $P(z)K(z)$ is strictly proper. Therefore, we conclude that

[TABLE]

with multiplicity of $n$ , and

[TABLE]

Define $\mathscr{G}\triangleq\left(-\infty,\frac{2m}{m-L}\right]\cup\left(\frac{2L}{L-m},+\infty\right)$ and $\mathscr{G}^{\textbf{C}}=\mathbb{C}\setminus\mathscr{G}$ , where $\mathbb{C}$ is the set of complex numbers.

It follows from [8, Lemma 2] and [11, Lemma 2.3], that in order to place the poles of the closed loop system within the interior of the open disk $|z|<\rho$ for all $\lambda\in[m,L]$ , it is equivalent to demonstrate the existence of a sensitivity function $S(z)$ defined in (68) which is analytic in $\mathscr{H}_{\rho}\triangleq\{|z|\geq\rho\}\cup\{\infty\}$ that satisfies the conditions (69), (70) and $S(z)\in\mathscr{G}^{\textbf{C}}$ for all $z\in\mathscr{H}_{\rho}$ . To determine the existence of such an $S(z)$ , we reformulate the problem as a Nevanlinna Pick interpolation problem, as described in [8, 11], using the commutative diagram in Figure 3.

In this figure,

[TABLE]

and the function $\varphi(z)=\rho z^{-1}$ maps $\mathscr{H}_{\rho}$ into $\bar{\mathscr{D}}$ . Similarly, it follows from [8, 11] that the function

[TABLE]

is analytic in $\mathscr{G}^{\textbf{C}}$ , maps $\mathscr{G}^{\textbf{C}}$ into $\mathscr{D}$ . Also, the function

[TABLE]

should be constructed to be analytic in $\mathscr{\bar{\mathscr{D}}}$ , and map the closed unit disk $\mathscr{\bar{\mathscr{D}}}$ into the open unit disk $\mathscr{\mathscr{D}}$ .

As illustrated in the commutative diagram in Figure 3, the existence of a function $S(z)$ satisfying (69) and (70) which is analytic in $\mathscr{H}_{\rho}$ , and maps $\mathscr{H}_{\rho}$ into $\mathscr{G}^{\textbf{C}}$ , is equivalent to a Nevanlinna Pick interpolation problem for the function $\hat{S}(z)$ .

The Nevanlinna Pick interpolation problem is defined as follows. Given a set of points $\{z_{i}\}$ in the closed unit disk $\mathscr{\bar{\mathscr{D}}}$ and a corresponding set of target values $\{w_{i}\}$ , find a function $\hat{S}(z)$ which is analytic in $\mathscr{D}$ such that $|\hat{S}(z)|<1$ for all $z\in\mathscr{\mathscr{D}}$ , where $\mathscr{\mathscr{D}}$ is the open unit disk, and satisfying the interpolation conditions $\hat{S}(z_{i})=w_{i}$ with specified multiplicities.

It follows from (69) and (70) that the required interpolation values for $\hat{S}(z)$ are

[TABLE]

with multiplicity $n$ , and

[TABLE]

Note that it is straightforward to verify that $\theta(1)=\rho_{\text{\scriptsize HB}}$ .

Handling an interpolation condition with multiplicities requires imposing conditions on the derivatives of $\hat{S}(z)$ , which complicates the interpolation problem. To avoid this complexity, we introduce small perturbations to the plant’s poles by redefining the plant transfer function as

[TABLE]

where $n$ denotes the number of integrators with $n>1$ and the $\varepsilon_{i}$ are small positive parameters. As $\varepsilon_{i}\rightarrow 0$ , the plant $P(z)$ reduces to $\frac{1}{(z-1)^{n}}$ , which represents a $n$ -th order integrator. This modification allows us to avoid multiplicities in the interpolation conditions by ensuring that the poles and zeros of $P(z)$ are distinct, thereby simplifying the Nevanlinna-Pick interpolation problem. In this modified problem, the poles of $P(z)$ at $z=1+\varepsilon_{i}$ , are zeros of $S(z)$ . Therefore, the interpolation conditions of $S(z)$ become

[TABLE]

all with multiplicity one. By perturbing the interpolation points in this manner, we obtain modified interpolation conditions on $\hat{S}(z)$ , which yields

[TABLE]

all with multiplicity one.

Now we are in the position to determine the existence of such an $\hat{S}(z)$ . By [21, Theorem 2.2] there exists an interpolation function $\hat{S}(z)$ with $\hat{S}(z)<1$ for all $z\in\mathscr{D}$ that satisfies (78)-(79) if and only the Pick matrix $M\in\mathbb{R}^{n+1\times n+1}$ written as

[TABLE]

is positive semi-definite. In (80), $P\in\mathbb{R}^{n\times n}$ ,

[TABLE]

$Q=\textbf{1}$ is a column vector of ones and $R$ is a scalar, $R=1-\theta(1)^{2}$ . To satisfy the positive semi-definiteness of the Pick matrix $M$ , a necessary condition is that its determinant be non-negative. To compute the determinant of the Pick matrix $M$ , we employ the Schur complement [22], which yields

[TABLE]

Since $R$ is a scalar

[TABLE]

and

[TABLE]

Let $T=P-QR^{-1}Q^{T}$ where $T\in\mathbb{R}^{n\times n}$ . Since $Q$ is a vector of ones, we obtain

[TABLE]

Let $T=UV$ , where $U\in\mathbb{R}^{n\times n}$ ,

[TABLE]

and $V\in\mathbb{R}^{n\times n}$ is a diagonal matrix whose diagonal entries are given by $\frac{1}{1-\theta(1)^{2}}$ . It is clear that

[TABLE]

and

[TABLE]

Also, according to [23, Theorem 1],

[TABLE]

where

[TABLE]

and

[TABLE]

Substituting (87), (88), (86) and (84) into (82) yields

[TABLE]

Substituting (III) into (III) yields

[TABLE]

Since $\rho\in(0,1)$ and $\theta(1)\in(0,1)$ , $P>0$ and $Q>0$ for all $\varepsilon_{i}>0$ . Hence, a necessary condition for the Pick matrix $M$ to be positive semi-definite is $\det(M)\geq 0$ . Letting $\varepsilon_{i},\varepsilon_{j}\rightarrow 0$ from the positive direction, we obtain

[TABLE]

It follows that a necessary condition for the Pick matrix $M$ to be positive semi-definite is

[TABLE]

which is equivalent to

[TABLE]

Since $\theta(1)=\rho_{\text{\scriptsize HB}}$ , the condition (95) is equivalent to (67). Thus, if there exists a compensator $K(z)$ satisfying the conditions of the lemma, then (67) must be satisfied. $\blacksquare$

Now, we can prove Theorem 1.

Proof of Theorem 1: Using Lemma III.2, a necessary condition for the algorithm $\Sigma(\alpha,\beta,m,L)$ to track a polynomially varying optimal point with zero steady-state error, for any $f(x,t)\in\mathscr{F}^{q}_{m,L}$ , is that $\tilde{D}(z)$ defined in (42) has at least $n-1$ roots at $z=1$ . Moreover, Lemmas II.2 and III.3 imply that the corresponding system in (II-C) must position all poles inside the disk defined by $|z|\leq\rho$ for every $\lambda\in[m,L]$ in order to achieve a convergence rate $\rho$ . In order to meet this requirement, condition (67) must hold. Thus, any algorithm $\Sigma(\alpha,\beta,m,L)$ which satisfies the conditions of the theorem will have a convergence rate which satisfies (1). $\blacksquare$

Also, we are now in the position to prove Theorem 2.

Proof of Theorem 2: In order to prove this theorem, we construct an algorithm $\Sigma(\alpha^{*},\beta^{*},m,L)$ satisfying the conditions of the theorem and such that (2) is satisfied. It follows from Lemma II.2 that we want to construct an algorithm $\Sigma(\alpha^{*},\beta^{*},m,L)$ such that the characteristic polynomial of $\tilde{A_{0}}(\lambda)$ has all its roots inside the circle of radius $\rho=\sqrt[n]{\rho_{\text{\scriptsize HB}}}\in(0,1)$ for all $\lambda\in[m,L]$ . We now define

[TABLE]

and

[TABLE]

where $\tilde{N}(z)$ and $\tilde{D}(z)$ are defined in (42). It follows from (42) that

[TABLE]

Therefore, $\sup_{m\leq\lambda\leq L}\rho(\tilde{A_{0}}(\lambda))$ is the radius of the smallest disk that contains the poles of the SISO feedback control system in Figure 2.

We now construct the interpolation function $\hat{S}(z)$ defined in (72) using the finite Blaschke product approach, as described in [21, p. 53]. The Blaschke product is a classical method for constructing bounded analytic functions on the unit disk that satisfy prescribed interpolation conditions. The general form of a finite Blaschke product is given by:

[TABLE]

where $|\psi|=1$ , $z_{i}$ are the interpolation points inside the unit disk $\mathscr{D}$ and $\overline{z_{i}}$ denotes the complex conjugate of $z_{i}$ . Considering the interpolation conditions (78), the corresponding Blaschke product is given by:

[TABLE]

In addition, considering (79), evaluating the Blaschke product at $z=0$ , we obtain

[TABLE]

By letting $\varepsilon_{i}\rightarrow 0$ , we obtain

[TABLE]

where $|\psi|=\frac{\theta(1)}{\rho^{n}}=\frac{\theta(1)}{\rho_{\text{\scriptsize HB}}}=1$ . Then

[TABLE]

Therefore, the required analytic function $\hat{S}(z)$ is

[TABLE]

since $\rho^{n}=\rho_{\text{\scriptsize HB}}=\theta(1)$ . Thus, the function $\hat{S}(z)$ defined in (102) satisfies the interpolation conditions (73) and (74), $|\hat{S}(z)|<1$ for all $z\in\mathscr{D}$ , and inequality (67) is satisfied with equality.

Since $\hat{S}(z)=\theta(S(\rho z^{-1}))$ , we can now construct

[TABLE]

In order to find $S(z)$ , we firstly calculate $\hat{S}(\rho z^{-1})$ :

[TABLE]

Also, we construct the inverse of the function $\theta(\cdot)$ as

[TABLE]

Substituting (104) and (105) into (103) yields

[TABLE]

Substituting (106) and $P(z)=\frac{1}{(z-1)^{n}}$ into (68) and rearranging yields

[TABLE]

where

[TABLE]

Substituting (107) into $\tilde{K}(z)=\frac{1}{z-1}K(z)$ yields

[TABLE]

Substituting $\theta(1)=\rho_{\text{\scriptsize HB}}=\frac{\sqrt{L}-\sqrt{m}}{\sqrt{L}+\sqrt{m}}$ into (III) yields

[TABLE]

where

[TABLE]

Substituting (III) into (III) and simplifying, yields

[TABLE]

where $\tilde{K}_{1},\tilde{K}_{2},\tilde{K}_{3}$ are defined in (2). Note that the transfer function $\tilde{K}(z)$ is proper, since $\tilde{K}_{3}=\tilde{K}_{1}+\tilde{K}_{2}$ , which cancels out the leading term $z^{2n}$ in the numerator. In order to find the coefficients $\alpha^{*}_{j}$ and $\beta^{*}_{j}$ , we employ the binomial theorem [14], which yields

[TABLE]

Therefore the numerator of $\tilde{K}(z)$ can be rewritten as

[TABLE]

Similarly,

[TABLE]

and the denominator of $\tilde{K}(z)$ can be rewritten as

[TABLE]

Note that

[TABLE]

Therefore, we obtain the final expression for $\tilde{K}(z)$

[TABLE]

where the coefficients $\alpha_{j}$ and $\beta_{j}$ in the polynomial expansions of $\tilde{N}(z)$ and $\tilde{D}(z)$ precisely match the corresponding coefficients obtained from the binomial expansions of $\tilde{N}(z)$ and $\tilde{D}(z)$ in (III) and (118), respectively. Thus, we have constructed an algorithm $\Sigma(\alpha^{*},\beta^{*},m,L)$ satisfying the conditions of the theorem with convergence rate given by (2). $\blacksquare$

Remark 2

Note that it may be useful to define an alternative form for the algorithm (2) by using an alternative state space realization

[TABLE]

rather than using the formula (2), (48). This is an area for future research.

IV Illustrative Example

This section gives an illustrative example of the use of the algorithm $\Sigma(\alpha^{*},\beta^{*},m,L)$ of the form (2) defined in Theorem 2. Consider a quadratic cost function $f(x,t)\in\mathscr{F}^{q}_{m,L}$ ,

[TABLE]

where $x^{*}(t)\in\mathbb{R}$ is a real source location at time $t$ defined by

[TABLE]

Equation (122) has a Taylor series expansion [24, Chapter 4]

[TABLE]

To simplify the problem, we only consider the first two terms of the Taylor series (123). It follows from Lemma III.2 that to achieve exact tracking of a polynomially varying point of order $3$ , the iterative method must incorporate at least four integrators. In order to find the corresponding algorithm parameters, it follows from (42) and (III) that

[TABLE]

where $\tilde{K}_{1},\tilde{K}_{2},\tilde{K}_{3}$ are defined in (2) and $n=4$ . Solving these equations, we find the corresponding optimal parameters as follows:

[TABLE]

and

[TABLE]

where

[TABLE]

Figure 4 compares the true optimal point to its estimated value obtained by using Theorem 2 and by using the standard gradient descent method with step size $\frac{2}{L+m}$ , where $L=1$ and $m=5$ . The yellow star represents the true optimal point, moving according to the polynomial trajectory (123) of order $3$ . The red line shows the optimal point estimated by the optimal method of Theorem 2. It can be observed that the optimal method of Theorem 2 successfully tracks the optimal point with zero steady-state error. In addition, the black line in Figure 4 shows the performance of the gradient descent method, which fails to track the time-varying optimal point with zero steady-state error.

To further verify the performance of our approach, Figure 5 illustrates the difference between the true optimal point, its estimate using the optimal algorithm obtained from Theorem 2 and the gradient descent method. Clearly, the optimal algorithm obtained from Theorem 2 achieves zero steady-state error, whereas the gradient descent method does not.

V CONCLUSIONS

In this paper, we establish a fundamental convergence rate bound for all linear gradient based optimization algorithms which optimize a time-varying cost function with zero steady-state error. We consider online optimization algorithms for solving optimization problems with time-varying quadratic cost functions. Our main results incorporate results on the optimal gain margin of linear uncertain systems. We also use a version of the internal model principle to require the inclusion of discrete time integrators in the corresponding algorithm. This approach enables the algorithm to achieve exact tracking of time-varying optimal points.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. Hazan, Introduction to Online Convex Optimization , 2nd ed. USA: The MIT Press, 2022.
2[2] N. Bastianello, R. Carli, and S. Zampieri, “Internal model-based online optimization,” IEEE Transactions on Automatic Control , vol. 69, no. 1, pp. 689–696, 2024.
3[3] L. Madden, S. Becker, and E. Dall’Anese, “Bounds for the tracking error of first-order online optimization methods,” Journal of Optimization Theory and Applications , vol. 189, no. 2, pp. 437–457, 2021.
4[4] L. Lessard, B. Recht, and A. Packard, “Analysis and design of optimization algorithms via integral quadratic constraints,” SIAM Journal on Optimization , vol. 26, no. 1, pp. 57–95, 2016.
5[5] C. S. Simon Michalowsky and C. Ebenbauer, “Robust and structure exploiting optimisation algorithms: an integral quadratic constraint approach,” International Journal of Control , vol. 94, no. 11, pp. 2956–2979, 2021.
6[6] B. Van Scoy and L. Lessard, “A tutorial on a Lyapunov-based approach to the analysis of iterative optimization algorithms,” in 2023 62nd IEEE Conference on Decision and Control (CDC) , 2023, pp. 3003–3008.
7[7] L. Lessard, “The analysis of optimization algorithms, a dissipativity approach,” IEEE Control Systems Magazine , vol. 42, 05 2022.
8[8] V. Ugrinovskii, I. R. Petersen, and I. Shames, “A robust control approach to asymptotic optimality of the heavy ball method for optimization of quadratic functions,” Automatica , vol. 155, pp. 111–129, 2023.