On the Convergence of the Inexact Running Krasnosel'skii-Mann Method

Emiliano Dall'Anese; Andrea Simonetto; Andrey Bernstein

arXiv:1904.08469·math.OC·January 9, 2020·IEEE Control. Syst. Lett.

On the Convergence of the Inexact Running Krasnosel'skii-Mann Method

Emiliano Dall'Anese, Andrea Simonetto, Andrey Bernstein

PDF

TL;DR

This paper analyzes the convergence of an inexact, evolving version of the Krasnosel'skii-Mann method, providing theoretical guarantees for fixed-point tracking under imperfect information and dynamic maps.

Contribution

It introduces a framework for analyzing inexact, running Krasnosel'skii-Mann algorithms with evolving maps and imperfect data, extending convergence results to these settings.

Findings

01

Convergence of the average fixed-point residual in non-expansive cases.

02

Linear convergence to a fixed-point trajectory under contractive operators.

03

Applicability to inexact gradient and forward-backward splitting methods.

Abstract

This paper leverages a framework based on averaged operators to tackle the problem of tracking fixed points associated with maps that evolve over time. In particular, the paper considers the Krasnosel'skii-Mann method in a settings where: (i) the underlying map may change at each step of the algorithm, thus leading to a "running" implementation of the Krasnosel'skii-Mann method; and, (ii) an imperfect information of the map may be available. An imperfect knowledge of the maps can capture cases where processors feature a finite precision or quantization errors, or the case where (part of) the map is obtained from measurements. The analytical results are applicable to inexact running algorithms for solving optimization problems, whenever the algorithmic steps can be written in the form of (a composition of) averaged operators; examples are provided for inexact running gradient methods and…

Equations127

F_{k} := I + λ_{k} (T - I)

F_{k} := I + λ_{k} (T - I)

\frac{1}{K} k = 1 \sum K ∥ x_{k} - T (x_{k}) ∥^{2} \leq \frac{∥ x _{1} - x ^{*} ∥ ^{2}}{K λ ( 1 - λ )}

\frac{1}{K} k = 1 \sum K ∥ x_{k} - T (x_{k}) ∥^{2} \leq \frac{∥ x _{1} - x ^{*} ∥ ^{2}}{K λ ( 1 - λ )}

F_{t} = (1 - α_{t}) I + α_{t} T_{t}

F_{t} = (1 - α_{t}) I + α_{t} T_{t}

x_{t} = F_{t} (x_{t - 1}) = (1 - α_{t}) x_{t - 1} + α_{t} T_{t} (x_{t - 1}) .

x_{t} = F_{t} (x_{t - 1}) = (1 - α_{t}) x_{t - 1} + α_{t} T_{t} (x_{t - 1}) .

∥ x_{t + 1}^{⋆} - x_{t}^{⋆} ∥ \leq σ_{t} .

∥ x_{t + 1}^{⋆} - x_{t}^{⋆} ∥ \leq σ_{t} .

x \in D max ∥ T_{t} (x) - \hat{T}_{t} (x) ∥ \leq e_{T, t} .

x \in D max ∥ T_{t} (x) - \hat{T}_{t} (x) ∥ \leq e_{T, t} .

\hat{F}_{t} (x) := (1 - α_{t}) x + α_{t} \hat{T}_{t} (x)

\hat{F}_{t} (x) := (1 - α_{t}) x + α_{t} \hat{T}_{t} (x)

x_{t} = \hat{F}_{t} (x_{t - 1}) = (1 - α_{t}) x_{t - 1} + α_{t} \hat{T}_{t} (x_{t - 1}) .

x_{t} = \hat{F}_{t} (x_{t - 1}) = (1 - α_{t}) x_{t - 1} + α_{t} \hat{T}_{t} (x_{t - 1}) .

x \in D max ∥ F_{t} (x) ∥ \leq M_{t}, x \in D max ∥ \hat{F}_{t} (x) ∥ \leq M_{t} .

x \in D max ∥ F_{t} (x) ∥ \leq M_{t}, x \in D max ∥ \hat{F}_{t} (x) ∥ \leq M_{t} .

t = 1 \sum T α_{t} (1 - α_{t}) ∥ x_{t} - T_{t} (x_{t}) ∥^{2} \leq ∥ x_{1} - x_{1}^{⋆} ∥^{2} + t = 1 \sum T r_{t}

t = 1 \sum T α_{t} (1 - α_{t}) ∥ x_{t} - T_{t} (x_{t}) ∥^{2} \leq ∥ x_{1} - x_{1}^{⋆} ∥^{2} + t = 1 \sum T r_{t}

\frac{1}{T} t = 1 \sum T α_{t} (1 - α_{t}) ∥ x_{t} - T_{t} (x_{t}) ∥^{2} \leq \frac{1}{T} ∥ x_{1} - x_{1}^{⋆} ∥^{2} + r

\frac{1}{T} t = 1 \sum T α_{t} (1 - α_{t}) ∥ x_{t} - T_{t} (x_{t}) ∥^{2} \leq \frac{1}{T} ∥ x_{1} - x_{1}^{⋆} ∥^{2} + r

\frac{1}{T} t = 1 \sum T \frac{1 - α _{t}}{α _{t}} ∥ x_{t} - F_{t} (x_{t}) ∥^{2} \leq \frac{1}{T} ∥ x_{1} - x_{1}^{⋆} ∥^{2} + r

\frac{1}{T} t = 1 \sum T \frac{1 - α _{t}}{α _{t}} ∥ x_{t} - F_{t} (x_{t}) ∥^{2} \leq \frac{1}{T} ∥ x_{1} - x_{1}^{⋆} ∥^{2} + r

T \to + \infty lim sup \frac{1}{T} t = 1 \sum T ∥ x_{t} - F_{t} (x_{t}) ∥^{2} \leq r \overset{α}{ˉ}^{- 1}

T \to + \infty lim sup \frac{1}{T} t = 1 \sum T ∥ x_{t} - F_{t} (x_{t}) ∥^{2} \leq r \overset{α}{ˉ}^{- 1}

t = 1 \sum T e_{T, t} = o (T),

t = 1 \sum T e_{T, t} = o (T),

T \to \infty lim sup \frac{1}{T} t = 1 \sum T ∥ x_{t} - T_{t} (x_{t}) ∥^{2} \leq \overset{α}{ˇ}^{- 1} σ (4 M + σ)

T \to \infty lim sup \frac{1}{T} t = 1 \sum T ∥ x_{t} - T_{t} (x_{t}) ∥^{2} \leq \overset{α}{ˇ}^{- 1} σ (4 M + σ)

T \to \infty lim sup \frac{1}{T} t = 1 \sum T ∥ x_{t} - F_{t} (x_{t}) ∥^{2} \leq \overset{α}{ˉ}^{- 1} σ (4 M + σ)

t = 1 \sum T σ_{t} = o (T)

t = 1 \sum T σ_{t} = o (T)

∥ x_{t + 1} - x_{t + 1}^{⋆} ∥

∥ x_{t + 1} - x_{t + 1}^{⋆} ∥

+ τ = 1 \sum t c^{(t, τ)} (α_{τ} e_{T, τ} + σ_{τ})

c^{(t, τ)} := {\prod_{ℓ = τ + 1}^{t} L_{ℓ}, 1, if τ = 0, \dots, t - 1 if τ = t .

c^{(t, τ)} := {\prod_{ℓ = τ + 1}^{t} L_{ℓ}, 1, if τ = 0, \dots, t - 1 if τ = t .

t \to \infty lim sup ∥ x_{t} - x_{t}^{⋆} ∥ \leq \frac{γ}{1 - L}

t \to \infty lim sup ∥ x_{t} - x_{t}^{⋆} ∥ \leq \frac{γ}{1 - L}

t \to \infty lim sup ∥ x_{t} - x_{t}^{⋆} ∥ \leq \frac{σ}{1 - L} .

t \to \infty lim sup ∥ x_{t} - x_{t}^{⋆} ∥ \leq \frac{σ}{1 - L} .

(P1_{t}) : x \in X_{t} min f_{t} (x)

(P1_{t}) : x \in X_{t} min f_{t} (x)

x_{t + 1} = proj_{X_{t}} {x_{t} - ν \nabla f_{t} (x_{t})}

x_{t + 1} = proj_{X_{t}} {x_{t} - ν \nabla f_{t} (x_{t})}

x_{t + 1} = proj_{X_{t}} {x_{t} - ν y_{t}} .

x_{t + 1} = proj_{X_{t}} {x_{t} - ν y_{t}} .

I - ν \nabla f_{t} = (1 - ν K_{t} /2) I + ν K_{t} / 2 (I - 2/ K_{t} \nabla f_{t})

I - ν \nabla f_{t} = (1 - ν K_{t} /2) I + ν K_{t} / 2 (I - 2/ K_{t} \nabla f_{t})

∥ \hat{T}_{t} (x) - T_{t} (x) ∥

∥ \hat{T}_{t} (x) - T_{t} (x) ∥

e_{T, t} = 2 K_{t}^{- 1} e_{y, t} .

e_{T, t} = 2 K_{t}^{- 1} e_{y, t} .

e_{T, t} = (2 ν - ν^{2} K_{t} / 2) e_{y, t} .

e_{T, t} = (2 ν - ν^{2} K_{t} / 2) e_{y, t} .

(P2_{t}) : x \in X_{t} min f_{t} (x) + g_{t} (x)

(P2_{t}) : x \in X_{t} min f_{t} (x) + g_{t} (x)

x_{t + 1} = prox_{g_{t}, X_{t}, ν} {x_{t} - ν \nabla f_{t} (x_{t})}

x_{t + 1} = prox_{g_{t}, X_{t}, ν} {x_{t} - ν \nabla f_{t} (x_{t})}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On the Convergence of the Inexact Running

Krasnosel’skiĭ-Mann Method

Emiliano Dall’Anese1, Andrea Simonetto2, Andrey Bernstein3 1E. Dall’Anese is the University of Colorado Boulder; email: [email protected]. 2A. Simonetto is with IBM Research Ireland; email: [email protected]. 3A. Bernstein is with the National Renewable Energy Laboratory (NREL); email: [email protected]. The work of E. Dall’Anese was supported by NREL via APUP UGA-0-41026-109. Funds for A. Bernstein were provided by ARPA-e NODES.

Abstract

This paper leverages a framework based on averaged operators to tackle the problem of tracking fixed points associated with maps that evolve over time. In particular, the paper considers the Krasnosel’skiĭ-Mann method in a settings where: (i) the underlying map may change at each step of the algorithm, thus leading to a “running” implementation of the Krasnosel’skiĭ-Mann method; and, (ii) an imperfect information of the map may be available. An imperfect knowledge of the maps can capture cases where processors feature a finite precision or quantization errors, or the case where (part of) the map is obtained from measurements. The analytical results are applicable to inexact running algorithms for solving optimization problems, whenever the algorithmic steps can be written in the form of (a composition of) averaged operators; examples are provided for inexact running gradient methods and the forward-backward splitting method. Convergence of the average fixed-point residual is investigated for the non-expansive case; linear convergence to a unique fixed-point trajectory is showed in the case of inexact running algorithms emerging from contractive operators.

I Introduction and Problem Formulation

The Banach-Picard method and its Krasnosel’skiĭ-Mann (KM) variant have been leveraged to establish convergence of a number of iterative algorithmic frameworks for solving convex optimization problems as well as problems associated with (non)linear systems [1, 2, 3, 4, 5]. Focusing on the KM method, recall that an operator $\textsf{T}:{\cal D}\rightarrow{\cal D}$ , where ${\cal D}$ is a nonempty convex subset of a finite-dimensional Hilbert space ${\cal H}$ with a given norm $\|\cdot\|$ , is non-expansive if it is $1$ -Lipschitz in ${\cal H}$ ; that is, $\forall\,{\bf x},{\bf y}\in{\cal D}$ one has that $\|\textsf{T}({\bf x})-\textsf{T}({\bf y})\|\leq\|{\bf x}-{\bf y}\|$ . The KM algorithm involves the sequential application of the following operator starting from a point in ${\cal D}$ (with $k$ the iteration index):

[TABLE]

with $\textsf{I}:{\cal H}\rightarrow{\cal H}$ the identity operator and $\{\lambda_{k}\}_{k\in\mathbb{N}}$ a sequence in $[0,1]$ satisfying $\sum_{k=1}^{\infty}\lambda_{k}(1-\lambda_{k})=\infty$ [1]. Based on (1), convergence of iterative algorithms for solving optimization problems can be cast as the problem of finding fixed points of a properly constructed non-expansive map T (which are also fixed points of F). As another example, the operator-based representation (1) can be utilized to investigate convergence of discrete-time linear systems [6].

The KM method (1) is known to converge weakly to a fixed point of T [1, 7, 6, 8]; that is, taking the case of a constant value of $\lambda_{k}=\lambda$ as an example, one has that the average fixed-point residual of the map T after $K$ iterations can be bounded as [1, 7]:

[TABLE]

with ${\bf x}^{*}$ a fixed point. See also the inexact [9] and stochastic [10] variants, as well as more results on convergence of algorithms involving averaged non-expansive operators [11].

While (2) pertains to problems where the map T is “fixed” during the execution of the KM algorithm and it is known, this paper revisits the convergence of the KM method in case of time-varying and possibly inexact maps. This setting is motivated by recent efforts to address the design and analysis of running algorithms for time-varying optimization problems [12, 13, 4, 14], with particular emphasis on feedback-based online optimization [14, 15]; additional works along these lines are in the context of online optimization (see the representative works [16, 17, 18] and references therein) and learning in dynamic environments [19, 20]. In a time-varying optimization setting, the underlying cost, constraints, and problem inputs may change at every step (or a few steps) of the algorithm; therefore, pertinent tasks in this case involve the derivation of results for the tracking of optimal solution trajectories. Updates of the algorithms may be implemented inexactly due to finite-precision [21] or because measurement feedback is utilized in lieu of model-based gradient computations [14]. Counterparts of (2) are of interest for inexact running algorithms for problems with time-varying cost functions that are (locally) convex but not strongly convex; in case of problems with a (locally) strongly convex costs, contractive arguments can be leveraged.

To concretely outline the problem, consider discretizing the temporal index as $th$ , $t\in\mathbb{N}$ and with $h$ a given interval (that will coincide with the time required to evaluate a map). Taking the normed space $(\mathbb{R}^{m},\|\cdot\|)$ for the rest of the paper, consider a convex and closed set ${\cal D}\subseteq\mathbb{R}^{m}$ and a sequence of non-expansive mappings $\textsf{F}_{t}:{\cal D}\rightarrow{\cal D}$ . In particular, assume that $\textsf{F}_{t}$ is $\alpha_{t}$ -averaged; that is, it is a convex combination

[TABLE]

$\alpha_{t}\in(0,1)$ . Starting from ${\bf x}_{1}\in{\cal D}$ , the running KM method amounts to the execution of the following step at each $t$ :

[TABLE]

Different from the “batch” KM method – especially when a Mann sequence $\{\lambda_{k}\}_{k\in\mathbb{N}}$ is utilized – where (1) is executed within an interval $h$ until convergence, the running algorithm (4) boils down to a sequential application of time-varying $\alpha_{t}$ -averaged maps. Preliminary results for the convergence of (4) were provided in [4].

The paper investigates the ability of the running algorithm (4) to track fixed points of the sequence of mappings $\{\textsf{T}_{t}\}_{k\in\mathbb{N}}$ , when in imperfect mapping $\hat{\textsf{T}}_{t}:{\cal D}\rightarrow{\cal D}$ is available. Notice that fixed points would be identified at each time $t$ only if the KM method (1) is executed to convergence at each $t$ (i.e., in a batch setting, instead of performing only one iteration) and the map $\textsf{T}_{t}$ is known. This paper derives results similar to (2) for the inexact running KM method; results are also provided for the case of vanishing errors and vanishing fixed-point dynamics. The paper further considers the case where the overall mappings $\{\textsf{F}_{t}\}_{k\in\mathbb{N}}$ are contractions, and establishes linear convergence to the unique fixed-point trajectory. The proposed framework is then exemplified for inexact running projected gradient and forward-backward splitting methods for solving time-varying convex optimization problems. Overall, the paper provides contributions over our previous work [22] on running Banach-Picard method, where linear convergence results where established in case of time-varying contractive maps, possibly corrupted by errors. Stochastic time-varying-fixed problems were considered in [20, Th. 20]; here, we focus on bounded errors on averaged operators, and leave stochastic errors as a follow on research opportunity.

II Inexact Running Algorithm

Let ${\bf x}^{\star}_{t}$ be a fixed point of the self-mapping $\textsf{F}_{t}$ ; that is, ${\bf x}^{\star}_{t}=\textsf{F}_{t}({\bf x}^{\star}_{t})$ . If the vectors $\{{\bf x}^{\star}_{t}\}_{t\in\mathbb{N}}$ satisfy the equation ${\bf x}^{\star}_{t}=\textsf{F}_{t}({\bf x}^{\star}_{t})$ for each $t\in\mathbb{N}$ , then we refer to $\{{\bf x}^{\star}_{t}\}_{t\in\mathbb{N}}$ as a sequence of fixed points. If the mappings $\{\textsf{F}_{t}\}_{t\in\mathbb{N}}$ are averaged, multiple sequences $\{{\bf x}^{\star}_{t}\}_{t\in\mathbb{N}}$ may exist; since $\textsf{F}_{t}=(1-\alpha_{t})\textsf{I}+\alpha_{t}\textsf{T}_{t}$ , ${\bf x}^{\star}_{t}$ is also a fixed point of $\textsf{T}_{t}$ . When $\{\textsf{F}_{t}\}_{t\in\mathbb{N}}$ are contractions, only one sequence exists by the Banach fixed-point theorem. To characterize the variability of a fixed-point sequence, we assume that there exists a sequence of fixed points $\{{\bf x}^{\star}_{t}\}_{t\in\mathbb{N}}$ , for which there exists a finite and non-negative sequence of scalars $\{\sigma_{t}\}_{t\in\mathbb{N}}$ , such that

[TABLE]

for all $t$ . If $\textsf{F}_{t+1}=\textsf{F}_{t}$ then one has that $\sigma_{t}=0$ , and we are recover the time-invariant case.

Consider now a mapping $\hat{\textsf{T}}_{t}:{\cal D}\rightarrow{\cal D}$ , which is an approximation of $\textsf{T}_{t}$ in the following sense.

Assumption 1 (Bounded approximation error)

For each $t\in\mathbb{N}$ and for all ${\bf x}\in{\cal D}$ , it holds that $\hat{\textsf{T}}_{t}({\bf x})\in{\cal D}$ . Further, there exists a scalar $e_{\textsf{T},t}<+\infty$ such that

[TABLE]

The condition (6) simply asserts that the error in the map is bounded; it can be deterministic or stochastic (and i.i.d over time), but with finite support. Accordingly, define the approximate $\alpha_{t}$ -averaged map $\hat{\textsf{F}}_{t}$ as:

[TABLE]

Based on (7), and given an initial point ${\bf x}_{1}\in{\cal D}$ the inexact running KM algorithm is given by [cf. (4)]:

[TABLE]

In the next section, tracking of a sequence of fixed points $\{{\bf x}^{\star}_{t}\}_{t\in\mathbb{N}}$ via (8) will be investigated.

III Convergence

This section will characterize the performance of the inexact running KM method in two different settings:

i) The map $\textsf{T}_{t}$ is non-expansive and $\textsf{F}_{t}$ is $\alpha_{t}$ -averaged; and,

ii) The map $\textsf{F}_{t}$ is a contraction.

It is worth pointing out that for generic non-expansive maps, the sequence generated by the Banach-Picard iteration may fail to produce a fixed point even in a static case; the structure of (8) will however facilitate the derivation of convergence results. Regarding the second case, notice that if $\textsf{T}_{t}$ is contractive then $\textsf{F}_{t}$ is contractive; however, the converse is not necessarily true. We start by outlining the following standard assumptions [1, 7].

Assumption 2 (Lipshitz maps)

There exists a scalar $0\leq L_{t}\leq 1$ such that $\|\textsf{F}_{t}({\bf x})-\textsf{F}_{t}({\bf x}^{\prime})\|\leq L_{t}\|{\bf x}-{\bf x}^{\prime}\|$ for all ${\bf x},{\bf x}^{\prime}\in{\cal D}$ .

Assumption 3 (Bounded maps)

There exists a scalar $M_{t}<+\infty$ such that

[TABLE]

If ${\cal D}$ is compact, then $M_{t}$ can be taken, in the worst case, to be the radius of ${\cal D}$ . For subsequent developments, define $M:=\sup_{t}\{M_{t}\}$ , $\sigma:=\sup_{t}\{\sigma_{t}\}$ , $e_{\textsf{T}}:=\sup_{t}\{e_{\textsf{T},t}\}$ , and $\alpha:=\sup_{t}\{\alpha_{t}\}$ . The following result pertains to the case where $\textsf{F}_{t}$ is $\alpha_{t}$ -averaged.

Theorem 1

Consider a sequence of $\alpha_{t}$ -averaged operators $\textsf{F}_{t}=(1-\alpha_{t})I+\alpha_{t}\textsf{T}_{t}$ , $t=1,\ldots,T$ , and assume that there exists a sequence of vectors $\{{\bf x}^{\star}_{t}\}_{t=1}^{T}$ that satisfy the equation ${\bf x}^{\star}_{t}=\textsf{F}_{t}({\bf x}^{\star}_{t})$ for each $t=1,\ldots,T$ . Suppose that Assumptions 1–3 hold, and take ${\bf x}_{1}\in{\cal D}$ . Then, the following bound holds for the algorithm (8):

[TABLE]

where $r_{t}:=\alpha_{t}e_{\textsf{T},t}(4M_{t}+\alpha_{t}e_{\textsf{T},t})+\sigma_{t}(4M_{t}+\sigma_{t})$ . In particular, one has that:

[TABLE]

with $r:=\alpha e_{\textsf{T}}(4M+\alpha e_{\textsf{T}})+\sigma(4M+\sigma)$ .

Proof. See Appendix -A

Bounds (11)–(12) imply convergence in mean of the fixed-point residual to a ball centered at [math]; the size of the ball depends on the bound on the variability of the fixed-point trajectories, on the size of the image of the operators, and on the approximation errors for the maps. An immediate follow-up from (11)–(12) is the following asymptotic result:

[TABLE]

where $\bar{\alpha}:=\inf_{t=1,\ldots T}\{(1-\alpha_{t})/\alpha_{t}\}$ . A similar result can be derived for the mean of $\|{\bf x}_{t}-\textsf{T}_{t}({\bf x}_{t})\|^{2}$ .

It is worth pointing out that, when $\sigma=0$ , the bound in (13) reduces to $\alpha e_{\textsf{T}}(4M+\alpha e_{\textsf{T}})\bar{\alpha}^{-1}$ , and the bounds therefore capture the effect of the approximate maps. In case of perfect mappings, (13) boils down to (2) [1, 7]. Motivated by this, the next results will deal with vanishing errors and fixed-point dynamics, which is increasingly motivated by learning in bandit settings (where the maps are learned online while the algorithm is running).

Corollary 1

Suppose111A relation $f(n)=o(g(n))$ signifies that for every positive constant $\varphi$ there exists $N$ such that $|f(n)|\leq\varphi|g(n)|$ for all $n\geq N$ . that for each $T$ , one has that

[TABLE]

i.e., $\sum_{t=1}^{T}e_{\textsf{T},t}$ grows sublinearly in $T$ . If Assumptions 1–3 hold, then, for the algorithm (8), the fixed-point residual $\|{\bf x}_{t}-\textsf{T}_{t}({\bf x}_{t})\|$ converges to:

[TABLE]

where $\check{\alpha}:=\inf_{t=1,\ldots T}\{\alpha_{t}(1-\alpha_{t})\}$ .

Proof. See Appendix -B.

Corollary 2

Suppose that for each $T$ , one has that

[TABLE]

i.e., $\sum_{t=1}^{T}\sigma_{t}$ grows sublinearly in $T$ . Assume further that (14) holds. Then, under Assumptions 1–3, for the algorithm (8) one has that $\lim_{t\rightarrow\infty}\|{\bf x}_{t}-\textsf{T}_{t}({\bf x}_{t})\|^{2}=0$ and $\lim_{t\rightarrow\infty}\|{\bf x}_{t}-\textsf{F}_{t}({\bf x}_{t})\|^{2}=0$ .

For completeness, we now turn the attention to convergence results for contractive operators. The following holds.

Theorem 2

Consider a sequence of contractive mappings of the form $\textsf{F}_{t}=(1-\alpha_{t})\textsf{I}+\alpha_{t}\textsf{T}_{t}$ , $t=1,\ldots,T$ and let $\{{\bf x}^{\star}_{t}\}_{t=1}^{T}$ be the trajectory of fixed points. Let $\{{\bf x}_{t}\}_{t=1}^{T}$ be a sequence generated by the algorithm (8), with ${\bf x}_{1}\in{\cal D}$ . Suppose that Assumptions 1–3 hold. Then, at each time $t$ , it holds that:

[TABLE]

for each $t$ , where

[TABLE]

Suppose further that Assumption 2 holds with $L_{t}<1$ for all $t$ . Then, $\{{\bf x}^{\star}_{t}\}_{t=1}^{T}$ is unique and the following asymptotic bound holds for the algorithm (8):

[TABLE]

where $\gamma:=\alpha e_{\textsf{T}}+\sigma$ and $L:=\sup_{t}\{L_{t}\}$ .

Proof. See Appendix -C.

Bound (20) in similar to [22], but customized for the operators considered here. In case of vanishing errors and dynamics, the following results readily hold.

Corollary 3

Suppose that (14) holds. Then, if Assumption 2 holds with $L_{t}<1$ for all $t$ , then

[TABLE]

Additionally, if (17) holds, then $\lim_{t\rightarrow\infty}\sup\|{\bf x}_{t}-{\bf x}^{\star}_{t}\|=0$ .

Remark 1

When a predictable sequence is available, one could reduce the error ball $r$ to a sublinear function of $T$ by properly tuning the sequence $\{\alpha_{t}\}$ , even if $\sigma_{t}$ and $e_{\textsf{T}}$ do not vanish; see, for example, the framework in [23] for adaptive optimistic mirror descent methods. Due to space limitations, we leave the derivation of these results for future efforts.

Remark 2

Proof techniques in [24] presuppose particular sequences $\{\alpha_{t}\}$ and $\{e_{\textsf{T},t}\}$ to establish convergence results for e.g., static non-expansive and strictly pseudocontractive maps (see, e.g., Theorems 6.1 and 6.2) as well as for (static) maps defined in Banach spaces (see, e.g., Theorem 6.8). Adopting the sequences $\{\alpha_{t}\}$ in [24] might not be possible in a time-varying setting, especially when $\alpha_{t}\rightarrow 0$ for $t\rightarrow\infty$ ; however, future efforts will look at possible extensions of the techniques in [24] in the time-varying case.

IV Examples of applications

The objective of this section is to show that a number of inexact running algorithms for time-varying optimization problems can be analyzed by leveraging the operator-based framework proposed in this paper. In particular, this section focuses on inexact running gradient methods and forward-backward splitting algorithms. Additional applications are possible [3], but are not included due to space limitations.

IV-A Running gradient method with errors

Recall that the temporal index is discretized as $th$ , $t\in\mathbb{N}$ , with $h$ a given interval (that can coincide with the time required to perform one algorithmic step). Consider the following time-varying optimization problem

[TABLE]

where $f_{t}:\mathbb{R}^{n}\rightarrow\mathbb{R}$ is a convex, closed, and proper (CCP) function at each time $t$ , and ${\cal X}_{t}$ is a convex and compact set at each time $t$ . Assume that $f$ is strongly smooth with parameter $K_{t}>0$ . Notice that solving the problem (22) is equivalent to finding the zeros of $\nabla f_{t}+{\cal N}_{{\cal X}_{t}}$ , where ${\cal N}_{{\cal X}_{t}}$ is the normal cone operator for the set ${\cal X}_{t}$ .

A running version of the projected gradient method for solving (22) is given by:

[TABLE]

for a given step size $\nu>0$ . Let ${\bf y}_{t}$ be a measurement or an estimate of the gradient $\nabla f_{t}({\bf x}_{t})$ ; then, an inexact running projected gradient method is given by:

[TABLE]

In this setting, the bounds (10) and (11) will be utilized to derive tracking results for (24) for the case where the function $f_{t}$ is convex, but not strongly convex; on the other hand, (20) will be utilized for the case where $f_{t}$ is strongly convex uniformly in time.

For simplicity, focus first on the case where ${\cal X}_{t}=\mathbb{R}^{m}$ . Take $\nu\in(0,2/K)$ , with $K:=\sup_{t}\{K_{t}\}$ , so that the operator $\textsf{I}-\nu\nabla f_{t}$ is averaged; that is,

[TABLE]

which is in the form of (3) with $\alpha_{t}=\nu K_{t}/2$ and $\textsf{T}_{t}=\textsf{I}-\frac{2}{K_{t}}\nabla f_{t}$ [25]. On the other hand, the approximate map $\hat{\textsf{T}}_{t}$ is given by $\hat{\textsf{T}}_{t}({\bf x}_{t})={\bf x}_{t}-\frac{2}{K_{t}}{\bf y}_{t}$ . Therefore, for the case where ${\cal X}_{t}=\mathbb{R}^{m}$ , one has that:

[TABLE]

Therefore, if there exists scalar $e_{y,t}<+\infty$ so that $\|\nabla f_{t}({\bf x})-{\bf y}\|\leq e_{y,t}$ [14], $e_{\textsf{T},t}$ in (6) amounts to:

[TABLE]

The results for the inexact running projected gradient method are presented in the following proposition.

Proposition 1

Let $\nu\in(0,2/K)$ , and let $\{{\bf x}_{t}\}$ be a sequence generated by (24). Assume that there exists scalar $e_{y,t}<+\infty$ so that $\|\nabla f_{t}({\bf x})-{\bf y}\|\leq e_{y,t}$ . Then, one has that (24) is an inexact averaged operator with $\alpha_{t}=1/(2-\nu K_{t}/2)$ and

[TABLE]

For the algorithm (23):

(i) The bounds (10), (11), and (18) hold with $e_{\textsf{T},t}$ as in (28);

(ii) Suppose further that $f_{t}$ is strongly convex with constant $k_{t}$ ; then, (20) hold with $L_{t}=\min\{|1-\nu k_{t}|,|1-\nu K_{t}|\}$ .

Proof. See Appendix -D.

IV-B Inexact forward-backward splitting method

Consider the following time-varying problem [19]

[TABLE]

where $f_{t}:\mathbb{R}^{n}\rightarrow\mathbb{R}$ and $g_{t}:\mathbb{R}^{n}\rightarrow\mathbb{R}$ are CCP functions at each time $t$ , and ${\cal X}_{t}$ is a convex and compact set at each time $t$ . Assume that $f_{t}$ is strongly smooth with parameter $K_{t}>0$ for all $t$ , and suppose that $g_{t}$ is not differentiable.

A running version of the forward-backward splitting method for solving (29) is given by:

[TABLE]

where

[TABLE]

is the proximal operator. If $\nu\in(0,2/K)$ , then the update (30) is given by the composition of a proximal operator and the operator $\textsf{I}-\nu\nabla f_{t}$ . The proximal operator is $\frac{1}{2}$ -averaged [25, 3], whereas $\textsf{I}-\nu\nabla f_{t}$ is an averaged operator with $\alpha_{t}=\nu K_{t}/2$ , whenever $\nu\in(0,2/K)$ . Therefore, since the composition of averaged operators is an averaged operator, if follows from [25] that (30) is an averaged operator with $\alpha_{t}=1/(2-\nu K_{t}/2)$ .

An inexact version of the running forward-backward splitting method for solving (29) is given by:

[TABLE]

where ${\bf y}_{t}$ is a measurement or an estimate of $\nabla f_{t}({\bf x}_{t})$ . Assuming that there exists scalar $e_{y,t}<+\infty$ so that $\|\nabla f_{t}({\bf x})-{\bf y}\|\leq e_{y,t}$ , results similar to Proposition 1 apply to the inexact running forward-backward splitting method (32). In particular, (10) and (11) bound the tracking error for (32) when the function $f_{t}$ is not strongly convex.

V Illustrative Numerical Results

As an illustrative example, we consider the network in Fig. 1 with 6 nodes and 8 links. The routing matrix is based on the directed edges. Let $z(i,s)$ denote the rate generated at node $i$ for traffic $s$ and $r(ij,s)$ the flow between noted $i$ and $j$ for traffic $s$ . consider then the following problem:

[TABLE]

where ${\bf z}$ and ${\bf r}$ stack the traffic rates and link rates for brevity, $\kappa(i,s)$ and ${\bf a}$ are given positive coefficients and the set ${\cal X}_{t}$ is built based on: i) the flow-conservation constraints ${\bf z}_{s}={\bf T}({\bf r}^{s}+{\bf w}^{s}_{t})$ per flow $s$ , where ${\bf T}$ is the routing matrix and ${\bf w}^{s}_{t}$ is a time-varying exogenous flow (of uncontrollable traffic); ii) the per-link capacity constraints, where the capacity of link $(i,j)$ is given by $\log(1+p(i,j)h(i,j))$ , with $p(i,j)$ the transmit power and $h(i,j)$ the normalized channel gain; and, iii) the non-negativity constraints on the traffic rates. Assume that two traffic flows are generated by nodes $1$ and $4$ , and they are received at nodes $3$ and $6$ , respectively.

We utilize (24). Errors and time variability of the problem are introduced as follows:

$\bullet$ Gradient errors: the gradient of the cost $\kappa(i,s)\log(1+z(i,s))$ for each exogenous traffic flow is estimated using a multi-point bandit feedback [26, 15]; the estimation error depends on the number of functional evaluations in constructing the proxy of the gradient in (24).

$\bullet$ Solution dynamics: at each time step, the channel gain of links are generated by using a complex Gaussian random variable with mean $1+\jmath 1$ and a given variance $v_{c}$ for both real and imaginary parts; the transmit power for each node is a Gaussian random variable with mean $1$ and a variance $v_{p}$ ; the exogenous traffics are random with mean $[0.2,0.3,0.3,0.4,0.5,0.2,0.1,0.4]$ and a given variance; and, the cost is perturbed by modifying ${\bf a}_{t}$ . Different values for $\sigma_{t}$ and $\sigma$ are obtained by varying the variance of these random variables. Figure 2 illustrates the evolution of the fixed-point residual $(1/T)\sum_{t=1}^{T}\|{\bf x}_{t}-\textsf{F}_{t}({\bf x}_{t})\|^{2}$ , for different values of $\sigma$ and the normalized error in the gradient estimate $e_{y}$ . Optimal rates are in the order of $0.6-1.7$ ; $\sigma=0.7$ implies a $20\%$ worst-case variation in the solution between consecutive time steps, while $\sigma=0.03$ leads to a $1\%$ variation. It can be seen that the fixed-point residual flattens, with an error that increase with the increasing of $\sigma$ and $e_{y}$ , thus corroborating the proposed analytical results.

-A Proof of Theorem 1

Consider $\|{\bf x}_{t+1}-{\bf x}^{\star}_{t+1}\|^{2}$ , which can be bounded as follows by using the definition of $\sigma_{t}$ :

[TABLE]

The term $\|{\bf x}_{t+1}-{\bf x}^{\star}_{t}\|^{2}$ can be expanded as:

[TABLE]

Let ${\bf a}_{t}:=(1-\alpha_{t}){\bf x}_{t}+\alpha_{t}\textsf{T}_{t}({\bf x}_{t})-{\bf x}^{\star}_{t}$ for brevity. Then, (35c) can be further bounded as:

[TABLE]

To bound $\|{\bf a}_{t}\|^{2}$ , consider the following inequality, valid for any vectors ${\bf x}\in\mathbb{R}^{2}$ , ${\bf y}\in\mathbb{R}^{2}$ and scalar $\theta$ :

[TABLE]

Then, using (37) and the fact that ${\bf x}^{\star}_{t}=(1-\alpha_{t}){\bf x}^{\star}_{t}+\alpha_{t}\textsf{T}_{t}({\bf x}^{\star}_{t})$ , one has that:

[TABLE]

where the non-expansiveness of $\textsf{T}_{t}$ was used to obtain (38d). To bound $\|{\bf a}_{t}\|$ , it follows from Assumption 3 that:

[TABLE]

Regarding the third term on the right-hand-side of (34c), one can show that:

[TABLE]

Therefore, using (38d), (39b) in (36c) and (40d), one obtains the following bound:

[TABLE]

or, equivalently,

[TABLE]

Summing (42) over $t=1,2,\ldots,T$ yields (10).

-B Proof of Corollary 2

Note that (14) implies that $\lim_{T\rightarrow\infty}\frac{1}{T}\sum_{t=1}^{T}e_{\textsf{T},t}^{2}=0$ as:

[TABLE]

implying that $\sum_{t=1}^{T}e_{\textsf{T},t}^{2}\leq e_{\textsf{T}}o(T)$ . Then, (12) can be shown from Theorem 1.

-C Proof of Theorem 2

Bound $\|{\bf x}_{t+1}-{\bf x}^{\star}_{t+1}\|$ as:

[TABLE]

where the definition of $\sigma_{t}$ was used in (44c) and Assumption 2 was utilized to obtain (44e). Therefore,

[TABLE]

Applying (45) recursively for $\tau=1,\ldots,t$ yields (18).

Next, take $\gamma:=\sup_{t}\{\alpha_{t}e_{\textsf{T},t}\}+\sup_{t}\{\sigma_{t}\}$ and $L:=\sup_{t}\{L_{t}\}$ , where $L<1$ . Then, (18) is upper bounded by

[TABLE]

where $\bar{c}^{(t,\tau)}=1$ is $\tau=t$ and $\bar{c}^{(t,\tau)}=L^{t-\tau+1}$ is $\tau=1,\ldots,t-1$ . The first term on the right-hand-side of (46) vanishes with the increasing of $t$ . The second term on the right-hand-side is the sum of the first $t$ terms of a geometric series. Taking the limit for $t\rightarrow+\infty$ the result (10) follows.

-D Proof of Proposition 1

First, for each time $t$ , $\nu\in(0,2/K_{t})$ then the fact that $\alpha_{t}=1/(2-\nu K_{t}/2)$ is proved in [25, Proposition 2.4]. The exact and approximate maps $\textsf{T}_{t}$ and $\hat{\textsf{T}}_{t}$ can be expressed as:

[TABLE]

Therefore, using the non-expansive property of the projection operator, one has that:

[TABLE]

Using $\alpha_{t}=1/(2-\nu K_{t}/2)$ and the bound for $\|\nabla f_{t}({\bf x})-{\bf y}_{t}\|$ , the result (i) follows. The result for (ii) builds on the strong convexity and strong smoothness of $f_{t}$ ; when $\nu\in(0,2/K)$ , then the operator $\textsf{I}-\nu\nabla f_{t}$ is contractive, and the composition of a contractive operator and a non-expansive one is contractive [25].

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] H. H. Bauschke and P. L. Combettes, Convex analysis and monotone operator theory in Hilbert spaces . Springer, 2011, vol. 408.
2[2] P. Combettes and T. Pennanen, “Generalized Mann Iterates for Constructing Fixed Points in Hilbert Spaces,” Journal of Mathematical Analysis and Applications , vol. 275, no. 2, pp. 521 – 536, 2002.
3[3] E. K. Ryu and S. Boyd, “Primer on monotone operator methods,” Appl. Comput. Math. , vol. 15, no. 1, pp. 3–43, Jan 2016.
4[4] A. Simonetto, “Time-varying convex optimization via time-varying averaged operators,” 2017, [Online] Available at:https://arxiv.org/abs/1704.07338.
5[5] S. Mou, J. Liu, and A. S. Morse, “A distributed algorithm for solving a linear algebraic equation,” IEEE Trans. on Automatic Control , vol. 60, no. 11, pp. 2863–2878, Nov. 2015.
6[6] G. Belgioioso, F. Fabiani, F. Blanchini, and S. Grammatico, “On the convergence of discrete-time linear systems: A linear time-varying mann iteration converges IFF its operator is strictly pseudocontractive,” IEEE Control Systems Letters , vol. 2, no. 3, pp. 453–458, July 2018.
7[7] R. Cominetti, J. A. Soto, and J. Vaisman, “On the rate of convergence of Krasnoselski-Mann iterations and their connection with sums of bernoullis,” Israel Journal of Mathematics , vol. 199, no. 2, pp. 757–772, 2014.
8[8] A. Themelis and P. Patrinos, “Super Mann: a superlinearly convergent algorithm for finding fixed points of nonexpansive operators,” ar Xiv:1609.06955 , 2016.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On the Convergence of the Inexact Running

Abstract

I Introduction and Problem Formulation

II Inexact Running Algorithm

Assumption 1** (Bounded approximation error)**

III Convergence

Assumption 2** (Lipshitz maps)**

Assumption 3** (Bounded maps)**

Theorem 1

Corollary 1

Corollary 2

Theorem 2

Corollary 3

Remark 1

Remark 2

IV Examples of applications

IV-A Running gradient method with errors

Proposition 1

IV-B *Inexact forward-backward splitting method *

V Illustrative Numerical Results

-A Proof of Theorem 1

-B Proof of Corollary 2

-C Proof of Theorem 2

-D Proof of Proposition 1

Assumption 1 (Bounded approximation error)

Assumption 2 (Lipshitz maps)

Assumption 3 (Bounded maps)

IV-B Inexact forward-backward splitting method