On the Convergence of the Inexact Running Krasnosel'skii-Mann Method
Emiliano Dall'Anese, Andrea Simonetto, Andrey Bernstein

TL;DR
This paper analyzes the convergence of an inexact, evolving version of the Krasnosel'skii-Mann method, providing theoretical guarantees for fixed-point tracking under imperfect information and dynamic maps.
Contribution
It introduces a framework for analyzing inexact, running Krasnosel'skii-Mann algorithms with evolving maps and imperfect data, extending convergence results to these settings.
Findings
Convergence of the average fixed-point residual in non-expansive cases.
Linear convergence to a fixed-point trajectory under contractive operators.
Applicability to inexact gradient and forward-backward splitting methods.
Abstract
This paper leverages a framework based on averaged operators to tackle the problem of tracking fixed points associated with maps that evolve over time. In particular, the paper considers the Krasnosel'skii-Mann method in a settings where: (i) the underlying map may change at each step of the algorithm, thus leading to a "running" implementation of the Krasnosel'skii-Mann method; and, (ii) an imperfect information of the map may be available. An imperfect knowledge of the maps can capture cases where processors feature a finite precision or quantization errors, or the case where (part of) the map is obtained from measurements. The analytical results are applicable to inexact running algorithms for solving optimization problems, whenever the algorithmic steps can be written in the form of (a composition of) averaged operators; examples are provided for inexact running gradient methods and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On the Convergence of the Inexact Running
Krasnosel’skiĭ-Mann Method
Emiliano Dall’Anese1, Andrea Simonetto2, Andrey Bernstein3 1E. Dall’Anese is the University of Colorado Boulder; email: [email protected]. 2A. Simonetto is with IBM Research Ireland; email: [email protected]. 3A. Bernstein is with the National Renewable Energy Laboratory (NREL); email: [email protected]. The work of E. Dall’Anese was supported by NREL via APUP UGA-0-41026-109. Funds for A. Bernstein were provided by ARPA-e NODES.
Abstract
This paper leverages a framework based on averaged operators to tackle the problem of tracking fixed points associated with maps that evolve over time. In particular, the paper considers the Krasnosel’skiĭ-Mann method in a settings where: (i) the underlying map may change at each step of the algorithm, thus leading to a “running” implementation of the Krasnosel’skiĭ-Mann method; and, (ii) an imperfect information of the map may be available. An imperfect knowledge of the maps can capture cases where processors feature a finite precision or quantization errors, or the case where (part of) the map is obtained from measurements. The analytical results are applicable to inexact running algorithms for solving optimization problems, whenever the algorithmic steps can be written in the form of (a composition of) averaged operators; examples are provided for inexact running gradient methods and the forward-backward splitting method. Convergence of the average fixed-point residual is investigated for the non-expansive case; linear convergence to a unique fixed-point trajectory is showed in the case of inexact running algorithms emerging from contractive operators.
I Introduction and Problem Formulation
The Banach-Picard method and its Krasnosel’skiĭ-Mann (KM) variant have been leveraged to establish convergence of a number of iterative algorithmic frameworks for solving convex optimization problems as well as problems associated with (non)linear systems [1, 2, 3, 4, 5]. Focusing on the KM method, recall that an operator , where is a nonempty convex subset of a finite-dimensional Hilbert space with a given norm , is non-expansive if it is -Lipschitz in ; that is, one has that . The KM algorithm involves the sequential application of the following operator starting from a point in (with the iteration index):
[TABLE]
with the identity operator and a sequence in satisfying [1]. Based on (1), convergence of iterative algorithms for solving optimization problems can be cast as the problem of finding fixed points of a properly constructed non-expansive map T (which are also fixed points of F). As another example, the operator-based representation (1) can be utilized to investigate convergence of discrete-time linear systems [6].
The KM method (1) is known to converge weakly to a fixed point of T [1, 7, 6, 8]; that is, taking the case of a constant value of as an example, one has that the average fixed-point residual of the map T after iterations can be bounded as [1, 7]:
[TABLE]
with a fixed point. See also the inexact [9] and stochastic [10] variants, as well as more results on convergence of algorithms involving averaged non-expansive operators [11].
While (2) pertains to problems where the map T is “fixed” during the execution of the KM algorithm and it is known, this paper revisits the convergence of the KM method in case of time-varying and possibly inexact maps. This setting is motivated by recent efforts to address the design and analysis of running algorithms for time-varying optimization problems [12, 13, 4, 14], with particular emphasis on feedback-based online optimization [14, 15]; additional works along these lines are in the context of online optimization (see the representative works [16, 17, 18] and references therein) and learning in dynamic environments [19, 20]. In a time-varying optimization setting, the underlying cost, constraints, and problem inputs may change at every step (or a few steps) of the algorithm; therefore, pertinent tasks in this case involve the derivation of results for the tracking of optimal solution trajectories. Updates of the algorithms may be implemented inexactly due to finite-precision [21] or because measurement feedback is utilized in lieu of model-based gradient computations [14]. Counterparts of (2) are of interest for inexact running algorithms for problems with time-varying cost functions that are (locally) convex but not strongly convex; in case of problems with a (locally) strongly convex costs, contractive arguments can be leveraged.
To concretely outline the problem, consider discretizing the temporal index as , and with a given interval (that will coincide with the time required to evaluate a map). Taking the normed space for the rest of the paper, consider a convex and closed set and a sequence of non-expansive mappings . In particular, assume that is -averaged; that is, it is a convex combination
[TABLE]
. Starting from , the running KM method amounts to the execution of the following step at each :
[TABLE]
Different from the “batch” KM method – especially when a Mann sequence is utilized – where (1) is executed within an interval until convergence, the running algorithm (4) boils down to a sequential application of time-varying -averaged maps. Preliminary results for the convergence of (4) were provided in [4].
The paper investigates the ability of the running algorithm (4) to track fixed points of the sequence of mappings , when in imperfect mapping is available. Notice that fixed points would be identified at each time only if the KM method (1) is executed to convergence at each (i.e., in a batch setting, instead of performing only one iteration) and the map is known. This paper derives results similar to (2) for the inexact running KM method; results are also provided for the case of vanishing errors and vanishing fixed-point dynamics. The paper further considers the case where the overall mappings are contractions, and establishes linear convergence to the unique fixed-point trajectory. The proposed framework is then exemplified for inexact running projected gradient and forward-backward splitting methods for solving time-varying convex optimization problems. Overall, the paper provides contributions over our previous work [22] on running Banach-Picard method, where linear convergence results where established in case of time-varying contractive maps, possibly corrupted by errors. Stochastic time-varying-fixed problems were considered in [20, Th. 20]; here, we focus on bounded errors on averaged operators, and leave stochastic errors as a follow on research opportunity.
II Inexact Running Algorithm
Let be a fixed point of the self-mapping ; that is, . If the vectors satisfy the equation for each , then we refer to as a sequence of fixed points. If the mappings are averaged, multiple sequences may exist; since , is also a fixed point of . When are contractions, only one sequence exists by the Banach fixed-point theorem. To characterize the variability of a fixed-point sequence, we assume that there exists a sequence of fixed points , for which there exists a finite and non-negative sequence of scalars , such that
[TABLE]
for all . If then one has that , and we are recover the time-invariant case.
Consider now a mapping , which is an approximation of in the following sense.
Assumption 1** (Bounded approximation error)**
For each and for all , it holds that . Further, there exists a scalar such that
[TABLE]
The condition (6) simply asserts that the error in the map is bounded; it can be deterministic or stochastic (and i.i.d over time), but with finite support. Accordingly, define the approximate -averaged map as:
[TABLE]
Based on (7), and given an initial point the inexact running KM algorithm is given by [cf. (4)]:
[TABLE]
In the next section, tracking of a sequence of fixed points via (8) will be investigated.
III Convergence
This section will characterize the performance of the inexact running KM method in two different settings:
i) The map is non-expansive and is -averaged; and,
ii) The map is a contraction.
It is worth pointing out that for generic non-expansive maps, the sequence generated by the Banach-Picard iteration may fail to produce a fixed point even in a static case; the structure of (8) will however facilitate the derivation of convergence results. Regarding the second case, notice that if is contractive then is contractive; however, the converse is not necessarily true. We start by outlining the following standard assumptions [1, 7].
Assumption 2** (Lipshitz maps)**
There exists a scalar such that for all .
Assumption 3** (Bounded maps)**
There exists a scalar such that
[TABLE]
If is compact, then can be taken, in the worst case, to be the radius of . For subsequent developments, define , , , and . The following result pertains to the case where is -averaged.
Theorem 1
Consider a sequence of -averaged operators , , and assume that there exists a sequence of vectors that satisfy the equation for each . Suppose that Assumptions 1–3 hold, and take . Then, the following bound holds for the algorithm (8):
[TABLE]
where . In particular, one has that:
[TABLE]
[TABLE]
with .
Proof. See Appendix -A
Bounds (11)–(12) imply convergence in mean of the fixed-point residual to a ball centered at [math]; the size of the ball depends on the bound on the variability of the fixed-point trajectories, on the size of the image of the operators, and on the approximation errors for the maps. An immediate follow-up from (11)–(12) is the following asymptotic result:
[TABLE]
where . A similar result can be derived for the mean of .
It is worth pointing out that, when , the bound in (13) reduces to , and the bounds therefore capture the effect of the approximate maps. In case of perfect mappings, (13) boils down to (2) [1, 7]. Motivated by this, the next results will deal with vanishing errors and fixed-point dynamics, which is increasingly motivated by learning in bandit settings (where the maps are learned online while the algorithm is running).
Corollary 1
Suppose111A relation signifies that for every positive constant there exists such that for all . that for each , one has that
[TABLE]
i.e., grows sublinearly in . If Assumptions 1–3 hold, then, for the algorithm (8), the fixed-point residual converges to:
[TABLE]
where .
Proof. See Appendix -B.
Corollary 2
Suppose that for each , one has that
[TABLE]
i.e., grows sublinearly in . Assume further that (14) holds. Then, under Assumptions 1–3, for the algorithm (8) one has that and .
For completeness, we now turn the attention to convergence results for contractive operators. The following holds.
Theorem 2
Consider a sequence of contractive mappings of the form , and let be the trajectory of fixed points. Let be a sequence generated by the algorithm (8), with . Suppose that Assumptions 1–3 hold. Then, at each time , it holds that:
[TABLE]
for each , where
[TABLE]
Suppose further that Assumption 2 holds with for all . Then, is unique and the following asymptotic bound holds for the algorithm (8):
[TABLE]
where and .
Proof. See Appendix -C.
Bound (20) in similar to [22], but customized for the operators considered here. In case of vanishing errors and dynamics, the following results readily hold.
Corollary 3
Suppose that (14) holds. Then, if Assumption 2 holds with for all , then
[TABLE]
Additionally, if (17) holds, then .
Remark 1
When a predictable sequence is available, one could reduce the error ball to a sublinear function of by properly tuning the sequence , even if and do not vanish; see, for example, the framework in [23] for adaptive optimistic mirror descent methods. Due to space limitations, we leave the derivation of these results for future efforts.
Remark 2
Proof techniques in [24] presuppose particular sequences and to establish convergence results for e.g., static non-expansive and strictly pseudocontractive maps (see, e.g., Theorems 6.1 and 6.2) as well as for (static) maps defined in Banach spaces (see, e.g., Theorem 6.8). Adopting the sequences in [24] might not be possible in a time-varying setting, especially when for ; however, future efforts will look at possible extensions of the techniques in [24] in the time-varying case.
IV Examples of applications
The objective of this section is to show that a number of inexact running algorithms for time-varying optimization problems can be analyzed by leveraging the operator-based framework proposed in this paper. In particular, this section focuses on inexact running gradient methods and forward-backward splitting algorithms. Additional applications are possible [3], but are not included due to space limitations.
IV-A Running gradient method with errors
Recall that the temporal index is discretized as , , with a given interval (that can coincide with the time required to perform one algorithmic step). Consider the following time-varying optimization problem
[TABLE]
where is a convex, closed, and proper (CCP) function at each time , and is a convex and compact set at each time . Assume that is strongly smooth with parameter . Notice that solving the problem (22) is equivalent to finding the zeros of , where is the normal cone operator for the set .
A running version of the projected gradient method for solving (22) is given by:
[TABLE]
for a given step size . Let be a measurement or an estimate of the gradient ; then, an inexact running projected gradient method is given by:
[TABLE]
In this setting, the bounds (10) and (11) will be utilized to derive tracking results for (24) for the case where the function is convex, but not strongly convex; on the other hand, (20) will be utilized for the case where is strongly convex uniformly in time.
For simplicity, focus first on the case where . Take , with , so that the operator is averaged; that is,
[TABLE]
which is in the form of (3) with and [25]. On the other hand, the approximate map is given by . Therefore, for the case where , one has that:
[TABLE]
Therefore, if there exists scalar so that [14], in (6) amounts to:
[TABLE]
The results for the inexact running projected gradient method are presented in the following proposition.
Proposition 1
Let , and let be a sequence generated by (24). Assume that there exists scalar so that . Then, one has that (24) is an inexact averaged operator with and
[TABLE]
For the algorithm (23):
(i) The bounds (10), (11), and (18) hold with as in (28);
(ii) Suppose further that is strongly convex with constant ; then, (20) hold with .
Proof. See Appendix -D.
IV-B *Inexact forward-backward splitting method *
Consider the following time-varying problem [19]
[TABLE]
where and are CCP functions at each time , and is a convex and compact set at each time . Assume that is strongly smooth with parameter for all , and suppose that is not differentiable.
A running version of the forward-backward splitting method for solving (29) is given by:
[TABLE]
where
[TABLE]
is the proximal operator. If , then the update (30) is given by the composition of a proximal operator and the operator . The proximal operator is -averaged [25, 3], whereas is an averaged operator with , whenever . Therefore, since the composition of averaged operators is an averaged operator, if follows from [25] that (30) is an averaged operator with .
An inexact version of the running forward-backward splitting method for solving (29) is given by:
[TABLE]
where is a measurement or an estimate of . Assuming that there exists scalar so that , results similar to Proposition 1 apply to the inexact running forward-backward splitting method (32). In particular, (10) and (11) bound the tracking error for (32) when the function is not strongly convex.
V Illustrative Numerical Results
As an illustrative example, we consider the network in Fig. 1 with 6 nodes and 8 links. The routing matrix is based on the directed edges. Let denote the rate generated at node for traffic and the flow between noted and for traffic . consider then the following problem:
[TABLE]
where and stack the traffic rates and link rates for brevity, and are given positive coefficients and the set is built based on: i) the flow-conservation constraints per flow , where is the routing matrix and is a time-varying exogenous flow (of uncontrollable traffic); ii) the per-link capacity constraints, where the capacity of link is given by , with the transmit power and the normalized channel gain; and, iii) the non-negativity constraints on the traffic rates. Assume that two traffic flows are generated by nodes and , and they are received at nodes and , respectively.
We utilize (24). Errors and time variability of the problem are introduced as follows:
Gradient errors: the gradient of the cost for each exogenous traffic flow is estimated using a multi-point bandit feedback [26, 15]; the estimation error depends on the number of functional evaluations in constructing the proxy of the gradient in (24).
Solution dynamics: at each time step, the channel gain of links are generated by using a complex Gaussian random variable with mean and a given variance for both real and imaginary parts; the transmit power for each node is a Gaussian random variable with mean and a variance ; the exogenous traffics are random with mean and a given variance; and, the cost is perturbed by modifying . Different values for and are obtained by varying the variance of these random variables. Figure 2 illustrates the evolution of the fixed-point residual , for different values of and the normalized error in the gradient estimate . Optimal rates are in the order of ; implies a worst-case variation in the solution between consecutive time steps, while leads to a variation. It can be seen that the fixed-point residual flattens, with an error that increase with the increasing of and , thus corroborating the proposed analytical results.
-A Proof of Theorem 1
Consider , which can be bounded as follows by using the definition of :
[TABLE]
The term can be expanded as:
[TABLE]
Let for brevity. Then, (35c) can be further bounded as:
[TABLE]
To bound , consider the following inequality, valid for any vectors , and scalar :
[TABLE]
Then, using (37) and the fact that , one has that:
[TABLE]
where the non-expansiveness of was used to obtain (38d). To bound , it follows from Assumption 3 that:
[TABLE]
Regarding the third term on the right-hand-side of (34c), one can show that:
[TABLE]
Therefore, using (38d), (39b) in (36c) and (40d), one obtains the following bound:
[TABLE]
or, equivalently,
[TABLE]
Summing (42) over yields (10).
-B Proof of Corollary 2
Note that (14) implies that as:
[TABLE]
implying that . Then, (12) can be shown from Theorem 1.
-C Proof of Theorem 2
Bound as:
[TABLE]
where the definition of was used in (44c) and Assumption 2 was utilized to obtain (44e). Therefore,
[TABLE]
Applying (45) recursively for yields (18).
Next, take and , where . Then, (18) is upper bounded by
[TABLE]
where is and is . The first term on the right-hand-side of (46) vanishes with the increasing of . The second term on the right-hand-side is the sum of the first terms of a geometric series. Taking the limit for the result (10) follows.
-D Proof of Proposition 1
First, for each time , then the fact that is proved in [25, Proposition 2.4]. The exact and approximate maps and can be expressed as:
[TABLE]
Therefore, using the non-expansive property of the projection operator, one has that:
[TABLE]
Using and the bound for , the result (i) follows. The result for (ii) builds on the strong convexity and strong smoothness of ; when , then the operator is contractive, and the composition of a contractive operator and a non-expansive one is contractive [25].
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] H. H. Bauschke and P. L. Combettes, Convex analysis and monotone operator theory in Hilbert spaces . Springer, 2011, vol. 408.
- 2[2] P. Combettes and T. Pennanen, “Generalized Mann Iterates for Constructing Fixed Points in Hilbert Spaces,” Journal of Mathematical Analysis and Applications , vol. 275, no. 2, pp. 521 – 536, 2002.
- 3[3] E. K. Ryu and S. Boyd, “Primer on monotone operator methods,” Appl. Comput. Math. , vol. 15, no. 1, pp. 3–43, Jan 2016.
- 4[4] A. Simonetto, “Time-varying convex optimization via time-varying averaged operators,” 2017, [Online] Available at:https://arxiv.org/abs/1704.07338.
- 5[5] S. Mou, J. Liu, and A. S. Morse, “A distributed algorithm for solving a linear algebraic equation,” IEEE Trans. on Automatic Control , vol. 60, no. 11, pp. 2863–2878, Nov. 2015.
- 6[6] G. Belgioioso, F. Fabiani, F. Blanchini, and S. Grammatico, “On the convergence of discrete-time linear systems: A linear time-varying mann iteration converges IFF its operator is strictly pseudocontractive,” IEEE Control Systems Letters , vol. 2, no. 3, pp. 453–458, July 2018.
- 7[7] R. Cominetti, J. A. Soto, and J. Vaisman, “On the rate of convergence of Krasnoselski-Mann iterations and their connection with sums of bernoullis,” Israel Journal of Mathematics , vol. 199, no. 2, pp. 757–772, 2014.
- 8[8] A. Themelis and P. Patrinos, “Super Mann: a superlinearly convergent algorithm for finding fixed points of nonexpansive operators,” ar Xiv:1609.06955 , 2016.
