Stochastic Primal-Dual Coordinate Method with Large Step Size for Composite Optimization with Composite Cone-constraints
Daoli Zhu, Lei Zhao

TL;DR
This paper proposes a stochastic primal-dual coordinate method with large step size for solving composite optimization problems with cone constraints, achieving convergence and high probability complexity bounds.
Contribution
It introduces a novel stochastic coordinate extension of primal-dual methods with parallel decomposition and large step size for COCC problems, providing convergence guarantees.
Findings
Almost sure convergence of the method
Expected convergence rate of O(1/t)
High probability complexity bounds
Abstract
We introduce a stochastic coordinate extension of the first-order primal-dual method studied by Cohen and Zhu (1984) and Zhao and Zhu (2018) to solve Composite Optimization with Composite Cone-constraints (COCC). In this method, we randomly choose a block of variables based on the uniform distribution. The linearization and Bregman-like function (core function) to that randomly selected block allow us to get simple parallel primal-dual decomposition for COCC. We obtain almost surely convergence and O(1/t) expected convergence rate in this work. The high probability complexity bound is also derived in this paper.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Statistical Methods and Inference
Stochastic Primal-Dual Coordinate Method with Large Step Size for Composite Optimization with Composite Cone-constraints
Daoli Zhu and Lei Zhao Manuscript received February 16, 2019; revised.Daoli Zhu was with Antai College of Economics and Management and Sino-US Global Logistics Institute, Shanghai Jiao Tong University, 200030 Shanghai, China (e-mail: [email protected])Lei Zhao was with the Antai College of Economics and Management, Shanghai Jiao Tong University, 200030 Shanghai, China (e-mail: [email protected])
Abstract
We introduce a stochastic coordinate extension of the first-order primal-dual method studied by Cohen and Zhu (1984) and Zhao and Zhu (2018) to solve Composite Optimization with Composite Cone-constraints (COCC). In this method, we randomly choose a block of variables based on the uniform distribution. The linearization and Bregman-like function (core function) to that randomly selected block allow us to get simple parallel primal-dual decomposition for COCC. We obtain almost surely convergence and expected convergence rate in this work. The high probability complexity bound is also derived in this paper.
Index Terms:
composite optimization with composite cone-constrains, stochastic primal-dual coordinate method with large step size, augmented Lagrangian.
I Introduction
Motivated by recent applications in big data analysis, there has been an explosive growth in interest in the design and analysis of block coordinate descent type (BCD-type) methods for large-scale convex optimization. (see [13, 24, 35, 39]) In these applications, the datasets used for computation are very big and are often distributed in different locations. It is often impractical to assume that optimization algorithms can traverse an entire dataset once in each iteration, because doing so is either time consuming or unreliable, and often results in low resource utilization due to necessary synchronization among different computing units (e.g., CPUs, GPUs, and cores) in a distributed computing environment. On the other hand, BCD-type algorithms can make progress by using information obtained from a randomly selected subset of data and, thus, provide much flexibility for their implementation in the aforementioned distributed environments. The main advantage of BCD-type method is to reduce the complexity and memory requirements per iteration. These benefits are increasingly important for very-large scale problem.
In this paper, we consider the nonlinear convex cone-constrained optimization problem known as a Composite Optimization with Composite Cone-constrains (COCC):
[TABLE]
where is a convex smooth function on the closed convex set and is a convex, possibly nonsmooth function on . is a smooth and is a possibly nonsmooth mapping from to . and are -convex and is a nonempty closed convex cone in with vertex at the origin, that is, , for . It is obvious that when (the interior of ) is nonempty, the constraint corresponds to an inequality constraint. The case corresponds to an equality constraint. denotes the conjugate cone i.e. . We note that COCC has full composite structure.
Assume that both and are additive respect to following space decomposition:
[TABLE]
I-A Related works
For problems without constraints, there are two variations of BCD discussed the most by researchers. The first variation is on block-choosing strategy. One common approach for block choosing is cyclic strategy. Tseng [32] proved the convergence of a BCD of cyclic strategy. Luo and Tseng [17] and Wang and Lin [33] proved local and global linear convergence under specific assumptions respectively. The other approach is randomized strategy. Nesterov [19] studied the convergence rate of randomized BCD for convex smooth optimization. Richtárik and Takáč [25] and Lu and Xiao [16] extended Nesterov’s technique to composite optimization. The point read to evaluate the gradient in each iteration is the second variation of BCD. If the read points have different ”ages”, this type of BCD called asynchronous BCD; otherwise, it is called synchronous BCD. All the variants of BCD reviewed above are synchronous BCD. Liu and Wright [14] and Liu et. al. [15] established the convergence rate of asynchronous BCD for composite optimization and convex smooth optimization without constraints, respectively.
For problems with constraints, there are only a few works. Gao et. al. [10] proposed a coordinate-type method for problems with linear coupling constraints. Necoara and Patrascu [18] proposed a random coordinate descent algorithm for an optimization problem with one linear constraint. Xu and Zhang [37] analyzed primal-dual coordinate type method for a linear constrained strongly convex problem. Moreover, Xu [38] proposed an asynchronous primal-dual coordinate-type method for linear constrained problems. For problem with nonlinear constraints, Xu [36] proposed a coordinate-type method for problem with nonlinear inequality constraints. To the best of our knowledge, there is no primal-dual coordinate convergence rate results for COCC.
I-B Main contributions and outline of this paper
In this paper, we propose a Stochastic Primal-Dual Coordinate with Large step size (SPDCL) method based on the variant auxiliary problem principle (Zhao and Zhu [41]) for COCC. In this method, we randomly update one block of variables based on the uniform distribution. The sequence generated by our algorithm is proved to converge to an optimal solution of problem (P) with probability . The expected convergence rate is also obtained for problem (P) under the convexity assumptions. The probability complexity bound is also derived in this paper.
The rest of this paper is organized as follows. Section II is devoted to technical preliminaries. The updating scheme of SPDCL for (P) is presented in Section III. In Section IV, we establish the convergence. In Section V, expected sub-linear convergence rate and the high probability complexity bound are established.
II Preliminaries
In this section, we first provide some preliminaries that are useful for our further discussions and then summarize some notations and assumptions to be used. We denote and as the inner product and Euclidean norm of vector, respectively.
II-A Notations and assumptions
Throughout this paper, we make the following standard assumptions for Problem (P):
Assumption 1
- (i)
is a convex, l.s.c function such that , is not necessary differentiable. is subgradientiable and has linear bounded subgradients in , that is
[TABLE]
- (ii)
is a convex and differentiable with its derivative Lipschitz of constant .
- (iii)
is -convex mapping from to , where , ,
[TABLE]
Moreover, the derivative of exists and meets the following condition: such that ,
[TABLE]
- (iv)
is -convex mapping from to .
- (v)
is Lipschitz with constant on an open subset containing , where
[TABLE]
- (vi)
Constraint Qualification Condition. When , we assume that
[TABLE]
For the case , we assume that .
- (vii)
There exists at least one saddle point for Lagrangian of (P).
Condition (i)-(iv) guarantee that (P) is a convex problem. The CQC condition (vi) implies that the Lagrangian dual function is coercive and the dual optimal solution set is bounded [8]. Furthermore, the following subsection gives augmented Lagrangian and first-order primal-dual decomposition algorithm for (P).
II-B Augmented Lagrangian and first-order primal-dual decomposition algorithm
In this subsection, the Lagrangian of (P) is defined as:
[TABLE]
and a saddle point is such that
[TABLE]
Under Assumption 1, there exist saddle points of on . The dual function is defined as
[TABLE]
The function is concave and sub-differentiable. Using dual function , we consider the primal-dual pair of nonlinear convex cone optimization:
[TABLE]
The following theorem characterizes a saddle point optimality condition for the primal and dual problem.
Theorem 1
A solution with and is a saddle point for the Lagrangian function if and only if
- (i)
or the following variational inequality holds: ,
[TABLE]
- (ii)
;
- (iii)
.
Moreover, is a saddle point if and only if and are, respectively, optimal solutions to the primal and dual problems (P) and (D) with no duality gap, that is, with .
Now we take a trick by introducing slack variables which help problem (P) come back to problem with equality constraints. Namely, the problem (P) is converted into the equivalent problem with equality constraints as follows
[TABLE]
The augmented Lagrangian for this problem is
[TABLE]
The augmented Lagrangian associated with problem (P) is defined as
[TABLE]
where \varphi(\Theta(u),p)=[\|\Pi\big{(}p+\gamma\Theta(u)\big{)}\|^{2}-\|p\|^{2}]/2\gamma and is a projection on to .
The augmented Lagrangian dual function is as following:
[TABLE]
Using , we obtain new primal-dual pair of nonlinear convex cone optimization
[TABLE]
The following theorem shows that function , dual function and augmented Lagrangian have some useful properties.
Theorem 2
Suppose Assumption 1 holds for problem (P). Then we have
- (i)
The function is convex in and concave in .
- (ii)
* is differentiable in and and one has*
[TABLE]
- (iii)
* is concave and differentiable in , and , where .*
- (iv)
* and have the same sets of saddle points respectively on and .*
- (v)
* is stable in , that is .*
Moreover, next lemma will give another property of augmented Lagrangian term.
Lemma 1
For all , and , we have that
[TABLE]
or
[TABLE]
Proof.
[TABLE]
For the general COCC, the augmented Lagrangian method is an approach which can overcome the instability and nondifferentiability of the dual function of the Lagrangian. Furthermore, the augmented Lagrangian of a constrained convex program has the same solution set as the original constrained convex program. The augmented Lagrangian approach for equality-constrained optimization problems was introduced in Hestenes [11] and Powell [23], and then extended to inequality-constrained problems by Buys [4].
Although the augmented Lagrangian approach (Uzawa algorithm) has several advantages, it does not preserve separability, even when the initial problem is separable. One way to decompose the augmented Lagrangian is ADMM (Fortin and Glowinski [9]). ADMM can only handle convex problems with linear constraints and is not easily parallelizable. Another way to overcome this difficulty is the Auxiliary Problem Principle of augmented Lagrangian methods (APP-AL) (Cohen and Zhu [8]), which is a fairly general first-order primal-dual decomposition method based on linearization of the augmented Lagrangian in nonlinear convex cone programming with separable or nonseparable, smooth or nonsmooth constraints. Zhao and Zhu (2018) [41] extend Cohen and Zhu (1984) [8]’s work to propose first-order primal-dual augmented Lagrangian methods for COCC as an algorithm (VAPP).
**Variant Auxiliary Problem Principle for solving COCC (VAPP)
** Initialize and
for , do
[TABLE]
**end for
** where is a Bregman like function with is strongly convex and gradient Lipschitz. Zhao and Zhu (2018) shows the sequence generated by VAPP convergence to saddle point of over . Moreover, an convergence rate is also proposed. In the era of big data, there has been a surge of interest in redesign of VAPP suitable for solving the huge optimization with available computing performance.
II-C The properties of projection on convex cone
In this subsection, we introduce some properties of projection on convex sets (resp. convex cone) as preparations. These properties are used in the following sections.
Let be a nonempty closed convex set of . For , we propose the projection as a projection on . Then is characterized by the following two conditions [6]:
[TABLE]
Furthermore, the following proposition gives another property of projection operator which is used for convergence and convergence rate analysis.
Proposition 1
For any , the projection operator satisfies
[TABLE]
Proof. See [41].
Next, we consider the properties for projection on convex cone. Let be a nonempty closed convex cone in with vertex at the origin. denotes the conjugate cone. Let denote the projection on and denote the projection on . The projection is characterized by the following conditions. (see Wierzbicki [34]):
[TABLE]
II-D The properties of differentiable functions and mappings
Lemma 2
Let the function be convex and differentiable on .
(i)* If is strongly convex with constant , then*
[TABLE]
(ii)* If the derivative of is Lipschitz with constant , then*
[TABLE]
(iii)* Let be a -convex mapping from to . Suppose its derivative exists and meets the following condition: such that*
[TABLE]
then we have
[TABLE]
Proof. The statements (i) and (ii) are classical; the proof is omitted (see Zhu and Marcotte [42]). For proof of (iii), see Cohen [7].
III Stochastic primal-dual coordinate method
In this section, we propose a stochastic primal-dual coordinate descent algorithm to solve (P). Firstly, we introduce the core function satisfying the following assumption:
Assumption 2
is strongly convex with parameter and differentiable with its gradient Lipschitz continuous with parameter on .
Additionally, let is a Bregman like function (core function) [1, 8]. From Assumption 2 we have: .
Moreover, we assume that the parameter satisfy:
[TABLE]
Let be a bound of dual optimal solution of (P), denote . Let . The estimation of can be found in [41]. By using the projection onto , we introduce Stochastic Primal-Dual Coordinate Method with Large step size (SPDCL) for solving (P):
**Stochastic Primal-Dual Coordinate Method with Large step size (SPDCL)
** Initialize , and
for , do
[TABLE]
**end for
** For the sake of brevity, let us set that q^{k}=\Pi\big{(}p^{k}+\gamma\Theta(u^{k})\big{)}, q^{k+1/2}=\Pi\big{(}p^{k}+\gamma\Theta(u^{k+1})\big{)} and . Then the primal problem of algorithm can be expressed as
[TABLE]
If we choose an additive Bregman like function (or core function) respect to the space decomposition (2) that is
[TABLE]
Then problem (APk) is just a small optimization problem for selected block . Specifically, taking for (APk), we perform only a block proximal gradient update for block , where we linearize the coupled function and augmented Lagrangian term and add the proximal term to it. In the following sections, we will establish the convergence and convergence rate and probability complexity bounds of SPDCL.
IV Convergence analysis
In this section, we will establish results about convergence of SPDCL. Before proceeding, we first give the generalized equilibrium reformulation of saddle point formulation (8):
Find such that
[TABLE]
Obviously, bifunction is convex in and linear in for given , .
In algorithm SPDCL, the indices , are random variables. After iterations, SPDCL method generates a random output . We denote by is a filtration generated by the random variable , i.e.,
[TABLE]
Additionaly, we define that , is the condition expectation w.r.t. and the condition expectation in term of given as .
Knowing , we have:
[TABLE]
Given , for any and , we construct the following function:
[TABLE]
Specifically, we can show the function value of at () provides an upper bound for .
[TABLE]
Additionally, since the SPDCL scheme guarantee that , we have that
[TABLE]
Before the convergence analysis, we need the following lemma.
Lemma 3
(Global estimation of bifunction values)* Let Assumption 1 and 2 hold, is generated by SPDCL, the parameter satisfy (21). For all and , could possibly be random, it holds that
[TABLE]
[TABLE]
*where
and
.*
Proof. The proof of this lemma is left in Appendix.
Based Lemma 3, we establish the following convergence analysis of SPDCL.
Theorem 3** (Almost surely convergence)**
Let assumptions of Lemma 3 hold, then
- (i)
* a.s. and a.s.;*
- (ii)
The sequence generated by SPDCL is almost surely bounded;
- (iii)
Every cluster point of almost surely is a saddle point of Lagrangian of (P).
Proof.
- (i)
Take and in statement (iii) of Lemma 3, we have
[TABLE]
By the definition of saddle point and assumption (21), is solution of (EP),
S_{k}=\mathbb{E}_{i(k)}\bigg{[}\frac{\epsilon^{k}}{N}\big{[}L(u^{k+1},p^{*})-L(u^{*},q^{k})\big{]}+\frac{\beta}{4}\|u^{k}-u^{k+1}\|^{2}+\frac{\epsilon^{k}}{2N\gamma}\|q^{k}-p^{k}\|^{2}\bigg{]}
is positive. From (30), we have that is nonnegative.
By the Robbins-Siegmund Lemma [26], we obtain that almost surely exists, a.s. and a.s..
- (ii)
Since almost surely exists, thus is almost surely bounded. Thanks (30) it implies the sequence is almost surely bounded.
- (iii)
From statement (ii), we have that the sequence is almost surely bounded. Together with the SPDCL scheme guarantees that the sequence is bounded. Therefore, there exists a positive number such that with probability 1. Then from statement (i) we have that
[TABLE]
and
[TABLE]
It follows that
[TABLE]
Since
[TABLE]
then from ((iii)), we have almost surely
[TABLE]
Let denote the subset such that is not bounded, and let denote the subset for which ((iii)) does not hold: . Pick some . Since the sequence is almost surely bounded and is bounded, the sequence has cluster point. Considering a subsequence of almost surely converging toward , let (resp. ) be neighbourhood of (resp. ). Together statement (iv) of Lemma 3, the sequence is almost surely bounded, is bounded, almost surely and , we also have that there exists positive number and such that
[TABLE]
Passing to the limit of ((iii)), it follows that , . Therefore, is a saddle point of over . Since is convex in , then is a saddle point of over .
V Convergence rate analysis
In this section we provide the convergence rate of SPDCL. For the sequence generated from Algorithm SPDCL, and any we define the average sequence
[TABLE]
Theorem 4
**(Expected primal suboptimality and expected feasibility)
**Let Assumption 1 and 2 hold, is generated by SPDCL, the parameter satisfy condition (21). Then we have that
- (i)
Boundness for expected vector:
where ;
- (ii)
Global estimate of expect bifunction values:
\mathbb{E}_{\mathcal{F}_{t}}\big{[}L(\bar{u}_{t},p)-L(u,\bar{p}_{t})\big{]}\leq\frac{Nh_{3}(u,p)}{\underline{\epsilon}(t+1)},
where h_{3}(u,p)=D(u,u^{0})+\frac{N-1}{N}D(u^{*},u^{0})+\frac{\epsilon^{0}}{\gamma}\|p-p^{0}\|^{2}+\frac{(2N-1)(N-1)\epsilon^{0}}{N^{2}}\big{[}\frac{\|p^{*}-p^{0}\|^{2}}{2\gamma}+L_{\gamma}(u^{0},p^{0})-L(u^{*},p^{*})\big{]}, , , could possibly be random;
- (iii)
Expected feasibility:
,
where ;
- (iv)
Expected primal suboptimality:
.
Proof.
- (i)
From statement (iii) of Lemma 3, we obtain that
[TABLE]
Taking expectation with respect to , for above inequality, we obtain that
[TABLE]
Take and in ((i)) we have that
[TABLE]
Together with (30) and (37), we have
[TABLE]
From the convexity of and is almost surely bounded below with (by Theorem 3), we obtain that
[TABLE]
Here comes the results.
- (ii)
Then from ((i)), we obtain that
[TABLE]
From (37), we have that , then by the definition of , it follows
[TABLE]
By Lemma 1 we have that
[TABLE]
Combine with , we have that
[TABLE]
Summing (40) over , it follows that
[TABLE]
where h_{3}(u,p)=D(u,u^{0})+\frac{N-1}{N}D(u^{*},u^{0})+\frac{\epsilon^{0}}{\gamma}\|p-p^{0}\|^{2}+\frac{(2N-1)(N-1)\epsilon^{0}}{N^{2}}\big{[}\frac{\|p^{*}-p^{0}\|^{2}}{2\gamma}+L_{\gamma}(u^{0},p^{0})-L(u^{*},p^{*})\big{]}.
Another hand, from the definition of and , we have and . From the convexity of set , and the function is convex in and linear in , for all and , since is almost surely bounded below with (by Theorem 3), we have that
[TABLE]
- (iii)
If , statement (ii) is obviously. Otherwise, i.e., there is set such that . Let be a random vector:
[TABLE]
Noted that for , we have and . Thus
[TABLE]
Otherwise, for , we have that
[TABLE]
Together (46) and (47), we have
[TABLE]
Moreover, since and , we have . By (48), we have
[TABLE]
Moreover, by taking in the right hand side of saddle point inequality (8), we have
[TABLE]
Combine (49) and (50), we have that
[TABLE]
Take expectation on both side of above inequality, we have that
[TABLE]
Since random variable , it follows that
[TABLE]
where . The statement (iii) is provided.
- (iv)
Again from (49), (50) and statement (iii), statement (iv) is coming.
Observe that Theorem 4 prompts SPDCL has the convergence rate . To obtain the dual suboptimality, we need the following additional assumption.
Assumption 3
* is coercive on if is not bounded, that is, ,*
[TABLE]
The following lemma states that for any given bounded set of dual points, the corresponding optimizer of the augmented Lagrangian is bounded.
Lemma 4
Suppose Assumption 1 holds. Let be a bounded set: . Then we have a positive constant , for any , there is an optimizer such that .
Proof. See [41].
By statement (i) of Theorem 4, we have one ball: such that is contained in . Furthermore, from Lemma 4 for we have that there exists such that and . Specifically, we construct a new ball as . Next proposition shows that the pair of expected vectors \big{(}\mathbb{E}_{\mathcal{F}_{t}}(\bar{u}_{t}),\mathbb{E}_{\mathcal{F}_{t}}(\bar{p}_{t})\big{)} is an approximate saddle point. This assertion will be used to derive the estimation on dual suboptimality for the average point .
Proposition 2
**(Approximate saddle points by expected point \big{(}\mathbb{E}_{\mathcal{F}_{t}}(\bar{u}_{t}),\mathbb{E}_{\mathcal{F}_{t}}(\bar{p}_{t})\big{)})
**Suppose Assumptions of Theorem 4 hold
- (i)
Expected point \big{(}\mathbb{E}_{\mathcal{F}_{t}}(\bar{u}_{t}),\mathbb{E}_{\mathcal{F}_{t}}(\bar{p}_{t})\big{)} is an approximate saddle point for :
[TABLE]
where .
- (ii)
Expected vectors \big{(}\mathbb{E}_{\mathcal{F}_{t}}(\bar{u}_{t}),\mathbb{E}_{\mathcal{F}_{t}}(\bar{p}_{t})\big{)} is an approximate saddle point for :
[TABLE]
where and .
Proof.
- (i)
From statement (ii) of Theorem 4 with and , we have that
[TABLE]
where . Since the bifunction is convex in and linear in for given , , we obtain
[TABLE]
Noted , now with , (52) yields the right inequality of approximate saddle point
[TABLE]
Now considering , with , (52) yields the left inequality
[TABLE]
Here comes the results.
- (ii)
In the left-hand side of inequality in statement (i), taking , we get . Then, from (10), we have
[TABLE]
Another hand, for , we have
[TABLE]
Therefore, we get the left-hand side of inequality in statement (ii):
[TABLE]
where . From (53) and ((ii)), it also has that
[TABLE]
which follows that
[TABLE]
Then, for , we have
[TABLE]
where . Here comes the right-hand side of inequality in statement (ii).
Theorem 5
(Dual suboptimality)* Let Assumptions of Theorem 4 hold, we have that*
[TABLE]
Proof. For saddle point of (or ) on , we have
[TABLE]
Substituting , in (58), and take u=\hat{u}\big{(}\mathbb{E}_{\mathcal{F}_{t}}(\bar{p}_{t})\big{)}, in statement (ii) of Proposition 2, we obtain the following two inequalities:
[TABLE]
Combining the above two inequalities, it follows the desired inequality:
[TABLE]
or
[TABLE]
Next we will provide the high probability complexity bound of constraints violation and objective function values.
Remark 1
From Theorem 4, we immediately get the expect primal suboptimality for average point
[TABLE]
Let and be chosen arbitrarily. For all , we have high probability complexity bound for obtaining an -optimal solution
[TABLE]
where
[TABLE]
*This result is derived from the Markov inequality [3]. Another representation for this result is:
for any *
[TABLE]
Remark 2
*Here we remark that, for problem (P) with and , we modify SPDCL scheme as following:
***Stochastic Primal-Dual Coordinate Method with Large step size (SPDCL)
*** Initialize , , and
for , do*
[TABLE]
**end for
Obviously, we don’t need to estimate the dual optimal bound in the new scheme. Additionally, using the constant parameter , the results of Lemma 3 still holds. Therefore the results of convergence (Theorem 3) and convergence rate results (Theorem 4 and 5) of SPDCL still hold.**
Appendix
*Proof of Lemma 3:
*(i) Firstly, for all , the unique solution of the primal problem (24) is characterized by the following variational inequality:
[TABLE]
which follows that
[TABLE]
Observing that
,
,
and , from (Appendix), we have that
[TABLE]
By statement (ii) and (iii) of Lemma 2, we have that
[TABLE]
The simple algebraic operation and Assumption 2 follows that
[TABLE]
Combining (67) and (68), we obtain that
[TABLE]
Take expectation with respect to on both side of (69), together the condition expectation (IV)-(29), we get
[TABLE]
It follows that
[TABLE]
By in Theorem 2. Then it follows that
[TABLE]
Together with statement (ii) of Lemma 2, we have that
[TABLE]
Combining (71) and (Appendix), we have that
[TABLE]
From concavity of \varphi\big{(}\Theta(u),p\big{)} in and statement (ii) of Theorem 2, the third term of (Appendix) follows that
[TABLE]
Together (Appendix) and inequality (75), we have that
[TABLE]
Multiply on both side of (Appendix), by the definition of , statement (i) is provided.
(ii) In order to prove statement (ii), we first derive two inequalities. By the property (12) of projection with and ,we have
[TABLE]
Using Proposition 1 with , and , we have
[TABLE]
For all , from (77), it follows:
[TABLE]
Together (79) and (Appendix), we have
[TABLE]
Since , we have that: ,
[TABLE]
Together (79) and (Appendix), we have that: ,
[TABLE]
Multiply on both side of above inequality, by we obtain that:
[TABLE]
Statement (ii) is provided by take expectation with respect to on both side of inequality (83).
(iii) Summing the two inequalities in statement (i) and statement (ii), we have that
[TABLE]
Since the SPDCL scheme guarantees that
[TABLE]
then we have the statement (iii).
(iv) From (70), we have that
[TABLE]
From (82), we have that
[TABLE]
Since
and , we have
[TABLE]
Take expectation with respect to on both side of (86) and sum with (84), we obtain that
[TABLE]
where and .
Acknowledgment
The authors would like to thank…
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Beck, A., & Teboulle, M. (2003). Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31 (3), 167-175.
- 2[2] Bertsekas, D.P. (1999). Nonlinear Programming . Athena Scientific, Belmont Massachusetts.
- 3[3] Bertsekas, D. P., & Tsitsiklis, J. N. (2002). Introduction to probability (Vol. 1). Belmont, MA: Athena Scientific.
- 4[4] Buys, J. D. (1972). Dual algorithms for constrained optimization problems . Brondder-Offset NV-Rotterdam.
- 5[5] Chen, S. S., Donoho, D. L., & Saunders, M. A. (2001). Atomic decomposition by basis pursuit. SIAM review, 43 (1), 129-159.
- 6[6] Cheney, W., & Goldstein, A. A. (1959). Proximity maps for convex sets. Proceedings of the American Mathematical Society, 10 (3), 448-450.
- 7[7] Cohen, G. (1980). Auxiliary problem principle and decomposition of optimization problems. Journal of optimization Theory and Applications, 32 (3), 277-305.
- 8[8] Cohen, G., & Zhu, D. L. (1984). Decomposition coordination methods in large scale optimization problems. The nondifferentiable case and the use of augmented Lagrangians. Advances in large scale systems, 1 , 203-266.
