Stochastic Primal-Dual Coordinate Method with Large Step Size for   Composite Optimization with Composite Cone-constraints

Daoli Zhu; Lei Zhao

arXiv:1905.01020·math.OC·May 6, 2019

Stochastic Primal-Dual Coordinate Method with Large Step Size for Composite Optimization with Composite Cone-constraints

Daoli Zhu, Lei Zhao

PDF

Open Access

TL;DR

This paper proposes a stochastic primal-dual coordinate method with large step size for solving composite optimization problems with cone constraints, achieving convergence and high probability complexity bounds.

Contribution

It introduces a novel stochastic coordinate extension of primal-dual methods with parallel decomposition and large step size for COCC problems, providing convergence guarantees.

Findings

01

Almost sure convergence of the method

02

Expected convergence rate of O(1/t)

03

High probability complexity bounds

Abstract

We introduce a stochastic coordinate extension of the first-order primal-dual method studied by Cohen and Zhu (1984) and Zhao and Zhu (2018) to solve Composite Optimization with Composite Cone-constraints (COCC). In this method, we randomly choose a block of variables based on the uniform distribution. The linearization and Bregman-like function (core function) to that randomly selected block allow us to get simple parallel primal-dual decomposition for COCC. We obtain almost surely convergence and O(1/t) expected convergence rate in this work. The high probability complexity bound is also derived in this paper.

Equations269

\begin{array}[]{lll}\mbox{(P):}&\min&G(u)+J(u)\\ &\rm{s.t}&\Theta(u)=\Omega(u)+\Phi(u)\in-\mathbf{C}\\ &&u\in\mathbf{U}\end{array}

\begin{array}[]{lll}\mbox{(P):}&\min&G(u)+J(u)\\ &\rm{s.t}&\Theta(u)=\Omega(u)+\Phi(u)\in-\mathbf{C}\\ &&u\in\mathbf{U}\end{array}

U = U_{1} \times U_{2} \dots \times U_{N}, u_{i} \in U_{i} \subset R^{n_{i}} \mbox an d i = 1 \sum N n_{i} = n .

U = U_{1} \times U_{2} \dots \times U_{N}, u_{i} \in U_{i} \subset R^{n_{i}} \mbox an d i = 1 \sum N n_{i} = n .

\exists c_{1} > 0, c_{2} > 0, \forall u \in U, \forall r \in \partial J (u), ∥ r ∥ \leq c_{1} ∥ u ∥ + c_{2} .

\exists c_{1} > 0, c_{2} > 0, \forall u \in U, \forall r \in \partial J (u), ∥ r ∥ \leq c_{1} ∥ u ∥ + c_{2} .

Ω (α u + (1 - α) v) - α Ω (u) - (1 - α) Ω (v) \in - C .

Ω (α u + (1 - α) v) - α Ω (u) - (1 - α) Ω (v) \in - C .

⟨ \nablaΩ (u) - \nablaΩ (v), u - v ⟩ - ∥ u - v ∥^{2} T \in - C .

⟨ \nablaΩ (u) - \nablaΩ (v), u - v ⟩ - ∥ u - v ∥^{2} T \in - C .

\forall u, v \in O, ∥Θ (u) - Θ (v) ∥ \leq τ ∥ u - v ∥.

\forall u, v \in O, ∥Θ (u) - Θ (v) ∥ \leq τ ∥ u - v ∥.

\mbox CQC : Θ (U) \cap (- \overset{˚}{C}) \neq = \emptyset.

\mbox CQC : Θ (U) \cap (- \overset{˚}{C}) \neq = \emptyset.

L (u, p) = (G + J) (u) + ⟨ p, Θ (u)⟩,

L (u, p) = (G + J) (u) + ⟨ p, Θ (u)⟩,

\forall u \in U, \forall p \in C^{*} : L (u^{*}, p) \leq L (u^{*}, p^{*}) \leq L (u, p^{*}) .

\forall u \in U, \forall p \in C^{*} : L (u^{*}, p) \leq L (u^{*}, p^{*}) \leq L (u, p^{*}) .

ψ (p) = {u \in U min L (u, p) - \infty \forall p \in C^{*} \mbox o t h er w i se .

ψ (p) = {u \in U min L (u, p) - \infty \forall p \in C^{*} \mbox o t h er w i se .

\begin{array}[]{lllllll}\mbox{(P):}&\min&(G+J)(u)&&\mbox{(D):}&\max&\psi(p)\\ &\rm{s.t}&\Theta(u)\in-\mathbf{C}&&&\rm{s.t}&p\in\mathbf{C}^{*}\\ &&u\in\mathbf{U}&&&&\end{array}

\begin{array}[]{lllllll}\mbox{(P):}&\min&(G+J)(u)&&\mbox{(D):}&\max&\psi(p)\\ &\rm{s.t}&\Theta(u)\in-\mathbf{C}&&&\rm{s.t}&p\in\mathbf{C}^{*}\\ &&u\in\mathbf{U}&&&&\end{array}

⟨ \nabla G (u^{*}), u - u^{*} ⟩ + J (u) - J (u^{*}) + ⟨ p^{*}, Θ (u) - Θ (u^{*})⟩ \geq 0;

⟨ \nabla G (u^{*}), u - u^{*} ⟩ + J (u) - J (u^{*}) + ⟨ p^{*}, Θ (u) - Θ (u^{*})⟩ \geq 0;

\begin{array}[]{lllllll}\mbox{(P${}_{1}$):}&\min\limits_{\xi\in-\mathbf{C}}&(G+J)(u)\\ &\rm{s.t}&\Theta(u)-\xi=0\\ &&u\in\mathbf{U}\\ \end{array}

\begin{array}[]{lllllll}\mbox{(P${}_{1}$):}&\min\limits_{\xi\in-\mathbf{C}}&(G+J)(u)\\ &\rm{s.t}&\Theta(u)-\xi=0\\ &&u\in\mathbf{U}\\ \end{array}

\overline{L}_{γ} (u, ξ, p) = (G + J) (u) + ⟨ p, Θ (u) - ξ ⟩ + \frac{γ}{2} ∥Θ (u) - ξ ∥^{2}

\overline{L}_{γ} (u, ξ, p) = (G + J) (u) + ⟨ p, Θ (u) - ξ ⟩ + \frac{γ}{2} ∥Θ (u) - ξ ∥^{2}

L_{γ} (u, p) ≜ ξ \in - C min \overline{L}_{γ} (u, ξ, p) = (G + J) (u) + φ (Θ (u), p),

L_{γ} (u, p) ≜ ξ \in - C min \overline{L}_{γ} (u, ξ, p) = (G + J) (u) + φ (Θ (u), p),

\forall p \in R^{m}, ψ_{γ} (p)

\forall p \in R^{m}, ψ_{γ} (p)

\begin{array}[]{lllllll}\mbox{(P):}&\min&(G+J)(u)&&\mbox{(D${}_{\gamma}$):}&\max&\psi_{\gamma}(p)\\ &\rm{s.t}&\Theta(u)\in-\mathbf{C}&&&\rm{s.t}&p\in\mathbf{R}^{m}\\ &&u\in\mathbf{U}&&&&\end{array}

\begin{array}[]{lllllll}\mbox{(P):}&\min&(G+J)(u)&&\mbox{(D${}_{\gamma}$):}&\max&\psi_{\gamma}(p)\\ &\rm{s.t}&\Theta(u)\in-\mathbf{C}&&&\rm{s.t}&p\in\mathbf{R}^{m}\\ &&u\in\mathbf{U}&&&&\end{array}

\nabla_{θ} φ (θ, p) = Π (p + γ θ),

\nabla_{θ} φ (θ, p) = Π (p + γ θ),

\nabla_{p} φ (θ, p) = [Π (p + γ θ) - p] / γ,

φ (θ, p) = [∥Π (p + γ θ) ∥^{2} - ∥ p ∥^{2}] /2 γ .

L (u, p) - L_{γ} (u, p^{'}) \leq \frac{1}{2 γ} ∥ p - p^{'} ∥^{2},

L (u, p) - L_{γ} (u, p^{'}) \leq \frac{1}{2 γ} ∥ p - p^{'} ∥^{2},

\displaystyle\langle p,\Theta(u)\rangle-\varphi\big{(}\Theta(u),p^{\prime}\big{)}\leq\frac{1}{2\gamma}\|p-p^{\prime}\|^{2}.

\displaystyle\langle p,\Theta(u)\rangle-\varphi\big{(}\Theta(u),p^{\prime}\big{)}\leq\frac{1}{2\gamma}\|p-p^{\prime}\|^{2}.

L_{γ} (u, p^{'}) - L (u, p) + \frac{1}{2 γ} ∥ p - p^{'} ∥^{2}

L_{γ} (u, p^{'}) - L (u, p) + \frac{1}{2 γ} ∥ p - p^{'} ∥^{2}

u^{k + 1} \leftarrow u \in U min ⟨ \nabla G (u^{k}), u ⟩ + J (u) + ⟨ Π_{M} (p^{k} + γ Θ (u^{k})),

u^{k + 1} \leftarrow u \in U min ⟨ \nabla G (u^{k}), u ⟩ + J (u) + ⟨ Π_{M} (p^{k} + γ Θ (u^{k})),

\nablaΩ (u^{k}) u + Φ (u)⟩ + \frac{1}{ϵ ^{k}} D (u, u^{k});

\displaystyle p^{k+1}\leftarrow\Pi_{M}\big{(}p^{k}+\gamma\Theta(u^{k+1})\big{)}.

(i)

(i)

(ii)

2 ⟨ Π_{S} (z + x) - Π_{S} (z + y), x ⟩

2 ⟨ Π_{S} (z + x) - Π_{S} (z + y), x ⟩

(iii)

(iii)

(i v)

\forall u, v \in U, f (u) - f (v) \geq ⟨ \nabla f (v), u - v ⟩ + \frac{β _{f}}{2} ∥ u - v ∥^{2} .

\forall u, v \in U, f (u) - f (v) \geq ⟨ \nabla f (v), u - v ⟩ + \frac{β _{f}}{2} ∥ u - v ∥^{2} .

\forall u, v \in U, f (u) - f (v) \leq ⟨ \nabla f (v), u - v ⟩ + \frac{B _{f}}{2} ∥ u - v ∥^{2},

\forall u, v \in U, f (u) - f (v) \leq ⟨ \nabla f (v), u - v ⟩ + \frac{B _{f}}{2} ∥ u - v ∥^{2},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Statistical Methods and Inference

Full text

Stochastic Primal-Dual Coordinate Method with Large Step Size for Composite Optimization with Composite Cone-constraints

Daoli Zhu and Lei Zhao Manuscript received February 16, 2019; revised.Daoli Zhu was with Antai College of Economics and Management and Sino-US Global Logistics Institute, Shanghai Jiao Tong University, 200030 Shanghai, China (e-mail: [email protected])Lei Zhao was with the Antai College of Economics and Management, Shanghai Jiao Tong University, 200030 Shanghai, China (e-mail: [email protected])

Abstract

We introduce a stochastic coordinate extension of the first-order primal-dual method studied by Cohen and Zhu (1984) and Zhao and Zhu (2018) to solve Composite Optimization with Composite Cone-constraints (COCC). In this method, we randomly choose a block of variables based on the uniform distribution. The linearization and Bregman-like function (core function) to that randomly selected block allow us to get simple parallel primal-dual decomposition for COCC. We obtain almost surely convergence and $O(1/t)$ expected convergence rate in this work. The high probability complexity bound is also derived in this paper.

Index Terms:

composite optimization with composite cone-constrains, stochastic primal-dual coordinate method with large step size, augmented Lagrangian.

I Introduction

Motivated by recent applications in big data analysis, there has been an explosive growth in interest in the design and analysis of block coordinate descent type (BCD-type) methods for large-scale convex optimization. (see [13, 24, 35, 39]) In these applications, the datasets used for computation are very big and are often distributed in different locations. It is often impractical to assume that optimization algorithms can traverse an entire dataset once in each iteration, because doing so is either time consuming or unreliable, and often results in low resource utilization due to necessary synchronization among different computing units (e.g., CPUs, GPUs, and cores) in a distributed computing environment. On the other hand, BCD-type algorithms can make progress by using information obtained from a randomly selected subset of data and, thus, provide much flexibility for their implementation in the aforementioned distributed environments. The main advantage of BCD-type method is to reduce the complexity and memory requirements per iteration. These benefits are increasingly important for very-large scale problem.

In this paper, we consider the nonlinear convex cone-constrained optimization problem known as a Composite Optimization with Composite Cone-constrains (COCC):

[TABLE]

where $G$ is a convex smooth function on the closed convex set $\mathbf{U}\subset\mathbf{R}^{n}$ and $J$ is a convex, possibly nonsmooth function on $\mathbf{U}\subset\mathbf{R}^{n}$ . $\Omega$ is a smooth and $\Phi$ is a possibly nonsmooth mapping from $\mathbf{R}^{n}$ to $\mathbf{R}^{m}$ . $\Omega(u)$ and $\Phi(u)$ are $\mathbf{C}$ -convex and $\mathbf{C}$ is a nonempty closed convex cone in $\mathbf{R}^{m}$ with vertex at the origin, that is, $\alpha\mathbf{C}+\beta\mathbf{C}\subset\mathbf{C}$ , for $\alpha,\beta\geq 0$ . It is obvious that when $\mathring{\mathbf{C}}$ (the interior of $\mathbf{C}$ ) is nonempty, the constraint $\Theta(u)\in-\mathbf{C}$ corresponds to an inequality constraint. The case $\mathbf{C}=\{0\}$ corresponds to an equality constraint. $\mathbf{C}^{*}$ denotes the conjugate cone i.e. $\mathbf{C}^{*}=\{y|\langle y,x\rangle\geq 0,\forall x\in\mathbf{C}\}$ . We note that COCC has full composite structure.

Assume that both $J(u)=\sum\limits_{i=1}\limits^{N}J_{i}(u_{i})$ and $\Phi(u)=\sum\limits_{i=1}\limits^{N}\Phi_{i}(u_{i})$ are additive respect to following space decomposition:

[TABLE]

I-A Related works

For problems without constraints, there are two variations of BCD discussed the most by researchers. The first variation is on block-choosing strategy. One common approach for block choosing is cyclic strategy. Tseng [32] proved the convergence of a BCD of cyclic strategy. Luo and Tseng [17] and Wang and Lin [33] proved local and global linear convergence under specific assumptions respectively. The other approach is randomized strategy. Nesterov [19] studied the convergence rate of randomized BCD for convex smooth optimization. Richtárik and Takáč [25] and Lu and Xiao [16] extended Nesterov’s technique to composite optimization. The point read to evaluate the gradient in each iteration is the second variation of BCD. If the read points have different ”ages”, this type of BCD called asynchronous BCD; otherwise, it is called synchronous BCD. All the variants of BCD reviewed above are synchronous BCD. Liu and Wright [14] and Liu et. al. [15] established the convergence rate of asynchronous BCD for composite optimization and convex smooth optimization without constraints, respectively.

For problems with constraints, there are only a few works. Gao et. al. [10] proposed a coordinate-type method for problems with linear coupling constraints. Necoara and Patrascu [18] proposed a random coordinate descent algorithm for an optimization problem with one linear constraint. Xu and Zhang [37] analyzed primal-dual coordinate type method for a linear constrained strongly convex problem. Moreover, Xu [38] proposed an asynchronous primal-dual coordinate-type method for linear constrained problems. For problem with nonlinear constraints, Xu [36] proposed a coordinate-type method for problem with nonlinear inequality constraints. To the best of our knowledge, there is no primal-dual coordinate convergence rate results for COCC.

I-B Main contributions and outline of this paper

In this paper, we propose a Stochastic Primal-Dual Coordinate with Large step size (SPDCL) method based on the variant auxiliary problem principle (Zhao and Zhu [41]) for COCC. In this method, we randomly update one block of variables based on the uniform distribution. The sequence generated by our algorithm is proved to converge to an optimal solution of problem (P) with probability $1$ . The expected $O(1/t)$ convergence rate is also obtained for problem (P) under the convexity assumptions. The probability complexity bound is also derived in this paper.

The rest of this paper is organized as follows. Section II is devoted to technical preliminaries. The updating scheme of SPDCL for (P) is presented in Section III. In Section IV, we establish the convergence. In Section V, expected $O(1/t)$ sub-linear convergence rate and the high probability complexity bound are established.

II Preliminaries

In this section, we first provide some preliminaries that are useful for our further discussions and then summarize some notations and assumptions to be used. We denote $\langle\cdot\rangle$ and $\|\cdot\|$ as the inner product and Euclidean norm of vector, respectively.

II-A Notations and assumptions

Throughout this paper, we make the following standard assumptions for Problem (P):

Assumption 1

(i)

$J$ is a convex, l.s.c function such that $\mathbf{dom}J\cap\mathbf{U}\neq\emptyset$ , $J$ is not necessary differentiable. $J$ is subgradientiable and has linear bounded subgradients in $\mathbf{U}$ , that is

[TABLE]

(ii)

$G$ is a convex and differentiable with its derivative Lipschitz of constant $B_{G}$ .

(iii)

$\Omega$ is $\mathbf{C}$ -convex mapping from $\mathbf{U}$ to $\mathbf{C}$ , where $\forall u,v\in\mathbf{U}$ , $\forall\alpha\in[0,1]$ ,

[TABLE]

Moreover, the derivative of $\Omega$ exists and meets the following condition: $\exists T\in\mathbf{C}$ such that $\forall u,v\in\mathbf{U}$ ,

[TABLE]

(iv)

$\Psi$ is $\mathbf{C}$ -convex mapping from $\mathbf{U}$ to $\mathbf{C}$ .

(v)

$\Theta(u)$ is Lipschitz with constant $\tau$ on an open subset $\mathcal{O}$ containing $\mathbf{U}$ , where

[TABLE]

(vi)

Constraint Qualification Condition. When $\mathring{\mathbf{C}}\neq\emptyset$ , we assume that

[TABLE]

For the case $\mathbf{C}=\{0\}$ , we assume that $0\in\mbox{interior of}~{}\Theta(\mathbf{U})$ .

(vii)

There exists at least one saddle point for Lagrangian of (P).

Condition (i)-(iv) guarantee that (P) is a convex problem. The CQC condition (vi) implies that the Lagrangian dual function is coercive and the dual optimal solution set is bounded [8]. Furthermore, the following subsection gives augmented Lagrangian and first-order primal-dual decomposition algorithm for (P).

II-B Augmented Lagrangian and first-order primal-dual decomposition algorithm

In this subsection, the Lagrangian of (P) is defined as:

[TABLE]

and a saddle point $(u^{*},p^{*})\in\mathbf{U}\times\mathbf{C}^{*}$ is such that

[TABLE]

Under Assumption 1, there exist saddle points of $L$ on $\mathbf{U}\times\mathbf{C}^{*}$ . The dual function $\psi$ is defined as

[TABLE]

The function $\psi$ is concave and sub-differentiable. Using dual function $\psi(p)$ , we consider the primal-dual pair of nonlinear convex cone optimization:

[TABLE]

The following theorem characterizes a saddle point optimality condition for the primal and dual problem.

Theorem 1

A solution $(u^{*},p^{*})$ with $u^{*}\in\mathbf{U}$ and $p^{*}\in\mathbf{C}^{*}$ is a saddle point for the Lagrangian function $L(u,p)$ if and only if

(i)

$L(u^{*},p^{*})=\min\limits_{u\in\mathbf{U}}L(u,p^{*})$

or the following variational inequality holds: $\forall u\in\mathbf{U}$ ,

[TABLE]

(ii)

$\Theta(u^{*})\in-\mathbf{C}$ ;

(iii)

$\langle p^{*},\Theta(u^{*})\rangle=0$ .

Moreover, $(u^{*},p^{*})$ is a saddle point if and only if $u^{*}$ and $p^{*}$ are, respectively, optimal solutions to the primal and dual problems (P) and (D) with no duality gap, that is, with $(G+J)(u^{*})=\psi(p^{*})$ .

Now we take a trick by introducing slack variables which help problem (P) come back to problem with equality constraints. Namely, the problem (P) is converted into the equivalent problem with equality constraints as follows

[TABLE]

The augmented Lagrangian for this problem is

[TABLE]

The augmented Lagrangian associated with problem (P) is defined as

[TABLE]

where $\varphi(\Theta(u),p)=[\|\Pi\big{(}p+\gamma\Theta(u)\big{)}\|^{2}-\|p\|^{2}]/2\gamma$ and $\Pi$ is a projection on to $\mathbf{C}^{*}$ .

The augmented Lagrangian dual function is as following:

[TABLE]

Using $\psi_{\gamma}(p)$ , we obtain new primal-dual pair of nonlinear convex cone optimization

[TABLE]

The following theorem shows that function $\varphi(\theta,p)$ , dual function $\psi_{\gamma}(p)$ and augmented Lagrangian $L_{\gamma}(u,p)$ have some useful properties.

Theorem 2

Suppose Assumption 1 holds for problem (P). Then we have

(i)

The function $\varphi(\theta,p)$ is convex in $\theta$ and concave in $p$ .

(ii)

$\varphi$ * is differentiable in $\theta$ and $p$ and one has*

[TABLE]

(iii)

$\psi_{\gamma}(p)$ * is concave and differentiable in $p$ , and $\nabla\psi_{\gamma}(p)=[\Pi(p+\gamma\Theta(\hat{u}(p)))-p]/\gamma$ , where $\hat{u}(p)\in\hat{\mathbf{U}}(p)=\{u\in\mathbf{U}|u=\arg\min\limits_{u\in\mathbf{U}}L_{\gamma}(u,p)\}$ .*

(iv)

$L$ * and $L_{\gamma}$ have the same sets of saddle points $\mathbf{U}^{*}\times\mathbf{P}^{*}$ respectively on $\mathbf{U}\times\mathbf{C}^{*}$ and $\mathbf{U}\times\mathbf{R}^{m}$ .*

(v)

$L_{\gamma}$ * is stable in $u$ , that is $\forall p^{*}\in\mathbf{P}^{*},\hat{\mathbf{U}}(p^{*})=\mathbf{U}^{*}$ .*

Moreover, next lemma will give another property of augmented Lagrangian term.

Lemma 1

For all $p\in\mathbf{C}^{*}$ , $p^{\prime}\in\mathbf{R}^{m}$ and $u\in\mathbf{R}^{n}$ , we have that

[TABLE]

or

[TABLE]

Proof.

[TABLE]

$\Box$

For the general COCC, the augmented Lagrangian method is an approach which can overcome the instability and nondifferentiability of the dual function of the Lagrangian. Furthermore, the augmented Lagrangian of a constrained convex program has the same solution set as the original constrained convex program. The augmented Lagrangian approach for equality-constrained optimization problems was introduced in Hestenes [11] and Powell [23], and then extended to inequality-constrained problems by Buys [4].

Although the augmented Lagrangian approach (Uzawa algorithm) has several advantages, it does not preserve separability, even when the initial problem is separable. One way to decompose the augmented Lagrangian is ADMM (Fortin and Glowinski [9]). ADMM can only handle convex problems with linear constraints and is not easily parallelizable. Another way to overcome this difficulty is the Auxiliary Problem Principle of augmented Lagrangian methods (APP-AL) (Cohen and Zhu [8]), which is a fairly general first-order primal-dual decomposition method based on linearization of the augmented Lagrangian in nonlinear convex cone programming with separable or nonseparable, smooth or nonsmooth constraints. Zhao and Zhu (2018) [41] extend Cohen and Zhu (1984) [8]’s work to propose first-order primal-dual augmented Lagrangian methods for COCC as an algorithm (VAPP).

**Variant Auxiliary Problem Principle for solving COCC (VAPP)

** Initialize $u^{0}\in\mathbf{U}$ and $p^{0}\in\mathbf{C^{*}}$

for $k=0,1,\cdots$ , do

[TABLE]

**end for

** where $D(u,v)=K(u)-K(v)-\langle\nabla K(v),u-v\rangle$ is a Bregman like function with $K$ is strongly convex and gradient Lipschitz. Zhao and Zhu (2018) shows the sequence $\{(u^{k},p^{k})\}$ generated by VAPP convergence to $(u^{*},p^{*})$ saddle point of $L$ over $\mathbf{U}\times\mathbf{C}^{*}$ . Moreover, an $O(1/t)$ convergence rate is also proposed. In the era of big data, there has been a surge of interest in redesign of VAPP suitable for solving the huge optimization with available computing performance.

II-C The properties of projection on convex cone

In this subsection, we introduce some properties of projection on convex sets (resp. convex cone) as preparations. These properties are used in the following sections.

Let $\mathcal{S}$ be a nonempty closed convex set of $\mathbf{R}^{m}$ . For $x\in\mathbf{R}^{m}$ , we propose the projection $\Pi_{\mathcal{S}}(x)$ as a projection on $\mathcal{S}$ . Then $\Pi_{\mathcal{S}}(x)$ is characterized by the following two conditions [6]:

[TABLE]

Furthermore, the following proposition gives another property of projection operator which is used for convergence and convergence rate analysis.

Proposition 1

For any $(x,y,z)\in\mathbf{R}^{m\times m\times m}$ , the projection operator $\Pi_{\mathcal{S}}$ satisfies

[TABLE]

Proof. See [41]. $\Box$

Next, we consider the properties for projection on convex cone. Let $\mathbf{C}$ be a nonempty closed convex cone in $\mathbf{R}^{m}$ with vertex at the origin. $\mathbf{C}^{*}$ denotes the conjugate cone. Let $\Pi$ denote the projection on $\mathbf{C}^{*}$ and $\Pi_{-\mathbf{C}}$ denote the projection on $-\mathbf{C}$ . The projection is characterized by the following conditions. (see Wierzbicki [34]):

[TABLE]

II-D The properties of differentiable functions and mappings

Lemma 2

Let the function $f$ be convex and differentiable on $\mathbf{U}$ .

(i)* If $f$ is strongly convex with constant $\beta_{f}$ , then*

[TABLE]

(ii)* If the derivative of $f$ is Lipschitz with constant $B_{f}$ , then*

[TABLE]

(iii)* Let $\Omega$ be a $\mathbf{C}$ -convex mapping from $\mathbf{U}$ to $\mathbf{C}$ . Suppose its derivative exists and meets the following condition: $\exists T\in\mathbf{C}$ such that*

[TABLE]

then $\forall u,v\in\mathbf{U},\forall p\in\mathbf{C}^{*}$ we have

[TABLE]

Proof. The statements (i) and (ii) are classical; the proof is omitted (see Zhu and Marcotte [42]). For proof of (iii), see Cohen [7]. $\Box$

III Stochastic primal-dual coordinate method

In this section, we propose a stochastic primal-dual coordinate descent algorithm to solve (P). Firstly, we introduce the core function $K(\cdot)$ satisfying the following assumption:

Assumption 2

$K$ is strongly convex with parameter $\beta$ and differentiable with its gradient Lipschitz continuous with parameter $B$ on $\mathbf{U}$ .

Additionally, let $D(u,v)=K(u)-K(v)-\langle\nabla K(v),u-v\rangle$ is a Bregman like function (core function) [1, 8]. From Assumption 2 we have: $\frac{\beta}{2}\|u-v\|^{2}\leq D(u,v)\leq\frac{B}{2}\|u-v\|^{2}$ .

Moreover, we assume that the parameter $\rho$ satisfy:

[TABLE]

Let $\mu_{0}$ be a bound of dual optimal solution of (P), denote $\mu=\mu_{0}+1$ . Let $\mathfrak{B}_{\mu}=\{p|\|p\|\leq\mu\}$ . The estimation of $\mu_{0}$ can be found in [41]. By using the projection $\mathcal{P}_{\mu}(\cdot)$ onto $\mathfrak{B}_{\mu}$ , we introduce Stochastic Primal-Dual Coordinate Method with Large step size (SPDCL) for solving (P):

**Stochastic Primal-Dual Coordinate Method with Large step size (SPDCL)

** Initialize $u^{0}\in\mathbf{U}$ , $p^{0}\in\mathbf{R}^{m}$ and $\epsilon^{-1}>0$

for $k=0,1,\cdots$ , do

[TABLE]

**end for

** For the sake of brevity, let us set that $q^{k}=\Pi\big{(}p^{k}+\gamma\Theta(u^{k})\big{)}$ , $q^{k+1/2}=\Pi\big{(}p^{k}+\gamma\Theta(u^{k+1})\big{)}$ and $F=G+J$ . Then the primal problem of algorithm can be expressed as

[TABLE]

If we choose an additive Bregman like function (or core function) respect to the space decomposition (2) that is

[TABLE]

Then problem (APk) is just a small optimization problem for selected block $i(k)$ . Specifically, taking $K(u)=\sum\limits_{i=1}\limits^{N}\frac{\|u_{i}\|^{2}}{2}$ for (APk), we perform only a block proximal gradient update for block $i(k)$ , where we linearize the coupled function $G(u)$ and augmented Lagrangian term $\varphi(\Theta(u),p)$ and add the proximal term to it. In the following sections, we will establish the convergence and convergence rate and probability complexity bounds of SPDCL.

IV Convergence analysis

In this section, we will establish results about convergence of SPDCL. Before proceeding, we first give the generalized equilibrium reformulation of saddle point formulation (8):

Find $(u^{*},p^{*})\in\mathbf{U}\times\mathbf{C}^{*}$ such that

[TABLE]

Obviously, bifunction $L(u^{\prime},p)-L(u,p^{\prime})$ is convex in $u^{\prime}$ and linear in $p^{\prime}$ for given $u\in\mathbf{U}$ , $p\in\mathbf{C}^{*}$ .

In algorithm SPDCL, the indices $i(k)$ , $k=0,1,2,\ldots$ are random variables. After $k$ iterations, SPDCL method generates a random output $(u^{k+1},p^{k+1})$ . We denote by $\mathcal{F}_{k}$ is a filtration generated by the random variable $i(0),i(1),\ldots,i(k)$ , i.e.,

[TABLE]

Additionaly, we define that $\mathcal{F}=(\mathcal{F}_{k})_{k\in\mathbb{N}}$ , $\mathbb{E}_{\mathcal{F}_{k+1}}=\mathbb{E}(\cdot|\mathcal{F}_{k})$ is the condition expectation w.r.t. $\mathcal{F}_{k}$ and the condition expectation in term of $i(k)$ given $i(0),i(1),\ldots,i(k-1)$ as $\mathbb{E}_{i(k)}$ .

Knowing $\mathcal{F}_{k-1}=\{i(0),i(1),\ldots,i(k-1)\}$ , we have:

[TABLE]

Given $(u^{*},p^{*})$ , for any $u,u^{\prime}\in\mathbf{U}$ and $p,p^{\prime}\in\mathbf{C}^{*}$ , we construct the following function:

[TABLE]

Specifically, we can show the function value of $\Lambda^{k}$ at ( $u^{*},p^{*}$ ) provides an upper bound for $\|u^{\prime}-u^{*}\|^{2}$ .

[TABLE]

Additionally, since the SPDCL scheme guarantee that $\epsilon^{k+1}\leq\epsilon^{k}$ , we have that

[TABLE]

Before the convergence analysis, we need the following lemma.

Lemma 3

(Global estimation of bifunction values)* Let Assumption 1 and 2 hold, $\{(u^{k},p^{k})\}$ is generated by SPDCL, the parameter $\rho$ satisfy (21). For all $u\in\mathbf{U}$ and $p\in\mathbf{C}^{*}\cap\mathfrak{B}_{\mu}$ , $(u,p)$ could possibly be random, it holds that

[TABLE]

*where

$h_{1}(\epsilon^{k},u,p,u^{k},u^{k+1},q^{k})=\frac{B}{\epsilon^{k}}\|u-u^{k+1}\|+[\|\nabla G(u^{k})\|+c_{1}\|u^{k}\|+c_{2}+\tau\|q^{k}\|]+\frac{\tau}{N}\|p-q^{k}\|$

and

$h_{2}(p,p^{k},p^{k+1})=\frac{1}{2N\rho}\|2p-p^{k+1}-p^{k}\|$ .*

Proof. The proof of this lemma is left in Appendix. $\Box$

Based Lemma 3, we establish the following convergence analysis of SPDCL.

Theorem 3 (Almost surely convergence)

Let assumptions of Lemma 3 hold, then

(i)

$\sum\limits_{k=0}\limits^{+\infty}\mathbb{E}_{i(k)}\frac{\beta}{4}\|u^{k}-u^{k+1}\|^{2}<+\infty$ * a.s. and $\sum\limits_{k=0}\limits^{+\infty}\frac{\epsilon^{k}}{2N\gamma}\|q^{k}-p^{k}\|^{2}<+\infty$ a.s.;*

(ii)

The sequence $\{u^{k}\}$ generated by SPDCL is almost surely bounded;

(iii)

Every cluster point of $\{(u^{k},p^{k})\}$ almost surely is a saddle point of Lagrangian of (P).

Proof.

(i)

Take $u=u^{*}$ and $p=p^{*}$ in statement (iii) of Lemma 3, we have

[TABLE]

By the definition of saddle point and assumption (21), $(u^{*},p^{*})$ is solution of (EP),

$S_{k}=\mathbb{E}_{i(k)}\bigg{[}\frac{\epsilon^{k}}{N}\big{[}L(u^{k+1},p^{*})-L(u^{*},q^{k})\big{]}+\frac{\beta}{4}\|u^{k}-u^{k+1}\|^{2}+\frac{\epsilon^{k}}{2N\gamma}\|q^{k}-p^{k}\|^{2}\bigg{]}$

is positive. From (30), we have that $\Lambda^{k}(u^{*},p^{*},u^{k},p^{k})$ is nonnegative.

By the Robbins-Siegmund Lemma [26], we obtain that $\lim\limits_{k\rightarrow+\infty}\Lambda^{k}(u^{*},p^{*},u^{k},p^{k})$ almost surely exists, $\sum\limits_{k=0}\limits^{+\infty}\mathbb{E}_{i(k)}\frac{\beta}{4}\|u^{k}-u^{k+1}\|^{2}<+\infty$ a.s. and $\sum\limits_{k=0}\limits^{+\infty}\frac{\epsilon^{k}}{2N\gamma}\|q^{k}-p^{k}\|^{2}<+\infty$ a.s..

(ii)

Since $\lim\limits_{k\rightarrow+\infty}\Lambda^{k}(u^{*},p^{*},u^{k},p^{k})$ almost surely exists, thus $\Lambda^{k}(u^{*},p^{*},u^{k},p^{k})$ is almost surely bounded. Thanks (30) it implies the sequence $\{u^{k}\}$ is almost surely bounded.

(iii)

From statement (ii), we have that the sequence $\{u^{k}\}$ is almost surely bounded. Together with the SPDCL scheme guarantees that the sequence $\{p^{k}\}$ is bounded. Therefore, there exists a positive number $\underline{\epsilon}$ such that $\epsilon^{k}\geq\underline{\epsilon}$ with probability 1. Then from statement (i) we have that

[TABLE]

and

[TABLE]

It follows that

[TABLE]

Since

[TABLE]

then from ((iii)), we have almost surely

[TABLE]

Let $\mathbb{W}_{0}$ denote the subset such that $\{u^{k}\}$ is not bounded, and let $\mathbb{W}_{1}$ denote the subset for which ((iii)) does not hold: $\mathbb{P}(\mathbb{W}_{0}\cup\mathbb{W}_{1})=0$ . Pick some $\omega\notin\mathbb{W}_{0}\cup\mathbb{W}_{1}$ . Since the sequence $\{u^{k}\}$ is almost surely bounded and $\{p^{k}\}$ is bounded, the sequence $\{(u^{k},p^{k})\}$ has cluster point. Considering a subsequence of $\{(u^{k},p^{k})\}$ almost surely converging toward $(\bar{u}(\omega),\bar{p}(\omega))$ , let $\mathcal{N}(\bar{u})$ (resp. $\mathcal{N}(\bar{p})$ ) be neighbourhood of $\bar{u}(\omega)$ (resp. $\bar{p}(\omega)$ ). Together statement (iv) of Lemma 3, the sequence $\{u^{k}\}$ is almost surely bounded, $\{p^{k}\}$ is bounded, almost surely $\underline{\epsilon}\leq\epsilon^{k}$ and $\epsilon^{k}\leq\epsilon^{0}$ , we also have that there exists positive number $d_{1}$ and $d_{2}$ such that

[TABLE]

Passing to the limit of ((iii)), it follows that $[L(\bar{u}(\omega),p)-L(u,\bar{p}(\omega))]\leq 0$ , $\forall(u,p)\in\mathcal{N}(\bar{u}(\omega))\times\mathcal{N}(\bar{p}(\omega))\subset\mathbf{U}\times\mathbf{C}^{*}$ . Therefore, $(\bar{u}(\omega),\bar{p}(\omega))$ is a saddle point of $L$ over $\mathcal{N}(\bar{u}(\omega))\times\mathcal{N}(\bar{p}(\omega))$ . Since $L(u^{\prime},p)-L(u,p^{\prime})$ is convex in $(u^{\prime},p^{\prime})$ , then $(\bar{u}(\omega),\bar{p}(\omega))$ is a saddle point of $L$ over $\mathbf{U}\times\mathbf{C}^{*}$ .

$\Box$

V Convergence rate analysis

In this section we provide the convergence rate of SPDCL. For the sequence $\{(u^{k},p^{k})\}$ generated from Algorithm SPDCL, and any $t>0$ we define the average sequence

[TABLE]

Theorem 4

**(Expected primal suboptimality and expected feasibility)

**Let Assumption 1 and 2 hold, $\{(u^{k},p^{k})\}$ is generated by SPDCL, the parameter $\rho$ satisfy condition (21). Then we have that

(i)

Boundness for expected vector:

$\|\mathbb{E}_{\mathcal{F}_{t}}(\bar{u}_{t})\|\leq\nu$

where $\nu=\left(\frac{2\epsilon^{0}}{\beta\underline{\epsilon}}\Lambda(u^{*},p^{*},u^{0},p^{0})\right)^{1/2}+\|u^{*}\|$ ;

(ii)

Global estimate of expect bifunction values:

$\mathbb{E}_{\mathcal{F}_{t}}\big{[}L(\bar{u}_{t},p)-L(u,\bar{p}_{t})\big{]}\leq\frac{Nh_{3}(u,p)}{\underline{\epsilon}(t+1)}$ ,

where $h_{3}(u,p)=D(u,u^{0})+\frac{N-1}{N}D(u^{*},u^{0})+\frac{\epsilon^{0}}{\gamma}\|p-p^{0}\|^{2}+\frac{(2N-1)(N-1)\epsilon^{0}}{N^{2}}\big{[}\frac{\|p^{*}-p^{0}\|^{2}}{2\gamma}+L_{\gamma}(u^{0},p^{0})-L(u^{*},p^{*})\big{]}$ , $\forall u\in\mathbf{U}$ , $p\in\mathbf{C}^{*}\cap\mathfrak{B}_{\mu}$ , $(u,p)$ could possibly be random;

(iii)

Expected feasibility:

$\mathbb{E}_{\mathcal{F}_{t}}\|\Pi\left(\Theta(\bar{u}_{t})\right)\|\leq\frac{Nd_{3}}{(\mu-\|p^{*}\|)\underline{\epsilon}(t+1)}$ ,

where $d_{3}=\sup\limits_{\|p\|<\mu}h_{3}(u^{*},p)$ ;

(iv)

Expected primal suboptimality:

$-\frac{\|p^{*}\|Nd_{3}}{(\mu-\|p^{*}\|)\underline{\epsilon}(t+1)}\leq\mathbb{E}_{\mathcal{F}_{t}}\left[F(\bar{u}_{t})-F(u^{*})\right]\leq\frac{Nd_{3}}{\underline{\epsilon}(t+1)}$ .

Proof.

(i)

From statement (iii) of Lemma 3, we obtain that

[TABLE]

Taking expectation with respect to $\mathcal{F}_{t}$ , $t>k$ for above inequality, we obtain that

[TABLE]

Take $u=u^{*}\in\mathbf{U}$ and $p=p^{*}\in\mathbf{C}^{*}\cap\mathfrak{B}_{\mu}$ in ((i)) we have that

[TABLE]

Together with (30) and (37), we have

[TABLE]

From the convexity of $\|\cdot\|^{2}$ and $\epsilon^{k}$ is almost surely bounded below with $\underline{\epsilon}$ (by Theorem 3), we obtain that

[TABLE]

Here comes the results.

(ii)

Then from ((i)), we obtain that

[TABLE]

From (37), we have that $\mathbb{E}_{\mathcal{F}_{t}}[\Lambda^{k}(u^{*},p^{*},u^{k+1},p^{k+1})-\Lambda^{k}(u^{*},p^{*},u^{k},p^{k})]\leq 0$ , then by the definition of $\Lambda^{k}$ , it follows

[TABLE]

By Lemma 1 we have that

[TABLE]

Combine with $\epsilon^{k+1}\leq\epsilon^{k}$ , we have that

[TABLE]

Summing (40) over $k=1,2,...,t$ , it follows that

[TABLE]

where $h_{3}(u,p)=D(u,u^{0})+\frac{N-1}{N}D(u^{*},u^{0})+\frac{\epsilon^{0}}{\gamma}\|p-p^{0}\|^{2}+\frac{(2N-1)(N-1)\epsilon^{0}}{N^{2}}\big{[}\frac{\|p^{*}-p^{0}\|^{2}}{2\gamma}+L_{\gamma}(u^{0},p^{0})-L(u^{*},p^{*})\big{]}$ .

Another hand, from the definition of $\bar{u}_{t}$ and $\bar{p}_{t}$ , we have $\bar{u}_{t}\in\mathbf{U}$ and $\bar{p}_{t}\in\mathbf{C}^{*}\cap\mathfrak{B}_{\mu}$ . From the convexity of set $\mathbf{U}$ , $\mathbf{C}^{*}\cap\mathfrak{B}_{\mu}$ and the function $L(u^{\prime},p)-L(u,p^{\prime})$ is convex in $u^{\prime}$ and linear in $p^{\prime}$ , for all $u\in\mathbf{U}$ and $p\in\mathbf{C}^{*}\cap\mathfrak{B}_{\mu}$ , since $\epsilon^{k}$ is almost surely bounded below with $\underline{\epsilon}$ (by Theorem 3), we have that

[TABLE]

(iii)

If $\mathbb{E}_{\mathcal{F}_{t}}\|\Pi\left(\Theta(\bar{u}_{t})\right)\|=0$ , statement (ii) is obviously. Otherwise, $\mathbb{E}_{\mathcal{F}_{t}}\|\Pi\left(\Theta(\bar{u}_{t})\right)\|\neq 0$ i.e., there is set $\mathbb{W}_{3}$ such that $\mathbb{P}\{\omega\in\mathbb{W}_{3}|\|\Pi\left(\Theta(\bar{u}_{t})\right)\|\neq 0\}>0$ . Let $\hat{p}$ be a random vector:

[TABLE]

Noted that for $\omega\notin\mathbb{W}_{3}$ , we have $\hat{p}(\omega)=0$ and $\|\Pi\left(\Theta(\bar{u}_{t})\right)\|=0$ . Thus

[TABLE]

Otherwise, for $\omega\in\mathbb{W}_{3}$ , we have that

[TABLE]

Together (46) and (47), we have

[TABLE]

Moreover, since $\Theta(u^{*})\in-\mathbf{C}$ and $\bar{p}_{t}\in\mathbf{C}^{*}\cap\mathfrak{B}_{\mu}$ , we have $\langle\bar{p}_{t},\Theta(u^{*})\rangle\leq 0$ . By (48), we have

[TABLE]

Moreover, by taking $u=\bar{u}_{t}$ in the right hand side of saddle point inequality (8), we have

[TABLE]

Combine (49) and (50), we have that

[TABLE]

Take expectation on both side of above inequality, we have that

[TABLE]

Since random variable $\hat{p}\in\mathfrak{B}_{\mu}$ , it follows that

[TABLE]

where $d_{3}=\sup\limits_{\|p\|<\mu}h_{3}(u^{*},p)$ . The statement (iii) is provided.

(iv)

Again from (49), (50) and statement (iii), statement (iv) is coming.

$\Box$

Observe that Theorem 4 prompts SPDCL has the convergence rate $O(1/t)$ . To obtain the dual suboptimality, we need the following additional assumption.

Assumption 3

$G+J$ * is coercive on $\mathbf{U}$ if $\mathbf{U}$ is not bounded, that is, $\forall\{u^{k}|k\in\mathbb{N}\}\subset\mathbf{U}$ ,*

[TABLE]

The following lemma states that for any given bounded set of dual points, the corresponding optimizer of the augmented Lagrangian is bounded.

Lemma 4

Suppose Assumption 1 holds. Let $\mathfrak{B}_{p}$ be a bounded set: $\mathfrak{B}_{p}=\{p\in\mathbf{R}^{m}|\|p\|\leq d_{p}\}$ . Then we have a positive constant $d_{u}$ , for any $p\in\mathfrak{B}_{p}$ , there is an optimizer $\hat{u}(p)\in\arg\min\limits_{u\in\mathbf{U}}L_{\gamma}(u,p)$ such that $\|\hat{u}(p)\|\leq d_{u}$ .

Proof. See [41]. $\Box$

By statement (i) of Theorem 4, we have one ball: $\mathfrak{B}_{\nu}=\{u|\|u\|\leq\nu\}$ such that $\mathbb{E}_{\mathcal{F}_{t}}(\bar{u}_{t})$ is contained in $\mathfrak{B}_{\nu}$ . Furthermore, from Lemma 4 for $p\in\mathfrak{B}_{\mu}$ we have that there exists $\nu^{\prime}>0$ such that $\hat{u}(p)=\arg\min L_{\gamma}(u,p)$ and $\|\hat{u}(p)\|\leq\nu^{\prime}$ . Specifically, we construct a new ball as $\mathfrak{B}_{{\nu}^{+}}=\{u|\|u\|\leq\overline{\nu}=\max(\nu,\nu^{\prime})\}$ . Next proposition shows that the pair of expected vectors $\big{(}\mathbb{E}_{\mathcal{F}_{t}}(\bar{u}_{t}),\mathbb{E}_{\mathcal{F}_{t}}(\bar{p}_{t})\big{)}$ is an approximate saddle point. This assertion will be used to derive the estimation on dual suboptimality for the average point $\bar{p}_{t}$ .

Proposition 2

**(Approximate saddle points by expected point $\big{(}\mathbb{E}_{\mathcal{F}_{t}}(\bar{u}_{t}),\mathbb{E}_{\mathcal{F}_{t}}(\bar{p}_{t})\big{)}$ )

**Suppose Assumptions of Theorem 4 hold

(i)

Expected point $\big{(}\mathbb{E}_{\mathcal{F}_{t}}(\bar{u}_{t}),\mathbb{E}_{\mathcal{F}_{t}}(\bar{p}_{t})\big{)}$ is an approximate saddle point for $L$ : $\forall(u,p)\in(\mathbf{U}\cap\mathfrak{B}_{{\nu}^{+}})\times(\mathbf{C}^{*}\cap\mathfrak{B}_{\mu})$

[TABLE]

where $d_{4}=\sup_{(u,p)\in(\mathbf{U}\cap\mathfrak{B}_{{\nu}^{+}})\times(\mathbf{C}^{*}\cap\mathfrak{B}_{\mu})}h_{3}(u,p)$ .

(ii)

Expected vectors $\big{(}\mathbb{E}_{\mathcal{F}_{t}}(\bar{u}_{t}),\mathbb{E}_{\mathcal{F}_{t}}(\bar{p}_{t})\big{)}$ is an approximate saddle point for $L_{\gamma}$ : $\forall(u,p)\in(\mathbf{U}\cap\mathfrak{B}_{{\nu}^{+}})\times(\mathbf{C}^{*}\cap\mathfrak{B}_{\mu})$

[TABLE]

where $\delta_{1}=\frac{\mu Nd_{3}+\left(\mu-\|p^{*}\|\right)Nd_{4}}{\left(\mu-\|p^{*}\|\right)\underline{\epsilon}(t+1)}+\frac{\gamma N^{2}(d_{3})^{2}}{2\left(\mu-\|p^{*}\|\right)^{2}\underline{\epsilon}^{2}(t+1)^{2}}$ and $\delta_{2}=\delta_{1}+\frac{Nd_{4}}{\underline{\epsilon}(t+1)}$ .

Proof.

(i)

From statement (ii) of Theorem 4 with $u\in\mathbf{U}\cap\mathfrak{B}_{{\nu}^{+}}$ and $p\in\mathbf{C}^{*}\cap\mathfrak{B}_{\mu}$ , we have that

[TABLE]

where $d_{4}=\sup_{(u,p)\in(\mathbf{U}\cap\mathfrak{B}_{{\nu}^{+}})\times(\mathbf{C}^{*}\cap\mathfrak{B}_{\mu})}h_{3}(u,p)$ . Since the bifunction $L(u^{\prime},p)-L(u,p^{\prime})$ is convex in $u^{\prime}$ and linear in $p^{\prime}$ for given $u\in\mathbf{U}$ , $p\in\mathbf{C}^{*}$ , we obtain

[TABLE]

Noted $\mathbb{E}_{\mathcal{F}_{t}}(\bar{p}_{t})\in\mathbf{C}^{*}\cap\mathfrak{B}_{\nu}$ , now with $p=\mathbb{E}_{\mathcal{F}_{t}}(\bar{p}_{t})$ , (52) yields the right inequality of approximate saddle point

[TABLE]

Now considering $\mathbb{E}_{\mathcal{F}_{t}}(\bar{u}_{t})\in\mathbf{U}\cap\mathfrak{B}_{{\nu}^{+}}$ , with $u=\mathbb{E}_{\mathcal{F}_{t}}(\bar{u}_{t})$ , (52) yields the left inequality

[TABLE]

Here comes the results.

(ii)

In the left-hand side of inequality in statement (i), taking $p=0$ , we get $\langle\mathbb{E}_{\mathcal{F}_{t}}(\bar{p}_{t}),\Theta(\mathbb{E}_{\mathcal{F}_{t}}(\bar{u}_{t}))\rangle\geq-\frac{Nd_{4}}{\underline{\epsilon}(t+1)}$ . Then, from (10), we have

[TABLE]

Another hand, for $p\in\mathbf{C}^{*}\cap\mathfrak{B}_{\mu}$ , we have

[TABLE]

Therefore, we get the left-hand side of inequality in statement (ii):

[TABLE]

where $\delta_{1}=\frac{\mu Nd_{3}+\left(\mu-\|p^{*}\|\right)Nd_{4}}{\left(\mu-\|p^{*}\|\right)\underline{\epsilon}(t+1)}+\frac{\gamma N^{2}(d_{3})^{2}}{2\left(\mu-\|p^{*}\|\right)^{2}\underline{\epsilon}^{2}(t+1)^{2}}$ . From (53) and ((ii)), it also has that

[TABLE]

which follows that

[TABLE]

Then, for $u\in\mathbf{U}\cap\mathfrak{B}_{{\nu}^{+}}$ , we have

[TABLE]

where $\delta_{2}=\delta_{1}+\frac{Nd_{4}}{\underline{\epsilon}(t+1)}$ . Here comes the right-hand side of inequality in statement (ii).

$\Box$

Theorem 5

(Dual suboptimality)* Let Assumptions of Theorem 4 hold, we have that*

[TABLE]

Proof. For saddle point $(u^{*},p^{*})$ of $L$ (or $L_{\gamma}$ ) on $\mathbf{U}\times\mathbf{R}^{m}$ , we have

[TABLE]

Substituting $u=\mathbb{E}_{\mathcal{F}_{t}}(\bar{u}_{t})$ , $p=\mathbb{E}_{\mathcal{F}_{t}}(\bar{p}_{t})$ in (58), and take $u=\hat{u}\big{(}\mathbb{E}_{\mathcal{F}_{t}}(\bar{p}_{t})\big{)}$ , $p=p^{*}$ in statement (ii) of Proposition 2, we obtain the following two inequalities:

[TABLE]

Combining the above two inequalities, it follows the desired inequality:

[TABLE]

or

[TABLE]

$\Box$

Next we will provide the high probability complexity bound of constraints violation and objective function values.

Remark 1

From Theorem 4, we immediately get the expect primal suboptimality for average point $\bar{u}_{t}$

[TABLE]

Let $0<\varepsilon<|F(u^{0})-F(u^{*})|+\|\Pi(\Theta(u^{0}))\|$ and $\eta\in(0,1)$ be chosen arbitrarily. For all $t\geq T$ , we have high probability complexity bound for obtaining an $\varepsilon$ -optimal solution

[TABLE]

where

[TABLE]

*This result is derived from the Markov inequality [3]. Another representation for this result is:

for any $t\geq T$ *

[TABLE]

Remark 2

*Here we remark that, for problem (P) with $\Omega(\cdot)=0$ and $\Theta(\cdot)=\Phi(\cdot)$ , we modify SPDCL scheme as following:

***Stochastic Primal-Dual Coordinate Method with Large step size (SPDCL)

*** Initialize $u^{0}\in\mathbf{U}$ , $p^{0}\in\mathbf{R}^{m}$ , and $0<\epsilon<\frac{\beta}{B_{G}+\gamma\tau^{2}}$

for $k=0,1,\cdots$ , do*

[TABLE]

**end for

Obviously, we don’t need to estimate the dual optimal bound in the new scheme. Additionally, using the constant parameter $0<\epsilon<\frac{\beta}{B_{G}+\gamma\tau^{2}}$ , the results of Lemma 3 still holds. Therefore the results of convergence (Theorem 3) and convergence rate results (Theorem 4 and 5) of SPDCL still hold.**

Appendix

*Proof of Lemma 3:

*(i) Firstly, for all $u\in\mathbf{U}$ , the unique solution $u^{k+1}$ of the primal problem (24) is characterized by the following variational inequality:

[TABLE]

which follows that

[TABLE]

Observing that

$\langle\nabla_{i(k)}G(u^{k}),(u^{k}-u^{k+1})_{i(k)}\rangle=\langle\nabla G(u^{k}),u^{k}-u^{k+1}\rangle$ ,

$J_{i(k)}(u_{i(k)}^{k})-J_{i(k)}(u_{i(k)}^{k+1})=J(u^{k})-J(u^{k+1})$ ,

$\langle q^{k},\nabla_{i(k)}\Omega(u^{k})(u^{k}-u^{k+1})_{i(k)}\rangle=\langle q^{k},\nabla\Omega(u^{k})(u^{k}-u^{k+1})\rangle$

and $\langle q^{k},\Phi_{i(k)}(u_{i(k)}^{k})-\Phi_{i(k)}(u_{i(k)}^{k+1})\rangle=\langle q^{k},\Phi(u^{k})-\Phi(u^{k+1})\rangle$ , from (Appendix), we have that

[TABLE]

By statement (ii) and (iii) of Lemma 2, we have that

[TABLE]

The simple algebraic operation and Assumption 2 follows that

[TABLE]

Combining (67) and (68), we obtain that

[TABLE]

Take expectation with respect to $i(k)$ on both side of (69), together the condition expectation (IV)-(29), we get

[TABLE]

It follows that

[TABLE]

By $\nabla_{\theta}\varphi(\Theta(u),p)=\Pi(p+\gamma\Theta(u))$ in Theorem 2. Then it follows that

[TABLE]

Together with statement (ii) of Lemma 2, we have that

[TABLE]

Combining (71) and (Appendix), we have that

[TABLE]

From concavity of $\varphi\big{(}\Theta(u),p\big{)}$ in $p$ and statement (ii) of Theorem 2, the third term of (Appendix) follows that

[TABLE]

Together (Appendix) and inequality (75), we have that

[TABLE]

Multiply $\epsilon^{k}$ on both side of (Appendix), by the definition of $\Lambda(u,p,u^{k},p^{k})$ , statement (i) is provided.

(ii) In order to prove statement (ii), we first derive two inequalities. By the property (12) of projection with $y=p\in\mathbf{C}^{*}$ and $x=p^{k}+\gamma\Theta(u^{k+1})$ ,we have

[TABLE]

Using Proposition 1 with $x=\gamma\Theta(u^{k+1})$ , $y=\gamma\Theta(u^{k})$ and $z=p^{k}$ , we have

[TABLE]

For all $p\in\mathbf{C}^{*}$ , from (77), it follows:

[TABLE]

Together (79) and (Appendix), we have

[TABLE]

Since $p^{k},p^{k+1}\in\mathfrak{B}_{\mu}$ , we have that: $\forall p\in\mathbf{C}^{*}\cap\mathfrak{B}_{\mu}$ ,

[TABLE]

Together (79) and (Appendix), we have that: $\forall p\in\mathbf{C}^{*}\cap\mathfrak{B}_{\mu}$ ,

[TABLE]

Multiply $\frac{\epsilon^{k}}{N}$ on both side of above inequality, by $\rho=\frac{\gamma}{2N-1}$ we obtain that: $\forall p\in\mathbf{C}^{*}\cap\mathfrak{B}_{\mu}$

[TABLE]

Statement (ii) is provided by take expectation with respect to $i(k)$ on both side of inequality (83).

(iii) Summing the two inequalities in statement (i) and statement (ii), we have that

[TABLE]

Since the SPDCL scheme guarantees that

[TABLE]

then we have the statement (iii).

(iv) From (70), we have that

[TABLE]

From (82), we have that

[TABLE]

Since

$\|p-p^{k}\|^{2}-\|p-p^{k+1}\|^{2}=\langle 2p-p^{k+1}-p^{k},p^{k+1}-p^{k}\rangle\leq\|2p-p^{k+1}-p^{k}\|\cdot\|p^{k+1}-p^{k}\|$

and $\langle p-q^{k},\Theta(u^{k})-\Theta(u^{k+1})\rangle\leq\tau\|p-q^{k}\|\cdot\|u^{k}-u^{k+1}\|$ , we have

[TABLE]

Take expectation with respect to $i(k)$ on both side of (86) and sum with (84), we obtain that

[TABLE]

where $h_{1}(\epsilon^{k},u,p,u^{k},u^{k+1},q^{k})=\frac{B}{\epsilon^{k}}\|u-u^{k+1}\|+[\|\nabla G(u^{k})\|+c_{1}\|u^{k}\|+c_{2}+\tau\|q^{k}\|]+\frac{\tau}{N}\|p-q^{k}\|$ and $h_{2}(p,p^{k},p^{k+1})=\frac{1}{2N\rho}\|2p-p^{k+1}-p^{k}\|$ . $\Box$

Acknowledgment

The authors would like to thank…

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Beck, A., & Teboulle, M. (2003). Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31 (3), 167-175.
2[2] Bertsekas, D.P. (1999). Nonlinear Programming . Athena Scientific, Belmont Massachusetts.
3[3] Bertsekas, D. P., & Tsitsiklis, J. N. (2002). Introduction to probability (Vol. 1). Belmont, MA: Athena Scientific.
4[4] Buys, J. D. (1972). Dual algorithms for constrained optimization problems . Brondder-Offset NV-Rotterdam.
5[5] Chen, S. S., Donoho, D. L., & Saunders, M. A. (2001). Atomic decomposition by basis pursuit. SIAM review, 43 (1), 129-159.
6[6] Cheney, W., & Goldstein, A. A. (1959). Proximity maps for convex sets. Proceedings of the American Mathematical Society, 10 (3), 448-450.
7[7] Cohen, G. (1980). Auxiliary problem principle and decomposition of optimization problems. Journal of optimization Theory and Applications, 32 (3), 277-305.
8[8] Cohen, G., & Zhu, D. L. (1984). Decomposition coordination methods in large scale optimization problems. The nondifferentiable case and the use of augmented Lagrangians. Advances in large scale systems, 1 , 203-266.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Stochastic Primal-Dual Coordinate Method with Large Step Size for Composite Optimization with Composite Cone-constraints

Abstract

Index Terms:

I Introduction

I-A Related works

I-B Main contributions and outline of this paper

II Preliminaries

II-A Notations and assumptions

Assumption 1

II-B Augmented Lagrangian and first-order primal-dual decomposition algorithm

Theorem 1

Theorem 2

Lemma 1

II-C The properties of projection on convex cone

Proposition 1

II-D The properties of differentiable functions and mappings

Lemma 2

III Stochastic primal-dual coordinate method

Assumption 2

IV Convergence analysis

Lemma 3

Theorem 3** (Almost surely convergence)**

V Convergence rate analysis

Theorem 4

Assumption 3

Lemma 4

Proposition 2

Theorem 5

Remark 1

Remark 2

Appendix

Acknowledgment

Theorem 3 (Almost surely convergence)