Convergence Revisit on Generalized Symmetric ADMM

Jianchao Bai; Xiaokai Chang; Jicheng Li; Fengmin Xu

arXiv:1906.07888·math.NA·June 20, 2019

Convergence Revisit on Generalized Symmetric ADMM

Jianchao Bai, Xiaokai Chang, Jicheng Li, Fengmin Xu

PDF

Open Access

TL;DR

This paper revisits the convergence properties of the generalized symmetric ADMM algorithm, establishing sublinear and linear convergence rates under specific conditions, thereby enhancing understanding of its theoretical performance.

Contribution

It provides new convergence rate results for the generalized symmetric ADMM, including sublinear and linear rates under particular assumptions and parameter settings.

Findings

01

Sublinear nonergodic convergence rate established.

02

Linear convergence under piecewise linear sub-differential and polyhedral constraints.

03

Convergence results depend on dual stepsize parameters within a specific isosceles triangle region.

Abstract

In this note, we show a sublinear nonergodic convergence rate for the algorithm developed in [Bai, et al. Generalized symmetric ADMM for separable convex optimization. Comput. Optim. Appl. 70, 129-170 (2018)], as well as its linear convergence under assumptions that the sub-differential of each component objective function is piecewise linear and all the constraint sets are polyhedra. These remaining convergence results are established for the stepsize parameters of dual variables belonging to a special isosceles triangle region, which aims to strengthen our understanding for convergence of the generalized symmetric ADMM.

Equations173

\begin{array}[]{lll}\min&\sum\limits_{i=1}^{p}f_{i}(x_{i})+\sum\limits_{j=1}^{q}g_{j}(y_{j})\\ \textrm{s.t. }&\sum\limits_{i=1}^{p}A_{i}x_{i}+\sum\limits_{j=1}^{q}B_{j}y_{j}=c,\\ &x_{i}\in\mathcal{X}_{i},\;i=1,\cdots,p,\\ &y_{j}\in\mathcal{Y}_{j},\;j=1,\cdots,q,\\ \end{array}

\begin{array}[]{lll}\min&\sum\limits_{i=1}^{p}f_{i}(x_{i})+\sum\limits_{j=1}^{q}g_{j}(y_{j})\\ \textrm{s.t. }&\sum\limits_{i=1}^{p}A_{i}x_{i}+\sum\limits_{j=1}^{q}B_{j}y_{j}=c,\\ &x_{i}\in\mathcal{X}_{i},\;i=1,\cdots,p,\\ &y_{j}\in\mathcal{Y}_{j},\;j=1,\cdots,q,\\ \end{array}

L_{β} (x, y, λ) = L (x, y, λ) + \frac{β}{2} ∥ A x + B y - c ∥^{2},

L_{β} (x, y, λ) = L (x, y, λ) + \frac{β}{2} ∥ A x + B y - c ∥^{2},

L (x, y, λ) = i = 1 \sum p f_{i} (x_{i}) + j = 1 \sum q g_{j} (y_{j}) - ⟨ λ, A x + B y - c ⟩

L (x, y, λ) = i = 1 \sum p f_{i} (x_{i}) + j = 1 \sum q g_{j} (y_{j}) - ⟨ λ, A x + B y - c ⟩

\left\{\begin{array}[]{lll}\textrm{For}\ i=1,2,\cdots,p,\\ \quad x_{i}^{k+1}=\arg\min\limits_{x_{i}\in\mathcal{X}_{i}}\mathcal{L}_{\beta}(x_{1}^{k},\cdots,x_{i},\cdots,x_{p}^{k},{\bf{y}}^{k},\lambda^{k})+P_{i}^{k}(x_{i}),\\ \quad\textrm{where }P_{i}^{k}(x_{i})=\frac{\sigma_{1}\beta}{2}\left\|A_{i}(x_{i}-x_{i}^{k})\right\|^{2},\\ \lambda^{k+\frac{1}{2}}=\lambda^{k}-\tau\beta(\mathcal{A}{\bf{x}}^{k+1}+\mathcal{B}{\bf{y}}^{k}-c),\\ \\ \textrm{For}\ j=1,2,\cdots,q,\\ \quad y_{j}^{k+1}=\arg\min\limits_{y_{j}\in\mathcal{Y}_{j}}\mathcal{L}_{\beta}({\bf{x}}^{k+1},y_{1}^{k},\cdots,y_{j},\cdots,y_{q}^{k},\lambda^{k+\frac{1}{2}})+Q_{j}^{k}(y_{j}),\\ \quad\textrm{where }Q_{j}^{k}(y_{j})=\frac{\sigma_{2}\beta}{2}\left\|B_{j}(y_{j}-y_{j}^{k})\right\|^{2},\\ \lambda^{k+1}=\lambda^{k+\frac{1}{2}}-s\beta(\mathcal{A}{\bf{x}}^{k+1}+\mathcal{B}{\bf{y}}^{k+1}-c),\end{array}\right.

\left\{\begin{array}[]{lll}\textrm{For}\ i=1,2,\cdots,p,\\ \quad x_{i}^{k+1}=\arg\min\limits_{x_{i}\in\mathcal{X}_{i}}\mathcal{L}_{\beta}(x_{1}^{k},\cdots,x_{i},\cdots,x_{p}^{k},{\bf{y}}^{k},\lambda^{k})+P_{i}^{k}(x_{i}),\\ \quad\textrm{where }P_{i}^{k}(x_{i})=\frac{\sigma_{1}\beta}{2}\left\|A_{i}(x_{i}-x_{i}^{k})\right\|^{2},\\ \lambda^{k+\frac{1}{2}}=\lambda^{k}-\tau\beta(\mathcal{A}{\bf{x}}^{k+1}+\mathcal{B}{\bf{y}}^{k}-c),\\ \\ \textrm{For}\ j=1,2,\cdots,q,\\ \quad y_{j}^{k+1}=\arg\min\limits_{y_{j}\in\mathcal{Y}_{j}}\mathcal{L}_{\beta}({\bf{x}}^{k+1},y_{1}^{k},\cdots,y_{j},\cdots,y_{q}^{k},\lambda^{k+\frac{1}{2}})+Q_{j}^{k}(y_{j}),\\ \quad\textrm{where }Q_{j}^{k}(y_{j})=\frac{\sigma_{2}\beta}{2}\left\|B_{j}(y_{j}-y_{j}^{k})\right\|^{2},\\ \lambda^{k+1}=\lambda^{k+\frac{1}{2}}-s\beta(\mathcal{A}{\bf{x}}^{k+1}+\mathcal{B}{\bf{y}}^{k+1}-c),\end{array}\right.

(τ, s) \in G = {(τ, s) ∣ τ + s > 0, - τ^{2} - s^{2} - τ s + τ + s + 1 > 0},

(τ, s) \in G = {(τ, s) ∣ τ + s > 0, - τ^{2} - s^{2} - τ s + τ + s + 1 > 0},

(τ, s) \in D := {(τ, s) ∣ τ < 1, s < 1, τ + s > 0} .

(τ, s) \in D := {(τ, s) ∣ τ < 1, s < 1, τ + s > 0} .

P_{1}^{k} (x_{1}) = \frac{1}{2} x_{1} - x_{1}^{k}_{T_{1}}^{2}, Q_{1}^{k} (y_{1}) = \frac{1}{2} y_{1} - y_{1}^{k}_{T_{2}}^{2},

P_{1}^{k} (x_{1}) = \frac{1}{2} x_{1} - x_{1}^{k}_{T_{1}}^{2}, Q_{1}^{k} (y_{1}) = \frac{1}{2} y_{1} - y_{1}^{k}_{T_{2}}^{2},

M = X \times Y \times R^{n}

M = X \times Y \times R^{n}

{\bf{u}}=\left(\begin{array}[]{c}{\bf{x}}\\ {\bf{y}}\\ \end{array}\right),\ {\bf{w}}=\left(\begin{array}[]{c}{\bf{x}}\\ {\bf{y}}\\ \lambda\end{array}\right),\ \mathcal{J}({\bf{w}})=\left(\begin{array}[]{c}-{\cal{A}}^{\sf T}\lambda\\ -{\cal{B}}^{\sf T}\lambda\\ {\cal{A}}{\bf{x}}+{\cal{B}}{\bf{y}}-c\end{array}\right),

{\bf{u}}=\left(\begin{array}[]{c}{\bf{x}}\\ {\bf{y}}\\ \end{array}\right),\ {\bf{w}}=\left(\begin{array}[]{c}{\bf{x}}\\ {\bf{y}}\\ \lambda\end{array}\right),\ \mathcal{J}({\bf{w}})=\left(\begin{array}[]{c}-{\cal{A}}^{\sf T}\lambda\\ -{\cal{B}}^{\sf T}\lambda\\ {\cal{A}}{\bf{x}}+{\cal{B}}{\bf{y}}-c\end{array}\right),

\widetilde{{\bf{x}}}^{k}=\left(\begin{array}[]{c}\widetilde{x}_{1}^{k}\\ \widetilde{x}_{2}^{k}\\ \vdots\\ \widetilde{x}_{p}^{k}\end{array}\right)=\left(\begin{array}[]{c}x_{1}^{k+1}\\ x_{2}^{k+1}\\ \vdots\\ x_{p}^{k+1}\end{array}\right),\quad\widetilde{{\bf{y}}}^{k}=\left(\begin{array}[]{c}\widetilde{y}_{1}^{k}\\ \widetilde{y}_{2}^{k}\\ \vdots\\ \widetilde{y}_{q}^{k}\end{array}\right)=\left(\begin{array}[]{c}y_{1}^{k+1}\\ y_{2}^{k+1}\\ \vdots\\ y_{q}^{k+1}\end{array}\right),

\widetilde{{\bf{x}}}^{k}=\left(\begin{array}[]{c}\widetilde{x}_{1}^{k}\\ \widetilde{x}_{2}^{k}\\ \vdots\\ \widetilde{x}_{p}^{k}\end{array}\right)=\left(\begin{array}[]{c}x_{1}^{k+1}\\ x_{2}^{k+1}\\ \vdots\\ x_{p}^{k+1}\end{array}\right),\quad\widetilde{{\bf{y}}}^{k}=\left(\begin{array}[]{c}\widetilde{y}_{1}^{k}\\ \widetilde{y}_{2}^{k}\\ \vdots\\ \widetilde{y}_{q}^{k}\end{array}\right)=\left(\begin{array}[]{c}y_{1}^{k+1}\\ y_{2}^{k+1}\\ \vdots\\ y_{q}^{k+1}\end{array}\right),

\widetilde{{\bf{u}}}^{k}=\left(\begin{array}[]{c}\widetilde{{\bf{x}}}^{k}\\ \widetilde{{\bf{y}}}^{k}\\ \end{array}\right),\quad\widetilde{{\bf{w}}}^{k}=\left(\begin{array}[]{c}\widetilde{{\bf{x}}}^{k}\\ \widetilde{{\bf{y}}}^{k}\\ \widetilde{\lambda}^{k}\end{array}\right)=\left(\begin{array}[]{c}{\bf{x}}^{k+1}\\ {\bf{y}}^{k+1}\\ \widetilde{\lambda}^{k}\end{array}\right),

\widetilde{{\bf{u}}}^{k}=\left(\begin{array}[]{c}\widetilde{{\bf{x}}}^{k}\\ \widetilde{{\bf{y}}}^{k}\\ \end{array}\right),\quad\widetilde{{\bf{w}}}^{k}=\left(\begin{array}[]{c}\widetilde{{\bf{x}}}^{k}\\ \widetilde{{\bf{y}}}^{k}\\ \widetilde{\lambda}^{k}\end{array}\right)=\left(\begin{array}[]{c}{\bf{x}}^{k+1}\\ {\bf{y}}^{k+1}\\ \widetilde{\lambda}^{k}\end{array}\right),

λ^{k} = λ^{k} - β (A x^{k + 1} + B y^{k} - c) .

λ^{k} = λ^{k} - β (A x^{k + 1} + B y^{k} - c) .

h (u) - h (u^{k}) + ⟨ w - w^{k}, J (w^{k}) + Q (w^{k} - w^{k}) ⟩ \geq 0, \forall w \in M,

h (u) - h (u^{k}) + ⟨ w - w^{k}, J (w^{k}) + Q (w^{k} - w^{k}) ⟩ \geq 0, \forall w \in M,

Q=\left[\begin{array}[]{cc}H_{\mathbf{x}}&\bf{0}\\ \bf{0}&\widetilde{Q}\end{array}\right]

Q=\left[\begin{array}[]{cc}H_{\mathbf{x}}&\bf{0}\\ \bf{0}&\widetilde{Q}\end{array}\right]

H_{\mathbf{x}}=\beta\left[\begin{array}[]{ccccccc}\sigma_{1}A_{1}^{\sf T}A_{1}&&-A_{1}^{\sf T}A_{2}&&\cdots&&-A_{1}^{\sf T}A_{p}\\ -A_{2}^{\sf T}A_{1}&&\sigma_{1}A_{2}^{\sf T}A_{2}&&\cdots&&-A_{2}^{\sf T}A_{p}\\ \vdots&&\vdots&&\ddots&&\vdots\\ -A_{p}^{\sf T}A_{1}&&-A_{p}^{\sf T}A_{2}&&\cdots&&\sigma_{1}A_{p}^{\sf T}A_{p}\end{array}\right],

H_{\mathbf{x}}=\beta\left[\begin{array}[]{ccccccc}\sigma_{1}A_{1}^{\sf T}A_{1}&&-A_{1}^{\sf T}A_{2}&&\cdots&&-A_{1}^{\sf T}A_{p}\\ -A_{2}^{\sf T}A_{1}&&\sigma_{1}A_{2}^{\sf T}A_{2}&&\cdots&&-A_{2}^{\sf T}A_{p}\\ \vdots&&\vdots&&\ddots&&\vdots\\ -A_{p}^{\sf T}A_{1}&&-A_{p}^{\sf T}A_{2}&&\cdots&&\sigma_{1}A_{p}^{\sf T}A_{p}\end{array}\right],

\widetilde{Q}=\left[\begin{array}[]{ccccccc|c}(\sigma_{2}+1)\beta B_{1}^{\sf T}B_{1}&&{\bf 0}&&\cdots&&{\bf 0}&-\tau B_{1}^{\sf T}\\ {\bf 0}&&(\sigma_{2}+1)\beta B_{2}^{\sf T}B_{2}&&\cdots&&{\bf 0}&-\tau B_{2}^{\sf T}\\ \vdots&&\vdots&&\ddots&&\vdots&\vdots\\ {\bf 0}&&{\bf 0}&&\cdots&&(\sigma_{2}+1)\beta B_{q}^{\sf T}B_{q}&-\tau B_{q}^{\sf T}\\ \hline\cr-B_{1}&&-B_{2}&&\cdots&&-B_{q}&\frac{1}{\beta}I\end{array}\right].

\widetilde{Q}=\left[\begin{array}[]{ccccccc|c}(\sigma_{2}+1)\beta B_{1}^{\sf T}B_{1}&&{\bf 0}&&\cdots&&{\bf 0}&-\tau B_{1}^{\sf T}\\ {\bf 0}&&(\sigma_{2}+1)\beta B_{2}^{\sf T}B_{2}&&\cdots&&{\bf 0}&-\tau B_{2}^{\sf T}\\ \vdots&&\vdots&&\ddots&&\vdots&\vdots\\ {\bf 0}&&{\bf 0}&&\cdots&&(\sigma_{2}+1)\beta B_{q}^{\sf T}B_{q}&-\tau B_{q}^{\sf T}\\ \hline\cr-B_{1}&&-B_{2}&&\cdots&&-B_{q}&\frac{1}{\beta}I\end{array}\right].

w^{k + 1} = w^{k} - M (w^{k} - w^{k}),

w^{k + 1} = w^{k} - M (w^{k} - w^{k}),

M=\left[\begin{array}[]{c|cccc}I&&&&\\ \hline\cr&I&&&\\ &&\ddots&&\\ &&&I&\\ &-s\beta B_{1}&\cdots&-s\beta B_{q}&(\tau+s)I\\ \end{array}\right].

M=\left[\begin{array}[]{c|cccc}I&&&&\\ \hline\cr&I&&&\\ &&\ddots&&\\ &&&I&\\ &-s\beta B_{1}&\cdots&-s\beta B_{q}&(\tau+s)I\\ \end{array}\right].

G = Q + Q^{T} - M^{T} Q,

G = Q + Q^{T} - M^{T} Q,

G=\left[\begin{array}[]{cc}H_{\mathbf{x}}&\bf{0}\\ \bf{0}&\widetilde{G}\end{array}\right],

G=\left[\begin{array}[]{cc}H_{\mathbf{x}}&\bf{0}\\ \bf{0}&\widetilde{G}\end{array}\right],

\widetilde{G}=\left[\begin{array}[]{ccccc|c}(\sigma_{2}+1-s)\beta B_{1}^{\sf T}B_{1}&&\cdots&&-s\beta B_{1}^{\sf T}B_{q}&(s-1)B_{1}^{\sf T}\\ \vdots&&\ddots&&\vdots&\vdots\\ -s\beta B_{q}^{\sf T}B_{1}&&\cdots&&(\sigma_{2}+1-s)\beta B_{q}^{\sf T}B_{q}&(s-1)B_{q}^{\sf T}\\ \hline\cr(s-1)B_{1}&&\cdots&&(s-1)B_{q}&\frac{2-\tau-s}{\beta}I\end{array}\right].

\widetilde{G}=\left[\begin{array}[]{ccccc|c}(\sigma_{2}+1-s)\beta B_{1}^{\sf T}B_{1}&&\cdots&&-s\beta B_{1}^{\sf T}B_{q}&(s-1)B_{1}^{\sf T}\\ \vdots&&\ddots&&\vdots&\vdots\\ -s\beta B_{q}^{\sf T}B_{1}&&\cdots&&(\sigma_{2}+1-s)\beta B_{q}^{\sf T}B_{q}&(s-1)B_{q}^{\sf T}\\ \hline\cr(s-1)B_{1}&&\cdots&&(s-1)B_{q}&\frac{2-\tau-s}{\beta}I\end{array}\right].

τ < 1, s < 1, and τ + s < 2.

τ < 1, s < 1, and τ + s < 2.

G = D^{T} G_{0} D,

G = D^{T} G_{0} D,

\displaystyle\widetilde{G}_{0}=\left[\begin{array}[]{ccccccc|c}(\sigma_{2}+1-s)I&&-sI&&\cdots&&-sI&(s-1)I\\ -sI&&(\sigma_{2}+1-s)I&&\cdots&&-sI&(s-1)I\\ \vdots&&\vdots&&\ddots&&\vdots&\vdots\\ -sI&&-sI&&\cdots&&(\sigma_{2}+1-s)I&(s-1)I\\ \hline\cr(s-1)I&&(s-1)I&&\cdots&&(s-1)I&(2-\tau-s)I\\ \end{array}\right]=P^{\sf T}\widetilde{G}_{0,0}P.

\displaystyle\widetilde{G}_{0}=\left[\begin{array}[]{ccccccc|c}(\sigma_{2}+1-s)I&&-sI&&\cdots&&-sI&(s-1)I\\ -sI&&(\sigma_{2}+1-s)I&&\cdots&&-sI&(s-1)I\\ \vdots&&\vdots&&\ddots&&\vdots&\vdots\\ -sI&&-sI&&\cdots&&(\sigma_{2}+1-s)I&(s-1)I\\ \hline\cr(s-1)I&&(s-1)I&&\cdots&&(s-1)I&(2-\tau-s)I\\ \end{array}\right]=P^{\sf T}\widetilde{G}_{0,0}P.

P=\left[\begin{array}[]{ccccccc|c}I&&&&&&&\\ &&I&&&&&\\ &&&&\ddots&&&\\ &&&&&&I&\\ \hline\cr\frac{1-s}{2-\tau-s}I&&\frac{1-s}{2-\tau-s}I&&\cdots&&\frac{1-s}{2-\tau-s}I&I\\ \end{array}\right]

P=\left[\begin{array}[]{ccccccc|c}I&&&&&&&\\ &&I&&&&&\\ &&&&\ddots&&&\\ &&&&&&I&\\ \hline\cr\frac{1-s}{2-\tau-s}I&&\frac{1-s}{2-\tau-s}I&&\cdots&&\frac{1-s}{2-\tau-s}I&I\\ \end{array}\right]

G_{0, 0}

G_{0, 0}

E=\left[\begin{array}[]{c}I\\ I\\ \vdots\\ I\end{array}\right]\quad\mbox{ and }\quad H_{\mathbf{y},0}=\left[\begin{array}[]{cccc}\sigma_{2}I&-I&\cdots&-I\\ -I&\sigma_{2}I&\cdots&-I\\ \vdots&\vdots&\ddots&\vdots\\ -I&-I&\cdots&\sigma_{2}I\end{array}\right].

E=\left[\begin{array}[]{c}I\\ I\\ \vdots\\ I\end{array}\right]\quad\mbox{ and }\quad H_{\mathbf{y},0}=\left[\begin{array}[]{cccc}\sigma_{2}I&-I&\cdots&-I\\ -I&\sigma_{2}I&\cdots&-I\\ \vdots&\vdots&\ddots&\vdots\\ -I&-I&\cdots&\sigma_{2}I\end{array}\right].

H_{y, 0} + (1 - s - \frac{( 1 - s ) ^{2}}{2 - τ - s}) E E^{T}

H_{y, 0} + (1 - s - \frac{( 1 - s ) ^{2}}{2 - τ - s}) E E^{T}

1 - s - \frac{( 1 - s ) ^{2}}{2 - τ - s} = \frac{( 1 - s ) ( 1 - τ )}{2 - τ - s} > 0,

1 - s - \frac{( 1 - s ) ^{2}}{2 - τ - s} = \frac{( 1 - s ) ( 1 - τ )}{2 - τ - s} > 0,

w^{k + 1} - w^{*}_{H}^{2} \leq w^{k} - w^{*}_{H}^{2} - w^{k} - w^{k}_{G}^{2}, \forall w^{*} \in M^{*},

w^{k + 1} - w^{*}_{H}^{2} \leq w^{k} - w^{*}_{H}^{2} - w^{k} - w^{k}_{G}^{2}, \forall w^{*} \in M^{*},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research · Optimization and Variational Analysis

Full text

Convergence Revisit on Generalized Symmetric ADMM

††thanks: The work was supported by the National Natural Science Foundation of China (Nos. 11671318; 11571271; 11631013) and the Natural Science Foundation of Fujian Province (No. 2016J01028). The second author Xiaokai Chang was supported by the Hongliu Foundation of First-class Disciplines of Lanzhou University of Technology.

Jianchao Bai 111Department of Applied Mathematics, Northwestern Polytechnical University, Xi’an, 710129, China. Past addresses: School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China ([email protected]). Xiaokai Chang 222College of Science, Lanzhou University of Technology, Lanzhou 730050, China ([email protected]). Jicheng Li 333School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China ([email protected]). Fengmin Xu 444School of Economics and Finance, Xi’an Jiaotong University, Xi’an 710049, China ([email protected]).

Abstract

In this note, we show a sublinear nonergodic convergence rate for the algorithm developed in [Bai, et al. Generalized symmetric ADMM for separable convex optimization. Comput. Optim. Appl. 70, 129-170 (2018)], as well as its linear convergence under assumptions that the sub-differential of each component objective function is piecewise linear and all the constraint sets are polyhedra. These remaining convergence results are established for the stepsize parameters of dual variables belonging to a special isosceles triangle region, which aims to strengthen our understanding for convergence of the generalized symmetric ADMM.

Keywords: Convex optimization; Alternating direction method of multipliers; Symmetric parameter domain; Convergence rate

Mathematics Subject Classification(2010): 65K10; 68W40; 90C25

1 Introduction

Revisit the following prototype multi-block separable convex optimization

[TABLE]

where $f_{i}(x_{i}):\mathbb{R}^{m_{i}}\rightarrow{\mathbb{R}},g_{j}(y_{j}):\mathbb{R}^{d_{j}}\rightarrow\mathbb{R}$ are closed and proper convex functions (possibly nonsmooth); $A_{i}\in\mathbb{R}^{n\times m_{i}},B_{j}\in\mathbb{R}^{n\times d_{j}}$ and $c\in\mathbb{R}^{n}$ are given matrices and vectors, respectively; $\mathcal{X}_{i}\subset\mathbb{R}^{m_{i}}$ and $\mathcal{Y}_{j}\subset\mathbb{R}^{d_{j}}$ are polyhedra; $p\geq 1$ and $q\geq 1$ denote two integers. Throughout we assume the solution set of the problem (1) is nonempty and all the matrices $A_{i}(i=1,\cdots,p)$ and $B_{j}(j=1,\cdots,q)$ have full column rank.

By denoting $\mathcal{A}=\left[A_{1},\cdots,A_{p}\right],\mathcal{B}=\left[B_{1},\cdots,B_{q}\right],{\bf{x}}=(x_{1},\cdots,x_{p})$ and ${\bf{y}}=(y_{1},\cdots,y_{q})$ , the augmented Lagrangian function of the problem (1) is written as

[TABLE]

where $\beta>0$ is a penalty parameter and

[TABLE]

denotes the Lagrangian function associated with a Lagrange multiplier $\lambda\in\mathbb{R}^{n}$ . As studied in our recent work [2], the Generalized Symmetric Alternating Direction Method of Multipliers (GS-ADMM) reads the following updates:

[TABLE]

where $\tau$ and $s$ are stepsize parameters satisfying

[TABLE]

and $\sigma_{1}\in(p-1,+\infty),\sigma_{2}\in(q-1,+\infty)$ are proximal parameters for the regularization terms $P_{i}^{k}(\cdot)$ and $Q_{j}^{k}(\cdot)$ , respectively.

By making use of a prediction-correction interpretation for GS-ADMM, we analyzed its global convergence, sublinear convergence rate in the ergodic sense and convergence complexity of two special cases allowing either $\sigma_{1}$ or $\sigma_{2}$ to be zero. However, two remaining tasks were not settled as mentioned by the past reviewers: (1) How to establish its worst-case $\mathcal{O}(1/t)$ convergence rate in the nonergodic sense, where $t$ denotes the iteration number? (2) Whether there exists a linear convergence rate of GS-ADMM under some mild assumptions? This note aims to give positive answers for these questions but for the following subregion (shown in the right-hand side of Fig. 1) of $\mathcal{G}$ , that is,

[TABLE]

Notice that the above region is much wider than that ( $\tau=s\in(0,1)$ ) in [8, Algorithm 3]. Moreover, it can be seen by later analysis that the symmetric ADMM (S-ADMM, [9]) for solving the two-block separable convex optimization also has the worst-case $\mathcal{O}(1/t)$ convergence rate in the nonegodic sense as well as global linear convergence rate for parameters belonging to $\mathcal{D}$ .

1.1 Relationship of GS-ADMM to related works

The algorithm GS-ADMM was initially proposed to generalize the meaningful S-ADMM [9] for solving the grouped multi-block separable convex optimization problem (1), whose convergence and iteration complexity could be still ensured for a larger domain of stepsizes of dual variables than that introduced in [9]. In practise, convergence of GS-ADMM was analyzed by estimating the lower bound of $\left\|{\bf{w}}^{k}-\widetilde{{\bf{w}}}^{k}\right\|_{G}^{2}$ directly and by treating the domain of stepsize parameters as a whole, while convergence of S-ADMM was showed separately by splitting the domain of $(\tau,s)$ into several subdomains, where ${\bf{w}}^{k+1}$ and $\widetilde{{\bf{w}}}^{k}$ are called the predictive variable and the correcting variable, respectively. Note that by taking $\sigma_{1}=\sigma_{2}=0$ , GS-ADMM with $p=q=1$ will become S-ADMM but continue to converge in the relatively larger convergence domain ${\cal{G}}$ . In addition, the original S-ADMM only works for the two-block case and may not be convenient for solving large-scale problems, while GS-ADMM could handle large-scale multiple block problems since the block variables within each group were updated in a Jacobian scheme.

Regardless of the additional dual variable update $\lambda^{k+\frac{1}{2}}$ (i.e. $\tau=0$ ), then GS-ADMM becomes a proximal ADMM-type algorithm with $s\in(0,\frac{1+\sqrt{5}}{2})$ . Moreover, it will become the classical ADMM proposed by Glowinski-Marrocco [7] when considering the simple two block case without using proximal regularization terms. To the best of knowledge, the first proximal ADMM was proposed by Eckstein [3] as GS-ADMM with $p=q=1,(\tau,s)=(0,1)$ and with the following proximal terms

[TABLE]

where $\mathcal{T}_{i}=\frac{\mu_{i}^{2}}{\beta}I$ for any nonzero scalars $\mu_{i},i=1,2.$ Later, a perfect extension on convergence analysis from the classical ADMM to GS-ADMM with $p=q=1$ and $\tau=0$ , but allowing the stepsize $s$ to stay in the range $(0,\frac{1+\sqrt{5}}{2})$ was studied, see Xu-Wu [14] and Fazel, et. al. [5] for more details. Recently, He-Xu-Yuan [10] constructed a proximal ADMM for solving the problem (1) with only $p$ block variables, and their algorithm could be regarded as a special version of GS-ADMM with $(\tau,s)=(0,1)$ barring the $y_{j}$ -updates. Especially, the partially proximal ADMM-type algorithm [12] with a specified regularization term $Q_{j}^{k}(y_{j})$ as ours could be treated as the case that GS-ADMM with $p=1,\sigma_{1}=0$ and $\tau=1.$ Considering the middle update $\lambda^{k+\frac{1}{2}}$ (i.e. $\tau\neq 0$ ), convergence domain of the dual stepsizes of GS-ADMM is still larger than that in the symmetric ADMM with indefinite proximal regularization [6, 13].

1.2 Notations and organizations

Throughout the note, the symbols $\mathbb{R},\mathbb{R}^{n},\mathbb{R}^{m\times n}$ denote the sets of real numbers, $n$ dimensional real column vectors and $m\times n$ real matrices, respectively. For any $x,y\in\mathbb{R}^{n}$ , $\langle x,y\rangle=x^{\sf T}y$ represents their inner product and $\|x\|=\sqrt{\langle x,x\rangle}$ denotes the Euclidean norm of $x$ , where T denotes the transpose operation. For any symmetric matrix $G$ , we define $\|x\|_{G}^{2}=x^{\sf T}Gx$ which is not necessarily nonnegative unless $G$ is positive definite. The symbols $\lambda_{\max}(\cdot)$ and $\lambda_{\min}(\cdot)$ denote respectively the maximum and minimum eigenvalue of a square matrix. The notations $I$ and $\bf{0}$ stand for the identity matrix and zero matrix with proper dimensions, respectively. We call $\phi(x)$ a piecewise linear multifunction if its graph $\{(x,y)|\ y\in\phi(x)\}$ is a union of finitely many polyhedra. For convenience, let

[TABLE]

and the corresponding solution set be $\mathcal{M}^{*},$ where ${\cal{X}}={\cal{X}}_{1}\times{\cal{X}}_{2}\times\cdots{\cal{X}}_{p}$ and ${\cal{Y}}={\cal{Y}}_{1}\times{\cal{Y}}_{2}\times\cdots{\cal{Y}}_{q}$ . We also preset

[TABLE]

and

[TABLE]

The rest of this paper is organized as follows. In Section 2, by making use of some well-known identities, inequalities and matrix decomposition techniques, we first establish sublinear convergence rate of GS-ADMM in the nonergodic sense. Then, its global linear convergence rate, measured by an error function $\textrm{dist}^{2}_{H}({\bf{w}}^{k+1},\mathcal{M}^{*})$ or $\left\|{\bf{w}}^{k}-{\bf{w}}^{\infty}\right\|_{H}$ , is analyzed under mild assumptions. Finally, we briefly conclude the paper in Section 3.

2 Main results

At the beginning of this section, we first analyze the worst-case $\mathcal{O}(1/t)$ nonergodic convergence rate of GS-ADMM for any $(\tau,s)\in\mathcal{D}$ . Then, by using several well-known inequalities its convergence rate is strengthened to linear under the assumption that the subdifferential of each objective function is piecewise linear.

2.1 Sublinear nonergodic convergence rate

Let us review the following two basic lemmas given in [2], which aims to interpret the GS-ADMM into a prediction-correction procedure.

Lemma 2.1

For the iterates $\widetilde{{\bf{u}}}^{k},\widetilde{{\bf{w}}}^{k}$ defined in (5), we have $\widetilde{{\bf{w}}}^{k}\in\mathcal{M}$ and

[TABLE]

where $h({\bf{u}})=\sum\limits_{i=1}^{p}f_{i}(x_{i})+\sum\limits_{j=1}^{p}g_{j}(y_{j})$ and

[TABLE]

with

[TABLE]

Lemma 2.2

For the sequences $\{{\bf{w}}^{k}\}$ and $\{\widetilde{{\bf{w}}}^{k}\}$ generated by GS-ADMM, the following equality holds

[TABLE]

where

[TABLE]

Now, we give a lemma to guarantee the positive definiteness of $G$ , defined by

[TABLE]

which plays a significant role in showing the whole convergence rate of GS-ADMM.

Lemma 2.3

Let $Q,M$ be given by (8) and (11), respectively. Then, the matrix $G$ is symmetric positive definite for any $(\tau,s)\in\mathcal{D}$ .

**Proof ** By simple calculations, the matrix $G$ can be explicitly written as

[TABLE]

where $H_{\mathbf{x}}$ is defined in (9) and

[TABLE]

Clearly, the matrix $G$ is symmetric positive definite if and only if both $H_{\mathbf{x}}$ and $\widetilde{G}$ are symmetric positive definite. Well, $H_{\mathbf{x}}$ is symmetric and its positivity can be guaranteed by the known conditions that $\sigma_{1}>p-1$ and the full column rank assumption on the matrices $A_{i},i=1,2,\cdots,p$ . Hence, we just need to demonstrate the positivity of the matrix $\widetilde{G}$ .

Noting that by the region shown in (3) we have

[TABLE]

Besides, it follows

[TABLE]

where $\widetilde{D}={\rm Diag}(\beta^{\frac{1}{2}}B_{1},\cdots,\beta^{\frac{1}{2}}B_{q},\beta^{-\frac{1}{2}}I)$ is a diagonal matrix and

[TABLE]

In the above decomposition, we have

[TABLE]

and

[TABLE]

where

[TABLE]

So, the matrix $\widetilde{G}$ is positive definite if and only if

[TABLE]

is positive definite. Notice that $H_{\mathbf{y},0}$ is positive definite if $\sigma_{2}>q-1$ , and $\left(1-s-\frac{(1-s)^{2}}{2-\tau-s}\right)EE^{\sf T}$ is positive definite if

[TABLE]

which is clearly guaranteed by the conditions (12). This completes the proof. $\ \ \ \diamondsuit$

Theorem 2.1

[2]** The sequences $\{{\bf{w}}^{k}\}$ and $\{\widetilde{{\bf{w}}}^{k}\}$ generated by GS-ADMM satisfy

[TABLE]

where $\mathcal{M}^{*}=\bigcap\limits_{{\bf{w}}\in\mathcal{M}}\left\{\widehat{{\bf{w}}}\in\mathcal{M}|\ h({\bf{u}})-h(\widehat{{\bf{u}}})+\left\langle{\bf{w}}-\widehat{{\bf{w}}},\mathcal{J}({\bf{w}})\right\rangle\geq 0\right\}$ and

[TABLE]

is symmetric positive definite for any $(\tau,s)\in\mathcal{D}$ .

In view of both Lemma 2.3 and Theorem 2.1, the sequence $\{\widetilde{{\bf{w}}}^{k}\}$ generated by GS-ADMM is contractive, which implies a global convergence of GS-ADMM. In fact, by estimating the lower bound of $\left\|{\bf{w}}^{k}-\widetilde{{\bf{w}}}^{k}\right\|_{G}^{2}$ , a global convergence of GS-ADMM was proved in [2] for the larger region ${\cal{G}}(\supseteq\mathcal{D})$ . Next, we will show sublinear nonergodic convergence rate of GS-ADMM for our discussed stepsize region $\mathcal{D}$ .

Lemma 2.4

Let $Q,M,H$ be given by (8), (11) and (16), respectively. Then, the sequences $\{{\bf{w}}^{k}\}$ and $\{\widetilde{{\bf{w}}}^{k}\}$ generated by GS-ADMM satisfy

[TABLE]

**Proof ** Setting ${\bf{w}}=\widetilde{{\bf{w}}}^{k+1}$ in (7), we obtain

[TABLE]

Meanwhile, the inequality (7) with $k:=k+1$ also implies

[TABLE]

which, by letting ${\bf{w}}=\widetilde{{\bf{w}}}^{k}$ , gives

[TABLE]

Because of the skew-symmetric property of $\mathcal{J}({\bf{w}})$ , i.e.,

[TABLE]

we have from (17) and (18) that

[TABLE]

Thus, adding the identity

[TABLE]

to both sides of (19), we get

[TABLE]

which immediately completes the whole proof by the relationships in (10) and (16). $\ \ \ \diamondsuit$

Next, we establish the worst-case $\mathcal{O}(1/t)$ nonergodic convergence rate of GS-ADMM in terms of optimality errors based on the following theorem.

Theorem 2.2

Let the sequences $\{{\bf{w}}^{k}\}$ and $\{\widetilde{{\bf{w}}}^{k}\}$ be generated by GS-ADMM. Then, for any integer $t>0$ there exists a constant $\xi>0$ such that

[TABLE]

**Proof ** Combining the aforementioned Theorem 2.1 and Lemma 2.3, there exists a constant $\xi>0$ such that

[TABLE]

which suggests

[TABLE]

for any integer $t>0$ . Meanwhile, by setting $a=M({\bf{w}}^{k}-\widetilde{{\bf{w}}}^{k})$ and $b=M({\bf{w}}^{k+1}-\widetilde{{\bf{w}}}^{k+1})$ into the following well-known identity

[TABLE]

we have

[TABLE]

where the above first inequality uses Lemma 2.4 and the final equality uses Lemma 2.3. Therefore, it holds by (21) that

[TABLE]

Substituting it into (20), the proof is completed. $\ \ \ \diamondsuit$

Theorem 2.3

For any integer $t>0$ , there exists a constant $\theta>0$ such that

[TABLE]

where $\mathbf{d}^{t}$ is defined by (23) satisfying (25), and $\theta$ depends on the problem data and the parameters of GS-ADMM.

**Proof ** Let

[TABLE]

componentwisely defined as

[TABLE]

Then, according to the proof of [2, Lemma 2], that is, the first-order optimality conditions of the subproblems of GS-ADMM, we have

[TABLE]

which implies

[TABLE]

Here the notation $\mathcal{N}_{\mathcal{X}}(x)$ denotes the normal cone of $\mathcal{X}$ at $x$ . By (24) and Theorem 2.2, it can be deduced that

[TABLE]

where and in the following proof, $\theta$ depends only on the problem data and the parameters of GS-ADMM.

We next prove the inequality in the right-hand side of (22). Since the equality (6) can be rewritten as

[TABLE]

we have

[TABLE]

Clearly, a nonergodic convergence rate in general is stronger than the ergodic convergence rate for GS-ADMM. Let $c_{0}=\inf\limits_{{\bf{w}}^{*}\in\mathcal{M}^{*}}\left\|{\bf{w}}^{0}-{\bf{w}}^{*}\right\|_{H}^{2}.$ Then, for any tolerance $\epsilon>0$ , Theorem 2.2 tells us that it needs at most $\lfloor\frac{c_{0}}{\xi\epsilon}\rfloor$ iterations to ensure $\left\|M({\bf{w}}^{t}-\widetilde{{\bf{w}}}^{t})\right\|_{H}^{2}\leq\epsilon.$ If ${\cal{A}}\widetilde{{\bf{x}}}^{t}+{\cal{B}}\widetilde{{\bf{y}}}^{t}=c$ and $\mathbf{d}^{t}=0$ , then we will have $\widetilde{{\bf{w}}}^{t}\in\mathcal{M}^{*}$ . Hence, we could use $\widetilde{{\bf{w}}}^{t}$ (or equivalently the iterate ${\bf{w}}^{t}$ since $\lim\limits_{t\rightarrow\infty}\widetilde{{\bf{w}}}^{t}-{\bf{w}}^{t}=0$ by the proof of [2, Theorem 6]) as an approximate solution of the problem when the right-hand sides of the inequalities in (22) are sufficiently small.

2.2 Linear convergence rate

Throughout this subsection, all subdifferentials of the functions $f_{i},g_{j}$ in (1) are assumed to be piecewise liner multi-functions. Under this hypothesis we will prove a global linear convergence rate of GS-ADMM by the aid of an error function

[TABLE]

If $H=I,$ we simply denote $\textrm{dist}^{2}_{I}({\bf{w}}^{k},\mathcal{M}^{*})$ by $\textrm{dist}^{2}({\bf{w}}^{k},\mathcal{M}^{*}).$

Since each $\mathcal{X}_{i}$ in the problem (1) is a polyhedron, so $\mathcal{X}_{i}$ is convex and any projection operator $\mathcal{P}_{\mathcal{X}_{i}}(x_{i}):=\arg\min\limits_{c\in\mathcal{X}_{i}}\|c-x_{i}\|$ is piecewise linear from [4, Proposition 4.1.4]. Here $\mathcal{P}_{\mathcal{X}_{i}}$ is nonexpansive, that is, the following inequality holds:

[TABLE]

Let $\partial f(x)$ be the sub-differential of a convex function $f(x):\mathbb{R}^{n}\rightarrow\mathbb{R}$ , defined as

[TABLE]

Then, for any saddle-point ${\bf{w}}^{*}=(x_{i}^{*},\cdots,x_{p}^{*},y_{1}^{*},\cdots,y_{q}^{*},\lambda^{*})$ of (1), there exist $\eta_{i}\in\partial f_{i}(x_{i})(i=1,\cdots,p)$ and $\nu_{j}\in\partial g_{j}(y_{j})(j=1,\cdots,q)$ such that

[TABLE]

which can be characterized by solving the equation $\left\|e_{\mathcal{M}}({\bf{w}},\gamma)\right\|=0$ with

[TABLE]

Under the assumption that $\partial f_{i}$ and $\partial g_{j}$ are piecewise linear multi-functions, $e_{\mathcal{M}}({\bf{w}},\gamma)$ is also piecewise linear. Besides, ${\bf{w}}^{*}\in\mathcal{M}^{*}$ if and only if $\textbf{0}\in e_{\mathcal{M}}({\bf{w}},1)$ . The following lemma, coming from Robinsons’s continuity property [11] for polyhedral multi-functions, shows that $\textrm{dist}(\mathbf{0},e_{\mathcal{M}}({\bf{w}},1))$ could provide a global error bound on the distance of ${\bf{w}}$ to the solution set $\mathcal{M}^{*}$ .

Lemma 2.5

Under the assumption that $\partial f_{i}$ and $\partial g_{j}$ are piecewise linear multi-functions, there exists a constant $\zeta>0$ such that

[TABLE]

For convenience of analysis, let

[TABLE]

Define

[TABLE]

with

[TABLE]

Note that all the above $(p+q+1)$ notations are positive since the matrices $A_{i},B_{j}$ have full column rank. Hence, $\delta$ is a positive number.

Theorem 2.4

Let $\delta$ be defined in (27) with $\widetilde{\mu}_{i},\widetilde{\nu}_{j}$ being defined in (26). Then, the sequences $\{{\bf{w}}^{k}\}$ and $\{\widetilde{{\bf{w}}}^{k}\}$ generated by GS-ADMM satisfy

[TABLE]

**Proof ** Firstly, by the equation (20) mentioned in [2], that is,

[TABLE]

there exists $\eta_{i}\in\partial f_{i}(x_{i}),i=1,\cdots,p,$ such that

[TABLE]

Therefore, we have from the definition of $\textrm{dist}(\mathbf{0},\cdot)$ and the nonexpansive property of the projection operator that

[TABLE]

where the second equality uses the fact

[TABLE]

Similarly, there exists $\nu_{j}\in\partial g_{j}(y_{j}),j=1,\cdots,q,$ such that

[TABLE]

Hence, we have

[TABLE]

Secondly, we can get by the update of $\lambda^{k+1}$ and $\lambda^{k+\frac{1}{2}}$ in GS-ADMM as well as (6) and (38) that

[TABLE]

which further shows

[TABLE]

Denote by

[TABLE]

Then, by combining (2.2), (2.2)-(50) together with the following identity

[TABLE]

for any $d_{i}\in\mathbb{R}^{n},i=1,2\cdots,,n$ , it can be achieved by the fact $\lambda_{\max(A^{\sf T}A)}=\lambda_{\max(AA^{\sf T})}$ that

[TABLE]

Based on the above preparations, we show a global linear convergence rate of GS-ADMM.

Theorem 2.5

Let $\delta$ be defined in (27) with $\widetilde{\mu}_{i},\widetilde{\nu}_{j}$ being defined in (26). Then, there exists a constant $\zeta>0$ such that the sequence $\{{\bf{w}}^{k}\}$ generated by GS-ADMM satisfies

[TABLE]

where

[TABLE]

**Proof ** Because $\mathcal{M}^{*}$ is a closed convex set, there exists a ${\bf{w}}^{*}_{k}\in\mathcal{M}^{*}$ satisfying

[TABLE]

Then, by Lemma 2.5 and Theorem 2.4 there exists a constant $\zeta>0$ such that

[TABLE]

where $G$ and $H$ are respectively defined in Lemma 2.3 and (16). So, we will have from the above inequality that

[TABLE]

This completes the whole proof. $\ \ \ \diamondsuit$

Next, we show that $\{{\bf{w}}^{k}\}$ generated by GS-ADMM converges to a point ${\bf{w}}^{\infty}\in\mathcal{M}^{*}$ R-linearly.

Corollary 2.1

Let $\epsilon>0$ be defined in Theorem 2.5 and the sequence $\{{\bf{w}}^{k}\}$ be generated by GS-ADMM. Then, there exists a point ${\bf{w}}^{\infty}\in\mathcal{M}^{*}$ such that

[TABLE]

where

[TABLE]

**Proof ** Select ${\bf{w}}_{k}^{*}\in\mathcal{M}^{*}$ such that $\textrm{dist}_{H}({\bf{w}}^{k},\mathcal{M}^{*})=\left\|{\bf{w}}^{k}-{\bf{w}}_{k}^{*}\right\|_{H}$ and let

[TABLE]

Then, it follows from Theorem 2.1 that $\left\|{\bf{w}}^{k+1}-{\bf{w}}_{k}^{*}\right\|_{H}\leq\left\|{\bf{w}}^{k}-{\bf{w}}_{k}^{*}\right\|_{H}$ implying

[TABLE]

where the last inequality comes from Theorem 2.5. According to [2, Theorem 6], the sequence $\{{\bf{w}}^{k}\}$ generated by GS-ADMM converges to a $w^{\infty}\in\mathcal{M}^{*}$ . Hence, we obtain by (52) that ${\bf{w}}^{\infty}={\bf{w}}^{k}+\sum_{j=k}^{\infty}d^{j}$ , which together with (53) show

[TABLE]

Hence, the assertion (51) holds, namely, ${\bf{w}}^{k}$ converges ${\bf{w}}^{\infty}$ R-linearly. $\ \ \ \diamondsuit$

3 Conclusion remark

In this note, we further study iteration-complexity of GS-ADMM for solving the prototype multi-block separable convex optimization model. We establish its sublinear nonergodic convergence rate and also a R-linear convergence rate under assumptions that the sub-differential of each component function in the objective function is piecewise linear and all the constraint sets are polyhedra. By the fourth part discussed in [2] and the analysis in this work, the GS-ADMM with either $\sigma_{1}=0$ or $\sigma_{2}=0$ has a similar convergence rate as described in Theorem 2.3, Theorem 2.5 and Corollary 2.1. Viewed from the proof of Theorem 2.5, the linear convergence analysis depends mainly on Theorem 2.4 and the positivity of the matrix $G$ . Hence, if the sequence generated by an algorithm has the property similar to the results of Theorem 2.1, then one can prove that such algorithm converges linearly provided that the weighted matrix $G$ is positive definite.

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1]
2[2] Bai, J.C., Li, J.C., Xu, F.M., Zhang, H.C.: Generalized symmetric ADMM for separable convex optimization. Comput. Optim. Appl. 70, 129-170 (2018)
3[3] Eckstein, J.: Some saddle-function splitting methods for convex programming. Optim. Methods Softw. 4, 75-83 (1994)
4[4] Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer-Verlag, Berlin (2003)
5[5] Fazel, M., Pong, T.K., Sun, D.F., Tseng, P.: Hankel matrix rank minimization with applications to system identification and realization. SIAM J. Matrix Anal. Appl. 34, 946-977 (2013)
6[6] Gao, B., Ma, F.: Symmetric alternating direction method with indefinite proximal regularization for linearly constrained convex optimization. J. Optim. Theory Appl. 176, 178-204 (2018)
7[7] Glowinski, R.: Marrocco, A.: Approximation par e ´ ´ 𝑒 \acute{e} l e ´ ´ 𝑒 \acute{e} ments finis d’rdre un et r e ´ ´ 𝑒 \acute{e} solution, par p e ´ ´ 𝑒 \acute{e} nalisation-dualit e ´ ´ 𝑒 \acute{e} d’une classe de probl e ` ` 𝑒 \grave{e} mes de Dirichlet non lin e ´ ´ 𝑒 \acute{e} aires. Rev. Fr. Autom. Inform. Rech. Op e ´ ´ 𝑒 \acute{e} r. Anal. Num e ´ ´ 𝑒 \acute{e} r. 2, 41-76 (1975)
8[8] He, B.S., Yuan, X.M.: Block-wise alternating direction method of multipliers for multiple-block convex programming and beyond. SMAI J. Comput. Math. 1, 145-174 (2015)