Complexity Certification of a Distributed Augmented Lagrangian Method

Soomin Lee; Nikolaos Chatzipanagiotis; Michael M. Zavlanos

arXiv:1705.11119·math.OC·January 16, 2018·IEEE Trans. Autom. Control.

Complexity Certification of a Distributed Augmented Lagrangian Method

Soomin Lee, Nikolaos Chatzipanagiotis, Michael M. Zavlanos

PDF

Open Access

TL;DR

This paper provides complexity bounds for a distributed Augmented Lagrangian algorithm solving convex problems with coupled constraints, demonstrating an $O(1/\epsilon)$ iteration complexity for achieving near-optimal solutions.

Contribution

It introduces the ADAL algorithm with explicit complexity bounds and a method to select stepsizes, advancing distributed convex optimization theory.

Findings

01

ADAL achieves $O(1/\epsilon)$ complexity for $\ ext{epsilon}$-optimal solutions.

02

Provides an explicit upper bound for the dual multiplier in distributed settings.

03

Demonstrates applicability to model predictive control problems with networked subsystems.

Abstract

In this paper we present complexity certification results for a distributed Augmented Lagrangian (AL) algorithm used to solve convex optimization problems involving globally coupled linear constraints. Our method relies on the Accelerated Distributed Augmented Lagrangian (ADAL) algorithm, which can handle the coupled linear constraints in a distributed manner based on local estimates of the AL. We show that the theoretical complexity of ADAL to reach an $ϵ$ -optimal solution both in terms of suboptimality and infeasibility is $O (\frac{1}{ϵ})$ iterations. Moreover, we provide a valid upper bound for the optimal dual multiplier which enables us to explicitly specify these complexity bounds. We also show how to choose the stepsize parameter to minimize the bounds on the convergence rates. Finally, we discuss a motivating example, a model predictive control (MPC) problem,…

Equations208

x_{i} min

x_{i} min

\sum_{i = 1}^{N} A_{i} x_{i} = b,

x_{i} \in X_{i}, i = 1, 2, \dots, N,

F (x) := \sum_{i = 1}^{N} f_{i} (x_{i}),

F (x) := \sum_{i = 1}^{N} f_{i} (x_{i}),

q = j = 1, \dots, m max q_{j} .

q = j = 1, \dots, m max q_{j} .

L (x, λ)

L (x, λ)

g (λ) = x \in X in f L (x, λ) = \sum_{i = 1}^{N} g_{i} (λ) - ⟨ b, λ ⟩,

g (λ) = x \in X in f L (x, λ) = \sum_{i = 1}^{N} g_{i} (λ) - ⟨ b, λ ⟩,

g_{i}(\boldsymbol{\lambda})=\inf_{{\bf x}_{i}\in\mathcal{X}_{i}}\Big{[}f_{i}({\bf x}_{i})+\langle\boldsymbol{\lambda},{\bf A}_{i}{\bf x}_{i}\rangle\Big{]}.

g_{i}(\boldsymbol{\lambda})=\inf_{{\bf x}_{i}\in\mathcal{X}_{i}}\Big{[}f_{i}({\bf x}_{i})+\langle\boldsymbol{\lambda},{\bf A}_{i}{\bf x}_{i}\rangle\Big{]}.

λ \in R^{m} max \sum_{i = 1}^{N} g_{i} (λ) - ⟨ b, λ ⟩ .

λ \in R^{m} max \sum_{i = 1}^{N} g_{i} (λ) - ⟨ b, λ ⟩ .

Λ_{ρ} (x, λ)

Λ_{ρ} (x, λ)

Λ_{ρ}^{i} (x_{i}, x_{- i}^{k}, λ) =

Λ_{ρ}^{i} (x_{i}, x_{- i}^{k}, λ) =

+ \frac{ρ}{2} ∥ A_{i} x_{i} + \sum_{j \in I}^{j \neq = i} A_{j} x_{j}^{k} - b ∥^{2},

∥ A_{i} x_{i} + \sum_{j \in I}^{j \neq = i} A_{j} x_{j}^{k} - b ∥^{2} =

∥ A_{i} x_{i} + \sum_{j \in I}^{j \neq = i} A_{j} x_{j}^{k} - b ∥^{2} =

\displaystyle\qquad\qquad=~{}\sum\nolimits_{l=1}^{m}\Big{(}\big{[}{\bf A}_{i}{\bf x}_{i}\big{]}_{l}+\sum\nolimits_{j\in\mathcal{I}}^{j\neq i}\big{[}{\bf A}_{j}{\bf x}_{j}^{k}\big{]}_{l}-b_{l}\Big{)}^{2}.

x_{i} \in X_{i} min

x_{i} \in X_{i} min

x_{i}^{k + 1} = x_{i}^{k} + τ (\hat{x}_{i}^{k} - x_{i}^{k}) .

x_{i}^{k + 1} = x_{i}^{k} + τ (\hat{x}_{i}^{k} - x_{i}^{k}) .

\boldsymbol{\lambda}^{k+1}=\boldsymbol{\lambda}^{k}+\rho\tau\Big{(}\sum\nolimits_{i=1}^{N}{\bf A}_{i}{\bf x}^{k+1}_{i}-{\bf b}\Big{)},

\boldsymbol{\lambda}^{k+1}=\boldsymbol{\lambda}^{k}+\rho\tau\Big{(}\sum\nolimits_{i=1}^{N}{\bf A}_{i}{\bf x}^{k+1}_{i}-{\bf b}\Big{)},

x_{i}^{t + 1} = \sum_{j \in C_{i}^{t}} (A_{ij}^{t} x_{j}^{t} + B_{ij}^{t} u_{j}^{t})

x_{i}^{t + 1} = \sum_{j \in C_{i}^{t}} (A_{ij}^{t} x_{j}^{t} + B_{ij}^{t} u_{j}^{t})

x_{i}^{t} \in X_{i}^{t}, u_{i}^{t} \in U_{i}^{t}, \forall i \in I,

x, u min

x, u min

x_{i}^{t + 1} = \sum_{j \in C_{i}^{t}} (A_{ij}^{t} x_{j}^{t} + B_{ij}^{t} u_{j}^{t}),

x_{i}^{t + 1} \in X_{i}^{t + 1}, u_{i}^{t} \in U_{i}^{t},

\forall i \in I and t \in {1, \dots, H - 1} .

Λ_{ρ}^{i} (x_{i}, u_{i}, λ) = t = 1 \sum H - 1 ℓ_{i}^{t} (x_{i}^{t}, u_{i}^{t}) + F_{i} (x_{i}^{H})

Λ_{ρ}^{i} (x_{i}, u_{i}, λ) = t = 1 \sum H - 1 ℓ_{i}^{t} (x_{i}^{t}, u_{i}^{t}) + F_{i} (x_{i}^{H})

\displaystyle+\sum_{t=1}^{H-1}\Bigg{[}(\boldsymbol{\lambda}_{i}^{t+1})^{T}{\bf x}_{i}^{t+1}-\sum_{j\in\tilde{\mathcal{C}}_{i}^{t}}(\boldsymbol{\lambda}_{j}^{t+1})^{T}\left({\bf A}_{ji}^{t}{\bf x}_{i}^{t}+\mathbf{B}_{ji}^{t}\mathbf{u}_{i}^{t}\right)

+ \frac{ρ}{2} ∥ x_{i}^{t + 1} - A_{ii}^{t} x_{i}^{t} - B_{ii}^{t} u_{i}^{t} - j \in C_{i}^{t} \ {i} \sum (A_{ij}^{t} \tilde{x}_{j}^{t} + B_{ij}^{t} \tilde{u}_{j}^{t}) ∥^{2}

+ j \in \tilde{C}_{i}^{t} \sum \frac{ρ}{2} ∥ \tilde{x}_{j}^{t + 1} - A_{j i}^{t} x_{i}^{t} - B_{j i}^{t} u_{i}^{t}

\displaystyle\qquad\qquad\qquad\qquad-\sum_{m\in\mathcal{C}_{j}^{t}\backslash\{i\}}\left({\bf A}_{jm}^{t}\tilde{\mathbf{x}}_{m}^{t}+\mathbf{B}_{jm}^{t}\tilde{\mathbf{u}}_{m}^{t}\right)\|^{2}\Bigg{]},

f (\tilde{x}) \geq f (x) + ⟨ s_{x}, \tilde{x} - x ⟩, \forall \tilde{x} \in X .

f (\tilde{x}) \geq f (x) + ⟨ s_{x}, \tilde{x} - x ⟩, \forall \tilde{x} \in X .

L (x^{*}, λ) \leq L (x^{*}, λ^{*}) \leq L (x, λ^{*}), \forall x \in X, λ \in R^{m} .

L (x^{*}, λ) \leq L (x^{*}, λ^{*}) \leq L (x, λ^{*}), \forall x \in X, λ \in R^{m} .

D_{X} := max_{x, \tilde{x} \in X} ∥ x - \tilde{x} ∥

D_{X} := max_{x, \tilde{x} \in X} ∥ x - \tilde{x} ∥

∥ s_{x} ∥ \leq G, \forall s_{x} \in \partial f_{i} (x), x \in X .

∥ s_{x} ∥ \leq G, \forall s_{x} \in \partial f_{i} (x), x \in X .

\tilde{x}^{k} := \frac{1}{k} \sum_{p = 0}^{k - 1} \hat{x}^{p} .

\tilde{x}^{k} := \frac{1}{k} \sum_{p = 0}^{k - 1} \hat{x}^{p} .

r (x) = \sum_{i} A_{i} x_{i} - b .

r (x) = \sum_{i} A_{i} x_{i} - b .

\overset{ˉ}{λ}^{k} := λ^{k} + ρ (1 - τ) r (x^{k}) .

\overset{ˉ}{λ}^{k} := λ^{k} + ρ (1 - τ) r (x^{k}) .

\overset{ˉ}{λ}^{k + 1} = \overset{ˉ}{λ}^{k} + τ ρ r (\hat{x}^{k}) .

\overset{ˉ}{λ}^{k + 1} = \overset{ˉ}{λ}^{k} + τ ρ r (\hat{x}^{k}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Stochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods

Full text

Complexity Certification of a Distributed Augmented Lagrangian Method

Soomin Lee*, Nikolaos Chatzipanagiotis, and Michael M. Zavlanos Soomin Lee is with the Dept. of Industrial and Systems Engineering, Georgia Tech, Atlanta, GA, 30318, USA, [email protected]. Nikolaos Chatzipanagiotis and Michael M. Zavlanos are with the Dept. of Mechanical Engineering and Materials Science, Duke University, Durham, NC, 27708, USA, {n.chatzip,michael.zavlanos}@duke.edu. This work is supported by NSF under grant CNS #1261828, and by ONR under grant #N000141410479.

Abstract

In this paper we present complexity certification results for a distributed Augmented Lagrangian (AL) algorithm used to solve convex optimization problems involving globally coupled linear constraints. Our method relies on the Accelerated Distributed Augmented Lagrangian (ADAL) algorithm, which can handle the coupled linear constraints in a distributed manner based on local estimates of the AL. We show that the theoretical complexity of ADAL to reach an $\epsilon$ -optimal solution both in terms of suboptimality and infeasibility is $O(\frac{1}{\epsilon})$ iterations. Moreover, we provide a valid upper bound for the optimal dual multiplier which enables us to explicitly specify these complexity bounds. We also show how to choose the stepsize parameter to minimize the bounds on the convergence rates. Finally, we discuss a motivating example, a model predictive control (MPC) problem, involving a finite number of subsystems which interact with each other via a general network.

Index Terms:

Augmented Lagrangian methods, computational complexity, distributed model predictive control.

I Introduction

Distributed optimization methods decompose large-scale problems into more manageable subproblems that can be efficiently solved in parallel. Moreover, distributed algorithms allow for better load balancing among the available computational resources (inexpensive devices or subsystems) and they also alleviate drawbacks of centralized systems, such as the cost, fragility, and privacy associated with centralized coordination. For this reason, they are widely used to solve large-scale problems arising in areas as diverse as optimal control, wireless communications, machine learning, computational biology, finance and statistics, to name a few.

Classic decomposition algorithms utilize the separable structure of the dual function. These methods have low computational cost, but they suffer from slow convergence due to the non-differentiability of the dual functions induced by the ordinary Lagrangian [1, Chapter 2.6]. Although this drawback can be avoided by using the Augmented Lagrangian (AL) framework [2, Chapter 2.1], AL based methods lose the decomposable structure of the ordinary Lagrangian, which makes distributed computation difficult. This calls for the development of specialized AL decomposition techniques.

Early specialized techniques that allow for decomposition of the AL can be traced back to the works [3, 4, 5]. More recent literature involves the Diagonal Quadratic Approximmation (DQA) algorithm [6, 7] and the Alternating Direction Method of Multipliers (ADMM) [8, 9, 10]. The DQA method replaces each minimization step in the augmented Lagrangian algorithm by a separable approximation of the AL function. The ADMM methods are based on the relations between splitting methods for monotone operators, such as Douglas-Rachford splitting, and the proximal point algorithm [11, 8]. Recently, the convergence rate of ADMM has been studied extensively; see e.g. [12] and references therein. Most of these results assume either smoothness, strong convexity, or strict convexity of the objective function. Although the results in [13, 14] do not require such properties, the convergence rates are given either in terms of the violation of optimality conditions [13] or the relative change in consecutive iterates [14].

The contributions of this paper are the following:

We revisit the general purpose AL method ADAL, first developed for convex optimization problems [15, 16] and later extended to non-convex problems [17] and problems with noise [18], which relies on local estimates of the AL to handle globally coupled linear constraints in a distributed manner. We provide computational complexity certifications for the ADAL method in terms of primal suboptimality and primal infeasibility. Specifically, we show that the number of iterations to reach an $\epsilon$ -optimal and $\epsilon$ -feasible solution is $O(\frac{1}{\epsilon})$ , under the assumption that the objective function is generally convex and not necessarily differentiable. This analysis can benefit many practical applications, such as model predictive control (MPC), one of the most successful control frameworks implemented on embedded systems. As the sampling times for embedded systems are very short, any iterative optimization algorithm implemented on such systems must be able to precondition the execution time by providing an explicit number of iterations needed to obtain a reasonably good solution in terms of suboptimality and infeasibility. For this reason, there has been a growing interest recently in enhancing MPC methods by providing the worst-case computational complexity [19, 20, 21, 22, 23].
Since the complexity bounds above depend on the optimal dual multiplier $\boldsymbol{\lambda}^{*}$ , we provide a valid upper bound for $\boldsymbol{\lambda}^{*}$ . Our bound holds for any general convex problems with Lipschitz gradients involving linear constraints. Tighter bounds for quadratic problems have been studied in [19, 20, 21].
We show how to select the algorithm parameter $\rho$ , which is the stepsize used in the dual gradient step. To the best of our knowledge, such parameter selection has been studied only when the objective function is quadratic or has special properties like strong convexity and smoothness [24, 25].

II Accelerated Distributed AL

This section describes the Accelerated Distributed Augmented Lagrangian (ADAL for short) method, a specialized Augmented Lagrangian (AL) decomposition technique which was proposed in [15], for solving optimization problems of the form:

[TABLE]

where ${\bf x}_{i}\in\mathbb{R}^{n_{i}}$ denotes the decision variables that belong to subsystem $i$ , and $f_{i}:\mathbb{R}^{n_{i}}\to\mathbb{R}$ is its local objective function. Problem (II) models situations where a set $\mathcal{I}=\{1,2,\dots,N\}$ of decision makers, henceforth referred to as agents, need to determine local decisions ${\bf x}_{i}\in\mathcal{X}_{i}$ that minimize the summation of the local functions $f_{i}({\bf x}_{i})$ , while respecting a set of affine coupling constraints $\sum_{i=1}^{N}{\bf A}_{i}{\bf x}_{i}={\bf b}$ . Here, we assume the functions $f_{i}:\mathbb{R}^{n_{i}}\rightarrow\mathbb{R}$ are convex (not necessarily differentiable) for all $i\in\mathcal{I}$ , the local sets $\mathcal{X}_{i}\subseteq\mathbb{R}^{n_{i}}$ for $i\in\mathcal{I}$ are convex, closed and bounded, ${\bf A}_{i}\in\mathbb{R}^{m\times n_{i}}$ , ${\bf b}\in\mathbb{R}^{m}$ , and $n=\sum_{i=1}^{N}n_{i}$ .

Furthermore, we let

[TABLE]

where $\mathbf{x}=[{\bf x}_{1}^{\top},\dots,{{\bf x}}_{N}^{\top}]^{\top}\in\mathbb{R}^{n}$ . Denoting ${\bf A}=[{\bf A}_{1}\dots\bf A_{N}]\in\mathbb{R}^{m\times n}$ , the constraint $\sum_{i=1}^{N}{{\bf A}}_{i}{{\bf x}}_{i}={{\bf b}}$ in problem (II) becomes ${\bf A}{\bf x}={\bf b}$ . Also, we define the maximum degree $q$ as a measure of sparsity of the matrix ${\bf A}$ , i.e., for each constraint $j=1,\ldots,m$ , we denote by $q_{j}$ the number of all $i\in\mathcal{I}$ such that $[{\bf A}_{i}]_{j}\neq\mathbf{0}$ , where $[{\bf A}_{i}]_{j}$ is the $j$ -th row of matrix ${\bf A}_{i}$ and $\mathbf{0}$ stands for a vector of all zeros. Then, $q$ is defined as:

[TABLE]

It will be shown below that $q$ plays a critical role in the convergence properties of the proposed method.

II-A Preliminaries: AL Framework

Associating Lagrange multipliers $\boldsymbol{\lambda}\in\mathbb{R}^{m}$ with the affine constraint ${\bf A}{\bf x}={\bf b}$ , the Lagrangian for (II) is defined as

[TABLE]

where $L_{i}({\bf x}_{i},\boldsymbol{\lambda})=f_{i}({\bf x}_{i})+\langle\boldsymbol{\lambda},{{\bf A}}_{i}{{\bf x}}_{i}\rangle$ , and $\langle\cdot,\cdot\rangle$ denotes inner product. Then, the dual function is defined as

[TABLE]

where $\mathcal{X}=\mathcal{X}_{1}\times\mathcal{X}_{2}\dots\times\mathcal{X}_{N}$ , and

[TABLE]

The dual function is decomposable with respect to ${\bf x}_{i}$ ’s and this gives rise to decomposition methods that address the dual problem [1, Chapter 2.6]

[TABLE]

Such dual methods suffer from well-documented disadvantages, the most notable one being their exceedingly slow convergence rates due to the nondifferentiability of the dual function (4). These drawbacks can be alleviated by the AL framework [2, Chapter 2.1]. The AL is obtained by adding a quadratic penalty term to the ordinary Lagrangian. The AL associated with problem (II) is

[TABLE]

where $\rho>0$ is a penalty parameter. We recall that the standard Augmented Lagrangian method is also referred to as the Method of Multipliers in the literature [2, Chapter 2.1]. A major drawback of the Augmented Lagrangian Method stems from the fact that (6) is not separable with respect to each ${\bf x}_{i}$ due to the additional quadratic penalty term.

II-B The ADAL Algorithm

The lack of decomposability of the AL calls for the development of specialized AL decomposition techniques. ADAL is a primal-dual iterative method utilizing a local AL function $\Lambda_{\rho}^{i}$ which is defined as:

[TABLE]

where ${\bf x}_{-i}^{k}=[{\bf x}_{1}^{k},\ldots,{\bf x}_{i-1}^{k},{\bf x}_{i+1}^{k},\ldots,{\bf x}_{N}^{k}]^{\top}$ . The ADAL method is summarized in Alg. 1. ADAL has two parameters: a positive penalty parameter $\rho$ and a stepsize parameter $\tau\in(0,1/q)$ . Each iteration of ADAL consists of three steps: i) every agent solves a local subproblem in a parallel fashion based on the local approximation of the AL in (7); ii) the agents update and communicate their primal variables to neighboring agents; and iii) they update their dual variables based on the values of the communicated primal variables.

We emphasize here that the quantities ${\bf A}_{j}{\bf x}_{j}^{k}$ , appearing in the penalty term of the local AL (7), correspond to the local primal variables of agent $j$ that are communicated to agent $i$ . With respect to agent $i$ , these are considered fixed parameters. The penalty term of each $\Lambda_{\rho}^{i}$ can be equivalently expressed as

[TABLE]

The above penalty term is present only in the minimization computation (8), in Alg. 1. Hence, for those $l$ such that $[{\bf A}_{i}]_{l}={\mathbf{0}}$ , the terms $\sum_{j\in\mathcal{I}}^{j\neq i}\big{[}{\bf A}_{j}{\bf x}_{j}^{k}\big{]}_{l}-b_{l}$ are just constant terms in the minimization step, and can be neglected. Here, $[{\bf A}_{i}]_{l}$ denotes the $l$ -th row of ${\bf A}_{i}$ and ${\bf 0}$ stands for a zero vector of proper dimension. This implies that agent $i$ needs access only to the decisions $\big{[}{\bf A}_{j}{\bf x}_{j}^{k}\big{]}_{l}$ from all agents $j\neq i$ that are present in the same constraints $l$ as $i$ . Moreover, regarding the term $\langle\boldsymbol{\lambda},{\bf A}_{i}{\bf x}_{i}\rangle$ in (7), we have that $\langle\boldsymbol{\lambda},{\bf A}_{i}{\bf x}_{i}\rangle~{}=~{}\sum_{j=1}^{m}\lambda_{j}[{\bf A}_{i}{\bf x}_{i}]_{j}$ . Hence, we see that, in order to compute (8), each agent $i$ needs access only to those $\lambda_{j}$ for which $[{\bf A}_{i}]_{j}\neq{\mathbf{0}}$ .

II-C A Motivating Example: Distributed Model Predictive Control (DMPC) with linear coupling constraints

Consider a discrete-time linear dynamical system expressed in terms of the dynamics of a set $\mathcal{I}=\{1,\dots,N\}$ of individual subsystems as

[TABLE]

where ${\bf x}_{i}^{t}\in\mathcal{X}_{i}^{t}\subseteq\mathbb{R}^{n_{i}}$ and $\mathbf{u}_{i}^{t}\in\mathcal{U}_{i}^{t}\subseteq\mathbb{R}^{p_{i}}$ represent a local state and input at time $t$ . We assume that the local constraint sets $\mathcal{X}_{i}^{t},~{}\mathcal{U}_{i}^{t}$ satisfy $\mathcal{X}^{t}=\mathcal{X}_{1}^{t}\times\cdots\times\mathcal{X}_{N}^{t}$ , $\mathcal{U}^{t}=\mathcal{U}_{1}^{t}\times\cdots\times\mathcal{U}_{N}^{t}$ , and $n=\sum_{i\in\mathcal{I}}n_{i}$ , $p=\sum_{i\in\mathcal{I}}p_{i}$ . The dynamic interconnections at time $t$ among the subsystems are modeled by a directed graph $\mathcal{G}^{t}=(\mathcal{I},\mathcal{E}^{t})$ . The set of edges $\mathcal{E}^{t}\subseteq\mathcal{I}\times\mathcal{I}$ contains a directed edge $(v_{i},v_{j})$ if the state or input of subsystem $i$ at time $t$ affects the dynamics of subsystem $j$ at time $t+1$ . More formally, $(v_{j},v_{i})\in\mathcal{E}^{t}$ if and only if ${\bf A}_{ij}^{t}\neq 0~{}\vee~{}\mathbf{B}_{ij}^{t}\neq 0$ , where the matrices ${\bf A}_{ij}^{t}\in\mathbb{R}^{n_{i}\times n_{j}}$ and $\mathbf{B}_{ij}^{t}\in\mathbb{R}^{n_{i}\times p_{j}}$ , define the dynamic coupling between subsystems $i$ and $j$ at time $t$ . We define the coupling in-neighborhood $\mathcal{C}_{i}^{t}$ (resp. out-neighborhood $\tilde{\mathcal{C}}_{i}^{t}$ ) of subsystem $i$ at time $t$ as the set of sybsystems $j$ whose dynamics at $t$ affect (resp. is affected by) the evolution of subsystem $i$ , i.e., $\mathcal{C}_{i}^{t}=\{j\in\mathcal{I}:(v_{j},v_{i})\in\mathcal{E}^{t}\}$ (resp. $\tilde{\mathcal{C}}_{i}^{t}=\{j\in\mathcal{I}:(v_{i},v_{j})\in\mathcal{E}^{t}\}$ ).

Determining optimal control sequences for (II-C) using MPC consists of solving online a finite horizon open-loop optimal control problem, subject to the aforementioned system dynamics and constraints that involve states and control inputs. Specifically, the MPC problem for the dynamical system (II-C) is parametric to the initial state ${\bf x}^{1}$ and can be formulated as

[TABLE]

where the functions $\ell^{t}_{i}({\bf x}_{i}^{t},\mathbf{u}_{i}^{t}):\mathbb{R}^{n_{i}}\times\mathbb{R}^{p_{i}}\to\mathbb{R}$ denote the running cost and the function $\mathcal{F}_{i}({\bf x}_{i}^{H}):\mathbb{R}^{n_{i}}\to\mathbb{R}$ denotes the terminal cost of subsystem $i$ .

To use the ADAL framework in Alg. 1 to solve (II-C), we introduce a local AL for each subsystem $i$ as

[TABLE]

where $\tilde{\mathbf{x}}_{j},\tilde{\mathbf{u}}_{j}$ denote the primal variables that are controlled by subsystem $j$ but communicated to subsystem $i$ for optimization of its local Lagrangian $\Lambda_{\rho}^{i}$ . With respect to subsystem $i$ , these are just considered as fixed parameters. That is, the local AL is created by taking all the terms involving ${\bf x}_{i}$ in the original AL and setting the remaining variables as fixed parameters, i.e., ${\bf x}_{j}$ as $\tilde{{\bf x}}_{j}$ for all $i\neq j$ .

Observe that the local AL (13) of each subsystem $i$ includes only locally available information. Regarding the dual variables, the necessary information includes $\boldsymbol{\lambda}_{i}^{t+1}$ and all $\boldsymbol{\lambda}_{j}^{t+1}$ for every $t\in\{1,\dots,H-1\}$ and $j\in\tilde{\mathcal{C}}_{i}^{t}$ , i.e., the dual variables corresponding to the dynamical constraints of $i$ and also those of the out-neighbors of subsystem $i$ in all coupling graphs $\mathcal{E}^{t}$ . Regarding the primal variables, the necessary information for the local AL of subsystem $i$ includes all $\tilde{\mathbf{x}}^{t}_{j},~{}\tilde{\mathbf{u}}^{t}_{j}$ for every $t\in\{1,\dots,H\}$ from the in-neighbors $j\in\mathcal{C}_{i}^{t}$ , the out-neighbors $j\in\tilde{\mathcal{C}}_{i}^{t}$ , and the in-neighbors of the out-neighbors of $i$ , namely $\{m\in\mathcal{I}:m\in\mathcal{C}_{j}^{t},~{}\forall j\in\tilde{\mathcal{C}}_{i}^{t}\}$ for all the coupling graphs $\mathcal{E}^{t}$ . In other words, each subsystem $i$ needs to be able to exchange messages with all subsystems $j$ that belong to its 2-hop communication neighborhood $\mathcal{I}_{i}=\bigcup_{t=1}^{H}\left(\mathcal{C}_{i}^{t}\cup\tilde{\mathcal{C}}_{i}^{t}\cup\{m\in\mathcal{I}:m\in\mathcal{C}_{j}^{t},~{}\forall j\in\tilde{\mathcal{C}}_{i}^{t}\}\right)$ .

In practice (II-C) is solved repeatedly, and after each solve, the first few inputs are applied to (II-C) and the horizon is shifted accordingly, providing a new initial condition for a subsequent solution of (II-C). In this framework, solving (II-C) until convergence is time consuming. Therefore, early termination is highly desired, while ensuring a good quality solution.

III Rate of Convergence

In this section we characterize the rate of convergence of the ADAL method. In what follows, we denote the subgradient of a convex function $f$ at a point ${\bf x}\in\mathcal{X}$ by $\mathbf{s}_{{\bf x}}$ , i.e., a vector $\mathbf{s}_{{\bf x}}\in\mathbb{R}^{n}$ is a subgradient of $f$ at ${\bf x}\in\mathcal{X}$ if

[TABLE]

We also denote the convex subdifferential of $f$ at ${\bf x}\in\mathcal{X}$ by $\partial f({\bf x})$ , which is the set of all subgradients $\mathbf{s}_{{\bf x}}$ .

The convergence of ADAL relies on the following three assumptions, which are typically required in the analysis of convex optimization methods:

(A1)

The functions $f_{i}$ are convex, and the sets $\mathcal{X}_{i}$ are convex, closed, and bounded for all $i\in\mathcal{I}$ .

(A2)

The Lagrangian function $L$ has a saddle point $({\bf x}^{*},\boldsymbol{\lambda}^{*})\in\mathbb{R}^{n}\times\mathbb{R}^{m}$ so that

[TABLE]

(A3)

All subproblems (8) are exactly solvable at every iteration.

Assumption (A1) implies that there exists a constant $D_{\mathcal{X}}$ such that

[TABLE]

and also Lipschitz subgradients, i.e., there exists a constant $G$ such that for all $i\in\mathcal{I}$

[TABLE]

Assumption (A2) implies that the point ${\bf x}^{*}$ is a solution of problem (II) and the point $\boldsymbol{\lambda}^{*}$ is a solution of (5). Since (II) is a convex program with linear constraints, strong duality holds, i.e., the optimal values of the primal and dual problems are equal, as long as (II) is feasible without the need of any constraint qualification. Assumption (A3) is satisfied for most MPC problems, see e.g., [19, Section V], or for general problems with simple constraint sets $\mathcal{X}$ , e.g., boxes or balls.

III-A Lemmas

In this subsection, we provide a few lemmas that will help us prove the convergence of ADAL. Our analysis relies on the ergodic average of the primal variables up to iteration $k$ :

[TABLE]

To avoid cluttering the notation, we will use $\sum_{i}$ to denote summation over all $i\in\mathcal{I}$ , i.e., $\sum_{i}=\sum_{i=1}^{N}$ , unless explicitly noted otherwise. We define the residual ${\bf r}({\bf x})\in\mathbb{R}^{m}$ as the vector containing the amount of all constraint violations with respect to primal variable ${\bf x}$ , i.e.,

[TABLE]

We also define the auxiliary dual variable $\bar{\boldsymbol{\lambda}}^{k}$ as

[TABLE]

In the next lemma, we obtain an iterative relation for $\bar{\boldsymbol{\lambda}}^{k}$ . The proof can be found in [15, Theorem 1].

Lemma 1

The dual update step (10) of ADAL is equivalent to the update rule

[TABLE]

In the next lemma, we utilize Lemma 1 and the first order optimality conditions for each local subproblem (8) to bound the function value at each iteration, which later will allow us to obtain a telescoping sum. For this, we make use of the Lyapunov/Merit function

[TABLE]

for all $k\geq 0$ and any arbitrary $\boldsymbol{\lambda}\in\mathbb{R}^{m}$ . A similar result whose Lyapunov/Merit function $\phi^{k}$ does not depend on $\boldsymbol{\lambda}$ can be found in [16]. Note that dependence of $\phi^{k}(\boldsymbol{\lambda})$ on $\boldsymbol{\lambda}$ is key to obtain the convergence rates presented in this paper.

Lemma 2

Assume (A1)–(A3). Then, for any $\boldsymbol{\lambda}\in\mathbb{R}^{m}$ and $k\geq 0$ , the following holds:

[TABLE]

The proof of this lemma can be found in Appendix -A.

III-B Primal Optimality and Feasibility

Using Lemma 2 and the properties of convex functions, we now provide two theorems regarding the convergence rate of ADAL. More specifically, in Theorem 1, we consider the objective value difference $F(\tilde{\bf x}^{k})-F({\bf x}^{*})$ and the constraint violation $\|{\bf A}\tilde{\bf x}^{k}-\bf b\|$ together and show that their sum decreases at a worst-case $O(1/k)$ rate. In Theorem 2, we upper bound the objective value difference and constraint violation separately, and show that each one of them decreases at a worst-case $O(1/k)$ rate.

Theorem 1

Assume (A1)–(A3). Recall that $\tilde{\bf x}^{k}=\frac{1}{k}\sum_{p=0}^{k-1}\hat{\mathbf{x}}^{p}$ denotes the ergodic average of the primal variable sequence generated by ADAL up to iteration $k$ and ${\bf r}({\bf x})={\bf A}{\bf x}-{\bf b}$ denotes the residual at ${\bf x}$ . Then, for all $k$

[TABLE]

where $\phi=\sum\nolimits_{i=1}^{N}{\rho}\|{\bf A}_{i}({\bf x}_{i}^{0}-{\bf x}_{i}^{*})\|^{2}+\frac{1}{\rho}(\|\bar{\boldsymbol{\lambda}}^{0}\|+1)^{2}$ .

Proof.

Summing the relation in Lemma 2 for all $p=0,\dots,k-1$ , we get

[TABLE]

By the convexity of $F$ , we have that

[TABLE]

which implies that $\sum\nolimits_{p=0}^{k-1}F(\hat{{\bf x}}^{p})~{}\geq~{}kF(\tilde{\bf x}^{k})$ . The analogous relation holds for $\sum_{p=0}^{k-1}{\bf r}(\hat{\mathbf{x}}^{p})\geq k{\bf r}(\tilde{\bf x}^{k})$ , since it is a linear (convex) mapping. We also have that $\sum_{p=0}^{k-1}F({\bf x}^{*})=kF({\bf x}^{*})$ . Hence, (III-B) can be expressed as

[TABLE]

or,

[TABLE]

because for any $\boldsymbol{\lambda}\in\mathbb{R}^{m}$ , we have $\phi^{k}(\boldsymbol{\lambda})\geq 0$ .

The above inequality is true for all $\boldsymbol{\lambda}\in\mathbb{R}^{m}$ , hence it must also hold for any point in the ball $\mathcal{B}=\{\boldsymbol{\lambda}\mid\|\boldsymbol{\lambda}\|\leq 1\}$ . We now let $\boldsymbol{\lambda}=\tilde{\boldsymbol{\lambda}}^{k}\triangleq\operatorname*{arg\,max}_{\boldsymbol{\lambda}\in\mathcal{B}}\langle\boldsymbol{\lambda},{\bf r}(\tilde{\bf x}^{k})\rangle$ and rewrite the above relation as

[TABLE]

where we used $\langle\tilde{\boldsymbol{\lambda}}^{k},{\bf r}(\tilde{\bf x}^{k})\rangle=\|{\bf r}(\tilde{\bf x}^{k})\|$ . Finally, the term on the right-hand side can be bounded as

[TABLE]

which gives the desired result. ∎

The importance of this bound is that the computation complexity can be specified in advance as long as the diameters of the primal constraint sets $\mathcal{X}_{i}$ can be determined. However, when the primal solution $\tilde{\bf x}^{k}$ is not feasible, it is possible that $F(\tilde{\bf x}^{k})-F^{*}<0$ . In this case, the bound in (19) can still be useful if the primal residual can be tightly bounded as pointed out in [26], i.e., if $\|\mathbf{A}\tilde{\bf x}^{k}-\mathbf{b}\|<\delta$ for a relatively small $\delta>0$ , then a lower bound of $F(\tilde{\bf x}^{k})-F^{*}$ is given by

[TABLE]

where $\boldsymbol{\lambda}^{*}$ is a component of the saddle point $(\mathbf{x}^{*},\boldsymbol{\lambda}^{*})$ of (3).

Theorem 2

Assume (A1)–(A3). Recall that $\tilde{\bf x}^{k}=\frac{1}{k}\sum_{p=0}^{k-1}\hat{\mathbf{x}}^{p}$ denotes the ergodic average of the primal variable sequence generated by ADAL up to iteration $k$ and ${\bf r}({\bf x})={\bf A}{\bf x}-{\bf b}$ denotes the residual at ${\bf x}$ . Let $({\bf x}^{*},\boldsymbol{\lambda}^{*})$ be a saddle point of (3). Then, for all $k$

(a)

[TABLE]

where $\phi^{0}(\boldsymbol{\lambda})=\sum\nolimits_{i=1}^{N}{\rho}\|{\bf A}_{i}({\bf x}_{i}^{0}-{\bf x}_{i}^{*})\|^{2}+\frac{1}{\rho}\|\bar{\boldsymbol{\lambda}}^{0}-\boldsymbol{\lambda}\|^{2}$ .

(b)

[TABLE]

Proof.

(a) The inequality (21) is true for all $\boldsymbol{\lambda}\in\mathbb{R}^{m}$ , hence letting $\boldsymbol{\lambda}=\mathbf{0}$ yields

[TABLE]

Let $\boldsymbol{\lambda}^{*}$ be a dual optimal solution. Then, from the saddle point inequality, we have

[TABLE]

which implies

[TABLE]

Next, we find an upper bound of the term $\langle\boldsymbol{\lambda}^{*},{\bf r}(\tilde{\bf x}^{k})\rangle$ . We add $\langle\boldsymbol{\lambda}^{*},{\bf r}(\tilde{\bf x}^{k})\rangle$ to both sides of (23) to obtain

[TABLE]

Using relation (21) again with $\boldsymbol{\lambda}=2\boldsymbol{\lambda}^{*}$ to bound the right-hand side of the above equation, we obtain

[TABLE]

Combining this with relation (24), we further obtain

[TABLE]

Combining this with relation (22), the desired result follows.

(b) We next bound the residual $\|{\bf r}(\tilde{\bf x}^{k})\|$ . Using relation (21) with $\boldsymbol{\lambda}=\boldsymbol{\lambda}^{*}+\frac{{\bf r}(\tilde{\bf x}^{k})}{\|{\bf r}(\tilde{\bf x}^{k})\|}$ , we have

[TABLE]

Using the saddle point inequality together with the fact that $({\bf x}^{*},\boldsymbol{\lambda}^{*})$ is a primal-dual optimal pair, we obtain

[TABLE]

which implies

[TABLE]

Combining this with relation (III-B), we obtain

[TABLE]

From the definition of the Lyapunov/Merit function $\phi^{k}(\boldsymbol{\lambda})$ in (18), the right-hand side can be represented as

[TABLE]

from which the desired result follows. ∎

Theorem 2 characterizes the suboptimality and infeasibility of the solution obtained when the algorithm is terminated before reaching the optimal solution. That is, the theoretical complexity for the algorithm to reach an $\epsilon$ -optimal solution both in terms of objective value and feasibility is $O(\frac{1}{\epsilon})$ iterations. This result is particularly useful for MPC applications where frequent re-optimization for different time horizons is often required in practice, as discussed in Section II-C. In order to explicitly specify the complexity in advance, however, these bounds require an estimation on the dual optimal solution $\boldsymbol{\lambda}^{*}$ .

IV Certification of Complexity

In this section, we provide a valid upper bound for $\boldsymbol{\lambda}^{*}$ , which is a corresponding dual multiplier for the optimal solution ${\bf x}^{*}$ of problem (II).

Theorem 3

Assume (A1)-(A3). Let $({\bf x}^{*},\boldsymbol{\lambda}^{*})$ be a primal-dual optimal pair of (II) and (5). Then,

[TABLE]

where $\tilde{\sigma}_{\min}({\bf A})$ is the smallest nonzero singular value of ${\bf A}$ .

Proof.

Define a value function $\mathcal{V}:\mathbb{R}^{m}\to\mathbb{R}$ as

[TABLE]

By Lagrangian duality, this can be equivalently represented as

[TABLE]

Let the function above attain its value at $\boldsymbol{\lambda}=\boldsymbol{\lambda}^{*}(\boldsymbol{\delta})$ . Then, $\boldsymbol{\lambda}^{*}(\boldsymbol{\delta})\in\partial\mathcal{V}(\boldsymbol{\delta})$ . To bound the dual multiplier $\boldsymbol{\lambda}^{*}=\boldsymbol{\lambda}^{*}(\mathbf{0})$ , therefore, it suffices to show that any vector in $\partial\mathcal{V}(\mathbf{0})$ is bounded. Let $\mathbf{s}\in\partial\mathcal{V}(\mathbf{0})$ . Then, from the convexity of $\mathcal{V}(\cdot)$ , we have that for any $\epsilon>0$

[TABLE]

Let ${\bf x}_{\epsilon}^{*}$ be defined such that

[TABLE]

Then, we have ${\bf A}({\bf x}^{*}-{\bf x}_{\epsilon}^{*})=-\epsilon\frac{\mathbf{s}}{\|\mathbf{s}\|}$ , from which we obtain

[TABLE]

where $\tilde{\sigma}_{\min}({\bf A})$ is the smallest nonzero singular value of ${\bf A}$ . From (27) and (15), we obtain

[TABLE]

In view of this relation and (26), and the fact that $\mathbf{s}$ represents any arbitrary vector in $\partial\mathcal{V}(\mathbf{0})$ , we obtain the desired result. ∎

Using the bound above, in the following two propositions, we provide an explicit number of iterations for the ADAL method to obtain an $\epsilon$ -optimal solution as well as a selection of the algorithm parameter $\rho$ . Since the bound on the right-hand side of Theorem 1 depends on ${\bf x}^{*}$ , we further upper bound this using relation (14) as

[TABLE]

where we set $\bar{\boldsymbol{\lambda}}^{0}=\mathbf{0}$ .

Proposition 1

Assume (A1)-(A3). Let $\bar{\boldsymbol{\lambda}}^{0}=\mathbf{0}$ . Then, the parameter $\rho^{*}$ minimizing the bound in (28) is

[TABLE]

Furthermore, the number of iterations required to decrease the bound (19) less than $\epsilon$ is

[TABLE]

Proof.

Note that the right-hand side of relation (28) is convex with respect to $\rho$ . Therefore, it is easy to see that the parameter $\rho$ which minimizes the right-hand side can be chosen as $\rho^{*}=\frac{1}{\sqrt{N}\sigma_{\max}({\bf A})D_{\mathcal{X}}}$ . By using this parameter for the bound in Theorem 1, we obtain

[TABLE]

from which the desired result follows. ∎

This result shows that the number of required iterations depends on the number of network agents, the diameter of the constraint set $\mathcal{X}$ , the maximum singular value of ${\bf A}$ , and the sparsity of the matrix ${\bf A}$ , which is encoded in the parameter $\tau\in(0,1/q)$ (cf. Eq. (2)).

Similarly, from relation (14) and Theorem 3, the right-hand side of Theorem 2(a), which is larger than that of Theorem 2(b), can be further upper bounded as

[TABLE]

where we set $\bar{\boldsymbol{\lambda}}^{0}=\mathbf{0}$ .

Proposition 2

Assume (A1)-(A3). Let $\bar{\boldsymbol{\lambda}}^{0}=\mathbf{0}$ . Then, the parameter $\rho^{*}$ minimizing the bound in (29) is

[TABLE]

Furthermore, the number of iterations required to obtain an $\epsilon$ -optimal and feasible solution is

[TABLE]

Proof.

Since the right-hand side of relation (29) is convex with respect to $\rho$ , it is easy to see that the parameter $\rho$ which minimizes the right-hand side can be chosen as $\rho^{*}=\frac{2G}{\tilde{\sigma}_{\min}({\bf A})\sigma_{\max}({\bf A})D_{\mathcal{X}}}$ . By using this parameter for the bound in Theorem 2(a), we obtain

[TABLE]

from which the desired result follows. ∎

As expected, $k_{\epsilon,2}\geq k_{\epsilon,1}$ since the conditions imposed by Theorem 2 are more strict. More specifically, due to the dependence of the bounds on the optimal dual multiplier $\boldsymbol{\lambda}^{*}$ , $k_{\epsilon,2}$ also depends on the Lipschitz constant $G$ .

V Conclusions

In this paper, we presented an Augmented Lagrangian decomposition method (ADAL) and characterized its computational complexity. We showed that the algorithm generates an $\epsilon$ -optimal and feasible solution using the ergodic average of the sequence of primal variables under some mild assumptions such as the general convexity of the problems. We also provided an explicit upper bound on the optimal dual multiplier, from which the number of iterations can be explicitly given for any general convex problems involving linear constraints. The results in this paper have the potential to significantly improve the performance of distributed MPC problems, where preconditioning of computational complexity is important.

-A Proof of Lemma 2

Let $\mathbf{s}_{i}^{k}$ be a subgradient of $f_{i}$ at $\hat{{\bf x}}_{i}^{k}$ , i.e., $\mathbf{s}_{i}^{k}\in\partial f_{i}(\hat{{\bf x}}_{i}^{k})$ . Then, the first order optimality conditions [1, Proposition 4.7.1] for each local problem (8) imply that for any ${\bf x}_{i}\in\mathcal{X}_{i}$

[TABLE]

By letting ${\bf x}_{i}={\bf x}_{i}^{*}$ and substituting $\boldsymbol{\lambda}^{k}$ with $\hat{\boldsymbol{\lambda}}^{k}:=\boldsymbol{\lambda}^{k}+\rho{\bf r}(\hat{{\bf x}}^{k})$ in the above, we get

[TABLE]

By the definition of $\mathbf{s}_{i}^{k}$ , we have the relation

[TABLE]

Substituting this into (30), we get

[TABLE]

Summing over all $i$ , we get

[TABLE]

Substituting $\sum_{i}{\bf A}_{i}\big{(}{\bf x}_{i}^{*}-\hat{{\bf x}}_{i}^{k}\big{)}={\bf b}-\sum_{i}{\bf A}_{i}\hat{{\bf x}}_{i}^{k}=-{\bf r}(\hat{\mathbf{x}}^{k})$ , adding and subtracting $\langle\boldsymbol{\lambda},{\bf r}(\hat{\mathbf{x}}^{k})\rangle$ , and rearranging terms in the above inequality we get

[TABLE]

To avoid cluttering the notation, we temporarily disregard the term $F(\hat{{\bf x}}^{k})-F({\bf x}^{*})+\langle\boldsymbol{\lambda},{\bf r}(\hat{\mathbf{x}}^{k})\rangle$ , i.e., we consider only the terms

[TABLE]

Add the term ${\rho}\sum_{i}\big{\langle}{\bf A}_{i}(\hat{\mathbf{x}}_{i}^{k}-{\bf x}_{i}^{*}),{\bf A}_{i}({\bf x}_{i}^{k}-\hat{\mathbf{x}}_{i}^{k})\big{\rangle}$ to both sides of the above inequality, and group the terms on the right-hand side by their common factor to get

[TABLE]

where the last equality is from $\sum_{j}{\bf A}_{j}({\bf x}_{j}^{k}-\hat{{\bf x}}_{j}^{k})={\bf r}({\bf x}^{k})-{\bf r}(\hat{{\bf x}}^{k})$ . Next, we represent

[TABLE]

in the left-hand side of (-A) to obtain

[TABLE]

Adding $-(1-\tau)\rho\big{\langle}{\bf r}({\bf x}^{k}),{\bf r}(\hat{{\bf x}}^{k})\big{\rangle}$ to both sides of the above inequality and recalling the definition of $\bar{\boldsymbol{\lambda}}^{k}$ in (17), we get

[TABLE]

Considering only the last two terms on the right hand side of (-A), we can write

[TABLE]

We now consider the last term of the above equality. Each one of the summands in this term is bounded below by

[TABLE]

Note, however, that some of the rows of ${\bf A}_{i}$ might be zero. If $[{\bf A}_{i}]_{j}=\mathbf{0}$ , then it follows that $\big{[}{\bf r}(\hat{\mathbf{x}}^{k})\big{]}_{j}\big{[}{\bf A}_{i}({\bf x}_{i}^{k}-\hat{{\bf x}}_{i}^{k})\big{]}_{j}=0$ . Hence, denoting the set of nonzero rows of ${\bf A}_{i}$ as $\mathcal{Q}_{i}$ , i.e., $\mathcal{Q}_{i}=\{j=1,\dots,m:[{\bf A}_{i}]_{j}\neq\mathbf{0}\}$ , we can obtain a tighter lower bound for each $\tau\rho\Big{\langle}{\bf r}(\hat{\mathbf{x}}^{k}),{\bf A}_{i}({\bf x}_{i}^{k}-\hat{{\bf x}}_{i}^{k})\Big{\rangle}$ term as

[TABLE]

Recalling that $q$ denotes the maximum number of non-zero blocks $[{\bf A}_{i}]_{j}$ over all $j$ , and summing inequality (-A) over all $i$ , we observe that each quantity $[{\bf r}(\hat{\mathbf{x}}^{k})]_{j}^{2}$ is included in the summation at most $q$ times. This leads us to the bound

[TABLE]

Substituting (-A)-(-A) back into (-A), we arrive at

[TABLE]

Recall that until now we have disregarded the term $F(\hat{{\bf x}}^{k})-F({\bf x}^{*})+\langle\boldsymbol{\lambda},{\bf r}(\hat{\mathbf{x}}^{k})\rangle$ . Reinstating this term in (-A), we get

[TABLE]

We now represent the right-hand side of the desired result using the definition of $\bar{\boldsymbol{\lambda}}^{k}$ in (17) and Lemma 1. For all $k$ , we have:

[TABLE]

Rearranging terms in the above equation, we get that

[TABLE]

where the last inequality follows from $\tau\in(0,\frac{1}{q})$ . Recall that $\tau$ is the stepsize parameter used in the second step of ADAL (cf. Eq. (9)). Therefore, combining this with (37), we arrive at the desired result.

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. Bertsekas, A. Nedic, and A. Ozdaglar, “Convex analysis and optimization,” in Series in Computational Mathematics . Athena Scientific, 2003.
2[2] D. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods (Optimization and neural computation series) . Athena Scientific, 1996.
3[3] P. Tatjewski, “New dual-type decomposition algorithm for nonconvex separable optimization problems,” Automatica , vol. 25, no. 2, pp. 233–242, 1989.
4[4] N. Watanabe, Y. Nishimura, and M. Matsubara, “Decomposition in large system optimization using the method of multipliers,” Journal of Optimiz. Theory and Applic. , vol. 25, no. 2, pp. 181–193, 1978.
5[5] G. Chen and M. Teboulle, “A proximal-based decomposition method for convex minimization problems,” Mathematical Programming , vol. 64, pp. 81–101, 1994.
6[6] J. Mulvey and A. Ruszczyński, “A diagonal quadratic approximation method for large scale linear programs,” Operations Research Letters , vol. 12, pp. 205–215, 1992.
7[7] A. Ruszczyński, “On convergence of an Augmented Lagrangian decomposition method for sparse convex optimization,” Mathematics of Operations Research , vol. 20, pp. 634–656, 1995.
8[8] J. Eckstein and D. P. Bertsekas, “On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators,” Mathematical Programming, , vol. 55, pp. 293–318, 1992.