Cluster-based Distributed Augmented Lagrangian Algorithm for a Class of   Constrained Convex Optimization Problems

Hossein Moradian; Solmaz S. Kia

arXiv:1908.06634·cs.MA·April 6, 2021

Cluster-based Distributed Augmented Lagrangian Algorithm for a Class of Constrained Convex Optimization Problems

Hossein Moradian, Solmaz S. Kia

PDF

TL;DR

This paper introduces a novel distributed continuous-time algorithm for constrained convex optimization over clustered networks, significantly reducing communication and computation costs while ensuring convergence under various convexity conditions.

Contribution

It presents a new cluster-based distributed augmented Lagrangian algorithm with proven convergence properties and explicit bounds for inequality constraint handling.

Findings

01

Converges asymptotically for convex costs.

02

Achieves exponential convergence for strongly convex costs.

03

Demonstrated effectiveness through a numerical example.

Abstract

We propose a distributed solution for a constrained convex optimization problem over a network of clustered agents each consisted of a set of subagents. The communication range of the clustered agents is such that they can form a connected undirected graph topology. The total cost in this optimization problem is the sum of the local convex costs of the subagents of each cluster. We seek a minimizer of this cost subject to a set of affine equality constraints, and a set of affine inequality constraints specifying the bounds on the decision variables if such bounds exist. We design our distributed algorithm in a cluster-based framework which results in a significant reduction in communication and computation costs. Our proposed distributed solution is a novel continuous-time algorithm that is linked to the augmented Lagrangian approach. It converges asymptotically when the local cost…

Tables2

Table 1. Table 1: The values of actual μ b o u n d subscript 𝜇 𝑏 𝑜 𝑢 𝑛 𝑑 \mu_{bound} and the bound in (17)

case:	1	2	3	4	5
$μ_{m a x}$	2.33	2.68	1.95	2.38	1.95
$μ_{b o u n d}$ in (17)	13.34	17.91	11.6	52.1	18.48

Table 2. Table 2: The global cost value and the inequality constraint evaluation at 𝘅 p ⋆ superscript subscript 𝘅 𝑝 ⋆ \boldsymbol{\mathbf{\mathsf{x}}}_{p}^{\star} obtained by using ϵ italic-ϵ \epsilon -exact penalty function method

	$x^{1} - x^{2} - 5$	$x^{2} - x^{3} - 5$	$x^{3} - x^{4} - 5$	$x^{4} - x^{5} - 5$	$f (𝐱_{p}^{⋆})$
$ϵ = 0.01$	-5.8e-3	-2.06e-2	-2.63e-2	2.1e-4	680.4
$ϵ = 0.001$	-5.92-3	-3.46e-2	-3.75e-2	-3.5e-2	680.4
$ϵ = 0.01$ and adjusted bounds	-1.25e-2	-1.3e-2	-3.92e-2	-8.2e-3	680.23

Equations134

x^{⋆}

x^{⋆}

[w^{1}]_{j} x^{1} + \dots + [w^{N}]_{j} x^{N} - b_{j} = 0, j \in {1, \dots, p},

\underline{x}_{l}^{i} \leq x_{l}^{i}, l \in \underline{B}^{i} \subseteq {1, \dots, n^{i}}, i \in V,

x_{l}^{i} \leq \overset{ˉ}{x}_{l}^{i}, l \in \overset{ˉ}{B}^{i} \subseteq {1, \dots, n^{i}}, i \in V,

W = [w^{1}, \dots, w^{N}] \in^{p \times m}

W = [w^{1}, \dots, w^{N}] \in^{p \times m}

X_{fe} = {x \in^{m} ∣ \eqref e q :: p r o b_{d} e f - e q u a l, \eqref e q :: p r o b_{d} e f - b o x 1, \eqref e q :: p r o b_{d} e f - b o x 2 hold}

X_{fe} = {x \in^{m} ∣ \eqref e q :: p r o b_{d} e f - e q u a l, \eqref e q :: p r o b_{d} e f - b o x 1, \eqref e q :: p r o b_{d} e f - b o x 2 hold}

x^{⋆}

x^{⋆}

[w^{1}]_{k} x^{1} + \dots + [w^{N}]_{k} x^{N} = b_{k}, k \in Z_{1}^{p},

\nabla f^{i} (x^{i ⋆}) + w^{i ⊤} ν^{⋆} = 0,

\nabla f^{i} (x^{i ⋆}) + w^{i ⊤} ν^{⋆} = 0,

[w^{1}]_{k} x^{1 ⋆} + \dots + [w^{N}]_{k} x^{N ⋆} = b_{k}, k \in Z_{1}^{p} .

\overset{ν}{˙}_{k}

\overset{ν}{˙}_{k}

\dot{x}^{i}

ρ w^{i ⊤} (w^{1} x^{1} + \dots + w^{N} x^{N} - b),

\overset{y}{˙}_{k}^{l} =

\overset{y}{˙}_{k}^{l} =

\overset{v}{˙}_{k}^{l} =

\dot{x}^{i} =

+ ρ^{i}

\sum_{l \in V_{k}} \overset{y}{˙}_{k}^{l} = 0 ⟹ \sum_{l \in V_{k}} y_{k}^{l} (t) = \sum_{l \in V_{k}} y_{k}^{l} (0),

\sum_{l \in V_{k}} \overset{y}{˙}_{k}^{l} = 0 ⟹ \sum_{l \in V_{k}} y_{k}^{l} (t) = \sum_{l \in V_{k}} y_{k}^{l} (0),

\sum_{l \in V_{k}} \overset{v}{˙}_{k}^{l} = [w^{1}]_{k} x^{1} + \dots + [w^{N}]_{k} x^{N} - b_{k},

\displaystyle\!\!\!\mathcal{S}_{e}\!=\!\Big{\{}

\displaystyle\!\!\!\mathcal{S}_{e}\!=\!\Big{\{}

\displaystyle\!\!\!\!\!\!\!\!\!\!\!\!\!\!\prod_{i=1}^{N}{}^{n^{i}}\Big{|}\boldsymbol{\mathbf{v}}_{k}=\theta_{k}\boldsymbol{\mathbf{1}}_{N_{k}},\theta_{k}\in,\,\nabla f^{i}(\boldsymbol{\mathbf{x}}^{i})\!+\!\!\!\sum_{j\in\mathcal{T}^{i}}[\boldsymbol{\mathbf{\mathsf{w}}}^{i}]^{\top}_{j}\theta_{j}\!=\!\boldsymbol{\mathbf{0}},

\sum_{j = 1}^{N} [w^{j}]_{k} x^{j} = b_{k} + \sum_{j \in V_{k}} y_{k}^{j}, y_{k}^{l} = [w^{l}]_{k} x^{l} - \overset{ˉ}{b}_{k}^{l},

\displaystyle~{}~{}\,\,\quad\qquad\qquad\qquad\quad~{}i\in\mathcal{V},l\in\mathcal{V}_{k},k\in\mathbb{Z}_{1}^{p}\Big{\}}.

r_{k}^{⊤} R_{k} = 0, R_{k}^{⊤} R_{k} = I_{N_{k} - 1}, R_{k} R_{k}^{⊤} = Π_{N_{k}},

r_{k}^{⊤} R_{k} = 0, R_{k}^{⊤} R_{k} = I_{N_{k} - 1}, R_{k} R_{k}^{⊤} = Π_{N_{k}},

[r_{k} R_{k}]^{⊤} L_{k} [r_{k} R_{k}] = Diag ([0, λ_{2 k}, \dots, λ_{N_{k} k}]) .

x_{p}^{⋆} =

x_{p}^{⋆} =

[w^{1}]_{j} x^{1} + \dots + [w^{N}]_{j} x^{N} = b_{j}, j \in Z_{1}^{p},

\displaystyle\!\!\!f^{i}_{\text{p}}(\boldsymbol{\mathbf{x}}^{i})\!=\!\!f^{i}(\boldsymbol{\mathbf{x}}^{i})\!+\!\gamma\big{(}\!\sum_{l\in\underline{\mathcal{B}}^{i}}\!p_{\epsilon}(\underline{\mathsf{x}\mkern-2.0mu}\mkern 2.0mu^{i}_{l}\!-\!x^{i}_{l})\!+\!\!\sum_{l\in\bar{\mathcal{B}}^{i}}\!p_{\epsilon}(x^{i}_{l}\!-\!\bar{\mathsf{x}}^{i}_{l})\big{)},

\displaystyle\!\!\!f^{i}_{\text{p}}(\boldsymbol{\mathbf{x}}^{i})\!=\!\!f^{i}(\boldsymbol{\mathbf{x}}^{i})\!+\!\gamma\big{(}\!\sum_{l\in\underline{\mathcal{B}}^{i}}\!p_{\epsilon}(\underline{\mathsf{x}\mkern-2.0mu}\mkern 2.0mu^{i}_{l}\!-\!x^{i}_{l})\!+\!\!\sum_{l\in\bar{\mathcal{B}}^{i}}\!p_{\epsilon}(x^{i}_{l}\!-\!\bar{\mathsf{x}}^{i}_{l})\big{)},

\nabla f^{i} (x^{i ⋆}) + w^{i ⊤} ν^{⋆} - \underline{μ}^{i ⋆} + \overset{μ}{ˉ}^{i ⋆} = 0,

\nabla f^{i} (x^{i ⋆}) + w^{i ⊤} ν^{⋆} - \underline{μ}^{i ⋆} + \overset{μ}{ˉ}^{i ⋆} = 0,

W x^{⋆} - b = 0,

\underline{μ}_{l}^{i ⋆} (\underline{x}_{l}^{i} - x_{l}^{i ⋆}) = 0, \underline{x}_{l}^{i} - x_{l}^{i ⋆} \leq 0, \underline{μ}_{l}^{i ⋆} \geq 0, l \in \underline{B}^{i},

\overset{μ}{ˉ}_{l}^{i ⋆} (x_{l}^{i ⋆} - \overset{ˉ}{x}_{l}^{i}) = 0, x_{l}^{i ⋆} - \overset{ˉ}{x}_{l}^{i} \leq 0, \overset{μ}{ˉ}_{l}^{i ⋆} \geq 0, l \in \overset{ˉ}{B}^{i},

\displaystyle X^{\epsilon}_{\text{fe}}=\big{\{}\boldsymbol{\mathbf{x}}\in^{m}\,|\,

\displaystyle X^{\epsilon}_{\text{fe}}=\big{\{}\boldsymbol{\mathbf{x}}\in^{m}\,|\,

\displaystyle~{}\qquad{x}^{i}_{j}\!-\!\bar{\mathsf{x}}^{i}_{j}\leq\!\epsilon,~{}j\!\in\!\bar{\mathcal{B}}^{i},~{}i\in\mathcal{V}\big{\}}.

x_{p}^{⋆} \in X_{fe}^{ϵ}, 0 \leq f^{⋆} - f (x_{p}^{⋆}) \leq ϵ γ N,

x_{p}^{⋆} \in X_{fe}^{ϵ}, 0 \leq f^{⋆} - f (x_{p}^{⋆}) \leq ϵ γ N,

\displaystyle\max\!\big{\{}\!\max\{\underline{\mu\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}\}_{l\in\underline{\mathcal{B}}^{i}},\max\{\bar{\mu}_{l}^{i\star}\}_{l\in\bar{\mathcal{B}}^{i}}\big{\}}_{i=1}^{N}\leq{\mu}_{\text{bound}},

\displaystyle\max\!\big{\{}\!\max\{\underline{\mu\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}\}_{l\in\underline{\mathcal{B}}^{i}},\max\{\bar{\mu}_{l}^{i\star}\}_{l\in\bar{\mathcal{B}}^{i}}\big{\}}_{i=1}^{N}\leq{\mu}_{\text{bound}},

\displaystyle\max\big{\{}\max\{\underline{\mu\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}\}_{l\in\underline{\mathcal{B}}^{i}},\max\{\bar{\mu}_{l}^{i\star}\}_{l\in\bar{\mathcal{B}}^{i}}\big{\}}_{i=1}^{N}=

\displaystyle\max\big{\{}\max\{\underline{\mu\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}\}_{l\in\underline{\mathcal{B}}^{i}},\max\{\bar{\mu}_{l}^{i\star}\}_{l\in\bar{\mathcal{B}}^{i}}\big{\}}_{i=1}^{N}=

\displaystyle\quad\quad\quad\max\big{\{}\max\{\underline{\mu\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}\}_{l\in\underline{\mathcal{A}}^{i}},\max\{\bar{\mu}_{l}^{i\star}\}_{l\in\bar{\mathcal{A}^{i}}}\big{\}}_{i=1}^{N}.

\displaystyle{\mu}_{\text{bound}}\leq\!(1+\frac{\bar{\mathsf{w}}}{\underline{\mathsf{w}}})\max\big{\{}\underset{\boldsymbol{\mathbf{x}}^{i}\in X^{i}_{\text{ineq}}}{\max}\,{\|\nabla f^{i}(\boldsymbol{\mathbf{x}}^{i})\|_{\infty}}\big{\}}_{i=1}^{N},

\displaystyle{\mu}_{\text{bound}}\leq\!(1+\frac{\bar{\mathsf{w}}}{\underline{\mathsf{w}}})\max\big{\{}\underset{\boldsymbol{\mathbf{x}}^{i}\in X^{i}_{\text{ineq}}}{\max}\,{\|\nabla f^{i}(\boldsymbol{\mathbf{x}}^{i})\|_{\infty}}\big{\}}_{i=1}^{N},

\nabla f_{l}^{i} (x_{l}^{i ⋆}) + w_{l}^{i} ν^{⋆} = 0, l \in Z_{1}^{n^{i}} \ {\overset{ˉ}{A}^{i} \cup \underline{A}^{i}},

\nabla f_{l}^{i} (x_{l}^{i ⋆}) + w_{l}^{i} ν^{⋆} = 0, l \in Z_{1}^{n^{i}} \ {\overset{ˉ}{A}^{i} \cup \underline{A}^{i}},

\nabla f_{l}^{i} (x_{l}^{i ⋆}) + w_{l}^{i} ν^{⋆} + \overset{μ}{ˉ}_{l}^{i ⋆} = 0, l \in \overset{ˉ}{A}^{i},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Cluster-based Distributed Augmented Lagrangian Algorithm for a Class of Constrained Convex Optimization Problems

Hossein Moradian [email protected]

Solmaz S. Kia [email protected] Department of Mechanical and Aerospace Engineering, University of California, Irvine

Abstract

We propose a distributed solution for a constrained convex optimization problem over a network of clustered agents each consisted of a set of subagents. The communication range of the clustered agents is such that they can form a connected undirected graph topology. The total cost in this optimization problem is the sum of the local convex costs of the subagents of each cluster. We seek a minimizer of this cost subject to a set of affine equality constraints, and a set of affine inequality constraints specifying the bounds on the decision variables if such bounds exist. We design our distributed algorithm in a cluster-based framework which results in a significant reduction in communication and computation costs. Our proposed distributed solution is a novel continuous-time algorithm that is linked to the augmented Lagrangian approach. It converges asymptotically when the local cost functions are convex and exponentially when they are strongly convex and have Lipschitz gradients. Moreover, we use an $\epsilon$ -exact penalty function to address the inequality constraints and derive an explicit lower bound on the penalty function weight to guarantee convergence to $\epsilon$ -neighborhood of the global minimum value of the cost. A numerical example demonstrates our results.

keywords:

distributed constrained convex optimization, augmented Lagrangian, primal-dual solutions, optimal resource allocation, penalty function methods

\AtAppendix\AtAppendix

††thanks: Corresponding author: H. Moradian††thanks: This work was supported by NSF, United States of America, CAREER award ECCS-1653838. A preliminary version of this paper is presented in [1].

1 Introduction

We consider a group of $N$ clustered agents $\mathcal{V}=\{1,\cdots,N\}$ with communication and computation capabilities, whose communication range is such that they can form a connected undirected graph topology, see Fig. 1. These agents aim to solve, in a distributed manner, the optimization problem

[TABLE]

where $f^{i}(\boldsymbol{\mathbf{x}}^{i})=\sum\nolimits_{l=1}^{n^{i}}f_{l}^{i}(x_{l}^{i}).$ In this setting, each agent $i\in\mathcal{V}$ is a cluster of local ‘subagents’ $l\in\{1,\dots,n^{i}\}$ whose decision variable is $\boldsymbol{\mathbf{x}}^{i}=[x^{i}_{1},\cdots,{x}^{i}_{n^{i}}]^{\top}\in^{n^{i}}$ . The weighting factor matrix $\boldsymbol{\mathbf{\mathsf{w}}}^{i}\in^{p\times n^{i}}$ of each agent $i\in\mathcal{V}$ is only known to the agent $i$ itself. Moreover, $\underline{\mathsf{x}\mkern-2.0mu}\mkern 2.0mu^{i}_{l},\bar{\mathsf{x}}^{i}_{l}\in$ , with $\underline{\mathsf{x}\mkern-2.0mu}\mkern 2.0mu^{i}_{l}<\bar{\mathsf{x}}^{i}_{l}$ , are respectively the lower and upper bounds on the $l^{\text{th}}$ decision variable of agent $i\in\mathcal{V}$ , if such a bound exists. In a distributed solution, each agent $i\in\mathcal{V}$ should obtain its respective component of $\boldsymbol{\mathbf{\mathsf{x}}}^{\star}=[{\boldsymbol{\mathbf{\mathsf{x}}}^{1\star\top}},\cdots,{\boldsymbol{\mathbf{\mathsf{x}}}^{N\star\top}}]^{\top}$ by interacting only with the agents that are in its communication range. Problem (1) explicitly or implicitly, captures various in-network optimization problems. One example is the optimal in-network resource allocation, which appears in many optimal decision making tasks such as economic dispatch over power networks [2, 3], optimal routing [4, 5] and network resource allocation for wireless systems [6, 7]. In such problems, a group of agents with limited resources, e.g., a group of generators in a power network, add up their local resources to meet a demand in a way that the overall cost is optimum for the entire network. Another family of problems that can be modeled as (1) is the in-network model predictive control over a finite horizon for a group of agents with linear dynamics [8, 9].

In recent years, there has been a surge in the design of distributed algorithms for large-scale in-network optimization problems. The major developments have been in the unconstrained convex optimization setting where the global cost is the sum of local costs of the agents (see e.g. [10, 11]for algorithms in discrete-time, and [12, 13, 14] for algorithms in continuous-time). In-network constrained convex optimization problems have also been studied in the literature. For example, in the context of the power generator economic dispatch problem, [15, 16, 17] offer distributed solutions that solve a special case of (1) with local quadratic costs subject to bounded decision variables and a single demand equation, $p=1$ and $\mathsf{w}^{i}=1$ for $i\in\mathcal{V}$ . Distributed algorithm design for special cases of (1) with non-quadratic costs are presented in [18, 19, 8] in discrete-time form, and [20, 21, 22, 23, 24] in continuous-time form. Except for [19], all these algorithms consider the case that the local decision variable of each agent $i\in\mathcal{V}$ is a scalar. Moreover, with the exception of [21, 19, 8], these algorithms only solve (1) when the equality constraint is the unweighted sum of local decision variables, i.e., $p=1$ and $\mathsf{w}^{i}=1$ for $i\in\mathcal{V}$ . Also, only [23] and [24] consider local inequality constraints, which are in the form of local box inequality constraints on all the decision variables of the problem. Lastly, the algorithms in [18, 23, 24] require the agents to communicate the gradient of their local cost functions to their neighbors. Such a requirement can be of concern for privacy-sensitive applications.

In this paper, we propose a novel distributed algorithm to solve the optimization problem (1). We start by considering the case that $\underline{\mathcal{B}}^{i}=\bar{\mathcal{B}}^{i}=\{\}$ for $i\in\mathcal{V}$ , i.e., when there is no inequality constraint. For this problem, we propose a continuous-time distributed primal-dual algorithm. To induce robustness and also to yield convergence without strict convexity of the local cost functions, we adapt an augmented Lagrangian framework [25]. The augmented Lagrangian method has been used in [26], [27], and [19] to improve the transient response of the distributed algorithms for, respectively, an unconstrained convex optimization, an online optimization, and a discrete-time constrained optimization problems. Different than the customary practice of using a common augmented Lagrangian penalty parameter as in [19, 27, 26], in our design to reduce the coordination overhead among the agents we allow each agent to choose its own penalty parameter locally. The structure of our distributed solution is inspired by the primal-dual centralized solution of [28] (see (6)), where the coupling in the differential solver is in the dual state dynamics. In decentralized primal-dual algorithms, e.g. [29, 22, 30], the adopted practice is to give every agent a copy of the dual variables and use a consensus mechanism to make the agents arrive eventually at the same dual variable. We follow the same approach but in our design, we pay particular attention to computation and communication resource management by adopting a cluster-based approach. First, we consider the sparsity in the equality constraints and give only a copy of a dual variable to an agent if a decision variable of that agent is involved in the equality constraint corresponding to that dual variable. Then, only the cluster of the agents that have a copy of the dual variable need to form a connected graph and use a consensus mechanism to arrive at agreement on their dual variable, see Fig. 1. Next, in our design, we only assign a single copy of the dual variable to an agent $i$ regardless of how many subagents it has. We note that if we use the algorithms in [18, 19, 8, 20, 21, 22, 23, 24] to solve problems where $\boldsymbol{\mathbf{x}}^{i}\in^{n^{i}}$ of an agent $i\in\mathcal{V}$ is a vector ( $n^{i}>1$ ), we need to treat each component of the $i$ as an agent and assign a copy of a dual variable to it. Such a treatment increases the local storage, computation and communication costs of agent $i$ . Our convergence analysis, is based on the Lyapunov and the LaSalle invariant set methods, and also the semistability analysis [31] to show that our algorithm is guaranteed to converge to a point in the set of optimal decision values when the local costs are convex. When the local cost functions are strongly convex and their local gradients are globally Lipschitz the convergence guarantees of our proposed algorithm over connected graphs is exponential and can also be extended to dynamic graphs.

To address scenarios where all or some of the decision variables are bounded in (1), we use a variation of exact penalty function method [32], called $\epsilon$ -exact penalty function method [33]. Unlike the exact penalty method, this method uses a smooth differentiable penalty function to converge to the $\epsilon$ -neighborhood of the global minimum value of the cost. The advantage of exact penalty function methods is in the possibility of using a finite penalty weight to arrive at a practical and numerically well-posed optimization solution. However, as shown in [32, 33], the penalty function weight is lower bounded by the bounds on the Lagrange multipliers. Since generally, the Lagrange multipliers are unknown, the bound on the penalty function weight is not known either. Many literature that use penalty function methods on distributed optimization framework generally state that a large enough value for the weight is used [34, 35], with no guarantees on the feasibility of their choice. [36], [37], [24, Lemma 5.1] and [30, Proposition 4],, and are among few results in literature that address the problem of establishing an exact upper-bound on the size of the Lagrange multipliers, which can be used to obtain a lower bound on the size of the valid penalty function weight. However, [36] only considers problems with inequality constraints only, while [24, Lemma 5.1], [30, Proposition 4] are developed for the resource allocation problem described by (1) when there exists only one equality constraint ( $p=1$ ) with $\mathsf{w}^{i}=1$ , $i\in\mathcal{V}$ and all the decision variables have boxed inequality. on the other hand [37] proposes a numerical procedure. As part of our contribution in this paper, we obtain an explicit closed-form upper-bound on the Lagrange multipliers of problem (1), which enables determining the size of the suitable penalty function weight for both exact and $\epsilon$ -exact penalty function methods.

In summary, the contribution of this paper is twofold. (a) We propose a novel distributed algorithm to solve problem (1). This design uses an augmented Lagrangian approach, which, similar to the case of centralized solvers, extends the convergence guarantees of our proposed distributed algorithm to convex cost functions, as well. Our design also incorporates a cluster-based approach to reduce computational and communication costs. (b) We establish a well-defined upper-bound on the Lagrange multipliers of problem (1). This result is of fundamental importance and its impact is beyond our proposed algorithm. It is useful in identifying the value of the weight factor of exact and $\epsilon$ -exact penalty functions that are used to address inequality constraints.

2 Preliminaries

Let ${\mathbb{R}}$ , ≥0, $\mathbb{Z}$ , and ${\mathbb{Z}}_{>0}$ be, respectively, the set of real, nonnegative real, integer, and positive integer numbers. For a given $i,j\in\mathbb{Z}$ , $i<j$ , we define $\mathbb{Z}_{i}^{j}=\{x\in\mathbb{Z}\,|\,i\leq x\leq j\}$ . We denote the cardinality of a set $\mathcal{A}$ by $|\mathcal{A}|$ . For a matrix $\boldsymbol{\mathbf{A}}=[\mathsf{a}_{ij}]\in^{n\times m}$ , we denote its transpose matrix by $\boldsymbol{\mathbf{A}}^{\top}$ , $k^{th}$ row by $[\boldsymbol{\mathbf{A}}]_{k}$ , $k^{th}$ column by $[\boldsymbol{\mathbf{A}}]^{k}$ , and its element wise max-norm with $\|\boldsymbol{\mathbf{A}}\|_{\max}$ . We let $\boldsymbol{\mathbf{1}}_{n}$ (resp. $\boldsymbol{\mathbf{0}}_{n}$ ) denote the vector of $n$ ones (resp. $n$ zeros), $\boldsymbol{\mathbf{\mathsf{I}}}_{n}$ denote the $n\times n$ identity matrix and $\boldsymbol{\mathbf{\mathsf{\Pi}}}_{n}=\boldsymbol{\mathbf{\mathsf{I}}}_{n}-\frac{1}{n}\boldsymbol{\mathbf{1}}_{n}\boldsymbol{\mathbf{1}}_{n}^{\top}$ . When clear from the context, we do not specify the matrix dimensions. For a vector $\boldsymbol{\mathbf{x}}\in{\mathbb{R}}^{n}$ we denote the standard Euclidean and infinity norms by, respectively, $\|\boldsymbol{\mathbf{x}}\|\!=\!\sqrt{\boldsymbol{\mathbf{x}}^{\top}\boldsymbol{\mathbf{x}}}$ and $\|\boldsymbol{\mathbf{x}}\|_{\infty}\!=\!\max{|x_{i}|}_{i=1}^{n}$ . Given a set of vectors, we use $[\{\boldsymbol{\mathbf{p}}^{i}\}_{i\in\mathcal{M}}]$ to indicate the aggregate vector obtained from staking the set of the vectors $\{\boldsymbol{\mathbf{p}}_{i}\}_{i\in\mathcal{M}}$ whose indices belong to the ordered set $\mathcal{M}\subset\mathbb{Z}_{>0}$ . In a network of $N$ agents, to distinguish and emphasize that a variable is local to an agent $i\in\mathbb{Z}_{1}^{N}$ , we use superscripts, e.g., $f^{i}(\boldsymbol{\mathbf{x}}^{i})$ is the local function of agent $i\in\mathbb{Z}_{1}^{N}$ evaluated at its own local value $\boldsymbol{\mathbf{x}}^{i}\in^{n^{i}}$ . The $l^{th}$ element of a vector $\boldsymbol{\mathbf{x}}^{i}\in^{n^{i}}$ at agent $i\in\mathbb{Z}_{1}^{N}$ is denoted by $x_{l}^{i}$ . Moreover, if $\boldsymbol{\mathbf{p}}^{i}\in{\mathbb{R}}^{d^{i}}$ is a variable of agent $i\in\mathcal{V}=\{1,\cdots,N\}$ , the aggregated $\boldsymbol{\mathbf{p}}^{i}$ ’s of the network is the vector $\boldsymbol{\mathbf{p}}=[\{\boldsymbol{\mathbf{p}}^{i}\}_{i\in\mathcal{V}}]=[{\boldsymbol{\mathbf{p}}^{1}}^{\top},\cdots,{\boldsymbol{\mathbf{p}}^{N}}^{\top}]^{\top}\in{\mathbb{R}}^{\bar{d}}$ and $\text{Blkdiag}(\boldsymbol{\mathbf{p}})=\Big{[}\begin{smallmatrix}\boldsymbol{\mathbf{p}}^{1}&\boldsymbol{\mathbf{0}}&\boldsymbol{\mathbf{0}}\\ \boldsymbol{\mathbf{0}}&\cdots&\boldsymbol{\mathbf{0}}\\ \boldsymbol{\mathbf{0}}&\boldsymbol{\mathbf{0}}&\boldsymbol{\mathbf{p}}^{N}\end{smallmatrix}\Big{]}\in^{\bar{d}\times N}$ , with $\bar{d}=\sum\nolimits_{i=1}^{N}d^{i}$ . For a differentiable function $f:{\mathbb{R}}^{d}\to{\mathbb{R}}$ , $\nabla f(\boldsymbol{\mathbf{x}})$ represents its gradient. A differentiable function $f:{\mathbb{R}}^{d}\to{\mathbb{R}}$ is convex (resp. $\alpha$ -strongly convex, $\alpha\in{\mathbb{R}}_{>0}$ ) over a convex set $C\subseteq{\mathbb{R}}^{d}$ if and only if $(\boldsymbol{\mathbf{\mathsf{z}}}-\boldsymbol{\mathbf{\mathsf{x}}})^{\top}(\nabla f(\boldsymbol{\mathbf{\mathsf{z}}})-\nabla f(\boldsymbol{\mathbf{\mathsf{x}}}))\geq 0$ (resp. $\alpha\|\boldsymbol{\mathbf{\mathsf{z}}}-\boldsymbol{\mathbf{\mathsf{x}}}\|^{2}\leq(\boldsymbol{\mathbf{\mathsf{z}}}-\boldsymbol{\mathbf{\mathsf{x}}})^{\top}(\nabla f(\boldsymbol{\mathbf{\mathsf{z}}})-\nabla f(\boldsymbol{\mathbf{\mathsf{x}}}))$ , or equivalently $\alpha\|\boldsymbol{\mathbf{\mathsf{z}}}-\boldsymbol{\mathbf{\mathsf{x}}}\|\leq\|\nabla f(\boldsymbol{\mathbf{\mathsf{z}}})-\nabla f(\boldsymbol{\mathbf{\mathsf{x}}})\|$ ) for all $\boldsymbol{\mathbf{\mathsf{x}}},\boldsymbol{\mathbf{\mathsf{z}}}\in C$ . Moreover, it is strictly convex over a convex set $C\subseteq{\mathbb{R}}^{d}$ if and only if $(\boldsymbol{\mathbf{\mathsf{z}}}-\boldsymbol{\mathbf{\mathsf{x}}})^{\top}(\nabla f(\boldsymbol{\mathbf{\mathsf{z}}})-\nabla f(\boldsymbol{\mathbf{\mathsf{x}}}))>0$ .

Next, we briefly review basic concepts from algebraic graph theory following [38]. A weighted graph, is a triplet $\mathcal{G}=(\mathcal{V},\mathcal{E},\boldsymbol{\mathbf{\sf{A}}})$ , where $\mathcal{V}=\{1,\dots,N\}$ is the node set, $\mathcal{E}\subseteq\mathcal{V}\times\mathcal{V}$ is the edge set, and $\boldsymbol{\mathbf{\sf{A}}}=[\mathsf{a}_{ij}]\in^{N\times N}$ is a weighted adjacency matrix such that $\mathsf{a}_{ij}>0$ if $(i,j)\in\mathcal{E}$ and $\mathsf{a}_{ij}=0$ , otherwise. An edge from $i$ to $j$ , denoted by $(i,j)$ , means that agent $j$ can send information to agent $i$ . A graph is undirected if $(i,j)\in\mathcal{E}$ anytime $(j,i)\in\mathcal{E}$ . An undirected graph whose weights satisfy $\mathsf{a}_{ij}=\mathsf{a}_{ji}$ for all $i,j\in\mathcal{V}$ is called a connected graph if there is a path from every node to every other node in the network. The (out-)Laplacian matrix of a graph is $\boldsymbol{\mathbf{\mathsf{L}}}=\operatorname{Diag}(\boldsymbol{\mathbf{\mathsf{A}}}\boldsymbol{\mathbf{1}}_{N})-\boldsymbol{\mathbf{\mathsf{A}}}$ . Note that $\boldsymbol{\mathbf{\mathsf{L}}}\boldsymbol{\mathbf{1}}_{N}=\boldsymbol{\mathbf{0}}$ . A graph is connected if and only if $\boldsymbol{\mathbf{1}}_{N}^{\top}\boldsymbol{\mathbf{\mathsf{L}}}=\boldsymbol{\mathbf{0}}$ , and $\operatorname{rank}(\boldsymbol{\mathbf{\mathsf{L}}})=N-1$ . Therefore, for a connected graph zero is a simple eigenvalue of $\boldsymbol{\mathbf{\mathsf{L}}}$ . For a connected graph, we denote the eigenvalues of $\boldsymbol{\mathbf{\mathsf{L}}}$ by $\lambda_{1},\dots,\lambda_{N}$ , where $\lambda_{1}=0$ and $\lambda_{i}\leq\lambda_{j}$ , for $i<j$ .

3 Distributed Continuous-Time

Solvers

In this section, we present our distributed algorithm to first solve the constrained optimization problem (1) when there is no inequality constraint, i.e., $\underline{\mathcal{B}}^{i}=\bar{\mathcal{B}}^{i}=\{\}$ for $i\in\mathcal{V}$ . Then, we extend our results to solve the constrained optimization problem (1) with inequality constraints. Our standing assumptions are given below.

Assumption 3.1.

*(Problem specifications): *The cost function $f_{l}^{i}:\to$ of the subagent $l\in\mathbb{Z}_{1}^{n^{i}}$ of each agent $i\in\mathcal{V}$ is convex and differentiable. Moreover, $\nabla f^{i}:^{n^{i}}\to^{n^{i}}$ of each agent $i\in\mathcal{V}$ is locally Lipschitz. Also,

[TABLE]

is full row rank and the feasible set

[TABLE]

is non-empty for local inequalities (1c) and (1d). Lastly, the optimization problem (1) has a finite optimum $f^{\star}=f(\boldsymbol{\mathbf{\mathsf{x}}}^{\star})=\sum\nolimits_{i=1}^{N}f^{i}(\boldsymbol{\mathbf{\mathsf{x}}}^{i\star})$ .* $\Box$ *

Local Lipschitzness of $\nabla f^{i}$ , $i\in\mathcal{V}$ , guarantees existence and uniqueness of the solution of our proposed algorithm (7), which is a differential equation.

To solve problem (1) subject to only the equality constraints, we consider the augmented cost function with a penalty term on violating the affine constraint, i.e.,

[TABLE]

where $\rho\in_{\geq 0}$ is the penalty parameter. This augmentation results in the so-called augmented Lagrangian formulation of iterative optimization algorithms. As stated in [10], augmented Lagrangian methods were developed in part to bring robustness to the dual ascent method, and in particular, to yield convergence without assumptions like strict convexity or finiteness of the cost function (see also [25]). As shown below, such positive effects are valid also for the continuous-time algorithms we study. Augmenting the cost with the penalty function as in (4a) however presents a challenge in design of distributed solutions as the total cost in (4a) is no longer separable. Nevertheless, we are able to address this challenge in our distributed solution.

Lemma 3.1.

*(KKT conditions to characterize solution set of (4) [39]): *Consider the constrained optimization problem (4). Let Assumption 3.1 hold and $f^{i}:\mathbb{R}^{n^{i}}\to\mathbb{R}$ , $i\in\mathcal{V}$ , be a differentiable and convex function on ${}^{n^{i}}$ . For any $\rho\in{\mathbb{R}}_{\geq 0}$ , a point $\boldsymbol{\mathbf{\mathsf{x}}}^{\star}\in^{m}$ is a solution of (4) if and only if there exists a ${\boldsymbol{\mathbf{\mathsf{\nu}}}}^{\star}\in^{p}$ , such that, for $i\in\mathcal{V}$ ,

[TABLE]

Moreover, ${\boldsymbol{\mathbf{\mathsf{\nu}}}}^{\star}$ corresponding to every $\boldsymbol{\mathbf{\mathsf{x}}}^{\star}$ is unique and finite. If the local cost functions are strongly convex, then for any $\rho\in{\mathbb{R}}_{\geq 0}$ the KKT equation (5) has a unique solution $({\boldsymbol{\mathbf{\mathsf{\nu}}}}^{\star},\boldsymbol{\mathbf{\mathsf{x}}}^{\star})$ , i.e., (4) has a unique solution. * $\Box$ *

Let $L(\boldsymbol{\mathbf{\nu}},\boldsymbol{\mathbf{x}})\!=\!f(\boldsymbol{\mathbf{x}})+\frac{\rho}{2}\|\boldsymbol{\mathbf{\mathsf{w}}}^{1}\boldsymbol{\mathbf{x}}^{1}\!+\!\cdots\!+\!\boldsymbol{\mathbf{\mathsf{w}}}^{N}\boldsymbol{\mathbf{x}}^{N}\!\!-\!\boldsymbol{\mathbf{\mathsf{b}}}\|^{2}+\boldsymbol{\mathbf{\nu}}^{\top}(\boldsymbol{\mathbf{\mathsf{w}}}^{1}\boldsymbol{\mathbf{x}}^{1}\!+\!\cdots\!+\!\boldsymbol{\mathbf{\mathsf{w}}}^{N}\boldsymbol{\mathbf{x}}^{N}\!\!-\!\boldsymbol{\mathbf{\mathsf{b}}})$ be the augmented Lagrangian of the optimization problem (4). Following [28], a central solver for the optimal resource allocation problem (4) is

[TABLE]

where $k\in\mathbb{Z}_{1}^{p}$ , and $i\in\mathcal{V}$ . The algorithm studied in [28] is for un-augmented Lagrangian, i.e., $\rho=0$ , and the guaranteed convergence holds only for strictly convex cost function $f(\boldsymbol{\mathbf{x}})$ . However, we can show that the central solver (6) with $\rho>0$ is guaranteed to converge for convex cost function $f(\boldsymbol{\mathbf{x}})$ , as well (the details are omitted for brevity). A numerical example demonstrating this positive role is presented in Appendix B.

The source of coupling in (4) is the set of the equality constraints (4b), which appear in the central solver (6), as well. To design our distributed algorithm, we adapt the structural constitution of (6), but aim to create the coupling terms $[\boldsymbol{\mathbf{\mathsf{w}}}^{1}]_{k}\boldsymbol{\mathbf{x}}^{1}\!\!+\!\cdots\!+\![\boldsymbol{\mathbf{\mathsf{w}}}^{N}]_{k}\boldsymbol{\mathbf{x}}^{N}\!-\mathsf{b}_{k}$ , $k\in\mathbb{Z}_{1}^{p}$ , in a distributed manner. We note that for every equality constraint $k\in\mathbb{Z}_{1}^{p}$ , the coupling is among the set of agents $\mathcal{C}_{k}=\{i\in\mathcal{V}\,|\,[\boldsymbol{\mathbf{\mathsf{w}}}^{i}]_{k}\neq\boldsymbol{\mathbf{0}}\}$ . To have an efficient communication and computation resource management, we seek an algorithm that handles every coupled equality constraint among only those agents that are involved. In this regards, for every equality constraint $k\in\mathbb{Z}_{1}^{p}$ , we let $\mathcal{G}_{k}(\mathcal{V}_{k},\mathcal{E}_{k})$ be a connected undirected subgraph of $\mathcal{G}$ that contains the set of agents $\mathcal{C}_{k}$ (see Fig. 1 for an example). We assume that $\mathcal{V}_{k}\subset\mathcal{V}$ is a monotonically increasing ordered set. It is very likely that the agents coupled through an equality constraint are geographically close, and thus in the communication range of each other. Nevertheless, $\mathcal{V}_{k}$ , $k\in\mathbb{Z}_{1}^{p}$ , may contain agents $i\in\mathcal{V}$ that have $[\boldsymbol{\mathbf{\mathsf{w}}}^{i}]_{k}=\boldsymbol{\mathbf{0}}$ but are needed to make $\mathcal{G}_{k}$ connected (see Fig. 1 for an example). We let $N_{k}=|\mathcal{V}_{k}|$ , $k\in\mathbb{Z}_{1}^{p}$ . In our distributed solution for (4), we also seek an algorithm that allows each agent to use a local penalty parameter $\rho^{i}\in_{>0}$ , so we can eliminate the need to coordinate among the agents to choose the penalty parameter $\rho$ . In what follows, we define $\mathcal{T}^{i}=\{j\in\mathbb{Z}_{1}^{p}|i\in\mathcal{V}_{j}\}$ , $i\in\mathcal{V}$ , and $\{\bar{\mathsf{b}}_{k}^{l}\}_{l\in\mathcal{V}_{k}}$ such that $\sum_{l\in\mathcal{V}_{k}}{\bar{\mathsf{b}}}_{k}^{l}=\mathsf{b}_{k}$ , for $k\in\mathbb{Z}_{1}^{p}$ (possible options include $\bar{\mathsf{b}}_{k}^{l}=\mathsf{b}_{k}/|\mathcal{C}_{k}|$ , $l\in\mathcal{C}_{k}$ while $\bar{\mathsf{b}}_{k}^{j}=0$ , $j\in\mathcal{V}\backslash\mathcal{C}_{k}$ , or $\bar{\mathsf{b}}^{j}_{k}=\mathsf{b}_{k}$ for a particular agent $j\in\mathcal{V}_{k}$ and $\bar{\mathsf{b}}^{l}_{k}=0$ for any $l\in\mathcal{V}\backslash\{j\}$ ).

With the right notation at hand, our proposed distributed algorithm to solve optimization problem (4) is

[TABLE]

with $\beta_{k}\!\in\!_{>0}$ and $\rho^{i}\!\in\!_{\geq 0}$ for $i\!\in\!\mathcal{V}$ , $k\in\mathbb{Z}_{1}^{p}$ and $l\in\mathcal{V}_{k}$ . To comprehend the connection with the centralized dynamical solver (6), take summation of (7a) and (7b) over every connected $\mathcal{G}_{k},\,k\in\mathbb{Z}_{1}^{p}$ to obtain

[TABLE]

which shows that for any $k\in\mathbb{Z}_{1}^{p}$ , the dynamics of the sum of $v^{l}_{k}$ s duplicates the Lagrange multiplier dynamics (6a) of the central Augmented Lagrangian method. Therefore, in a convergent (7), ultimately for each $k\in\mathbb{Z}^{p}_{1}$ , all the $v^{l}_{k}$ s converge to the same value indicating that ultimately every agent obtains a local copy of (6a) for any $k\in\mathbb{Z}^{p}_{1}$ . On the other hand, if we factor out $(1+\rho^{i})$ from the right hand side of (7c) and exclude the third component, which is a technical term added to induce agreement between the agents, (7c) mimics the dynamics (6) of the central Augmented Lagrangian solver.

Remark 3.1.

*(Benefits of cluster-based approach) *First we note that regardless of the size of $n^{i}$ , in algorithm (7) we associate at most one copy of the Lagrange multiplier generator dynamics, i.e., (7a) and (7b), to every agent $i\in\mathcal{V}$ . Specifically, every agent $i\in\mathcal{V}$ , maintains $|\mathcal{T}^{i}|\leq p$ number of (7a) and (7b) pair dynamics and consequently has to broadcast the same number of variables to the network. In comparison, if we use the algorithms in [18, 19, 8, 20, 21, 22, 23, 24], when $n^{i}>1$ , for any $i\in\mathcal{V}$ , we need to treat each component of the $i$ as an agent and assign a copy of a dynamics that generates the dual variable to every subagent $l\in\mathbb{Z}_{1}^{n^{i}}$ . This results in a storage, computation and communication cost of order $n^{i}\times p$ per agent $i\in\mathcal{V}$ . See our numerical examples for a comparison. Next, notice that algorithm (7) can always be implemented by using $\mathcal{G}_{k}=\mathcal{G}$ , $k\in\mathbb{Z}_{1}^{p}$ , where $\mathcal{G}=(\mathcal{V},\mathcal{E})$ is the connected interaction topology that all the agents form. However, the flexibility to use a smaller cyber-layer formed by only the cluster of agents that are coupled by an equality constraint reduces the communication and computational cost of implementing Algorithm (7). Moreover, in some problems, similar to our numerical example in Section 4, the coupling equation is between the neighboring agents. In such cases, subgraphs $\mathcal{G}_{k}$ can be easily formed. Moreover, as one can expect and our numerical example also highlights, using a smaller subgraph $\mathcal{G}_{k}$ can results in a faster convergence for (7a) and (7b) dynamics and as a result a faster convergence for algorithm (7). $\Box$

The equilibrium points of algorithm (7) when every $\mathcal{G}_{k}$ , $k\in\mathbb{Z}_{1}^{p}$ is a connected graph is given by

[TABLE]

Due to (8a), if algorithm (7) is initialized such that $\sum\nolimits_{l\in\mathcal{V}_{k}}y_{k}^{l}(0)=0$ , we have $\sum\nolimits_{l\in\mathcal{V}_{k}}{y}^{l}_{k}(t)=\sum\nolimits_{l\in\mathcal{V}_{k}}{y}^{l}_{k}(0)$ for $t\in_{\geq 0}$ . In that case, if algorithm (7) converges to an equilibrium point $(\{\bar{\boldsymbol{\mathbf{v}}}_{k}\}_{k=1}^{p},\{\bar{\boldsymbol{\mathbf{y}}}_{k}\}_{k=1}^{p},\{\bar{\boldsymbol{\mathbf{x}}}^{i}\}_{i=1}^{N})\in\mathcal{S}_{e}$ , we have $(\{\bar{\boldsymbol{\mathbf{v}}}_{k}\}_{k=1}^{p},\{\bar{\boldsymbol{\mathbf{y}}}_{k}\}_{k=1}^{p},\{\bar{\boldsymbol{\mathbf{x}}}^{i}\}_{i=1}^{N})\!=\!(\{[\{[\boldsymbol{\mathbf{\mathsf{w}}}^{l}]_{k}\boldsymbol{\mathbf{\mathsf{x}}}^{l\star}-\bar{\mathsf{b}}^{l}_{k}\}_{l\in\mathcal{V}_{k}}]\}_{k=1}^{p},\,\{\nu^{\star}_{k}\boldsymbol{\mathbf{1}}_{N_{k}}\}_{k=1}^{p},\{\boldsymbol{\mathbf{\mathsf{x}}}^{i\star}\}_{i=1}^{N})$ , where $(\{\boldsymbol{\mathbf{\mathsf{x}}}^{i\star}\}_{i=1}^{N},\\ \{\nu^{\star}_{k}\}_{k=1}^{p})$ satisfies the KKT equation (5). The following theorem shows that indeed under the stated initialization, the algorithm (7) converges to a minimizer of optimization problem (4). To establish the proof of this theorem we use the following notations. We let $\boldsymbol{\mathbf{\mathsf{A}}}\in^{N\times N}$ be the adjacency matrix of $\mathcal{G}$ . Then, the the adjacency matrix of $\mathcal{G}_{k}\subset\mathcal{G}$ , $k\in\mathbb{Z}_{1}^{p}$ , is $\boldsymbol{\mathbf{\mathsf{A}}}_{k}$ , which is the submatrix of $\boldsymbol{\mathbf{\mathsf{A}}}$ corresponding to the rows and the columns associated with the agents in $\mathcal{V}_{k}$ , i.e., $\boldsymbol{\mathbf{\mathsf{A}}}_{k}=\boldsymbol{\mathbf{M}}_{k}^{\top}\,\boldsymbol{\mathbf{\mathsf{A}}}\,\boldsymbol{\mathbf{M}}_{k}$ where $\boldsymbol{\mathbf{M}}_{k}\in^{N\times N_{k}}$ is defined such that $[\boldsymbol{\mathbf{M}}_{k}]^{l}=[\boldsymbol{\mathbf{\mathsf{I}}}]^{\mathcal{V}_{k}(l)}$ , $l\in\{1,\dots,N_{k}\}$ with $\mathcal{V}_{k}(l)$ being the $l^{th}$ element of the ordered set $\mathcal{V}_{k}$ . Then, $\boldsymbol{\mathbf{\mathsf{L}}}_{k}=\operatorname{Diag}(\boldsymbol{\mathbf{\mathsf{A}}}_{k}\boldsymbol{\mathbf{1}}_{N_{k}})-\boldsymbol{\mathbf{\mathsf{A}}}_{k}$ is the Laplacian matrix of $\mathcal{G}_{k}$ , $k\in\mathbb{Z}_{1}^{p}$ . Next, we define $\boldsymbol{\mathbf{\mathsf{r}}}_{k}=\frac{1}{\sqrt{N_{k}}}\boldsymbol{\mathbf{1}}_{N_{k}}$ and $\boldsymbol{\mathbf{\mathsf{R}}}_{k}=[\boldsymbol{\mathbf{v}}_{2k},\cdots,\boldsymbol{\mathbf{v}}_{N_{k}k}]$ with $(\boldsymbol{\mathbf{\mathsf{r}}}_{k},\{\boldsymbol{\mathbf{v}}_{jk}\}_{j=2}^{N_{k}})$ being the normalized eigenvectors of $\boldsymbol{\mathbf{\mathsf{L}}}_{k}$ . Note here that we have

[TABLE]

The eigenvectors are ordered such that $\lambda_{2k}$ and $\lambda_{N_{k}k}$ are, respectively, the smallest and the largest non-zero eigenvalues of $\boldsymbol{\mathbf{\mathsf{L}}}_{k}$ . The next two theorems whose proofs are given in Appendix A examine the stability and convergence of (7) over connected graphs.

Theorem 3.1.

(Asymptotic convergence of (7) over connected graphs when the local costs are convex): Let every $\mathcal{G}_{k}$ , $k\in\mathbb{Z}_{1}^{p}$ , be a connected graph and Assumption 3.1 hold. For every $k\in\mathbb{Z}_{1}^{p}$ , suppose $\{\bar{\mathsf{b}}_{k}^{l}\}_{l\in\mathcal{V}_{k}}\subset$ is defined such that $\sum_{l\in\mathcal{V}_{k}}{\bar{\mathsf{b}}}_{k}^{l}=\mathsf{b}_{k}$ . Then, for each $i\in\mathcal{V}$ , $l\in\mathcal{V}_{k}$ , starting from $\boldsymbol{\mathbf{x}}^{i}(0)\in^{n^{i}}$ and $y^{l}_{k}(0),v^{l}_{k}(0)\in$ with $\sum_{l\in\mathcal{V}_{k}}y^{l}_{k}(0)\!=\!0$ , the algorithm (7) for any $\rho^{i}\in{\mathbb{R}}_{>0}$ , makes $t\mapsto(\{\boldsymbol{\mathbf{v}}_{k}(t)\}_{k=1}^{p},\{\boldsymbol{\mathbf{x}}^{i}(t)\}_{i=1}^{N})$ converge asymptotically to $(\,\{{{\nu}}_{k}^{\star}\boldsymbol{\mathbf{1}}_{N_{k}}\}_{k=1}^{p},\{{\boldsymbol{\mathbf{\mathsf{x}}}^{i\star}}\}_{i=1}^{N})$ , where $(\{{\nu}_{k}^{\star}\}_{k=1}^{p},\{{\boldsymbol{\mathbf{\mathsf{x}}}^{i\star}}\}_{i=1}^{N})$ is a point satisfying the KKT conditions (5) of problem (4). $\Box$

The initialization condition $\sum_{l\in\mathcal{V}_{k}}y^{l}_{k}(0)=0$ of Theorem 3.1 is trivially satisfied by every agent $l\in\mathcal{V}_{k}$ , $k\in\mathbb{Z}_{1}^{p}$ , using $y^{l}_{k}(0)=0$ . The asymptotic convergence guarantee for algorithm (7) in Theorem 3.1 is established for local convex cost functions. For such cost functions, similar to the centralized algorithm (6), (7) fails to converge when $\rho^{i}=0$ for all $i\in\mathcal{V}$ . Next, we show that if the local costs are strongly convex and have Lipschitz gradients then the convergence is in fact exponentially fast for $\rho^{i}\in_{>0}$ $i\in\mathcal{V}$ . Recall that for strongly convex local cost functions, the minimizer of (4) is unique.

Theorem 3.2.

(Exponential convergence of (7) over connected graphs when the local costs are strongly convex and have Lipschitz gradients ): Let every $\mathcal{G}_{k}$ , $k\!\in\!\mathbb{Z}_{1}^{p}$ be connected and Assumption 3.1 hold. Also, assume each cost function $f^{i}_{l}$ , $l\!\in\!\mathbb{Z}_{1}^{n^{i}}$ , $i\!\in\!\mathcal{V}$ , is $m^{i}_{l}$ -strongly convex and has $M^{i}_{l}$ -Lipschitz gradient. Let $m\!=\!\max\{\{m^{i}_{l}\}_{l=1}^{n^{i}}\}_{i=1}^{N}\in_{>0}$ and $M\!=\!\max\{\{M^{i}_{l}\}_{l=1}^{n^{i}}\}_{i=1}^{N}\in{\mathbb{R}}_{>0}$ . Then, starting from $\boldsymbol{\mathbf{x}}^{i}(0)\!\in\!^{n^{i}}$ and $y_{k}^{l}(0),v^{l}_{k}(0)\!\in\!$ for each $i\!\in\!\mathcal{V}$ , $l\in\mathcal{V}_{k}$ , and given $\sum_{l\in\mathcal{V}_{k}}y^{l}_{k}(0)\!=\!0$ and $\sum\nolimits_{l\in\mathcal{V}_{k}}\bar{b}_{k}^{l}\!=\!\mathsf{b}_{k}$ in (7), the algorithm (7) makes $t\mapsto(\{\boldsymbol{\mathbf{v}}_{k}(t)\}_{k=1}^{p},\{\boldsymbol{\mathbf{x}}^{i}(t)\}_{i=1}^{N})$ converge exponentially fast to $(\,\{{{\nu}}_{k}^{\star}\boldsymbol{\mathbf{1}}_{N_{k}}\}_{k=1}^{p},\{{\boldsymbol{\mathbf{\mathsf{x}}}^{i\star}}\}_{i=1}^{N})$ for any $\rho^{i}\in_{>0}$ , where $(\{{\nu}_{k}^{\star}\}_{k=1}^{p},\{{\boldsymbol{\mathbf{\mathsf{x}}}^{i\star}}\}_{i=1}^{N})$ is the unique solution of the KKT conditions (5) of problem (4). Moreover, when $\rho^{i}=0$ for an $i\in\mathcal{V}$ , the convergence to the unique solution of the KKT conditions (5) is asymptotic. $\Box$

The proof of Theorem 3.2 is given in Appendix A.

Remark 3.2.

*(The convergence of (7) over dynamically changing connected graphs) *The proof of Theorem 3.2 relies on a Lyapunov function that is independent of the systems parameters, and its derivative for $\rho^{i}\in_{>0}$ , $i\in\mathcal{V}$ , is negative definite with a quadratic upper bound. Hence, we can also show that the algorithm (7), when $\rho^{i}\in_{>0}$ for $i\in\mathcal{V}$ , converges exponentially fast to a unique solution of the KKT conditions (5) of problem (4) over any time-varying topology $\mathcal{G}_{k}$ , $k\in\mathbb{Z}_{1}^{p}$ that is connected at all times and its adjacency matrix is uniformly bounded and piece-wise constant.

3.1 Problem subject to both equality and inequality constraints

To address inequality constraints, we use a penalty function method to eliminate the local inequality constraints (1c) and (1d). That is, we seek solving

[TABLE]

with

[TABLE]

$i\in\mathcal{V}$ , where $\gamma\in_{>0}$ is the weight of the smooth penalty function $p_{\epsilon}=\begin{cases}0,&y\leq 0,\\ \,\frac{1}{2\epsilon}y^{2},&0\leq y\leq\epsilon,\\ (y-\frac{1}{2}\epsilon),&y\geq\epsilon,\end{cases}$ for some $\epsilon\in\!{\mathbb{R}}_{>0}$ . This approach allows us to use algorithm (7) to solve the optimization (1) by using $f^{i}_{\text{p}}(\boldsymbol{\mathbf{x}}^{i})$ in place of $f^{i}(\boldsymbol{\mathbf{x}}^{i})$ in (7c). We note that $f^{i}_{\text{p}}(\boldsymbol{\mathbf{x}}^{i})$ is convex and differentiable if $f^{i}(\boldsymbol{\mathbf{x}}^{i})$ is a convex function in ${}^{n^{i}}$ . Following this penalty method approach, when the global cost function of (1) is evaluated at the limit point of algorithm (7), it is in $\epsilon$ -order neighborhood of the global optimal value of the optimization problem (1) (see Proposition 3.1 below). In what follows, we investigate when the penalty function weight $\gamma$ has a finite value and give a well-defined admissible range for it.

Given Assumption 3.1, the Slater condition [39] is satisfied. Thus, the KKT conditions below give a set of necessary and sufficient conditions that characterize the solution set of the convex optimization problem (1).

Lemma 3.2.

*(Solution set of (1) [39]): *Consider the constrained optimization problem (1) under Assumptions 3.1. A point $\boldsymbol{\mathbf{\mathsf{x}}}^{\star}\in^{m}$ is a solution of (1) if and only if there exists ${\boldsymbol{\mathbf{\mathsf{\nu}}}}^{\star}\in^{p}$ and $\{\underline{{\mu}\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}\}_{l\in\underline{\mathcal{B}}^{i}}\subset_{\geq 0}$ $\{\bar{{\mu}}_{l}^{i\star}\}_{l\in\bar{\mathcal{B}}^{i}}\subset_{\geq 0}$ , $i\in\mathcal{V}$ , such that

[TABLE]

where $\underline{\boldsymbol{\mathbf{\mu}}\mkern-2.0mu}\mkern 2.0mu^{i\star}=[\underline{\mu\mkern-2.0mu}\mkern 2.0mu_{1}^{i\star},\cdots,\underline{\mu\mkern-2.0mu}\mkern 2.0mu_{n^{i}}^{i\star}]^{\top}$ with $\underline{\mu\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}=0$ for $l\in\mathbb{Z}_{1}^{n^{i}}\backslash\underline{\mathcal{B}}^{i}$ and $\bar{\boldsymbol{\mathbf{\mu}}}^{i\star}=[\bar{\mu}_{1}^{i\star},\cdots,\bar{\mu}_{n^{i}}^{i\star}]^{\top}$ with $\bar{\mu}_{l}^{i\star}=0$ for $l\in\mathbb{Z}_{1}^{n^{i}}\backslash\bar{\mathcal{B}}^{i}$ . If the local cost functions are strongly convex, then the optimization problem (1) has a unique solution. $\Box$ **

Let $X^{\epsilon}_{\text{fe}}$ be the $\epsilon$ -feasible set of optimization problem (1),

[TABLE]

The result below states that for some admissible values of $\gamma$ , the minimizer of problem (11) belongs to $\epsilon$ -feasible set $X^{\epsilon}_{\text{fe}}$ and optimal value of optimization problem (1) is in $\epsilon$ order neighborhood of the optimal value of the original optimization problem (1).

Proposition 3.1.

*(relationship between the solution of (1) and (11) [33]): *Let $(\boldsymbol{\mathbf{\mathsf{x}}}^{\star},{\boldsymbol{\mathbf{\mathsf{\nu}}}}^{\star},\{\underline{{\mu}\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}\}_{l\in\underline{\mathcal{B}}^{i}},\{\bar{{\mu}}_{l}^{i\star}\}_{l\in\bar{\mathcal{B}}^{i}})$ be any solution of the KKT equations (5). Let ${\boldsymbol{\mathbf{\mathsf{x}}}}_{\text{p}}^{\star}$ be a minimizer of optimization problem (11) for some $\gamma,\epsilon\in{\mathbb{R}}_{>0}$ . If $\gamma=\frac{1-N}{1-\sqrt{N}}\gamma^{\star}$ , where $\gamma^{\star}>\max\big{\{}\max\{\underline{\mu\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}\}_{l\in\underline{\mathcal{B}}^{i}},\max\{\bar{\mu}_{l}^{i\star}\}_{l\in\bar{\mathcal{B}}^{i}}\big{\}}_{i=1}^{N}$ , then

[TABLE]

where $f^{\star}=f(\boldsymbol{\mathbf{\mathsf{x}}}^{\star})$ is the optimal value of (1).* $\Box$ *

We note that if $\epsilon\!\to\!0$ , we have $p_{\epsilon}(y)\!\to\!p(y)=\max\{0,y\}$ , where $p(y)$ is the well-known non-smooth penalty function [32] with exact equivalency guarantees when $\gamma\!>\!\gamma^{\star}$ in Proposition 3.1.

Remark 3.3.

*(comment on the feasibility of solution of (11)) *Use of $\epsilon-$ exact penalty function approach is motivated by keeping the cost smooth and differentiable, which is of desire from practical perspective compared to exact penalty method which is a non-smooth function. Using an $\epsilon$ -exact penalty function we have the grantees that the approximated solution $\boldsymbol{\mathbf{\mathsf{x}}}^{\star}_{p}$ is in (3.1). Therefore only the inequality constrains may be violated by $\epsilon$ amount. Since the value of $\epsilon$ can be selected very small, the possible violation of the inequality constraints will be small too. One may select the value of $\epsilon$ in accordance to the expected accuracy of the algorithm. Note that by slight tightening of the inequality constraints according to ${x}^{i}_{l}\leq\bar{\mathsf{x}}^{i}_{l}-\epsilon$ and $\underline{\mathsf{x}\mkern-2.0mu}\mkern 2.0mu^{i}_{l}+\epsilon\leq{x}^{i}_{l}$ and using these adjusted inequalities in the penalty function, we can guarantee that ${\boldsymbol{\mathbf{\mathsf{x}}}}_{\text{p}}^{\star}\in X_{\text{fe}}$ . But this may result in slight increase in the optimally gap in (15). **

Considering Proposition 3.1, a practical and numerically well-posed solution via the penalty optimization method (11) is achieved when the Lagrange multipliers are bounded. Thus, in what follows we seek for ${\mu}_{\text{bound}}$ in

[TABLE]

with the objective of choosing a penalty function weight $\gamma$ that satisfies the condition set by Proposition 3.1 by setting $\gamma\geq\frac{1-N}{1-\sqrt{N}}\,{\mu}_{\text{bound}}$ .

For any solution of the KKT conditions (5), we let $\underline{\mathcal{A}\mkern-2.0mu}\mkern 2.0mu^{i}\subset\underline{\mathcal{B}\mkern-2.0mu}\mkern 2.0mu^{i}$ and $\bar{\mathcal{A}^{i}}\subset\bar{\mathcal{B}^{i}}$ respectively be the set of indices of the active lower bound and the active upper bound inequality constraints of agent $i\in\mathcal{V}$ . We note that $\underline{\mathcal{A}\mkern-2.0mu}\mkern 2.0mu^{i}\cap\bar{\mathcal{A}}^{i}=\{\}$ . Because for inactive inequalities $\bar{\mu}^{i\star}_{l}=0$ (resp. $\underline{\mu\mkern-2.0mu}\mkern 2.0mu^{i\star}_{l}=0$ ) for $l\in\bar{\mathcal{B}}^{i}\backslash\bar{\mathcal{A}^{i}}$ and $i\in\mathcal{V}$ (resp. $l\in\underline{\mathcal{B}\mkern-2.0mu}\mkern 2.0mu^{i}\backslash\underline{\mathcal{A}\mkern-2.0mu}\mkern 2.0mu^{i}$ ) [40], we obtain

[TABLE]

Therefore, to find ${\mu}_{\text{bound}}$ , it suffices to find an upper bound on $\max\big{\{}\max\{\underline{\mu\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}\}_{l\in\underline{\mathcal{A}}^{i}},\max\{\bar{\mu}_{l}^{i\star}\}_{l\in\bar{\mathcal{A}}^{i}}\big{\}}_{i=1}^{N}$ .

As known, the set of the Lagrange multipliers of an optimization problem of form (1) is nonempty and bounded if and only if the Mangasarian-Fromovitz constraint qualification (MFCQ) holds [41]. It is straight-forward to show that the MFCQ condition is satisfied for a resource allocation problem of form (1) with one equality constraint (i.e., $p=1$ ) and upper and lower bounded decision variables (i.e., $\underline{\mathcal{B}}^{i}=\bar{\mathcal{B}}^{i}=\mathbb{Z}_{1}^{n_{i}}$ ). For such a problem the following result specifies a ${\mu}_{\text{bound}}$ that satisfies (16).

Proposition 3.2.

*( ${\mu}_{\text{bound}}$ for the resource allocation problem with one equality constraint and bounded decision variables): *Consider problem (1) under Assumption 3.1 when $p=1$ , $\mathsf{w}^{i}_{l}>0$ for $l\in\{1,\cdots,n^{i}\}$ and $\underline{\mathcal{B}}^{i}=\bar{\mathcal{B}}^{i}=\mathbb{Z}_{1}^{n_{i}}$ , $i\in\mathcal{V}$ . Let $(\boldsymbol{\mathbf{\mathsf{x}}}^{\star},\nu^{\star},\{\underline{{\mu}\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}\}_{l\in\underline{\mathcal{B}}^{i}},\{\bar{{\mu}}_{l}^{i\star}\}_{l\in\bar{\mathcal{B}}^{i}})$ be an arbitrary solution of the KKT conditions (5) for this problem. Then, ${\mu}_{\text{bound}}$ in (16) satisfies

[TABLE]

where $X^{i}_{\text{ineq}}=\{\boldsymbol{\mathbf{x}}^{i}\in^{n^{i}}|\,\underline{\mathsf{x}}^{i}_{l}\leq{x}^{i}_{l}\leq\bar{\mathsf{x}}^{i}_{l},l\in\mathbb{Z}_{1}^{n^{i}}\}$ , $\underline{\mathsf{w}}=\min\{\{\mathsf{w}^{i}_{l}\}_{l=1}^{n_{i}}\}_{i=1}^{N}$ and $\bar{\mathsf{w}}=\max\{\{\mathsf{w}^{i}_{l}\}_{l=1}^{n_{i}}\}_{i=1}^{N}$ .**

{pf}

For any given $(\boldsymbol{\mathbf{\mathsf{x}}}^{\star},\nu^{\star},\{\underline{{\mu}\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}\}_{l\in\underline{\mathcal{B}}^{i}},\{\bar{{\mu}}_{l}^{i\star}\}_{l\in\bar{\mathcal{B}}^{i}})$ , we note that the KKT conditions (5) can be written as

[TABLE]

Since $\{\mathsf{w}^{i}_{l}\}_{l=1}^{n^{i}}\subset_{>0}$ , it follows from Assumption 3.1, which states that the feasible set is non-empty for strict local inequalities, that the upper bounds (similarly the lower bounds) for all decision variable cannot be active simultaneously. Therefore, for any given minimizer, we have either (a) at least for one subagent $k\in\mathbb{Z}_{1}^{n^{i}}$ in an agent $i\in\mathcal{V}$ we have $\underline{\mathsf{x}}^{i}_{k}<\mathsf{x}^{i\star}_{k}<\bar{\mathsf{x}}^{i}_{k}$ or (b) some of the decision variables are equal to their upper bound and the remaining others are equal to their lower bound. If case (a) holds, it follows from (19a) that $\nu^{\star}=\frac{-\nabla f^{i}_{k}(\mathsf{x}^{i\star}_{k})}{\mathsf{w}^{i}_{k}}$ , which means that we have the guarantees that $|\nu^{\star}|\leq\frac{\max\{\|\nabla f^{i}(\boldsymbol{\mathbf{\mathsf{x}}}^{i\star})\|_{\infty}\}_{i=1}^{N}}{\underline{\mathsf{w}}}$ . On the other hand, if (b) holds, then there exists at least an agent $k\in\mathcal{V}$ with $\bar{\mathcal{A}}^{k}\neq\{\}$ and an agent $j\in\mathcal{V}$ with $\underline{\mathcal{A}}^{j}\neq\{\}$ ( $k=j$ is possible). Therefore, for $l\in\bar{\mathcal{A}}^{k}$ it follows from (19b) that $\nu^{\star}=\frac{1}{\mathsf{w}^{k}_{l}}(-\nabla f^{k}_{l}(\mathsf{x}^{k\star}_{l})-\bar{\mu}^{k\star}_{l})$ , and for $\bar{l}\in\underline{\mathcal{A}}^{j}$ it follows from (19c) that $\nu^{\star}=\frac{1}{\mathsf{w}^{j}_{\bar{l}}}(-\nabla f^{j}_{\bar{l}}(\mathsf{x}^{j\star}_{\bar{l}})+\bar{\mu}^{j\star}_{\bar{l}})$ . Consequently, because $\bar{\mu}^{k\star}_{l}\geq 0$ and $\bar{\mu}^{j\star}_{\bar{l}}\geq 0$ , we conclude that $-\frac{1}{\mathsf{w}^{j}_{\bar{l}}}\nabla f^{j}_{\bar{l}}(\mathsf{x}^{j\star}_{\bar{l}})\leq\nu^{\star}\leq-\frac{1}{\mathsf{w}^{k}_{l}}\nabla f^{k}_{l}(\mathsf{x}^{k\star}_{l})$ , which leads to $|\nu^{\star}|\leq\max\{|\frac{\nabla f^{j}_{\bar{l}}(\mathsf{x}^{j\star}_{\bar{l}})}{{\mathsf{w}}^{j}_{\bar{l}}}|,|\frac{\nabla f^{k}_{l}(\mathsf{x}^{k\star}_{l})}{{\mathsf{w}}^{k}_{l}}|\}\leq\frac{\max\{\|\nabla f^{i}(\boldsymbol{\mathbf{\mathsf{x}}}^{i\star})\|_{\infty}\}_{i=1}^{N}}{\underline{\mathsf{w}}}$ . Therefore, we conclude that for any given $(\boldsymbol{\mathbf{\mathsf{x}}}^{\star},\nu^{\star},\{\underline{{\mu}\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}\}_{l\in\underline{\mathcal{B}}^{i}},\{\bar{{\mu}}_{l}^{i\star}\}_{l\in\bar{\mathcal{B}}^{i}})$ , we have $|\nu^{\star}|\leq\frac{\max\{\|\nabla f^{i}(\boldsymbol{\mathbf{\mathsf{x}}}^{i\star})\|_{\infty}\}_{i=1}^{N}}{\underline{\mathsf{w}}}\leq\frac{\max\big{\{}\underset{\boldsymbol{\mathbf{x}}^{i}\in X^{i}_{\text{ineq}}}{\max}\,{\|\nabla f^{i}(\boldsymbol{\mathbf{x}}^{i})\|_{\infty}}\big{\}}_{i=1}^{N}}{{\underline{\mathsf{w}\mkern-2.0mu}\mkern 2.0mu}}$ . Consequently, it follows from (19b) that $\bar{\mu}^{i\star}_{l}\leq|\nabla f^{i}_{l}(\mathsf{x}^{i\star}_{l})|\!+|\mathsf{w}^{i}_{l}\,\nu^{\star}|\leq\|\nabla f^{i}_{l}(\mathsf{x}^{i\star}_{l})\|_{\infty}+\bar{\mathsf{w}}|\nu^{\star}|$ , and from (19c) that $\underline{\mu\mkern-2.0mu}\mkern 2.0mu^{i\star}_{l}\leq\|\nabla f^{i}_{l}(\mathsf{x}^{i\star}_{l})\|_{\infty}\!+|\mathsf{w}^{i}_{l}\,\nu^{\star}|\leq\|\nabla f^{i}_{l}(\mathsf{x}^{i\star}_{l})\|_{\infty}+\bar{\mathsf{w}}|\nu^{\star}|$ . Therefore, given (3.1), we have the guarantees that (18) holds.

To compute the upper-bound in (18) in a distributed manner, agents can run a set of max-consensus algorithms.

To demonstrate the tightness of the bound in (20), consider the following numerical example

[TABLE]

in which the local cost functions are assumed quadratic as $f^{i}({x}^{i})=\alpha_{i}{x}^{i2}+\beta_{i}{x}^{i}+\gamma_{i}$ where the parameters chosen randomly according to $\alpha_{i}\in(0,1]$ , $\beta_{i}\in(0,3]$ , $\gamma_{i}\in(0,4]$ , $b\in(0,4]$ . The affine constraint weights are also chosen randomly according to $w_{i}\in(0,2]$ are randomly chosen. For this problem finding the exact value of the Lagrange multipliers is possible by solving the KKT equations. To do this calculation, we use fmincon function of MATLAB to obtain the optimum solution. Then, we compute the corresponding Lagrange multipliers by solving the KKT conditions. Table. 1 shows the values of $\mu_{max}$ , the maximum of the Lagrange multipliers, and the values of $\mu_{bound}$ in (18) for five different runs of the algorithm. As we can see, for this problem the values for $\mu_{bound}$ at most are only one order of magnitude larger than $\mu_{\max}$ .

Evaluating the MFCQ condition generally is challenging for other classes of optimization problems. A common sufficient condition for the MFCQ is the linear independence constraint qualification (LICQ), which also guarantees the uniqueness of the Lagrange multipliers for any solution of the optimization problem (1) [42] (see [12] and [43] for examples of the optimization solvers that are developed under the assumption that the LICQ holds). For a constrained optimization problem we say that the LICQ holds for the optimal solution $\boldsymbol{\mathbf{\mathsf{x}}}^{\star}\in^{m}$ if the gradient of the equality constraints and the active inequality constraints at $\boldsymbol{\mathbf{\mathsf{x}}}^{\star}$ are linearly independent. The following result finds a ${\mu}_{\text{bound}}$ for problem (1) when LICQ condition holds at the minimizers.

Theorem 3.3.

*(Bounds on the Lagrange multipliers corresponding to inequality constraints when the LICQ holds at the minimizers): *Consider problem (1) under Assumption 3.1. Assume also that the LICQ holds at the minimizers of (1). Let $(\boldsymbol{\mathbf{\mathsf{x}}}^{\star},{\boldsymbol{\mathbf{\mathsf{\nu}}}}^{\star},\{\underline{{\mu}\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}\}_{l\in\underline{\mathcal{B}}^{i}},\{\bar{{\mu}}_{l}^{i\star}\}_{l\in\bar{\mathcal{B}}^{i}})$ be an arbitrary solution of the KKT conditions (5) for this problem. Then, the bound ${\mu}_{\text{bound}}$ in (16) satisfies

[TABLE]

where $\bar{\mathsf{w}}\!=\!\|\boldsymbol{\mathbf{\mathsf{W}}}\|_{\max}=\max\{\|\boldsymbol{\mathbf{\mathsf{w}}}^{i}\|_{\max}\}_{i=1}^{N}$ , and $\omega\!=\!\min\{\sigma_{\min}(\boldsymbol{\mathbf{\mathsf{W}}}_{c})\,\big{|}\,\boldsymbol{\mathbf{\mathsf{W}}}_{c}\!\in\!\mathpzc{Q}(\boldsymbol{\mathbf{\mathsf{W}}}^{\top})\,\}$ . Here, $\mathpzc{Q}(\boldsymbol{\mathbf{\mathsf{W}}}^{\top})$ is the set of all the invertible $p\times p$ sub-matrices of $\boldsymbol{\mathbf{\mathsf{W}}}^{\top}\in^{m\times p}$ (recall (2)).**

{pf}

For any $(\boldsymbol{\mathbf{\mathsf{x}}}^{\star},{\boldsymbol{\mathbf{\mathsf{\nu}}}}^{\star},\{\underline{{\mu}\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}\}_{l\in\underline{\mathcal{B}}^{i}},\{\bar{{\mu}}_{l}^{i\star}\}_{l\in\bar{\mathcal{B}}^{i}})$ , we note that the KKT conditions (5) can be written as

[TABLE]

$i\in\mathcal{V}$ . Under the LICQ assumption, the gradients of the equality constraints (set of $p$ vectors in m) and the active inequality constraints (set of $\sum_{i=1}^{N}|\underline{\mathcal{A}}^{i}\cup\bar{\mathcal{A}}^{i}|$ vectors in m) at the minimizer should be linearly independent. This necessitates that $\sum_{i=1}^{N}|\underline{\mathcal{A}}^{i}\cup\bar{\mathcal{A}}^{i}|\leq m-p$ . As a result, we can conclude that $q=\sum_{i=1}^{N}|\mathbb{Z}_{1}^{n^{i}}\backslash(\bar{\mathcal{A}}^{i}\cup{\underline{\mathcal{A}\mkern-2.0mu}\mkern 2.0mu^{i}})|\geq p$ . Thus, the number of KKT equations of the form (21a) is $q\geq p$ . As a result, we can write all these $q$ equations as

[TABLE]

where $\boldsymbol{\mathbf{\mathsf{W}}}_{e}\in^{p\times q}$ is a sub-matrix of $\boldsymbol{\mathbf{\mathsf{W}}}\in^{p\times m}$ . Recall that under the LICQ assumption $({\boldsymbol{\mathbf{\mathsf{\nu}}}}^{\star}\in^{p},\{\underline{{\mu}\mkern-2.0mu}\mkern 2.0mu_{l}^{i\star}\}_{l\in\underline{\mathcal{B}}^{i}},\{\bar{{\mu}}_{l}^{i\star}\}_{l\in\bar{\mathcal{B}}^{i}})$ corresponding to every $\boldsymbol{\mathbf{\mathsf{x}}}^{\star}$ is unique. Thus, $\operatorname{rank}(\boldsymbol{\mathbf{\mathsf{W}}}_{e}^{\top})=p$ and there always exist a sub-matrix $\boldsymbol{\mathbf{\mathsf{W}}}_{se}\in^{p\times p}$ of $\boldsymbol{\mathbf{\mathsf{W}}}_{e}^{\top}\in^{q\times p}$ such that

[TABLE]

where $\boldsymbol{\mathbf{J}}$ is the components of $[\{\{\nabla f^{i}_{l}(\mathsf{x}^{i\star}_{l})\}_{l=1}^{n_{i}}\}_{i=1}^{N}]$ associated with the rows of $\boldsymbol{\mathbf{\mathsf{W}}}_{se}$ . Therefore, we can write

[TABLE]

where $\omega$ is defined in the statement. Here, we used $|\nabla f_{l}^{i}(\mathsf{x}^{i\star})|\!\leq\!\max\big{\{}\!\!\underset{\boldsymbol{\mathbf{x}}^{i}\in X^{i}_{\text{ineq}}}{\max}{\|\nabla f^{i}(\boldsymbol{\mathbf{x}}^{i})\|_{\infty}}\big{\}}_{i=1}^{N}$ , $l\!\in\!\mathbb{Z}_{1}^{n^{i}}$ , $i\!\in\!\mathcal{V}$ . On the other hand, given (21b) and (21c) we can write

[TABLE]

where $\bar{\mathsf{w}}$ is defined in the statement. Therefore, given (3.1) we have the guarantees that (20) holds.

4 Numerical examples

In what follows, we demonstrate the performance of algorithm (7) via two numerical examples.

As a first demonstrative example, we consider the in-network resource allocation problem described in Fig. 1. We choose the parameters of the costs and the limits of generation of the generators randomly from the table below, which lists the parameters of the generators of the IEEE 118 bus test model [44], located at buses $(4,10,18,26,54,69)$ .

[TABLE]

Figure 2 shows the time history of $x^{i}_{l}$ ’s generated by implementing the distributed optimization algorithm (7) (using $f^{i}_{\text{p}}(\boldsymbol{\mathbf{x}}^{i})$ as defined in (12) in place of $f^{i}(\boldsymbol{\mathbf{x}}^{i})$ in (7c)) in comparison to the solution obtained using MATLAB’s constraint optimization solver ‘fmincon’. As expected the decision variable $\boldsymbol{\mathbf{x}}^{i}$ of each agent $i\in\{1,\dots,6\}$ converges closely to its corresponding minimizer, using $\epsilon=0.001$ . Figure 3 depicts the equality constraint violation time history, which as shown vanishes over the time. For this problem to generate the dual dynamics, the agents $\{1,\cdots,6\}$ , maintain and communicate variables of order $\{1,1,2,2,1,1\}$ , respectively when we implement algorithm (7). Whereas, if we implement algorithms of [18, 19, 8, 20, 21, 22, 23, 24], the corresponding variables to generate the dual dynamics is of order $\{4,2,6,6,4,2\}$ .

For second example, we consider a simple distributed self-localizing deployment problem concerned with optimal deployment of $3$ sensors labeled $\text{S}^{i}$ , $i\in\{1,3,5\}$ on a line to monitor a set of events that are horizontally located at $\boldsymbol{\mathbf{P}}\!=\![\{p_{i}\}_{i=1}^{10}]=[12,11,9,3,2,-1,-2,-8,-11,\\ -13]$ for $t\in[0,100)$ , and $\boldsymbol{\mathbf{P}}\!=\![\{p_{i}\}_{i=1}^{10}]\!=\![24,22,17,15,\\ 13,8,7,3,-2,-4]$ for $t\in[100,200)$ , see Fig. 4. Agent $1$ is monitoring $\{p_{i}\}_{i=1}^{3}$ , agent $3$ is monitoring $\{p_{i}\}_{i=4}^{7}$ , and agent $5$ is monitoring $\{p_{i}\}_{i=8}^{10}$ . Sensors should find their positions cooperatively to keep their position in the communication range of each other as well as stay close to the targets to improve the detection accuracy. Due to limited communication range, two relay nodes $\text{R}^{i}$ , $i\in\{2,4\}$ , as shown in Fig. 4 are used to guarantee the connectivity of the sensors during the operation. The problem is formulated by

[TABLE]

where $f^{i}(x^{i})=\sum_{j\in E^{i}}\|x^{i}-p_{j}\|^{2}$ for $i\in\{1,3,5\}$ with $E^{1}=\{1,\cdots,3\}$ , $E^{3}=\{4,\cdots,7\}$ and $E^{5}=\{8,\cdots,10\}$ and $f^{i}(x^{i})=0$ for $i\in\{2,4\}$ . Here, $x^{i}$ with $i\in\{1,3,5\}$ (resp. $i\in\{2,4\}$ ) is the horizontal position of sensor $\text{S}^{i}$ (resp. relay node $\text{R}^{i}$ ). To transform problem (24) to the standard form described in (1) we introduce slack variables $x^{i}_{2}\in$ with $i\in\{1,\cdots,4\}$ , to rewrite (24) as

[TABLE]

where $\boldsymbol{\mathbf{x}}^{i}\in^{2}$ for $i\in\{1,2,3,4\}$ , $\boldsymbol{\mathbf{x}}^{5}\in$ , and $f^{i}(\boldsymbol{\mathbf{x}}^{i})=f^{i}(x^{i}_{1})$ for any $i\in\{1,\cdots,5\}$ , i.e., $f^{i}(x^{i}_{2})=0$ . We can run algorithm (7) by choosing the cyber layer equivalent to the physical connected topology between all the agent, i.e., $\mathcal{G}_{k}=\mathcal{G}$ for $k\in\{1,2,3,4\}$ , where $\mathcal{G}$ is the line graph connecting all $5$ agents. However, as stated earlier this configuration leads to extra computational and communication efforts. Here, instead, we form $4$ cyber-layers $\mathcal{G}_{k}$ , $k\in\{1,2,3,4\}$ , where $\mathcal{V}_{1}=\{1,2\}$ , $\mathcal{V}_{2}=\{2,3\}$ , $\mathcal{V}_{3}=\{3,4\}$ and $\mathcal{V}_{4}=\{4,5\}$ . We note that our proposed approach to form the cyber-layers in correspondence to the equality constraints leads to an efficient communication topology here. More specifically, to generate the dual dynamics, the agents $\{1,\cdots,5\}$ , maintain and communicate variables of order $\{1,2,2,2,1\}$ , respectively. Whereas, if we implement algorithms of [19, 21], the corresponding variables to generate the dual dynamics is of order $\{8,8,8,8,4\}$ .

Figure 5 shows the trajectory of the distributed optimization algorithm (7) (using $f^{i}_{\text{p}}(\boldsymbol{\mathbf{x}}_{1}^{i})$ as defined in (12) in place of $f^{i}(\boldsymbol{\mathbf{x}}^{i})$ in (7c)) for problem (25). As shown the location of the sensors remain in their communication range and converge to optimum values during execution of the algorithm (the optimal solution is shown by the grey lines, and is obtained by MATLAB’s constraint optimization solver ‘fmincon’). Our choice of smooth penalty function (12) is obtained by $\gamma=200$ and $\epsilon=0.01$ which satisfies the condition of Proposition 3.1. What is interesting to note in Fig. 5 is how the convergence of the algorithm is slowed down when we use $\mathcal{G}_{k}=\mathcal{G}$ for $k\in\mathbb{Z}_{1}^{4}$ . This is expected, as in this case the coordination to generate the dual variables has to happen over a larger graph.

Table 2 gives the global cost value and the inequality constraint evaluation at $\boldsymbol{\mathbf{\mathsf{x}}}_{p}^{\star}$ obtained by using our distributed algorithm with $\epsilon$ -exact penalty function method for three simulation scenarios. The first and the second scenarios are respectively when we use $\epsilon=0.01$ and $\epsilon=0.001$ . As we can see when $\epsilon=0.01$ only one of the inequalities is violated slightly (by $2.1\text{e}^{-4}$ ). When a smaller $\epsilon=0.001$ is used this violation also is removed. Table 2 also shows that if we use the ’adjusted boxed inequalities’ that we introduced in Remark 3.3, the inequality constraints are all respected with only a negligible increase in the cost value.

5 Conclusions

We proposed a novel cluster-based distributed augmented Lagrangian algorithm for a class of constrained convex optimization problem. In the design of our distributed algorithm, we paid special attention to the efficient communication and computation resource management and required only the agents that are coupled through an equality constraint to form a communication topology to address that coupling in a distributed manner. We showed that if the communication topology corresponding to each equality constraint is a connected graph, the proposed algorithm converges asymptotically when the local cost functions are convex, and exponentially when the local cost functions are strongly convex and have Lipschitz gradients. We invoked the $\epsilon$ -exact penalty function method to address the inequality constraints and obtained an explicit lower bound on the penalty function weight to guarantee convergence to $\epsilon$ -neighborhood of the global minimum value of the cost. Simulations demonstrated the performance of our proposed algorithm. As future work, we will study the event-triggered communication implementation of our algorithm.

Appendix A

{pf}

[Proof of Theorem 3.1] Let $(\{\boldsymbol{\mathbf{\mathsf{x}}}^{i\star}\}_{i=1}^{N},{\boldsymbol{\mathbf{\mathsf{\nu}}}}^{\star})$ satisfy the KKT equation (5) and $\boldsymbol{\mathbf{y}}^{\star}_{k}=[\{[\boldsymbol{\mathbf{\mathsf{w}}}^{l}]_{k}\boldsymbol{\mathbf{\mathsf{x}}}^{l\star}-\bar{\mathsf{b}}^{l}_{k}\}_{l\in\mathcal{V}_{k}}]$ . For convenience in analysis, we apply the change of variables

[TABLE]

to write the algorithm (7), under the stated initialization conditions, in the equivalent form

[TABLE]

where we used $\boldsymbol{\mathbf{q}}_{k}=(\hat{q}_{k},\bar{\boldsymbol{\mathbf{q}}}_{k})$ with $\hat{q}_{k}\in\!,\,\bar{\boldsymbol{\mathbf{q}}}_{k}\in\!^{(N_{k}-1)}$ . Here, we also used $\boldsymbol{\mathbf{\mathsf{R}}}_{k}\boldsymbol{\mathbf{\mathsf{R}}}_{k}^{\top}\boldsymbol{\mathbf{\mathsf{L}}}_{k}=\boldsymbol{\mathbf{\mathsf{L}}}_{k}$ , $\boldsymbol{\mathbf{\mathsf{\psi}}}_{k}=\text{Blkdiag}(\{[\boldsymbol{\mathbf{\mathsf{w}}}^{i}]_{k}\}_{i\in\mathcal{V}_{k}})$ and $\boldsymbol{\mathbf{\chi}}_{k}=[\{\boldsymbol{\mathbf{\chi}}^{i\top}\}_{i\in\mathcal{V}_{k}}]^{\top}$ . Under the given initial condition, for any $t\in{\mathbb{R}}_{\geq 0}$ we obtain

[TABLE]

To study the stability in the other variables, we let $\hat{q}_{k}(t)=0$ in (A.27c) and (A.27d), and consider the radially unbounded candidate Lyapunov function

[TABLE]

where $\boldsymbol{\mathbf{\Gamma}}_{k}\!=\!\text{Blkdiag}(\{\rho^{i}\}_{i\in\mathcal{V}_{k}})$ . Note that $(\beta_{k}\boldsymbol{\mathbf{\mathsf{R}}}_{k}^{\top}\boldsymbol{\mathbf{\mathsf{L}}}_{k}\boldsymbol{\mathbf{\mathsf{R}}}_{k})^{-1}$ and $\boldsymbol{\mathbf{\Gamma}}_{k}+\boldsymbol{\mathbf{\mathsf{I}}}$ are positive definite diagonal matrices, thus $\bar{\boldsymbol{\mathbf{q}}}_{k}^{\top}(\boldsymbol{\mathbf{\Gamma}}_{k}+\boldsymbol{\mathbf{\mathsf{I}}})(\beta_{k}\boldsymbol{\mathbf{\mathsf{R}}}_{k}^{\top}\boldsymbol{\mathbf{\mathsf{L}}}_{k}\boldsymbol{\mathbf{\mathsf{R}}}_{k})^{-1}\bar{\boldsymbol{\mathbf{q}}}_{k}\!>\!\boldsymbol{\mathbf{0}}$ . Taking the derivative of $V$ along the trajectories of (A.27b)-(A.27d) gives

[TABLE]

Convexity of the local cost functions ensures $\boldsymbol{\mathbf{\chi}}^{i}(\boldsymbol{\mathbf{\nabla}}f^{i}(\boldsymbol{\mathbf{\chi}}^{i}+{\boldsymbol{\mathbf{\mathsf{x}}}}^{i\star})-\boldsymbol{\mathbf{\nabla}}f^{i}({\boldsymbol{\mathbf{\mathsf{x}}}}^{i\star}))=((\boldsymbol{\mathbf{\chi}}^{i}+{\boldsymbol{\mathbf{\mathsf{x}}}}^{i\star})-{\boldsymbol{\mathbf{\mathsf{x}}}}^{i\star})(\boldsymbol{\mathbf{\nabla}}f^{i}(\boldsymbol{\mathbf{\chi}}^{i}+{\boldsymbol{\mathbf{\mathsf{x}}}}^{i\star})-\boldsymbol{\mathbf{\nabla}}f^{i}({\boldsymbol{\mathbf{\mathsf{x}}}}^{i\star}))\geq 0$ , $i\in\mathcal{V}$ . The connectivity of the sub-graph $\mathcal{G}_{k}$ , $k\in\mathbb{Z}_{1}^{p}$ also ensures $-\boldsymbol{\mathbf{p}}_{k}^{\top}\boldsymbol{\mathbf{\mathsf{L}}}_{k}\boldsymbol{\mathbf{p}}_{k}\leq 0$ . Thus, $\dot{V}\leq 0$ , and consequently the trajectories of (A.27b)-(A.27d) starting from any initial condition are bounded.

Next, we invoke the invariant set stability results to prove that the trajectories of (A.27b)-(A.27d) converge to a point in its set of equilibrium points. Let $\mathcal{S}=\{(\{\bar{\boldsymbol{\mathbf{q}}}_{k}\}_{k=1}^{p},\{\boldsymbol{\mathbf{p}}_{k}\}_{k=1}^{p},\{\boldsymbol{\mathbf{\chi}}^{i}\}_{i=1}^{N})\in\prod_{k=1}^{p}\!{}^{N_{k}-1}\times\prod_{k=1}^{p}\!{}^{N_{k}}\times\prod_{i=1}^{N}{\mathbb{R}}^{n^{i}}|\,\dot{V}\equiv 0\}$ . Given (A.30), we have $\mathcal{S}=\Big{\{}\!\{\bar{\boldsymbol{\mathbf{q}}}_{k}\}_{k=1}^{p},\!\{\boldsymbol{\mathbf{p}}_{k}\}_{k=1}^{p},\!\{\boldsymbol{\mathbf{\chi}}^{i}\}_{i=1}^{N}\!\in\!\prod_{k=1}^{p}{}^{N_{k}-1}\!\times\!\prod_{k=1}^{p}{}^{N_{k}}\times\prod_{i=1}^{N}{\mathbb{R}}^{n^{i}}\Big{|}~{}\boldsymbol{\mathbf{p}}_{k}=\boldsymbol{\mathbf{0}},~{}\boldsymbol{\mathbf{\mathsf{\psi}}}_{k}\,\boldsymbol{\mathbf{\chi}}_{k}=\boldsymbol{\mathbf{\mathsf{R}}}_{k}\bar{\boldsymbol{\mathbf{q}}}_{k},\\ \boldsymbol{\mathbf{\chi}}^{i\top}(\nabla f^{i}(\boldsymbol{\mathbf{\chi}}^{i}+{\boldsymbol{\mathbf{\mathsf{x}}}}^{i\star})-\nabla f^{i}({\boldsymbol{\mathbf{\mathsf{x}}}}^{i\star}))=0,~{}i\in\mathcal{V},~{}k\in\mathbb{Z}_{1}^{p}\big{\}}$ . Since $\boldsymbol{\mathbf{\chi}}^{i\top}(\nabla f^{i}(\boldsymbol{\mathbf{\chi}}^{i}+{\boldsymbol{\mathbf{\mathsf{x}}}}^{i\star})-\nabla f^{i}({\boldsymbol{\mathbf{\mathsf{x}}}}^{i\star}))=\sum_{j=1}^{n^{i}}\chi^{i}_{j}(\nabla f^{i}_{j}({\chi^{i}_{j}}+\mathsf{x}^{i\star}_{j})-\nabla f^{i}_{j}(\mathsf{x}^{i\star}_{j}))$ , due to convexity of the cost functions $f^{i}_{j}$ , $j\in\mathbb{Z}_{1}^{n_{i}}$ , $i\in\mathcal{V}$ , from $\boldsymbol{\mathbf{\chi}}^{i\top}(\nabla f^{i}(\boldsymbol{\mathbf{\chi}}^{i}+{\boldsymbol{\mathbf{\mathsf{x}}}}^{i\star})-\nabla f^{i}({\boldsymbol{\mathbf{\mathsf{x}}}}^{i\star}))=0$ we conclude that either ${\chi}_{j}^{i}=0$ or $\nabla f_{j}^{i}({\chi^{i}_{j}}+\mathsf{x}^{i\star}_{j}))-\nabla f_{j}^{i}(\mathsf{x}^{i\star}_{j}))=0$ . Consequently, the points in $\mathcal{S}$ satisfy $\nabla f^{i}(\boldsymbol{\mathbf{\chi}}^{i}+{\boldsymbol{\mathbf{\mathsf{x}}}}^{i\star})-\nabla f^{i}({\boldsymbol{\mathbf{\mathsf{x}}}}^{i\star})=0$ . As a result, given (A.28), a trajectory $t\mapsto(\{\bar{\boldsymbol{\mathbf{q}}}_{k}(t)\}_{k=1}^{p},\{\boldsymbol{\mathbf{p}}_{k}(t)\}_{k=1}^{p},\{\boldsymbol{\mathbf{\chi}}^{i}(t)\}_{i=1}^{N})$ of (A.27b)-(A.27d) belonging to $\mathcal{S}$ for all $t\geq 0$ , must satisfy $(\overset{.}{\bar{\boldsymbol{\mathbf{q}}}}_{k}\equiv\boldsymbol{\mathbf{0}},\dot{\boldsymbol{\mathbf{p}}}_{k}\equiv\boldsymbol{\mathbf{0}},\dot{\boldsymbol{\mathbf{\chi}}}^{i}\equiv\boldsymbol{\mathbf{0}})$ . Therefore, the largest invariant set in $\mathcal{S}$ is the set of equilibrium points of (A.27b)-(A.27d). Then, invoking the La Salle invariant theorem [31, Theorem 3.4], we conclude that the trajectories of (A.27b)-(A.27d) converge asymptotically to the set of its equilibrium points.

Next, we show that the convergence is indeed to a point in the equlibia set. For that, by virtue of semi-stability theorem [31, Theorem 4.20], we show that every equilibrium point of (A.27b)-(A.27d) is Lyapunov stable. Let $(\{\underline{\boldsymbol{\mathbf{\bar{q}}}\mkern-2.0mu}\mkern 2.0mu_{k}\}_{k=1}^{p},\{\underline{\boldsymbol{\mathbf{p}}\mkern-2.0mu}\mkern 2.0mu_{k}\},\{\underline{\boldsymbol{\mathbf{\chi}}\mkern-2.0mu}\mkern 2.0mu^{i})\}_{i=1}^{N}$ be an equilibrium point of (A.27b)-(A.27d) (recall that $\hat{q}_{k}(t)=0$ due to (A.28)). Now, consider the change of variables $\mathbf{\mathfrak{q}}_{k}=\bar{\boldsymbol{\mathbf{q}}}_{k}-\underline{\boldsymbol{\mathbf{\bar{q}}}\mkern-2.0mu}\mkern 2.0mu_{k}$ and $\mathbf{\mathfrak{p}}_{k}=\boldsymbol{\mathbf{p}}_{k}-\underline{\boldsymbol{\mathbf{p}}\mkern-2.0mu}\mkern 2.0mu_{k}$ for $k\in\mathbb{Z}_{1}^{p}$ , and $\mathbf{\mathfrak{r}}^{i}=\boldsymbol{\mathbf{\chi}}^{i}-\underline{\boldsymbol{\mathbf{\chi}}\mkern-2.0mu}\mkern 2.0mu^{i}$ for $i\in\mathcal{V}$ , to write (A.27b)-(A.27d) as

[TABLE]

Next, consider the Lyapunov function (Appendix A) where $(\{\bar{\boldsymbol{\mathbf{q}}}_{k}\}_{k=1}^{p},\{\boldsymbol{\mathbf{p}}_{k}\}_{k=1}^{p},\{\boldsymbol{\mathbf{\chi}}^{i}\}_{i=1}^{N})$ is substituted by $(\{\mathbf{\mathfrak{q}}_{k}\}_{k=1}^{p},\\ \{\mathbf{\mathfrak{p}}_{k}\}_{k=1}^{p},\{\mathbf{\mathfrak{r}}^{i}\}_{i=1}^{N})$ . Following the same argument used to show $\dot{V}\leq 0$ in (A.30), we can show that the derivative of $V(\{\mathbf{\mathfrak{q}}_{k}\}_{k=1}^{p},\{\mathbf{\mathfrak{p}}_{k}\}_{k=1}^{p},\{\mathbf{\mathfrak{r}}^{i}\}_{i=1}^{N})$ along the trajectories of (A.27b)-(A.27d), when (A.28) holds, is also negative semi-definite. Thus, any equilibrium point $(\{\underline{\bar{\boldsymbol{\mathbf{q}}}\mkern-2.0mu}\mkern 2.0mu_{k}\}_{k=1}^{p},\{\underline{\boldsymbol{\mathbf{p}}\mkern-2.0mu}\mkern 2.0mu_{k}\},\{\underline{\boldsymbol{\mathbf{\chi}}\mkern-2.0mu}\mkern 2.0mu^{i}\}_{i=1}^{N})$ of (A.27b)-(A.27d) is Lyapunov stable (recall (A.28)). Therefore, since the trajectories of (A.27b)-(A.27d) are approaching to the set of stable equilibrium points, starting from any initial condition, the trajectories of (A.27b)-(A.27d) converge to a point in its equilibrium set. Consequently, given the change of variables (A.26), we conclude that starting from stated initial conditions in the statement, the trajectories of (7) converge, as $t\to\infty$ , to a point in its set of equilibrium points (3), where $(\{\dot{v}^{l}_{k}\}_{l\in\mathcal{V}_{k}}=\boldsymbol{\mathbf{0}},\{\dot{y}^{l}_{k}\}_{l\in\mathcal{V}_{k}}=\boldsymbol{\mathbf{0}},\{\dot{\boldsymbol{\mathbf{x}}}^{i}\}_{i=1}^{N}=\boldsymbol{\mathbf{0}})$ . Therefore, under the stated initial condition, as $t\to\infty$ , the limit point $(\{{v}^{l}_{k}\}_{k=1}^{p},\{{y}^{l}_{k}\}_{k=1}^{p},\{\boldsymbol{\mathbf{x}}^{i}\}_{i=1}^{N})$ , $i\in\mathcal{V}$ , $l\in\mathcal{V}_{k}$ that satisfies $(\{\dot{v}^{l}_{k}\}_{l\in\mathcal{V}_{k}}=\boldsymbol{\mathbf{0}},\{\dot{y}^{l}_{k}\}_{l\in\mathcal{V}_{k}}=\boldsymbol{\mathbf{0}},\{\dot{\boldsymbol{\mathbf{x}}}^{i}\}_{i=1}^{N}=\boldsymbol{\mathbf{0}})$ in (7) is equal to $(\nu^{\star}_{k}\boldsymbol{\mathbf{1}}_{N_{k}},\boldsymbol{\mathbf{\mathsf{y}}}^{\star},\{\boldsymbol{\mathbf{\mathsf{x}}}^{i\star}\}_{i=1}^{N})$ , where $(\{\nu^{\star}_{k}\}_{k=1}^{p},\boldsymbol{\mathbf{\mathsf{x}}}^{i\star})$ , where $(\{{\nu}_{k}^{\star}\}_{k=1}^{p},\{{\boldsymbol{\mathbf{\mathsf{x}}}^{i\star}}\}_{i=1}^{N})$ is a point satisfying the KKT conditions (5) of problem (4) (this point is not necessarily the point used in the change of variable (A.26)).

{pf}

[Proof of Theorem 3.2] Follow the proof of Theorem 3.1 until the choice of the candidate Lyapounv function where we use the candidate function below consisted of $V$ in (Appendix A) plus an extra positive quadratic term

[TABLE]

where $\phi_{k}\in{\mathbb{R}}_{>0}$ satisfies $\phi_{k}<\min\{\frac{2(1+\underline{\rho})m}{p(M^{2}(\bar{\rho}^{2}+1)^{2}\!+1)},\\ \frac{2\beta_{k}\lambda_{2k}}{(\beta^{2}_{k}\lambda_{Nk}^{2}\bar{\rho}^{2}+\bar{\rho}+1)\|\boldsymbol{\mathbf{\psi}}_{k}\|^{2}}\},$ with $\underline{\rho}=\min\{\rho^{i}\}_{i=1}^{N}$ and $\bar{\rho}=\max\{\rho^{i}\}_{i=1}^{N}$ . Here $\boldsymbol{\mathbf{\zeta}}=[\{\bar{\boldsymbol{\mathbf{q}}}_{k}^{\top}\}_{k=1}^{p},\{\boldsymbol{\mathbf{p}}_{k}^{\top}\}_{k=1}^{p},\{\boldsymbol{\mathbf{\chi}}^{i\top}\}_{i=1}^{N}]^{\top}$ and $\boldsymbol{\mathbf{E}}>0$ is the obvious matrix describing the coefficients of the quadratic terms of $\bar{V}$ . When every $\mathcal{G}_{k}$ , $k\in\mathbb{Z}_{1}^{p}$ is a connected graph, $\bar{V}$ is a radially unbounded and positive definite function. Then,

[TABLE]

where $\boldsymbol{\mathbf{h}}(\boldsymbol{\mathbf{\chi}}_{k})=\nabla f(\boldsymbol{\mathbf{\chi}}_{k}+\boldsymbol{\mathbf{\mathsf{x}}}^{\star}_{k})-\nabla f(\boldsymbol{\mathbf{\mathsf{x}}}^{\star}_{k})$ . When $\rho^{i}\in_{>0}$ for all $i\in\mathcal{V}$ , we can write

[TABLE]

Here, we used the $M^{i}_{l}$ -Lipschitzness property of local gradients to write $\boldsymbol{\mathbf{h}}(\boldsymbol{\mathbf{\chi}}_{k})^{\top}(\boldsymbol{\mathbf{\Gamma}}_{k}+\boldsymbol{\mathbf{\mathsf{I}}})^{2}\boldsymbol{\mathbf{h}}(\boldsymbol{\mathbf{\chi}}_{k})\leq\sum\nolimits_{i=1}^{N_{k}}(\rho^{i}+1)^{2}\,M^{2}\chi^{i}\,\!{}^{2}\leq M^{2}(\bar{\rho}+1)^{2}\boldsymbol{\mathbf{\chi}}^{\top}\boldsymbol{\mathbf{\chi}}$ . We also used $-\sum_{i=1}^{N}(\rho^{i}+1)\boldsymbol{\mathbf{\chi}}_{i}^{\top}\boldsymbol{\mathbf{h}}(\boldsymbol{\mathbf{\chi}}_{i})\leq-m(\underline{\rho}+1)\boldsymbol{\mathbf{\chi}}^{\top}\boldsymbol{\mathbf{\chi}}$ due to the $m^{i}_{l}$ -strong convexity of local cost function $f^{i}_{l}$ , and $-\boldsymbol{\mathbf{p}}^{\top}_{k}\boldsymbol{\mathbf{\mathsf{L}}}_{k}\boldsymbol{\mathbf{p}}_{k}\leq\boldsymbol{\mathbf{0}}$ , which is true because every $\mathcal{G}_{k}$ , $k\in\mathbb{Z}_{1}^{p}$ is a connected graph. We also used $\|\boldsymbol{\mathbf{p}}_{k}^{\top}\boldsymbol{\mathbf{\mathsf{L}}}_{k}\boldsymbol{\mathbf{\Gamma}}_{k}\boldsymbol{\mathbf{\mathsf{\psi}}}_{k}\|^{2}\leq\lambda_{Nk}^{2}\bar{\rho}^{2}\|\boldsymbol{\mathbf{\psi}}_{k}\|^{2}\boldsymbol{\mathbf{p}}_{k}^{\top}\boldsymbol{\mathbf{p}}_{k}$ where $\lambda_{Nk}$ is the maximum eigenlavue of $\boldsymbol{\mathbf{\mathsf{L}}}_{k}$ . We note that for $0<\phi_{k}<\min\{\frac{2(1+\underline{\rho})m}{p(M^{2}(\bar{\rho}^{2}+1)^{2}+1)},\frac{2\beta_{k}\lambda_{2k}}{(\beta^{2}_{k}\lambda_{Nk}^{2}\bar{\rho}^{2}+\bar{\rho}+1)\|\boldsymbol{\mathbf{\psi}}_{k}\|^{2}}\}$ , we have $\dot{\bar{V}}<0$ . Next, note that we can bound $\dot{\bar{V}}$ by a negative definite quadratic upper bound as

[TABLE]

where $\boldsymbol{\mathbf{F}}>0$ is the obvious matrix describing the coefficients of the quadratic terms of the upper bound of $\dot{\bar{V}}$ . Because $\bar{V}$ is a quadratic positive definite function and the upper bound on $\dot{\bar{V}}$ is a quadratic negative definite quadratic function, by virtue of [45, Theorem 4.10], (A.27b)-(A.27d) is exponentially stable, and its trajectories converge to the origin with the rate no worse than $\frac{\lambda_{\min}(\boldsymbol{\mathbf{F}})}{2\lambda_{\max}(\boldsymbol{\mathbf{E}})}$ , where $\lambda_{\min}(\boldsymbol{\mathbf{F}})$ is the minimum eigenvalue of $\boldsymbol{\mathbf{F}}$ and $\lambda_{\max}(\boldsymbol{\mathbf{E}})$ is the maximum eigenvalue of $\boldsymbol{\mathbf{E}}$ . Consequently, starting from any initial condition given in the statement, the trajectories $t\mapsto(\{\boldsymbol{\mathbf{v}}_{k}(t)\}_{k=1}^{p},\{\boldsymbol{\mathbf{x}}^{i}(t)\}_{i=1}^{N})$ converge exponentially fast with the rate given above to $(\,{{\nu}}_{k}^{\star}\boldsymbol{\mathbf{1}}_{N_{k}},\{{\boldsymbol{\mathbf{\mathsf{x}}}^{i\star}}\}_{i=1}^{N})$ , as $t\to\infty$ .

If $\rho^{i}\!=\!0$ for any $i\in\mathcal{V}$ , we can only guarantee that $\dot{\bar{V}}\leq 0$ with $\mathcal{S}=\{(\{\bar{\boldsymbol{\mathbf{q}}}_{k}\}_{k=1}^{p},\{\boldsymbol{\mathbf{p}}_{k}\}_{k=1}^{p},\{\boldsymbol{\mathbf{\chi}}^{i}\}_{i=1}^{N})\in\,\prod_{k=1}^{p}{}^{N_{k}-1}\times\,\prod_{k=1}^{p}{{}^{N_{k}}~{}\times}\,\,\,\prod_{i=1}^{N}{\mathbb{R}}^{n^{i}}|\,\dot{\bar{V}}\equiv 0\}=\Big{\{}\!\{\bar{\boldsymbol{\mathbf{q}}}_{k}\}_{k=1}^{p},\!\{\boldsymbol{\mathbf{p}}_{k}\}_{k=1}^{p},\!\{\boldsymbol{\mathbf{\chi}}^{i}\}_{i=1}^{N}\!\in\!\prod_{k=1}^{p}{}^{N_{k}-1}\!\times\!\prod_{k=1}^{p}{}^{N_{k}}\times\prod_{i=1}^{N}{\mathbb{R}}^{n^{i}}\Big{|}~{}\boldsymbol{\mathbf{p}}_{k}=\boldsymbol{\mathbf{0}},~{}\boldsymbol{\mathbf{\chi}}^{i}=\boldsymbol{\mathbf{0}},\boldsymbol{\mathbf{\Gamma}}_{k}\boldsymbol{\mathbf{\mathsf{R}}}_{k}\bar{\boldsymbol{\mathbf{q}}}_{k}=\boldsymbol{\mathbf{0}},~{}i\in\mathcal{V},k\in\mathbb{Z}_{1}^{p}\}$ . Next, we note that since $\boldsymbol{\mathbf{\mathsf{R}}}_{k}$ is a full column rank matrix, given (A.28), the only trajectory $t\mapsto(\{\bar{\boldsymbol{\mathbf{q}}}_{k}(t)\}_{k=1}^{p},\{\boldsymbol{\mathbf{p}}_{k}(t)\}_{k=1}^{p},\{\boldsymbol{\mathbf{\chi}}^{i}(t)\}_{i=1}^{N})$ of (A.27b)-(A.27d) that belongs to $\mathcal{S}$ for all $t\in_{\geq 0}$ is $(\{\bar{\boldsymbol{\mathbf{q}}}_{k}(t)\equiv\boldsymbol{\mathbf{0}}\}_{k=1}^{p},\{\boldsymbol{\mathbf{p}}_{k}(t)\equiv\boldsymbol{\mathbf{0}}\}_{k=1}^{p},\{\boldsymbol{\mathbf{\chi}}^{i}(t)\equiv\boldsymbol{\mathbf{0}}\}_{i=1}^{N})$ . Therefore, using a LaSalle invariant set analysis of [45, Corollary 4.1], and recalling the change of variable (A.26) and also (A.28), we can conclude that $t\mapsto(\{\boldsymbol{\mathbf{v}}_{k}(t)\}_{k=1}^{p},\{\boldsymbol{\mathbf{x}}^{i}(t)\}_{i=1}^{N})$ of (7) converges exponentially fast to $(\,{{\nu}}_{k}^{\star}\boldsymbol{\mathbf{1}}_{N_{k}},\{{\boldsymbol{\mathbf{\mathsf{x}}}^{i\star}}\}_{i=1}^{N})$ .

Appendix B

Consider the optimization problem

[TABLE]

where $f^{i}(x^{i})\!=\!\begin{cases}0,&|x^{i}|\leq 2,\\ \,\frac{1}{2\alpha}(|x^{i}|-2)^{2},&2<|x^{i}|\leq 2+\alpha,\\ (|x^{i}|-2-\frac{1}{2}\alpha),&|x^{i}|>2+\alpha,\end{cases}$

with $\alpha\!=\!0.01$ . Here, the cost function is convex.

Note that the optimization problem (B.1) has infinite number of minimizers that correspond to the minimum cost of $f^{\star}=0$ . One of these minimizers is $(x^{1\star},x^{2\star})=(0,2)$ . Figure 6 shows the $x^{i}$ trajectories of central solver (6) over time. As shown, the algorithm does not converge when $\rho=0$ , while the convergence is achieved when we use the augmented Lagrangian with $\rho=1$ .

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. S. Kia, “An augmented lagrangian distributed algorithm for an in-network optimal resource allocation problem,” in American Control Conference , (WA, USA), 2017.
2[2] A. J. Wood, F. Wollenberg, and G. B. Sheble, Power Generation, Operation and Control . New York: John Wiley, 3rd ed., 2013.
3[3] A. Cherukuri and J. Cortés, “Initialization-free distributed coordination for economic dispatch under varying loads and generator commitment,” Automatica , vol. 74, pp. 183–193, 2016.
4[4] L. Xiao, M. Johansson, and S. P. Boyd, “Simultaneous routing and resource allocation via dual decomposition,” IEEE Transactions on Communications , vol. 52, no. 7, pp. 1136–1144, 2004.
5[5] R. Madan and S. Lall, “Distributed algorithms for maximum lifetime routing in wireless sensor networks,” IEEE Transactions on Wireless Communications , vol. 5, no. 8, pp. 2185–2193, 2006.
6[6] J. Chen and V. K. N. Lau, “Convergence analysis of saddle point problems in time varying wireless systems – control theoretical approach,” IEEE Transactions on Signal Processing , vol. 60, no. 1, pp. 443–452, 2012.
7[7] A. Ferragut and F. Paganini, “Network resource allocation for users with multiple connections: fairness and stability,” IEEE/ACM Transactions on Networking , vol. 22, no. 2, pp. 349–362, 2014.
8[8] S. A. Alghunaim, K. Yuan, and A. H. Sayed, “Dual coupled diffusion for distributed optimization with affine constraints,” in IEEE Conf. on Decision and Control , (FL, USA), 2018.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Cluster-based Distributed Augmented Lagrangian Algorithm for a Class of Constrained Convex Optimization Problems

Abstract

keywords:

1 Introduction

2 Preliminaries

3 Distributed Continuous-Time

Assumption 3.1**.**

Lemma 3.1**.**

Remark 3.1**.**

Theorem 3.1**.**

Theorem 3.2**.**

Remark 3.2**.**

3.1 Problem subject to both equality and inequality constraints

Lemma 3.2**.**

Proposition 3.1**.**

Remark 3.3**.**

Proposition 3.2**.**

Theorem 3.3**.**

4 Numerical examples

5 Conclusions

Appendix A

Appendix B

Assumption 3.1.

Lemma 3.1.

Remark 3.1.

Theorem 3.1.

Theorem 3.2.

Remark 3.2.

Lemma 3.2.

Proposition 3.1.

Remark 3.3.

Proposition 3.2.

Theorem 3.3.