Decomposition of non-convex optimization via bi-level distributed ALADIN

Alexander Engelmann; Yuning Jiang; Boris Houska; Timm Faulwasser

arXiv:1903.11280·math.OC·March 28, 2019

Decomposition of non-convex optimization via bi-level distributed ALADIN

Alexander Engelmann, Yuning Jiang, Boris Houska, Timm Faulwasser

PDF

TL;DR

This paper introduces a bi-level distributed framework for non-convex optimization that combines ALADIN with decentralized algorithms, providing convergence guarantees and practical case studies in power systems and robotics.

Contribution

It presents a novel bi-level distribution approach for decentralized non-convex optimization using ALADIN, with convergence analysis and implementation via decentralized algorithms.

Findings

01

Proves local convergence under certain conditions.

02

Demonstrates effectiveness in power systems and robotics case studies.

03

Shows how decentralized algorithms can solve the inner coordination problem.

Abstract

Decentralized optimization algorithms are important in different contexts, such as distributed optimal power flow or distributed model predictive control, as they avoid central coordination and enable decomposition of large-scale problems. In case of constrained non-convex optimization only a few algorithms are currently are available; often their performance is limited, or they lack convergence guarantees. This paper proposes a framework for decentralized non-convex optimization via bi-level distribution of the Augmented Lagrangian Alternating Direction Inexact Newton (ALADIN) algorithm. Bi-level distribution means that the outer ALADIN structure is combined with an inner distribution/decentralization level solving a condensed variant of ALADIN's convex coordination QP by decentralized algorithms. We prove sufficient conditions ensuring local convergence while allowing for inexact…

Tables4

Table 1. TABLE I : Convergence properties of decentralized CG and decentralized ADMM for ( 11 ).

conv. rate	CG	ADMM
theoretical	$n_{c}$ -step	sublinear⁸
practical	linear/superlinear⁹⁹9Analyzing the convergence rate of conjugate gradient methods seems quite complex as there are different phases with different convergence rates during the iteration cf. [30, 27]. However, the practically observed convergence rate often is superlinear [30].	sublinear
tuning	no	yes

Table 2. TABLE II : Per-step forward communication (number of floats) for 2-assigned problems and different ALADIN variants.

variant	standard	cond.	ADMM	CG
local prep.	-	-	-	$2 n_{c}^{2}$
local iter.	-	-	$2 n_{c} n^{AD}$	$2 n_{c} n^{CG}$
global	$> \sum_{i = 1}^{N} \frac{{(n_{x_{i}} + n_{g_{i}})}^{2}}{2}$	$n_{c}^{2} + n_{c}$	-	$2 N n^{CG}$

Table 3. TABLE III : Total iterations versus inner iterations for OPF.

$n^{i n n e r}$	$ϵ$	80	100	200	400	1000
CG	$10^{- 4}$	800	800	800	800	800
ADMM	$10^{- 2}$	-	7 000	7 000	7 600	11 000
	$10^{- 3}$	-	-	10 800	10 800	13 000
	$10^{- 4}$	-	-	-	14 800	16 000

Table 4. TABLE IV : Forward comm. to ϵ = 10 − 4 italic-ϵ superscript 10 4 \epsilon=10^{-4} with n AD = 400 superscript 𝑛 AD 400 n^{\text{AD}}=400 , n CG = 80 superscript 𝑛 CG 80 n^{\text{CG}}=80 for OPF and n AD = 2 400 superscript 𝑛 AD 2400 n^{\text{AD}}=2\,400 , n CG = 30 superscript 𝑛 CG 30 n^{\text{CG}}=30 for robot control.

	variant	standard	condensed	ADMM	CG
OPF	local prep.	-	-	-	$2 048$
	local iter.	-	-	$25 600$	$5 120$
	global	$> 9 858$	$1 056$	-	$960$
	local tot.	-	-	$691 200$	$53 248$
	global tot.	$> 98 580$	$10 560$	-	$9 600$
robots	local prep.	-	-	-	$80 000$
	local iter.	-	-	$960 000$	$12 000$
	global	$> 824 506$	$40 200$	-	$120$
	local tot.	-	-	$9 600$ k	$200$ k
	global tot.	$> 20 613$ k	$1 005$ k	-	$1$ k

Equations95

x \in R^{n_{x}} min

x \in R^{n_{x}} min

subject to h_{i} (x_{i})

i \in R \sum A_{i} x_{i}

x_{i}^{k} = x_{i} arg min

x_{i}^{k} = x_{i} arg min

s.t. h_{i} (x_{i}) \leq 0 ∣ κ_{i}^{k},

Δ x, s min i \in R \sum \frac{1}{2} Δ x_{i}^{⊤} H_{i}^{k} Δ x_{i} + g_{i}^{k}^{⊤} Δ x_{i} + λ^{k}^{⊤} s + \frac{μ}{2} ∥ s ∥_{2}^{2}

Δ x, s min i \in R \sum \frac{1}{2} Δ x_{i}^{⊤} H_{i}^{k} Δ x_{i} + g_{i}^{k}^{⊤} Δ x_{i} + λ^{k}^{⊤} s + \frac{μ}{2} ∥ s ∥_{2}^{2}

s.t. i \in R \sum A_{i} (x_{i}^{k} + Δ x_{i}) = s C_{i}^{act k} Δ x_{i} = 0 ∣ λ^{QP}, \forall i \in R .

Δ x, s min

Δ x, s min

C_{i}^{act} Δ x_{i}

i \in R \sum A_{i} (x_{i} + Δ x_{i})

null (C^{act}) = {x \in R^{n_{x}} ∣ x = Z v, v \in R^{n_{x} - n_{h}^{k}}}

null (C^{act}) = {x \in R^{n_{x}} ∣ x = Z v, v \in R^{n_{x} - n_{h}^{k}}}

Δ v, s min

Δ v, s min

i \in R \sum \overset{ˉ}{A}_{i} (v_{i} + Δ v_{i}) = s ∣ λ^{QP} .

(\overset{ˉ}{H} \overset{ˉ}{A} \overset{ˉ}{A}^{⊤} - \frac{1}{μ} I) (Δ v λ^{QP}) = (- \overset{g}{ˉ} - \overset{ˉ}{A} v - \frac{1}{μ} λ),

(\overset{ˉ}{H} \overset{ˉ}{A} \overset{ˉ}{A}^{⊤} - \frac{1}{μ} I) (Δ v λ^{QP}) = (- \overset{g}{ˉ} - \overset{ˉ}{A} v - \frac{1}{μ} λ),

(μ^{- 1} I + \overset{ˉ}{A} \overset{ˉ}{H}^{- 1} \overset{ˉ}{A}^{⊤}) λ^{QP} = \overset{ˉ}{A} (v - \overset{ˉ}{H}^{- 1} \overset{g}{ˉ}) + μ^{- 1} λ

(μ^{- 1} I + \overset{ˉ}{A} \overset{ˉ}{H}^{- 1} \overset{ˉ}{A}^{⊤}) λ^{QP} = \overset{ˉ}{A} (v - \overset{ˉ}{H}^{- 1} \overset{g}{ˉ}) + μ^{- 1} λ

(μ^{- 1} I + i \in R \sum S_{i}) λ^{QP} = μ^{- 1} λ + i \in R \sum s_{i}

(μ^{- 1} I + i \in R \sum S_{i}) λ^{QP} = μ^{- 1} λ + i \in R \sum s_{i}

Δ x_{i} = Z_{i} \overset{ˉ}{H}_{i}^{- 1} (- \overset{ˉ}{A}_{i}^{⊤} λ^{QP} - \overset{g}{ˉ}_{i}) .

Δ x_{i} = Z_{i} \overset{ˉ}{H}_{i}^{- 1} (- \overset{ˉ}{A}_{i}^{⊤} λ^{QP} - \overset{g}{ˉ}_{i}) .

x_{i}^{k} = x_{i} arg min

x_{i}^{k} = x_{i} arg min

s.t. h_{i} (x_{i}) \leq 0 ∣ κ_{i}^{k},

(μ^{- 1} I + i \in R \sum S_{i}) λ^{QP} = μ^{- 1} λ + i \in R \sum s_{i}

(μ^{- 1} I + i \in R \sum S_{i}) λ^{QP} = μ^{- 1} λ + i \in R \sum s_{i}

∥ p^{k} - p^{⋆} ∥ \leq χ ∥ q^{k} - p^{⋆} ∥

∥ p^{k} - p^{⋆} ∥ \leq χ ∥ q^{k} - p^{⋆} ∥

=: M (p^{k}) H A C^{a} A^{⊤} - \frac{1}{μ} I 0 C^{a^{⊤}} 00 Δ q^{k} = =: m (p^{k}) - g - C^{a ⊤} κ^{k} - A^{⊤} λ^{k} - A x^{k} + b 0,

=: M (p^{k}) H A C^{a} A^{⊤} - \frac{1}{μ} I 0 C^{a^{⊤}} 00 Δ q^{k} = =: m (p^{k}) - g - C^{a ⊤} κ^{k} - A^{⊤} λ^{k} - A x^{k} + b 0,

∥ q^{k + 1} - p^{⋆} ∥ \leq γ ∥ p^{k} - p^{⋆} ∥ + \frac{ω}{2} ∥ p^{k} - p^{⋆} ∥_{2}^{2}

∥ q^{k + 1} - p^{⋆} ∥ \leq γ ∥ p^{k} - p^{⋆} ∥ + \frac{ω}{2} ∥ p^{k} - p^{⋆} ∥_{2}^{2}

r_{p}^{k} := M (p^{k}) Δ \overset{q}{ˉ}^{k} - m (p^{k}) .

r_{p}^{k} := M (p^{k}) Δ \overset{q}{ˉ}^{k} - m (p^{k}) .

∥ r_{p}^{k} ∥ \leq η^{k} ∥ m (p^{k}) ∥,

∥ r_{p}^{k} ∥ \leq η^{k} ∥ m (p^{k}) ∥,

∥ \overset{q}{ˉ}^{k + 1} - p^{⋆} ∥

∥ \overset{q}{ˉ}^{k + 1} - p^{⋆} ∥

\leq \frac{ω}{2} ∥ q^{k} - p^{⋆} ∥_{2}^{2} + α \cdot η^{k} ∥ m (p^{k}) - m (p^{⋆}) ∥

\leq \frac{ω}{2} ∥ q^{k} - p^{⋆} ∥_{2}^{2} + α \cdot β \cdot η^{k} ∥ p^{k} - p^{⋆} ∥

\leq \frac{ω}{2} ∥ q^{k} - p^{⋆} ∥_{2}^{2} + α \cdot β \cdot χ \cdot η^{k} ∥ q^{k} - p^{⋆} ∥,

r_{λ}^{k} := (μ^{- 1} I + i \in R \sum S_{i}) λ^{k} - μ^{- 1} λ - i \in R \sum s_{i} .

r_{λ}^{k} := (μ^{- 1} I + i \in R \sum S_{i}) λ^{k} - μ^{- 1} λ - i \in R \sum s_{i} .

(A_{1} A_{2}) \tilde{x}_{1} + A_{3} x_{3} = 0, (0 I) \tilde{x}_{1} - I x_{2} = 0.

(A_{1} A_{2}) \tilde{x}_{1} + A_{3} x_{3} = 0, (0 I) \tilde{x}_{1} - I x_{2} = 0.

(i \in R \sum \tilde{S}_{i}) λ^{QP} = i \in R \sum \tilde{s}_{i}

(i \in R \sum \tilde{S}_{i}) λ^{QP} = i \in R \sum \tilde{s}_{i}

\tilde{S}_{i} := S_{i} + j = 1 \sum n_{c} \frac{δ _{ij}}{∣ R ( j ) ∣ μ} I_{j}, \tilde{s}_{i} := s_{i} + j = 1 \sum n_{c} \frac{δ _{ij}}{∣ R ( j ) ∣ μ} λ I_{j},

\tilde{S}_{i} := S_{i} + j = 1 \sum n_{c} \frac{δ _{ij}}{∣ R ( j ) ∣ μ} I_{j}, \tilde{s}_{i} := s_{i} + j = 1 \sum n_{c} \frac{δ _{ij}}{∣ R ( j ) ∣ μ} λ I_{j},

λ min \frac{1}{2} λ^{⊤} i = 1 \sum N \tilde{S}_{i} λ - i = 1 \sum N \tilde{s}_{i}^{⊤} λ .

λ min \frac{1}{2} λ^{⊤} i = 1 \sum N \tilde{S}_{i} λ - i = 1 \sum N \tilde{s}_{i}^{⊤} λ .

r_{j}^{S k} = j \in C \sum r^{k ⊤} \tilde{S} e_{j} r_{j}^{k} and r_{j}^{2 k} = (r_{j}^{k})^{2},

r_{j}^{S k} = j \in C \sum r^{k ⊤} \tilde{S} e_{j} r_{j}^{k} and r_{j}^{2 k} = (r_{j}^{k})^{2},

λ_{j}^{k + 1}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Decomposition of non-convex optimization via bi-level distributed ALADIN

Alexander Engelmann, Yuning Jiang, Boris Houska, and Timm Faulwasser This work received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 730936. TF acknowledges further support from the Baden-Württemberg Stiftung under the Elite Programme for Postdocs. YJ and BH are supported by ShanghaiTech University, Grant-Nr. F-0203-14-012.AE and TF are with the Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany [email protected] [email protected] and BH are with the School of Information Science and Technology, ShanghaiTech University, Shanghai, China $\{$ jiangyn, borish $\}$ @shanghaitech.edu.cn

Abstract

Decentralized optimization algorithms are important in different contexts, such as distributed optimal power flow or distributed model predictive control, as they avoid central coordination and enable decomposition of large-scale problems. In case of constrained non-convex optimization only a few algorithms are currently are available; often their performance is limited, or they lack convergence guarantees. This paper proposes a framework for decentralized non-convex optimization via bi-level distribution of the Augmented Lagrangian Alternating Direction Inexact Newton (ALADIN) algorithm. Bi-level distribution means that the outer ALADIN structure is combined with an inner distribution/decentralization level solving a condensed variant of ALADIN’s convex coordination QP by decentralized algorithms. We prove sufficient conditions ensuring local convergence while allowing for inexact decentralized/distributed solutions of the coordination QP. Moreover, we show how a decentralized variant of conjugate gradient or decentralized ADMM schemes can be employed at the inner level. We draw upon case studies from power systems and robotics to illustrate the performance of the proposed framework.

Index Terms:

Decentralized optimization, decomposition, ALADIN, ADMM, conjugate gradient, distributed optimal power flow, distributed nonlinear model predictive control.

I Introduction

Distributed optimization algorithms are of interest in many engineering applications due to their ability to solve large-scale problems efficiently and enable solution to optimization problems with limited information exchange [1].111Note that there is no unified notion of distributed optimization—while the classical optimization literature allows for (preferably small) central coordination [2], in the power systems community optimization with any kind of centralized coordination is called hierarchical or hierarchically distributed [3]. These algorithms often employ a (usually simple) global coordination step while computationally expensive operations are executed in parallel or decentralized by local agents. Some algorithms avoid any kind of central coordination and communicate on a neighborhood basis only; they are commonly denoted as decentralized [4]. Decentralized algorithms are of significant application interest; yet they are difficult to design and to analyze.

The majority of results on distributed optimization investigates convex problems [1, 2, 5]. Many practically relevant problems, however, are inherently non-convex; examples range from non-linear model predictive control [6, 7] to power systems [8, 9, 10] and wireless sensor networks [11].

An approach to unconstrained non-convex problems via a push-sum algorithm can be found in [12]; [13] employs an alternating trust-region method with convergence guarantees for general non-convex problems. Algorithms based on distributing steps of centralized algorithms like Sequential Quadratic Programming (SQP) can be found in [14, 15]. A decomposition method of the linear algebra subproblems of an interior point method using the Schur complement is presented in [16]. Moreover, for special classes of non-convex problems, the Augmented Direction of Multipliers Method (ADMM) has convergence guarantees [17, 18]. Note, however, that only a few algorithms for decentralized non-convex optimization exist; to the best of our knowledge the only currently available algorithms are decentralized variants of the before mentioned algorithms [17] and [12].

The present paper proposes a design framework for general purpose decentralized algorithms applicable to constrained non-convex optimization defined over networks with generic topology. To this end, we build upon the Augmented Lagrangian Alternating Direction Inexact Newton (ALADIN) method [19] which solves general non-convex problems to local optimality with guarantees. ALADIN exhibits advantageous local quadratic convergence under mild technical assumptions; however it requires solving a centralized Quadratic Program (QP) as coordination step.

Specifically, we propose to decentralize ALADIN by solving the coordination QP—which is the only centralized step—in a decentralized fashion.222Note that the globalization routine in ALADIN also requires central coordination. However, as our goal for the present paper is developing a local algorithm, the globalization routine is not considered. Hence decentralizing the solution of the coordination QP provides an avenue towards a fully decentralized algorithm. To this end, we apply condensing techniques similar to [20, 21] to reduce the dimension of the coordination QP. Moreover, we prove that this coordination QP inherits the sparsity pattern from the original problem. We use this insight in the key step of our developments: the introduction of a second (inner) level of problem distribution to ALADIN. In other words, we show how the coordination QP can be solved efficiently in a decentralized fashion. To the latter end, we propose a decentralized variant of the Conjugate Gradient (CG) method. We also investigate the application of decentralized ADMM. The proposed framework is based on two consecutive layers of problem distribution: the general outer ALADIN structure is combined with a second inner layer. Hence we refer to it as bi-level distribution. As iterative methods (such as CG and ADMM) typically return inexact solutions, the original local convergence analysis for ALADIN [19] is not directly applicable. Accounting for this fact, we show that local convergence properties of ALADIN are preserved by enforcing bounds on the accuracy of the inner decentralized methods. These bounds are derived using arguments from inexact Newton methods [22]. This way we obtain—to the best of our knowledge—one of the first decentralized algorithms with local convergence guarantees for constrained non-convex problems.

The remainder is structured as follows: Section II recalls ALADIN and condensing techniques for the coordination QP. Section III shows how the local convergence properties of ALADIN while solving inexactly the coordination QP. Section IV provides details on how to solve the reduced system in a decentralized fashion using decentralized ADMM [5, 1] and decentralized conjugate gradient. Finally, examples from power systems and from robotics are presented in Section V.

Notation. If not explicitly stated differently, we use superscripts $(\cdot)^{k}$ for inner iterations and omit outer iteration indexes for simplicity. In optimization problems the Lagrange multiplier $\kappa$ associated to constraint $h$ is denoted as $h(x)\leq 0\;|\;\kappa$ . Given a matrix $S\in\mathbb{R}^{n\times m}$ , $S_{ij}$ denotes its $ij$ th entry.

II Preliminaries & Problem Statement

II-A Recalling ALADIN

Distributed optimization aims at solving problems of the form333Ovserve that (1) can be interpreted as a generalization of a consensus problem [1] in the sense that any consensus problem can be expressed in form of (1) by appropriately choosing $A_{i}$ .,444

For the sake of simplified notation we consider only inequality constraints $h_{i}$ here. Including equality constraints $g_{i}:\mathbb{R}^{n_{x_{i}}}\mapsto\mathbb{R}^{n_{g_{i}}}$ does not pose any difficulty as $g_{i}$ can be reformulated as $0\leq g_{i}(x_{i})\leq 0$ .

[TABLE]

with objective functions $f_{i}:\mathbb{R}^{n_{x_{i}}}\rightarrow\mathbb{R}$ and constraints $h_{i}:\mathbb{R}^{n_{x_{i}}}\rightarrow\mathbb{R}^{n_{h_{i}}}$ . In all subproblems $i\in\mathcal{R}=\{1,\dots,N\}$ the functions $f_{i}$ and $h_{i}$ are assumed to be twice continuously differentiable and possibly non-convex. The overall decision vector is $x:={(x_{1}^{\top},\dots,x^{\top}_{N})}^{\top}\in\mathbb{R}^{n_{x}}$ and the matrices $A_{i}\in\mathbb{R}^{n_{c}\times n_{x_{i}}}$ describe couplings between subproblems. Standard ALADIN is summarized in Algorithm 1; we refer to [19, 23, 24] for details including convergence proofs and application examples.

Two main steps in ALADIN require central coordination and thus render ALADIN distributed instead of decentralized: (i) the coordination QP in Step 2) and (ii) an additional globalization strategy which is neglected (for the sake of simplicity) in Step 3). Here, we focus on designing a local optimization algorithm. Hence, we use the full-step variant of ALADIN and focus on issue (i). Note that—upon solving Step 2) exactly in a decentralized fashion and modulo technical subtleties—one directly obtains a decentralized algorithm for constrained non-convex problems (1).

II-B Condensing the coordination QP

In ALADIN (Algorithm 1) the coordination QP (2) directly scales with the number of decision variables and constraints $(n_{x}+n_{h}+n_{c})$ , which may be prohibitive in many applications. Hence we aim at reducing the size of (2) to the number of coupling constraints $n_{c}$ which is typically much smaller than $(n_{x}+n_{h})$ . In context of direct methods for numerical optimal control, a similar approach has been used in [21]. Subsequently, we derive the reduced QP based on the Schur-complement whereas the analysis in [21] is based on dualization. In contrast to [21], we consider slack variables $s$ as they are important in practice.

In Step 2) of ALADIN one solves the coordination QP

[TABLE]

where $H_{i}\in\mathbb{R}^{n_{x_{i}}\times n_{x_{i}}}$ are positive definite Hessian approximations of the local Lagrangians, $g_{i}\in\mathbb{R}^{n_{x_{i}}}$ and the gradients, $\lambda\in\mathbb{R}^{n_{c}}$ are Lagrange multiplier estimates for the consensus constraint, $C^{\mathrm{act}}_{i}\in\mathbb{R}^{n^{k}_{h_{i}}\times n_{x_{i}}}$ are constraint linearizations of the active constraints with $n^{k}_{h_{i}}$ being the number of active constraints in the $k$ th ALADIN iteration in subproblem $i\in\mathcal{R}$ . $A_{i}\in\mathbb{R}^{n_{c}\times n_{x_{i}}}$ describes linear coupling between the subproblems. The slack $s\in\mathbb{R}^{n_{c}}$ in combination with a sufficiently large penalty parameter $\mu\in\mathbb{R}_{+}$ fosters numerical stability.555Neglecting the slack variables $s$ simplifies condensing. However, these variables are essential for handling inconsistent constraint linearizations [19]. The examples of Section V fail to converge in absence of them. For the sake of readability, we suppress the ALADIN iteration superscripts $(\cdot)^{k}$ whenever possible without ambiguity.

Assumption 1 (Strong regularity).

For all ALADIN iterates $k\in\mathbb{N}$ , for all $i\in\mathcal{R}$ , and for all local minimizers of (1), linear independence constraint qualification (LICQ), strict complementarity condition (SCC) and second-order sufficient conditions (SOSC) are satisfied on the nullspace of $C_{i}$ , cf. [25].666Note that the SOSC assumption made here is slightly stronger than the general SOSC from [25]. Here we require positive definiteness of $H_{i}$ on the tangent space of the nonlinear constraints and do not consider the nullspace of the consensus constraints (1c).

We employ the nullspace method [25] to project (II-B) onto the subspace spanned by $C_{i}^{\mathrm{act}}$ . Assumption 1 implies that $C^{\mathrm{act}}:=\operatorname{diag}_{i\in\mathcal{R}}C_{i}^{\mathrm{act}}\in\mathbb{R}^{n^{k}_{h}\times n_{x}}$ has full row-rank. Hence parametrizing $\operatorname{null}(C^{\mathrm{act}})$ in terms of $v\in\mathbb{R}^{n_{x}-n^{k}_{h}}$ yields

[TABLE]

where the columns of $Z\in\mathbb{R}^{n_{x}\times(n_{x}-n^{k}_{h})}$ form a basis of $\operatorname{null}(C^{\mathrm{act}})$ . With $x_{i}:=Z_{i}v_{i}$ for all $i\in\mathcal{R}$ , we write (II-B) as

[TABLE]

where $\bar{H}_{i}:=Z_{i}^{\top}H_{i}Z_{i}$ , $\bar{g}_{i}:=Z_{i}g_{i}$ , and $\bar{A}_{i}:=A_{i}Z_{i}$ . Upon eliminating the equation for $s$ , the KKT conditions of (5) read

[TABLE]

where $\bar{H}:=\operatorname{diag}\left(\{\bar{H}_{i}\}_{i\in\mathcal{R}}\right)$ , $\bar{g}:=\left(\,\bar{g}_{1}^{\top},\;\dots,\;\bar{g}_{N}^{\top}\,\right)^{\top}$ and ${\bar{A}:=\left(\,\bar{A}_{1},\;\dots\;,\bar{A}_{N}\,\right)}$ .

We use the Schur-complement [25, Chap. 16] to further reduce (6). SOSC implies that $\bar{H}$ is positive definite and therefore invertible. Hence, we solve the first row of (6) as $\Delta v=\bar{H}^{-1}(-{\bar{A}}^{\top}\lambda^{\text{QP}}-\bar{g})$ and obtain

[TABLE]

which is a linear system of equations of dimension ${n_{c}}$ . After solving (7), the solution to (II-B) $\Delta x$ can be obtained by backwards substitution. Exploiting the block structure of $\bar{H}$ , $\bar{g}$ and $\bar{A}$ we write (7) as

[TABLE]

where $S_{i}=\bar{A}_{i}\bar{H}_{i}^{-1}\bar{A}_{i}^{\top}$ and $s_{i}=\bar{A}_{i}(v_{i}-\bar{H}_{i}^{-1}\bar{g}_{i})$ . Observe that the matrices $S_{i}$ and the vectors $s_{i}$ can be computed entirely locally. Furthermore, reverse application of the above formulas shows that the increments $\Delta x_{i}$ can be computed locally via

[TABLE]

Doing so, we arrive at a variant of ALADIN requiring less communication compared to the standard one in Algorithm 1.

II-C Bi-level distributed ALADIN

Algorithm 2 summarizes the general algorithmic framework for bi-level distributed ALADIN. Note that the condensing all iterates $k$ for (II-B) can be performed locally and that a coordination QP of reduced dimension is used for coordination. This distributed algorithm can in principle be applied as is. However, it still requires solving a (less complex) hierarchical coordination problem (11). Observe that solving (11) by a decentralized algorithm, one obtains a decentralized variant of ALADIN. In Section IV we propose two variants for doing so: one based on conjugate gradient and one based on ADMM. As, for conceptual and numerical reasons, these iterative algorithms do not yield an exact values of $\lambda^{\text{QP}}$ , the next section presents a convergence analysis of ALADIN for inexact solutions to (11).

III Local Convergence Analysis

Usually decentralized algorithms solving (11) achieve a finite precision only. Hence it is fair to ask whether it is possible to preserve local convergence guarantees under inexact solutions. We answer this question by combining properties of ALADIN [19] with classical results from inexact Newton methods [22].

Bi-level distributed ALADIN is composed of two main steps: the parallelizable Step 1) solving local NLPs and computing (condensed) sensitivities as well as the coordination Step 2) solving (11). In order to establish local convergence properties of bi-level distributed ALADIN, we aim at ensuring progress towards a local minimizer in both steps.

From [26, Lemma 3] we have that the mapping formed by Step 1) is locally Lipschitz, i.e.

[TABLE]

with $q^{k}=(z^{k},\lambda^{k},\kappa^{k-1})$ and $p^{k}:=(x^{k},\lambda^{k},\kappa^{k})$ for some $\chi<\infty$ . The superscript $(\cdot)^{\star}$ denotes optimal primal and dual variables of (1).

It remains to analyze the progress in the coordination problem (11). Eliminating $s$ , the optimality conditions of (2) read

[TABLE]

with $\Delta q^{k}=q^{k+1}-p^{k}$ . Apart from the entry $-\frac{1}{\mu}I$ , (13) is equivalent to a Newton step for (1) if exact Hessians and Jacobians are used. Hence, we have the typical progress in Step 2) known from Newton-type methods [25]

[TABLE]

where $\gamma=\|I-M(p^{k})^{-1}\nabla^{2}\mathcal{L}(p^{k})\|<1$ can be seen as a bound on the error of $\nabla^{2}\mathcal{L}(p^{k})$ with $\mathcal{L}(x,\lambda,\kappa):=f(x)+\lambda^{\top}Ax+\kappa^{\top}h(x)$ being the Lagrangian to (1). Yet this only holds for an exact solution to (13).

Denote the approximate solution by $\bar{q}^{k+1}$ and $\Delta\bar{q}^{k}=\bar{q}^{k+1}-p^{k}$ . We define the residual for (13) similar to inexact Newton methods as

[TABLE]

We assume that the residual is bounded by

[TABLE]

which we have to guarantee during the ALADIN iterations. Now we have all the ingredients to prove the main result of this section.

Theorem 1 (Conv. of Bi-level decentralized ALADIN).

Consider bi-level distributed ALADIN (Algorithm 2). Suppose Assumption 1 holds. Let $\frac{1}{\mu^{k}}=O(\|q^{k}-p^{\star}\|)$ , let $\nabla^{2}\mathcal{L}$ and $\nabla\mathcal{L}$ be Lipschitz, and let the solution to the condensed QP (11) satisfy (14) in each iteration $k\in\mathbb{N}_{+}$ .

Then there exists $\eta\in(\eta^{k},\,\infty)$ such that bi-level distributed ALADIN converges locally to $(x^{\star},\lambda^{\star},\kappa^{\star})$

•

at linear rate; and

•

at quadratic rate if $\eta^{k}=O(\|q^{k}-p^{\star}\|)$ .

Proof.

The inequalities (12), (14) and the Lipschitz property of $m$ with $\frac{1}{\mu^{k}}=O(\|q^{k}-p^{\star}\|)$ imply

[TABLE]

where $\beta$ is the Lipschitz-constant of $m$ . The finiteness of $\alpha,\beta$ and $\chi$ shows linear convergence if ${\alpha\cdot\beta\cdot\chi\cdot\eta^{k}<1}$ . Quadratic convergence follows immediately from the above inequality if $\eta^{k}=O(\|q^{k}-p^{\star}\|)$ . ∎

The above result shows that inexact solutions to (11) do not jeopardize linear or even quadratic local convergence of bi-level distributed ALADIN.

However, the question of how to evaluate (14) in a decentralized setting arises. To this end, we draw upon $r_{p}^{k}$ the residual of (11)

[TABLE]

The structure of $S$ , $s$ , and (15) imply that $r_{\lambda}^{k}=\bar{A}\bar{H}^{-1}(-\bar{g}-\bar{A}^{\top}\lambda^{k})=\bar{A}\Delta v=\nabla_{\lambda}\mathcal{L}(p^{k})$ . As we enforce $\nabla_{\Delta x}\mathcal{L}(p^{k})=0$ and $\nabla_{\kappa}\mathcal{L}(p^{k})=0$ by virtue of the nullspace method and the first row of (6), we obtain $r_{p}^{k}={(0^{\top}\ \ 0^{\top}\ \ r_{\lambda}^{k\top})}^{\top}$ and $\|r_{p}^{k}\|=\|r_{\lambda}^{k}\|$ . Hence, note that one can evaluate (14) using only the residual of the reduced system $\|r^{k}_{\lambda}\|$ .

IV Decentralized Solution of the Coord. QP (11)

Observe that the QP (11) inherits structural properties of problem (1); i.e. the Schur-complements $S_{i}$ inherit the sparsity pattern induced by the coupling matrices $A_{i}$ . This sparsity can be exploited—either to further reduce communication by using sparse matrix storage formats or to design decentralized algorithms. Here we focus on the latter. We first analyze the sparsity of the matrices $S_{i}$ s and then we propose two decentralized algorithms exploiting this sparsity.

IV-A Sparsity of the Schur-complements

Usually, the consensus constraint (1c) describes couplings between two neighboring subproblems $i,j\in\mathcal{R}$ . This means that in the matrices $A_{i}$ and $A_{j}$ the $i$ th and $j$ th rows are nonzero.

Definition 1 (Assigned consensus constraints).

A subproblem $i\in\mathcal{R}$ is called assigned to consensus constraint $j\in{\mathcal{C}=\{1,\dots,n_{c}\}}$ , if the $j$ th row of $A_{i}$ is non-zero. Furthermore, all subproblems assigned to consensus constraint $j$ are collected in $\mathcal{R}(j):=\{i\in\mathcal{R}\;|\;i\;\text{assigned to}\;{j\in\mathcal{C}}\}$ .

A consensus constraint $j\in\mathcal{C}$ is called $n$ -assigned, if $|\mathcal{R}(j)|=n.$ Furthermore, if $|\mathcal{R}(j)|\leq n$ for all $j\in\mathcal{C}$ , problem (1) is called $n$ -assigned.

Observe that assigned consensus constraints generalize the usual consensus setting [1]. Moreover, they provide an effective framework to analyze the sparsity pattern of the Schur-complements. We remark that any generic consensus problem can be expressed in this form via appropriate choice of $A_{i}$ and using additional local variables.

Remark 1 (Reformulation as 2-assigned problem).

Without loss of generality any $n$ -assigned problem can be reformulated as $2$ -assigned problem by introduction of auxiliary decision variables. For example consider a $3$ -assigned problem with consensus constraint $A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}=0$ where $A_{1},A_{2},$$A_{3}\neq 0$ . Introduce a copy of $x_{2}$ in subproblem $1$ as $\tilde{x}_{2}:=x_{2}$ and define an augmented decision vector $\tilde{x}_{1}:=(x_{1}\;\tilde{x}_{2})^{\top}$ . This yields a $2$ -assigned problem in terms of the augmented decision vectors $(\tilde{x}_{1},x_{2},x_{3})$

[TABLE]

Lemma 2 (Sparsity of $S_{i}$ ).

The rows and columns of $S_{i}$ and entries of $s_{i}$ , $i\in\mathcal{R}$ , which are not assigned to consensus constraint $j$ , (i.e. all $j\notin\mathcal{C}(i):=\{j\in\mathcal{C}\;|\;i\in\mathcal{R}(j)\}$ ) are zero.

Proof.

We have $S_{i}=\bar{A}_{i}\bar{H}_{i}^{-1}\bar{A}_{i}^{\top}=A_{i}(Z_{i}\bar{H}_{i}^{-1}Z_{i}^{\top})A_{i}^{\top}$ . All columns of $A_{i}$ with $j\notin\mathcal{C}(i)$ are zero by Definition 1. It follows immediately that the rows and columns of $S_{i}$ with $j\notin\mathcal{C}(i)$ are zero. The sparsity of $s_{i}=\bar{A}_{i}(v_{i}-\bar{H}_{i}^{-1}\bar{g}_{i})$ follows analogously. ∎

Lemma 2 shows that the matrices $S_{i}$ and vectors $s_{i}$ have non-zero entries only for neighboring subproblems.

IV-B Consensus reformulation

Now, we reformulate (11) as a strictly convex consensus problem such that the conjugate gradient method and ADMM are applicable. Specifically, we reformulate (11) as

[TABLE]

where each $\tilde{S}_{i}$ and $\tilde{s}_{i}$ is constructed by local information only. Equation (11) implies that the reduced QP is separable as it involves sums of $S_{i}$ and $s_{i}$ . However, the terms $\mu^{-1}I$ and $\mu^{-1}\lambda$ can not directly be assigned to any of the subproblems.

One possibility is to introduce an additional subproblem which would serve as a coordinator. However, here we are interested in relying on neighborhood communication only. Hence we distribute $\mu^{-1}I$ and $\mu^{-1}\lambda$ uniformly to all subproblems assigned to the corresponding consensus constraint. This yields

[TABLE]

where $I_{j}$ contains only zeros except for $I_{jj}=1$ and $\delta_{ij}:=1$ if $j\in\mathcal{C}(i)$ and [math] else. This way (11) is expressed in the form of (16) without destroying its sparsity pattern.

The next result reformulates (16) as strictly convex QP.

Lemma 3 (Minimization to solve (8)).

The minimizer of

[TABLE]

is unique and solves (16). Furthermore, (18) is strictly convex.

Proof.

The first-order necessary condition for (18) reads $\frac{1}{2}(\sum_{i=1}^{N}\tilde{S}_{i}+\sum_{i=1}^{N}\tilde{S}_{i}^{\top})\lambda-\sum_{i=1}^{N}\tilde{s}_{i}=0$ . From Lemma 4 (given in the Appendix) one has that $\tilde{S}_{i}=\tilde{S}_{i}^{\top}$ . This proves the first assertion. Moreover, Lemma 4 gives $\sum_{i=1}^{N}\tilde{S}_{i}\succ 0$ which implies strict convexity of (18). ∎

IV-C Decentralized conjugate gradient

Next, we propose a sparsity exploiting variant of the conjugate gradient algorithm. The usual centralized conjugate gradient method with $r^{0}=p^{0}=\tilde{s}-\tilde{S}\lambda^{0}$ reads [25]

[TABLE]

Recall that $\tilde{S}=\sum_{i\in\mathcal{R}}\tilde{S}_{i}$ and let $e_{j}$ be the $j$ th unit vector. Then, from Lemma 2 and (17) we have

[TABLE]

i.e. the $j$ th column of $\tilde{S}$ belonging to consensus constraint $j$ is the sum only of the respective rows of $\tilde{S}_{i}$ of the subproblems $i\in\mathcal{R}(j)$ assigned to consensus constraint $j\in\mathcal{C}$ . Therefore the rows of $\tilde{S}$ can be constructed locally based on neighborhood communication between the assigned subproblems. Furthermore, in (19a) we have to compute

[TABLE]

From (20) and Lemma 2 we know that $\tilde{S}_{i}e_{j}=0$ for $i\in\mathcal{C}\setminus\cup_{i\in\mathcal{R}(j)}\mathcal{C}(i)$ . Hence, the components of (21) are

[TABLE]

where $\tilde{S}_{ij}$ denotes the $ij$ th element of $\tilde{S}$ . Observe that it suffices to exchange $r^{k}_{i}$ and $\tilde{S}_{ji}$ locally between all $i\in\mathcal{R}(j)$ . As

[TABLE]

and $r^{k}_{j}$ is also known locally, all summands in (23) can be computed locally. The only centralized operation is evaluating one global sum. The same applies to

[TABLE]

where $(r_{j}^{k})^{2}$ can be computed locally. Similar analysis applies to (19b)-(19e), where in (19d) an additional global sum is needed and therefore the conjugate gradient needs two global sums in each iteration.777Note that although the sum is global, it can easily be decentralized by computing the sum via “hopping” (i.e. a round-robin protocol) from neighbor to neighbor. Algorithm 3 summarizes the proposed decentralized variant of the conjugate gradient method.

Note that the decentralized conjugate gradient algorithm requires communication between all subproblems assigned to a specific consensus constraint. In other words, this algorithm can be executed in decentralized fashion if the coupling described in the $A_{i}$ s refer to two subproblems only, i.e. if Problem (1) is $2$ -assigned. The same holds for ADMM as we will see in the next section.

IV-D Decentralized ADMM

The above proposed decentralized conjugate gradient method still requires (very little) central coordination using the global sums in Step 2) and Step 4) of Algorithm 3. As an alternative, we consider a decentralized variant of ADMM for solving (11) without these centralized steps.

We rely on decentralized ADMM in so-called consensus form to (18) [1, 2]. To this end, we introduce variable copies of $\lambda$ , $\lambda_{1},\dots,\lambda_{N}$ and write (18) as

[TABLE]

with $f_{i}(\lambda_{i}):=\lambda_{i}^{\top}S_{i}\lambda_{i}-s_{i}^{\top}\lambda_{i}$ . The ADMM iteration rules can be derived from the method of multipliers combined with coordinate descent [2]. Decentralized ADMM is summarized in Algorithm 4. Observe that (25) is an entirely local step, (26) is a simple averaging step based on neighborhood communication, and (27) is again a local step. Furthermore (25) requires solving a linear system with changing right-hand sides, which means that $(\tilde{S}_{i}+\rho I)$ has to be factorized once only and can be reused in all ADMM iterations.

IV-E Comparison of CG and ADMM

The convergence properties of CG and ADMM are summarized in Table I. In theory, CG yields the exact solution in at most $n_{c}$ steps [25, Thm 5.1]. However, in practice the convergence is typically slower as conjugate gradient is sensitive to errors caused by finite precision arithmetic. Practically one observes superlinear convergence [27]. The recent paper [28] shows sublinear convergence of ADMM for convex objectives $f_{i}$ .888For strongly convex $f_{i}$ , linear convergence of ADMM can be shown [29, 4, 28]. In the present paper the $f_{i}$ of (24) are only convex but not strictly convex. In case of (2), the $f_{i}$ s are only convex, hence at least sublinear convergence can be expected which is in line with our later numerical observations. Thus conjugate gradient is expected to outperform ADMM. An advantage of CG compared to ADMM is that no tuning of the step size is needed, as this is done “automatically” in Step 2) and Step 4) of CG.

As discussed in the previous section, satisfying (14) preserves the convergence properties in bi-level distributed ALADIN. Note that criterion (14) can be evaluated locally by computing $e_{j}^{\top}r^{k}_{\lambda}$ for each $j\in\mathcal{C}$ and calculating one additional global sum. However, in implementations it turns out that a fixed number of iterations for the coordination step combined with warm starting often suffices to ensure $0<\eta_{k}<0$ .

Remark 2 (Related works on optimization over networks).

*Related results to our above developments can be found in the context of distributed optimization over networks, see [31, 4] for recent overviews. The problems considered therein are in general more difficult. Frequently, communication delays, a time-varying network topology and asynchronous operation might be considered. Prominent algorithms tailored to distributed optimization over networks are, for example, EXTRA [32], NEXT and also the widely used decentralized variant of ADMM [33]. Linear systems of equations are considered in [34, 35], gradient and subgradient-based algorithms can be found in [36, 37]. Indeed most of the algorithms cited above can in principle be used to solve (18) in decentralized fashion. A potential pitfall might be that the convergence rate of these algorithms is at most linear, in many cases merely sublinear. *

IV-F Communication analysis

We turn to analyze the forward communication need in all ALADIN variants for 2-assigned problems. Forward means that, for the sake of simplicity, we consider the communication in Step 2) of the different ALADIN variants where local sensitivities are communicated to the coordination QP. The backward communication in Step 3) is negligible compared to forward one. Our analysis evaluates communication by counting the number of floating point numbers.

Moreover, we distinguish two different kinds of communication: The first one is global communication, i.e. the information sent to any central (coordinating) entity. The second kind is local communication between neighbors. We assess the local preparation steps, which are done only once per outer ALADIN iteration in a preprocessing phase between neighboring subproblems.101010Note that we analyze the communication under symmetric conditions; i.e. both regions assigned to a consensus constraint send and receive the values corresponding to the respective consensus constraint. In general, it would suffice to choose one of these two participating regions to take care of the computations. However, this would render the algorithm somehow asymmetric.

The forward communication for solving the coordination problem (11) of bi-level distributed ALADIN once is shown in Table II. In its full variant, ALADIN communicates the first and second-order sensitivities of the objective and the first-order sensitivity of the constraints to the coordinator. Let the constraints $h_{i}$ (1b) consist of $n_{gi}$ equalities (handled as per Footnote 4) and $n_{h_{i}}-n_{g_{i}}$ inequalities. Neglecting sparsity and counting the number of all entries of the sensitivity matrices/vectors yields the following lower bound $\sum_{i=1}^{N}\frac{(n_{x_{i}}+n_{g_{i}})(n_{x_{i}}+n_{g_{i}}+1)}{2}.$ Note that we do not count the communication of the $A_{i}$ s here as they have to be communication only once and do not change during iterations.In case of active inequality constraints, $n_{g_{i}}$ is enlarged by the number of active inequality constraints which is bounded by $n_{h_{i}}-n_{g_{i}}$ . Hence, the above is a lower bound on the per-step communication which may vary during the ALADIN outer iterations. For a detailed application-specific communication analysis for the standard ALADIN see [23].

In the condensed and sparsity exploiting variant of ALADIN—i.e. Algorithm 2 without decentralization of (11)—the global forward communication is ${n_{c}(n_{c}+1)}$ where $n_{c}$ is the number of coupling constraints. The number of coupling constraints is typically much smaller than the total number of decision variables thus reducing the necessary communication effectively. Note that the $2$ in the denominator disappears due to 2-assignment and therefore each row of $\tilde{S}$ is composed of the rows of exactly two $S_{i}$ .

The bi-level distributed ALADIN ADMM variant (ALADIN ADMM) relies on purely local communication; i.e. in each iteration, the respective $\lambda_{i}$ ’s between two neighboring regions are exchanged. Hence, in ALADIN ADMM one communicates $2n_{c}\cdot n^{\mathrm{AD}}$ floats locally, where $n^{\mathrm{AD}}$ is the number of inner ADMM iterations.

Similarly, in the bi-level distributed ALADIN with conjugate gradient (ALADIN CG) one communicates $2n_{c}\cdot n^{\mathrm{CG}}$ floats locally and additionally $2\cdot n_{c}^{2}$ in the local preparation phase (the rows of the Schur-complements $e_{j}^{\top}S_{i}$ ). Finally, the global communication for computing $\alpha$ and $\beta$ is $2N\cdot n^{\mathrm{CG}}$ .

V Numerical Case Studies

V-A AC Optimal Power Flow

Non-convex AC optimal power flow problems are of crucial interest in control of power systems. Specifically, we investigate the IEEE 30-bus system shown in Figure 1 with data from [38]. For details on how to formulate OPF problems in form of (1) see [3, 39, 23]. Here we use the problem formulation and partitioning $\mathcal{P}$ from [23] with ALADIN parameters $\rho=10^{6}$ , $\mu=10^{7}$ and the step size for the lower-level ADMM $\rho=2\cdot 10^{-2}$ . In all cases we use warm-starting for CG and ADMM to accelerate convergence.

The 30-bus example has two physical interconnections between subproblems 1 and 2 shown in Figure 1. This leads to eight consensus constraints jointly assigned to subproblem 1 and 2 [23]. Figure 2 shows the resulting sparsity patterns of the corresponding Schur-complements $\tilde{S}_{1}\in\mathbb{R}^{32\times 32}$ and $\tilde{S}_{2}\in\mathbb{R}^{32\times 32}$ . One can observe an overlap in the corresponding rows/columns of $\tilde{S}_{1}$ and $\tilde{S}_{2}$ predicted by Lemma 2. The rows/columns of the remaining Schur-complements $\tilde{S}_{3}$ and $\tilde{S}_{4}$ are zero respectively.

Figure 3(a) shows the behavior of standard ALADIN (exactly solved coordination QP) and for ALADIN CG. Figure 3(b) depicts the results for inexactly solved coordination QP with different fixed numbers of inner iterations for ALADIN CG and ALADIN ADMM. Observe that there is almost no difference in the convergence rate of standard ALADIN compared with ALADIN CG with 80 inner iterations.

In contrast, different numbers of inner iterations influence the total convergence behavior of ALADIN ADMM, cf. Figure 3(b). Indeed the convergence speed varies greatly with $n^{\text{AD}}\in\{80,100,200,400,1000\}$ ; also the achievable accuracy of ALADIN ADMM seems to be limited by different numbers of inner ADMM iterations. Whereas for ALADIN CG a fixed number of inner iterations yields good performance, the number of inner iterations necessary for ALADIN ADMM depends on the desired solution accuracy and it effects the overall convergence speed (i.e. the number of outer ALADIN iterations).

This behavior is underpinned by the total number of inner iterations (# of inner iterations times # of outer iterations) shown in Table III.

Figure 4 depicts the convergence behavior of distributed conjugate gradient and ADMM for two different instances of (11). The left-hand side shows the results for ALADIN CG and ALADIN ADMM at one of the first iterations of ALADIN where $\tilde{S}$ is quite ill-conditioned. The right-hand side depicts the convergence of both algorithms when ALADIN is almost converged and therefore the condition number of $\tilde{S}$ is smaller. Observe the sublinear convergence rate of ADMM versus the practically superlinear convergence rate of conjugate gradient (cf. Table I) in both cases. Furthermore, note that the theoretical finite convergence of CG (here this would be 32 iterations) is not realized due to the conditioning of $\tilde{S}$ . However, the practical convergence rate of centralized CG appears to be superior to most other available decentralized methods [31].

V-B Distributed control of mobile robots

As a second example we consider an Optimal Control Problem (OCP) where two mobile robots should reach their final position while keeping a minimum distance to each other, cf. [7]. The centralized OCP reads

[TABLE]

where $z_{i}=(x_{i}\;y_{i}\;\theta_{i})^{\top}$ is the state of each robot $i\in\mathcal{R}$ , $x_{i}$ and $y_{i}$ describe the robots position in the $x$ - $y$ -plane, and $\theta_{i}$ is the yaw angle with respect to the $x$ -axis (Fig. 5(a)). The stage cost (28a) is the sum of quadratic tracking cost with respect to the desired end position $z^{e}_{i}\in\mathbb{R}^{3}$ for all robots. Constraint (28b) are the continuous-time dynamics

[TABLE]

The inputs $u_{i}=(v_{i}\;\omega_{i})^{\top}$ are the velocity $v_{i}$ the turning rate $\omega_{i}$ . The terminal constraint (28c) and the stage cost (28a) are chosen having a distributed NMPC setting in mind [40].

In order to convert (28) into a partially separable NLP (1), we introduce auxiliary variables duplicating the predicted $(x$ - $y)$ trajectories of each robot pair and enforce consensus by the constraint (1c). Due to space limitations we do not elaborate this in detail. We employ a direct solution approach and discretize (28) via Euler-backward; the sampling period is $0.1\,$ seconds and the horizon is $T=10\,$ seconds. We consider $|\mathcal{R}|=2$ robots which should keep a distance of $d=5\,$ m with $Q=0.1\cdot\operatorname{diag}\large((10\;\;10\;\;1))$ and $R=\operatorname{diag}\large((1\;\;1))$ . We use $\rho=10^{2}$ , $\mu=10^{6}$ and $\rho^{\text{AD}}=10^{-1}$ as tuning parameters for ALADIN.

Figure 5(b) shows the optimal open-loop trajectories for (28). One can observe that the goal of collision avoidance is satisfied while the robots move to their target positions. Interestingly, Problem (28) seems to be numerically quite different to the OPF problem. Here, $n^{\text{CG}}=30$ inner iterations for CG suffice for local convergence although the problem size is ( $n_{x}=1\,200$ ) much larger. At the same time, at least $n^{\text{AD}}=2\,400$ inner iterations were needed for ADMM to achieve an accuracy of $\epsilon=10^{-4}$ .

V-C Numerical communication analysis

Finally, we evaluate forward communication as introduced in Section IV-F practically. Table IV summarizes the forward communication for both examples. In addition the last two rows in both parts of Table IV depict the total communication (per step-communication times outer # of ALADIN iterations) for a termination tolerance of $\epsilon=10^{-4}$ .

As expected, ALADIN with condensing (Algorithm 2) needs much less communication compared to standard ALADIN variant (Algorithm 1). Solving (11) with the decentralized variants of conjugate gradient or ADMM increases total communication compared to the condensed ALADIN variant. Furthermore, the total communication of ALADIN CG is smaller compared to standard ALADIN. The comparably large local communication burden of ALADIN ADMM stems from the increased number of inner iterations, cf. Figure 3(b) and Table III.

Finally, it is worth to be noted investing the very limited global coordination and communication effort required by ALADIN CG one can achieve much better performance compared with entirely decentralized coordination, cf. right-hand side columns of Table IV.

VI Summary & Outlook

This paper has proposed a framework for designing decentralized algorithms for non-convex constrained optimization problems via bi-level distribution of the ALADIN algorithm. The core idea is to add a second (inner) layer of distributed/decentralized computation to ALADIN, whereby the coordination QP is first condensed (as a post-processing step of solving the local non-convex subproblems) and then solved in decentralized fashion. We have presented sufficient conditions on the numerical solution accuracy necessary to preserve local quadratic convergence properties of ALADIN. Moreover, we have shown how this bound can be enforced by means of decentralized inner algorithms. Specifically, we have proposed a decentralized variant of the conjugate gradient method, which shows promising performance. We also compared it to using ADMM at the inner level. Simulation studies from power systems and robotics underpin the efficacy of the proposed scheme. These studies also indicate that decentralized conjugate gradient outperforms ADMM in terms of convergence speed and in terms of total communication effort.

We expect that the proposed bi-level distribution framework opens new avenues for future research, e.g., on decentralizing globalization strategies or on tailored decentralized algorithms for distributed non-linear model predictive control.

Appendix A

Lemma 4 ( $S_{i}\succeq 0,S\succ 0$ and $S_{i}^{\top}=S_{i}$ ).

The matrices $S_{i}$ are positive semidefinite, $S=\sum_{i=1}^{N}S_{i}$ is positive definite and $S_{i}=S_{i}^{\top}$ .

Proof.

By Assumption 1 we have that all $\bar{H}_{i}$ s are positive definite, i.e. $x^{\top}\bar{H}_{i}x>0$ for all $x\in\mathbb{R}^{n_{i}}$ . With $x:=H_{i}^{-1}y$ we have $x^{\top}\bar{H}_{i}x=y^{\top}(\bar{H}_{i}^{-1})^{\top}\bar{H}_{i}\bar{H}_{i}^{-1}y=y^{\top}\bar{H}_{i}^{-1}y>0$ . Furthermore let $y:=\bar{A}_{i}z$ . Then $z^{\top}\bar{A}_{i}^{\top}\bar{H}_{i}^{-1}\bar{A}_{i}z=z^{\top}S_{i}z\geq 0$ for all $z\in\mathbb{R}^{n_{c}}$ as $\bar{A}_{i}$ may be rank deficient.

$S\succ 0:$ We know from Assumption 1 that $x^{\top}\bar{H}x>0$ . By defining $y:=Ax$ we have $x^{\top}A^{\top}\bar{H}Ax=x^{\top}Sx>0$ as $A$ has full rank by LICQ.

As $H_{i}=H_{i}^{\top}$ , we have $\bar{H}_{i}^{\top}=(Z_{i}^{\top}H_{i}Z_{i})^{\top}=(H_{i}Z_{i})^{\top}Z_{i}=Z_{i}^{\top}H_{i}^{\top}Z_{i}=\bar{H}_{i}$ and by the same argument $S_{i}^{\top}=(\bar{A}_{i}\bar{H}_{i}^{-1}\bar{A}_{i}^{\top})^{\top}=S_{i}$ . To obtain $\tilde{S}_{i}$ we add elements to the main diagonal only yielding $\tilde{S}_{i}=\tilde{S}_{i}^{\top}$ . ∎

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato and Jonathan Eckstein “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers” In Found. Trends Mach. Learn. 3.1 Hanover, MA, USA: Now Publishers Inc., 2011, pp. 1–122
2[2] Dimitri P Bertsekas and John N Tsitsiklis “Parallel and Distributed Computation: Numerical Methods” Prentice Hall Englewood Cliffs, NJ, 1989
3[3] D. K. Molzahn, F. Dörfler, H. Sandberg, S. H. Low, S. Chakrabarti, R. Baldick and J. Lavaei “A Survey of Distributed Optimization and Control Algorithms for Electric Power Systems” In IEEE Trans Smart Grid 8.6 , 2017, pp. 2941–2962 DOI: 10.1109/TSG.2017.2720471 · doi ↗
4[4] A. Nedić, A. Olshevsky and S. Wei “Decentralized Consensus Optimization and Resource Allocation” In Large-Scale and Distributed Optimization Springer, 2018, pp. 247–287
5[5] Daniel Gabay and Bertrand Mercier “A dual algorithm for the solution of nonlinear variational problems via finite element approximation” In Computers & Mathematics with Applications 2.1 , 1976, pp. 17 –40 DOI: https://doi.org/10.1016/0898-1221(76)90003-1 · doi ↗
6[6] B. T. Stewart, S. J. Wright and J. B. Rawlings “Cooperative distributed model predictive control for nonlinear systems” In Journal of Process Control 21.5 Elsevier, 2011, pp. 698–704
7[7] M. W. Mehrez, T. Sprodowski, K. Worthmann, G. Mann, R. G. Gosine, J. K. Sagawa and J. Pannek “Occupancy grid based distributed MPC for mobile robots” In Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on , 2017, pp. 4842–4847 IEEE
8[8] T. Erseghe “Distributed Optimal Power Flow Using ADMM” In IEEE Trans Power Syst 29.5 , 2014, pp. 2370–2380 DOI: 10.1109/TPWRS.2014.2306495 · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Decomposition of non-convex optimization via bi-level distributed ALADIN

Abstract

Index Terms:

I Introduction

II Preliminaries & Problem Statement

II-A Recalling ALADIN

II-B Condensing the coordination QP

Assumption 1** (Strong regularity).**

II-C Bi-level distributed ALADIN

III Local Convergence Analysis

Theorem 1** (Conv. of Bi-level decentralized ALADIN).**

Proof.

IV Decentralized Solution of the Coord. QP (11)

IV-A Sparsity of the Schur-complements

Definition 1** (Assigned consensus constraints).**

Remark 1** (Reformulation as 2-assigned problem).**

Lemma 2** (Sparsity of SiS_{i}Si​).**

Proof.

IV-B Consensus reformulation

Lemma 3** (Minimization to solve (8)).**

Proof.

IV-C Decentralized conjugate gradient

IV-D Decentralized ADMM

IV-E Comparison of CG and ADMM

Remark 2** (Related works on optimization over networks).**

IV-F Communication analysis

V Numerical Case Studies

V-A AC Optimal Power Flow

V-B Distributed control of mobile robots

V-C Numerical communication analysis

VI Summary & Outlook

Appendix A

Lemma 4** (Si⪰0,S≻0S_{i}\succeq 0,S\succ 0Si​⪰0,S≻0 and Si⊤=SiS_{i}^{\top}=S_{i}Si⊤​=Si​).**

Proof.

Assumption 1 (Strong regularity).

Theorem 1 (Conv. of Bi-level decentralized ALADIN).

Definition 1 (Assigned consensus constraints).

Remark 1 (Reformulation as 2-assigned problem).

Lemma 2 (Sparsity of $S_{i}$ ).

Lemma 3 (Minimization to solve (8)).

Remark 2 (Related works on optimization over networks).

Lemma 4 ( $S_{i}\succeq 0,S\succ 0$ and $S_{i}^{\top}=S_{i}$ ).