Accelerated Distributed Primal-Dual Dynamics using Adaptive   Synchronization

P. A. Bansode; K. C. Kosaraju; S. R. Wagh; R. Pasumarthy; N. M. Singh

arXiv:1905.00837·math.OC·May 3, 2019·IEEE Access

Accelerated Distributed Primal-Dual Dynamics using Adaptive Synchronization

P. A. Bansode, K. C. Kosaraju, S. R. Wagh, R. Pasumarthy, N. M. Singh

PDF

Open Access

TL;DR

This paper introduces an adaptive primal-dual dynamics with synchronization for distributed optimization, achieving accelerated convergence and robustness in multi-agent systems, demonstrated through applications to least squares and SVM problems.

Contribution

It presents a novel adaptive synchronization law that accelerates convergence of primal-dual dynamics in distributed optimization, with proven stability and robustness properties.

Findings

01

Achieves faster convergence rates compared to non-adaptive methods

02

Proves stability and passivity of the proposed dynamics

03

Demonstrates effectiveness on distributed least squares and SVM problems

Abstract

This paper proposes an adaptive primal-dual dynamics for distributed optimization in multi-agent systems. The proposed dynamics incorporates an adaptive synchronization law that reinforces the interconnection strength between the primal variables of the coupled agents, the given law accelerates the convergence of the proposed dynamics to the saddle-point solution. The resulting dynamics is represented as a feedback interconnected networked system that proves to be passive. The passivity properties of the proposed dynamics are exploited along with the LaSalle's invariance principle for hybrid systems, to establish asymptotic convergence and stability of the saddle-point solution. Further, the primal dynamics is analyzed for the rate of convergence and stronger convergence bounds are established, it is proved that the primal dynamics achieve accelerated convergence under the adaptive…

Equations207

\overset{x}{˙} = F (x, u), y = G (x, u),

\overset{x}{˙} = F (x, u), y = G (x, u),

x \in R^{l n} min subject \leavevmode to \leavevmode f (x) = i = 1 \sum n f_{i} (x_{i}) \leavevmode x_{ik} = x_{q k}, \leavevmode \forall_{k = 1}^{l}, \forall i, q \in N, \leavevmode g_{j} (x_{ik}) \leq 0, \leavevmode \forall_{j = 1}^{m_{g}^{ik}}, \forall_{k = 1}^{l}, \forall i \in N,

x \in R^{l n} min subject \leavevmode to \leavevmode f (x) = i = 1 \sum n f_{i} (x_{i}) \leavevmode x_{ik} = x_{q k}, \leavevmode \forall_{k = 1}^{l}, \forall i, q \in N, \leavevmode g_{j} (x_{ik}) \leq 0, \leavevmode \forall_{j = 1}^{m_{g}^{ik}}, \forall_{k = 1}^{l}, \forall i \in N,

μ ∥ x_{1} - x_{2} ∥^{2} \leq ⟨ \nabla f (x_{1}) - \nabla f (x_{2}), x_{1} - x_{2} ⟩ \leq κ ∥ x_{1} - x_{2} ∥^{2} .

μ ∥ x_{1} - x_{2} ∥^{2} \leq ⟨ \nabla f (x_{1}) - \nabla f (x_{2}), x_{1} - x_{2} ⟩ \leq κ ∥ x_{1} - x_{2} ∥^{2} .

\nabla_{x_{ik}^{*}} f_{i} (x_{ik}^{*}) + q \in N_{i} \sum a_{i q} (α_{ik}^{*} - α_{q k}^{*})

\nabla_{x_{ik}^{*}} f_{i} (x_{ik}^{*}) + q \in N_{i} \sum a_{i q} (α_{ik}^{*} - α_{q k}^{*})

+ j = 1 \sum m_{g}^{ik} (θ_{j}^{ik})^{*} \nabla_{x_{ik}^{*}} g_{j} (x_{ik}^{*}) = 0, \forall_{k = 1}^{l}, \forall i \in N,

g_{j} (x_{ik}^{*}) \leq 0, (θ_{j}^{ik})^{*} \geq 0, \forall_{k = 1}^{l}, \forall_{j = 1}^{m_{g}^{ik}}, \forall i \in N,

(θ_{j}^{ik})^{*} g_{j} (x_{ik}^{*}) = 0, \forall_{k = 1}^{l}, \forall_{j = 1}^{m_{g}^{ik}}, \forall i \in N,

x_{ik}^{*} = x_{q k}^{*}, \leavevmode \forall_{k = 1}^{l}, \forall (i, q) \in N .

\overset{ˉ}{L} (x, α, θ) = L (x, α, θ) + x^{T} (L \otimes I_{l}) x .

\overset{ˉ}{L} (x, α, θ) = L (x, α, θ) + x^{T} (L \otimes I_{l}) x .

\nabla_{x_{ik}^{*}} f_{i} (x_{ik}^{*}) + q \in N_{i} \sum a_{i q} (x_{ik}^{*} - x_{q k}^{*}) + q \in N_{i} \sum a_{i q} (α_{ik}^{*} - α_{q k}^{*})

\nabla_{x_{ik}^{*}} f_{i} (x_{ik}^{*}) + q \in N_{i} \sum a_{i q} (x_{ik}^{*} - x_{q k}^{*}) + q \in N_{i} \sum a_{i q} (α_{ik}^{*} - α_{q k}^{*})

+ j = 1 \sum m_{g}^{ik} (θ_{j}^{ik})^{*} \nabla_{x_{ik}^{*}} g_{j} (x_{ik}^{*}) = 0, \forall_{k = 1}^{l}, \forall i \in N,

g_{j} (x_{ik}^{*}) \leq 0, (θ_{j}^{ik})^{*} \geq 0, \forall_{k = 1}^{l}, \forall_{j = 1}^{m_{g}^{ik}}, \forall i \in N,

(θ_{j}^{ik})^{*} g_{j} (x_{ik}^{*}) = 0, \forall_{k = 1}^{l}, \forall_{j = 1}^{m_{g}^{ik}}, \forall i \in N,

x_{ik}^{*} = x_{q k}^{*}, \leavevmode \forall_{k = 1}^{l}, \forall (i, q) \in N .

\overset{x}{˙}_{ik} \dot{θ}_{j}^{ik} = - \nabla_{x_{ik}} \overset{ˉ}{L} (x, α, θ), \overset{α}{˙}_{ik} = \nabla_{α_{ik}} \overset{ˉ}{L} (x, α, θ), = [\nabla_{θ_{j}^{ik}} \overset{ˉ}{L} (x, α, θ)]_{θ_{j}^{ik}}^{+}, \leavevmode \forall_{k = 1}^{l}; \forall_{j = 1}^{m_{g}^{ik}}; \forall i \in N .

\overset{x}{˙}_{ik} \dot{θ}_{j}^{ik} = - \nabla_{x_{ik}} \overset{ˉ}{L} (x, α, θ), \overset{α}{˙}_{ik} = \nabla_{α_{ik}} \overset{ˉ}{L} (x, α, θ), = [\nabla_{θ_{j}^{ik}} \overset{ˉ}{L} (x, α, θ)]_{θ_{j}^{ik}}^{+}, \leavevmode \forall_{k = 1}^{l}; \forall_{j = 1}^{m_{g}^{ik}}; \forall i \in N .

\overset{x}{˙}_{ik} = - \nabla_{x_{ik}} \overset{ˉ}{L} (x, α, θ)

\overset{x}{˙}_{ik} = - \nabla_{x_{ik}} \overset{ˉ}{L} (x, α, θ)

\overset{x}{˙}_{ik}

\overset{x}{˙}_{ik}

\leavevmode \leavevmode \leavevmode - q \in N_{i} \sum a_{i q} (α_{ik} - α_{q k}) - j = 1 \sum m_{g}^{ik} θ_{j}^{ik} \nabla_{x_{ik}} g_{j} (x_{ik}) .

u_{x_{ik}} = - q \in N_{i} \sum a_{i q} (x_{ik} - x_{q k}), \forall q \in N_{i},

u_{x_{ik}} = - q \in N_{i} \sum a_{i q} (x_{ik} - x_{q k}), \forall q \in N_{i},

a_{i q} = a_{q i} = {a \leavevmode positive \leavevmode scalar, \leavevmode \leavevmode for \leavevmode (q, i) \in E, 0, \leavevmode \leavevmode for \leavevmode (q, i) \in / E .

a_{i q} = a_{q i} = {a \leavevmode positive \leavevmode scalar, \leavevmode \leavevmode for \leavevmode (q, i) \in E, 0, \leavevmode \leavevmode for \leavevmode (q, i) \in / E .

u_{x_{i}} = - q \in N_{i} \sum a_{i q} (x_{i} - x_{q}), \forall q \in N_{i},

u_{x_{i}} = - q \in N_{i} \sum a_{i q} (x_{i} - x_{q}), \forall q \in N_{i},

u_{x} = - (L \otimes I_{l}) x

u_{x} = - (L \otimes I_{l}) x

\overset{a}{˙}_{i q} = d_{i q} (e_{i q}^{T} e_{i q} + \overset{e}{˙}_{i q}^{T} \overset{e}{˙}_{i q}),

\overset{a}{˙}_{i q} = d_{i q} (e_{i q}^{T} e_{i q} + \overset{e}{˙}_{i q}^{T} \overset{e}{˙}_{i q}),

H_{1}

H_{1}

H_{2}

H_{3}

H_{3}

a_{i q}^{*} = {a^{*} \leavevmode \leavevmode if i and q are neighbors in G, 0 \leavevmode \leavevmode if i and q are not neighbors in G,

a_{i q}^{*} = {a^{*} \leavevmode \leavevmode if i and q are neighbors in G, 0 \leavevmode \leavevmode if i and q are not neighbors in G,

W = \frac{1}{2} i = 1 \sum p q = 1 \sum p \frac{1}{d _{i q}} \tilde{a}_{i q}^{2} .

W = \frac{1}{2} i = 1 \sum p q = 1 \sum p \frac{1}{d _{i q}} \tilde{a}_{i q}^{2} .

\dot{W} = i = 1 \sum p q = 1 \sum p \tilde{a}_{i q} (e_{i q}^{T} e_{i q} + \overset{e}{˙}_{i q}^{T} \overset{e}{˙}_{i q}) .

\dot{W} = i = 1 \sum p q = 1 \sum p \tilde{a}_{i q} (e_{i q}^{T} e_{i q} + \overset{e}{˙}_{i q}^{T} \overset{e}{˙}_{i q}) .

\dot{W} = \overset{x}{˙}^{T} (L \otimes I_{l}) \overset{x}{˙} - a^{*} \overset{x}{˙}^{T} (L \otimes I_{l}) \overset{x}{˙} \leavevmode \leavevmode \leavevmode + x^{T} (L \otimes I_{l}) x - a^{*} x^{T} (L \otimes I_{l}) x, = (1 - a^{*}) \overset{x}{˙}^{T} (L \otimes I_{l}) \overset{x}{˙} + (1 - a^{*}) x^{T} (L \otimes I_{l}) x .

\dot{W} = \overset{x}{˙}^{T} (L \otimes I_{l}) \overset{x}{˙} - a^{*} \overset{x}{˙}^{T} (L \otimes I_{l}) \overset{x}{˙} \leavevmode \leavevmode \leavevmode + x^{T} (L \otimes I_{l}) x - a^{*} x^{T} (L \otimes I_{l}) x, = (1 - a^{*}) \overset{x}{˙}^{T} (L \otimes I_{l}) \overset{x}{˙} + (1 - a^{*}) x^{T} (L \otimes I_{l}) x .

V_{H_{1}} (x) = \frac{1}{2} \overset{x}{˙}^{T} \overset{x}{˙} + W .

V_{H_{1}} (x) = \frac{1}{2} \overset{x}{˙}^{T} \overset{x}{˙} + W .

\displaystyle\begin{aligned} \dot{V}_{H_{1}}(x)&=-\dot{x}^{T}\mathbb{H}\dot{x}-\dot{x}^{T}(L\otimes I_{l})\dot{x}+(1-a^{*})\dot{x}^{T}(L\otimes I_{l})\dot{x}\\ &\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ +(1-a^{*}){x}^{T}(L\otimes I_{l}){x}+\dot{x}^{T}\dot{u}_{H_{1}},\\ &=-\dot{x}^{T}\mathbb{H}\dot{x}-a^{*}\dot{x}^{T}(L\otimes I_{l})\dot{x}+(1-a^{*}){x}^{T}(L\otimes I_{l}){x}\\ &\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ +\dot{x}^{T}\dot{u}_{H_{1}},\\ &\leq-\big{(}\lambda_{min}({\mathbb{H}})+a^{*}\lambda_{2}(L\otimes I_{l})\big{)}\|\dot{y}_{H_{1}}\|^{2}\\ &\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ +(1-a^{*})\lambda_{2}(L\otimes I_{l})\|y_{H_{1}}\|^{2}+\dot{{y}}_{H_{1}}^{T}\dot{u}_{H_{1}}.\end{aligned}

\displaystyle\begin{aligned} \dot{V}_{H_{1}}(x)&=-\dot{x}^{T}\mathbb{H}\dot{x}-\dot{x}^{T}(L\otimes I_{l})\dot{x}+(1-a^{*})\dot{x}^{T}(L\otimes I_{l})\dot{x}\\ &\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ +(1-a^{*}){x}^{T}(L\otimes I_{l}){x}+\dot{x}^{T}\dot{u}_{H_{1}},\\ &=-\dot{x}^{T}\mathbb{H}\dot{x}-a^{*}\dot{x}^{T}(L\otimes I_{l})\dot{x}+(1-a^{*}){x}^{T}(L\otimes I_{l}){x}\\ &\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ +\dot{x}^{T}\dot{u}_{H_{1}},\\ &\leq-\big{(}\lambda_{min}({\mathbb{H}})+a^{*}\lambda_{2}(L\otimes I_{l})\big{)}\|\dot{y}_{H_{1}}\|^{2}\\ &\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ +(1-a^{*})\lambda_{2}(L\otimes I_{l})\|y_{H_{1}}\|^{2}+\dot{{y}}_{H_{1}}^{T}\dot{u}_{H_{1}}.\end{aligned}

V_{H_{2}} (α) = \frac{1}{2} \overset{α}{˙}^{T} \overset{α}{˙} .

V_{H_{2}} (α) = \frac{1}{2} \overset{α}{˙}^{T} \overset{α}{˙} .

\dot{V}_{H_{2}} (α)

\dot{V}_{H_{2}} (α)

V_{H_{2}} (α (τ)) - V_{H_{2}} (α (0)) \leq \int_{0}^{τ} \overset{y}{˙}_{H_{2}}^{T} \overset{u}{˙}_{H_{2}} d t .

V_{H_{2}} (α (τ)) - V_{H_{2}} (α (0)) \leq \int_{0}^{τ} \overset{y}{˙}_{H_{2}}^{T} \overset{u}{˙}_{H_{2}} d t .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed Control Multi-Agent Systems · Neural Networks Stability and Synchronization · Nonlinear Dynamics and Pattern Formation

Full text

Accelerated Distributed Primal-Dual Dynamics using Adaptive Synchronization

P. A. Bansode

&K. C. Kosaraju

&S. R. Wagh

&R. Pasumarthy

&N. M. Singh P. A. Bansode is with department of Instrumentation Engineering, Ramrao Adik Institute of Technology, Mumbai, 400706 India. [email protected]. C. Kosaraju is with faculty of Science and Engineering, University of Groningen, AG Groningen, 9747 The Netherlands.S. R. Wagh is with department of Electrical Engineering, Veermata Jijabai Technological Institute, Mumbai, 400019 India.R. Pasumarthy is with department of Electrical Engineering, Indian Institute of Technology Madras, Madras, 600036 India.N. M. Singh is with department of Electrical Engineering, Veermata Jijabai Technological Institute, Mumbai, 400019 India.

Abstract

This paper proposes an adaptive primal-dual dynamics for distributed optimization in multi-agent systems. The proposed dynamics incorporates an adaptive synchronization law that reinforces the interconnection strength between the coupled agents. By strengthening the synchronization between the primal variables of the coupled agents, the given law accelerates the convergence of the proposed dynamics to the saddle-point solution. The resulting dynamics is represented as a feedback-interconnected networked system that proves to be passive. The passivity properties of the proposed dynamics are exploited along with the LaSalle’s invariance principle for hybrid systems, to establish asymptotic convergence and stability of the saddle-point solution. Further, the primal dynamics is analyzed for the rate of convergence and stronger convergence bounds are established, it is proved that the primal dynamics achieve accelerated convergence under the adaptive synchronization. The robustness of the proposed dynamics is quantified using $L_{2}$ -gain analysis and the correlation between the rate of convergence and robustness of the proposed dynamics is presented. The effectiveness of the proposed dynamics is demonstrated by applying it to solve distributed least squares and distributed support vector machines problems.

K****eywords Distributed optimization $\cdot$ Networked control system $\cdot$ Primal-dual dynamics $\cdot$ Adaptive synchronization

1 Introduction

Distributed optimization techniques have been the subject of substantial research for many years. Their applications include wireless sensor networks [1, 2, 3], power networks [4], large scale support vector machines [5, 6] etc. An exhaustive survey of these techniques can be found in [7]. Mainly, distributed optimization techniques are categorized as either decomposition based distributed optimization (see, [8] and references therein) or consensus-based distributed optimization. The consensus based distributed optimization techniques have been significantly explored lately [9, 5, 4, 6, 10, 11, 12], which is the prime subject of this paper.

Many algorithms have been proposed to solve consensus-based distributed optimization problems arising in networked systems, such as the seminal work on distributed sub-gradient methods[13], distributed primal-dual dynamical algorithms[4], distributed gradient descent algorithms [14, 10] etc. Out of these, the distributed primal-dual dynamics based algorithms deserve special attention because of their rich systems and control theoretic properties [15, 16, 17, 18, 19] and ability to obtain simultaneously both primal as well as dual optimal solutions. The seminal work on the primal-dual dynamics or the saddle point dynamics dates back to late $1950$ s [20, 21]. While their application for solving optimization problems over a network first appeared in [15] with the focus on asymptotic convergence and stability of these algorithms. This framework is later extended to distributed optimization over a network of communicating nodes in [22, 4]. The primal-dual dynamics in [22] combine the decomposition and the consensus-based methods to propose proportional-integral distributed optimization for equality constrained optimization problems and achieves a globally asymptotically stable saddle-point solution. The primal-dual gradient-based algorithm proposed in [4] achieves asymptotic convergence for a consensus-based distributed optimization problem with local inequality constraints and implements the algorithm for load-sharing control in power networks. The notion of asymptotic convergence and stability of the (distributed) primal-dual dynamics for distributed optimization has been well established.

From the perspectives of online optimization, it is necessary to certify the distributed optimization algorithms on the basis of their rate of convergence as well as stability. The asymptotic convergence of the primal-dual dynamics implies that the trajectories will converge to the saddle-point solution as $t\rightarrow\infty$ which is not sufficient as a notion when the algorithm solves the distributed optimization problem online. Lately, the algorithms such as distributed gradient (sub-gradient) methods have been widely re-studied with the objective of improvement in the rate of convergence, see [14, 23, 24, 10, 25]. However, the distributed primal-dual dynamics are not yet explored with the same objective which could limit their application to large-scale distributed optimization problems. While the existing methods on improving the rate convergence of the primal-dual dynamics rely upon increasing the convexity of the objective function by using quadratic penalty terms (augmented Lagrangian techniques)[18], their usage for solving distributed optimization problems will destroy the distributed structure of the objective function. Thus, increasing convexity by using quadratic penalties may not pose as a suitable way of improving the rate of convergence of the distributed primal-dual dynamics. As an alternative route to this could be to exploit the graph-Laplacian properties of the underlying network and use adaptive coupling gains between the nodes to improve the convergence results. Addressing this issue, the present work primarily contributes to the accelerated convergence of the distributed primal-dual dynamics.

1.1 Relevant literature and contributions

The work proposed in this paper is in the same spirit with the recent articles [4, 19]. In [4], the framework of primal-dual dynamics for network utility maximization [26] which uses Krasovskii type Lyapunov function to derive asymptotic convergence, is extended for distributed optimization with application to load sharing control in power systems. Our contribution significantly differs from [4] in the sense that the proposed dynamics is first analyzed using passivity tools of dynamical systems which then lead to its asymptotic stability when combined with the LaSalle’s invariance principle of hybrid systems [27]. The advantage of passivity-based stability analysis is that the proposed dynamics can be realized as a negative feedback interconnection of the primal and the dual subsystems. This also facilitates to understand the interaction between the primal and the dual dynamical subsystems through their inputs and outputs. Thus each subsystem also enjoys $L_{2}$ stability properties of feedback connected dynamical systems. This feature later comes to the aid of robustness analysis of the proposed dynamics using $L_{2}$ -gains. The fundamental results on passivity-based stability analysis of the primal-dual dynamics are established in [19]. Our work, in a way, extends these results for the consensus-based distributed optimization problems. However, the primal dynamical subsystem derived in this paper does not use Brayton-Moser framework [28] to arrive at the optimal solution.

The central theme of the paper, that is the adaptively coupled primal-dual dynamics is derived by integrating the consensus protocol in the distributed primal-dual dynamics with the adaptive coupling laws motivated from the results in [29]. In [29], the adaptive synchronization technique has been proved to guarantee the synchronization between the trajectories of diffusively coupled agents of a multiagent system. This technique is essentially based upon modifying the coupling weights of the diffusively coupled agents in accordance with the synchronization error between them. Larger values of synchronization errors result in increasing the coupling weights and vice-a-versa. In this paper, it is shown that the adaptation in the coupling weights strengthens the synchronization of the primal variables of the coupled agents. With this, the proposed work establishes results on an accelerated convergence of the proposed dynamics to the saddle point solution. While the adaptive synchronization has proved to accelerate the convergence, it is shown that it affects the robustness of the proposed dynamics. By introducing exogenous inputs in the interconnected network dynamics of the primal-dual subsystems, the $L_{2}$ -gain of the proposed dynamics is analyzed and worst case $L_{2}$ -gain is quantified in correlation with the rate of convergence. Although it is well known that the interconnected network of passive dynamical systems is inherently robust to exogenous inputs [30], our results have quantified the $L_{2}$ -gain margins and established a relation between these margins and the rate of convergence.

To summarize, the proposed work envelopes the following key points:

The proposed algorithm, designated hereafter as the adaptively synchronized distributed primal-dual dynamics (ADPDD), ensures synchronization of the network-wide primal variables to a common trajectory which is then driven to the optimal solution. 2. 2.

The ADPDD is posed as a negative feedback interconnection of the primal dynamical subsystem and the dual dynamical subsystems. It is proved that these subsystems remain individually passive, which subsequently, ensures the passivity and the asymptotic stability of the proposed dynamics. 3. 3.

The convergence rate of the ADPDD is established and it is proved that the ADPDD has an accelerated convergence than the distributed primal-dual dynamics (DPDD). 4. 4.

By allowing time-scale separation between the adaptive coupling laws and the primal-dual dynamics, the adaptively coupled distributed primal dynamics is proved to have an accelerated convergence to the optimal solution than the conventional distributed primal dynamics. 5. 5.

The $L_{2}$ -gain analysis of the proposed dynamics against the exogenous disturbances is presented to show the correlation between the rate of convergence and the robustness of the proposed algorithm. 6. 6.

Applicability of the proposed algorithm to solve distributed least squares, distributed support vector machines problems is discussed.

1.2 Notations and Preliminaries

The set $\mathbb{R}$ (respectively $\mathbb{R}_{\geq 0}$ or $\mathbb{R}_{>0}$ ) is the set of real (respectively non-negative or positive) numbers. $I_{n}$ is the $n\times n$ identity matrix. $\mathbf{0}$ is a zero vector of appropriate dimensions. For a square matrix $A\in\mathbb{R}^{n\times n}$ , $\mathrm{eig}(A)=\{\lambda_{1}(A),\lambda_{2}(A),\ldots,\lambda_{n}(A)\}\in\mathbb{R}$ represents eigenvalues of $A$ in an ascending order. The smallest eigenvalue of $A$ is given by $\lambda_{1}(A)$ and the second smallest eigenvalue is given by $\lambda_{2}(A)$ . If $B\in\mathbb{R}^{m\times n}$ and $C\in\mathbb{R}^{p\times q}$ are real matrices, then $B\otimes C\in\mathbb{R}^{mp\times nq}$ is a block matrix that defines the Kronecker product of $B$ and $C$ .

The interaction topology in a multi-agent system is represented using an undirected graph $\mathcal{G}=(\mathcal{N},\mathcal{E})$ with $\mathcal{N}=\{1,2,\ldots,n\}$ as the set of agents and $\mathcal{E}\subseteq\mathcal{N}\times\mathcal{N}$ as the set of edges. The neighbor set of the $i^{th}$ agent is $\mathcal{N}_{i}=\{q\in\mathcal{N}|(q,i)\in\mathcal{E}\}$ , where $i\in\mathcal{N}$ . The number of agents $n$ is the cardinality of $\mathcal{G}$ . Let $D\in\mathbb{R}^{n\times n}$ be the degree matrix of $\mathcal{G}$ and $A\in\mathbb{R}^{n\times n}$ be the adjacency matrix of $\mathcal{G}$ , with elements $a_{iq}=a_{qi}>0,\forall(i,q)\in\mathcal{E}$ , then $L=D-A$ is the Laplacian matrix of $\mathcal{G}$ . By definition, $L\in\mathbb{R}^{n\times n}$ is a symmetric positive semidefinite matrix that encodes the connectivity of the agents and their interaction topology in $\mathcal{G}$ .

If $f:\mathbb{R}^{n}\rightarrow\mathbb{R}$ is continuously differentiable in $x\in\mathbb{R}^{n}$ , then $\nabla_{x}f:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$ is the gradient of $f$ with respect to $x$ . If $f$ is twice continuously differentiable and strictly convex in $x$ then $\mathbb{H}=\nabla^{2}_{x}f\in\mathbb{R}^{n\times n}_{>0}$ is a symmetric positive definite matrix of second-order partial derivatives of $f$ with respect to $x$ .

Consider the following dynamical system

[TABLE]

where state $x\in\mathbb{R}^{n}$ , input $u\in\mathbb{R}^{m}$ , and output $y\in\mathbb{R}^{m}$ , with $F,G$ (of appropriate dimensions) sufficiently smooth and satisfying $F(0)=G(0)=0$ .

Definition 1.1 ([31]).

The system (1) is said to be passive if there exists a positive semidefinite storage function (Lyapunov function) $V:\mathbb{R}^{n}\rightarrow\mathbb{R}$ , continuously differentiable in $x$ such that $\dot{V}\leq u^{T}y$ .

For scalars $x,y$ , $[x]^{+}_{y}:=x$ if $y>0$ or $x>0$ , and $[x]^{+}_{y}:=0$ otherwise.

The remainder of the paper is mainly divided into two sections. Section 2 discusses the main results of the paper and Section 3 presents examples to validate the proposed work. Subsection 2 is divided as follows: Section 2.1 describes the consensus-based distributed optimization problem. In Subsection 2.2.1 the adaptive synchronization technique is elaborated. Subsection 2.2.2 formulates the adaptive distributed primal-dual dynamical algorithm to solve distributed optimization problem proposed in Subsection 2.1. Subsections 2.3 and 2.4 present passivity and stability analysis of the proposed dynamics. In Subsection 2.5 the convergence bounds of the proposed algorithm are obtained and the proof for an accelerated convergence of the same is provided. Subsection 2.6 provides $L_{2}$ -gain analysis of the proposed dynamics and establishes a correlation between both robustness and rate of convergence of the same. Section 3 presents the application of the proposed dynamics to the distributed least squares and the distributed support vector machines problems. Some numerical examples of academic interests are also discussed. Section 4 concludes the paper.

2 Problem Formulation and main results

2.1 Distributed Optimization

Consider the following distributed optimization problem

[TABLE]

where $x_{i}=[x_{i1},\ldots,x_{il}]^{T}\in\mathbb{R}^{l}$ and $x=[x^{T}_{1},\ldots,x^{T}_{n}]^{T}\in\mathbb{R}^{ln}$ . It is assumed that the functions $f_{i}:\mathbb{R}^{l}\rightarrow\mathbb{R}$ is twice differentiable and strongly convex, and $g_{j}:\mathbb{R}\rightarrow\mathbb{R}$ is convex. The optimization problem (2) can be decomposed into $n$ subproblems wherein each subproblem minimizes the cost $f_{i}(x_{i})$ subject to the consensus constraint $x_{ik}=x_{qk}$ and inequality constraints $g_{j}(x_{ik})\leq 0$ . The problem (2) can not be fully decoupled into a set of $n$ subproblems because of the consensus constraints, but it can be addressed as a network-based multiagent optimization problem using graph theory as a tool. Let an undirected and connected graph $\mathcal{G}(\mathcal{N},\mathcal{E})$ describe the communication topology of the underlying network, where $\mathcal{N}$ denotes the set of agents or subproblems, and $\mathcal{E}$ denotes the set of communication links. Each agent minimizes a local cost function $f_{i}(x_{i})$ subject to the consensus constraints $x_{ik}=x_{qk},\forall^{l}_{k=1},\forall q\in\mathcal{N}_{i}$ and the local inequality constraints $g_{j}(x_{ik})\leq 0,\forall^{l}_{k=1}$ . The global consensus corresponds to the optimal solution of (2), when $x^{*}_{1}=x^{*}_{2}=\ldots=x^{*}_{n}=x^{*}$ . The index $m^{ik}_{g}$ is the number of inequality constraints associated with the scalar $x_{ik}$ .

The strong duality of (2) is subject to the convexity of $f$ and the constraint satisfaction given by the Slater’s condition (see, [32]), which is as follows: Assuming that there exists an $x\in\mathrm{relint}\mathcal{D}$ such that $g_{j}(x_{ik})<0,x_{ik}=x_{qk},\forall^{l}_{k=1},\forall q\in\mathcal{N}_{i},\forall^{n}_{i=1}$ , then $x$ is strictly feasible, where $\mathcal{D}$ is the domain of (2) defined as $\mathcal{D}=\mathrm{dom}f$ . The convexity of $f$ strongly imply the uniqueness of its optimal solution $x^{*}$ .

Assumption 1.

$f$ * is $\mu-$ strongly convex and $\kappa$ -smooth, i.e., for all $x_{1},x_{2}\in\mathbb{R}^{ln}$ ,*

[TABLE]

The Lagrangian function ${\mathcal{L}}$ of the problem (2) is given by:

where $\alpha_{ik}\in\mathbb{R}$ is a Lagrange multiplier associated with the consensus constraint $x_{ik}=x_{qk}$ and $\theta^{ik}_{j}\in\mathbb{R}_{+}=\{\theta^{ik}_{j}\in\mathbb{R}|\theta^{ik}_{j}\geq 0,\leavevmode\nobreak\ \forall^{l}_{k=1},\forall^{m^{ik}_{g}}_{j=1},\forall i\in\mathcal{N}\}$ is a Lagrange multiplier associated with the inequality constraint $g_{j}(x_{ik})\leq 0$ . The vector notations of the respective Lagrange multipliers are $\theta\in\mathbb{R}^{m^{ik}_{g}ln}_{+}$ and $\alpha\in\mathbb{R}^{ln}$ .

Remark 1.

Assuming that the Slater’s condition is satisfied and a strong duality holds, the saddle-point $(x^{*},\alpha^{*},\theta^{*})$ satisfies the Karush-Kuhn-Tucker (KKT) conditions derived the Lagrangian (LABEL:dlag1), as follows:

[TABLE]

In order to ensure the global consensus of the states $x_{i},\forall i\in\mathcal{N}$ , the Lagrangian function defined in (LABEL:dlag1) is augmented with the term $x^{T}(L\otimes I_{l})x$ . The augmented Lagrangian function is defined below:

[TABLE]

Remark 2.

Note that augmenting the Lagrangian (LABEL:dlag1) with $x^{T}(L\otimes I_{l})x$ does not affect its convexity-concavity properties. This owes to the fact that $x^{T}(L\otimes I_{l})x$ is a positive semidefinite function of the primal variable $x$ . Thus the saddle-point satisfying (4) also satisfies the following KKT conditions for the Lagrangian (5):

[TABLE]

Using the augmented Lagrangian (5), the primal-dual dynamics is derived as follows:

[TABLE]

With the primal-dual dynamics derived as given in (7), the following subsection develops the ADPDD.

2.2 Adaptively Synchronized Distributed Primal-dual dynamics

The following subsection presents the adaptive synchronization mechanism which is later integrated with the dynamics defined in (7) to arrive at ADPDD.

2.2.1 Adaptive synchronization

The adaptive synchronization mechanism has been widely used in multi-agent systems to guarantee synchronization between the agents with respect to their state variables [29, 33], which is explained subsequently.

The primal variables associated with each agent evolve according to

[TABLE]

as described in (7). By performing gradient descent on (5), the primal dynamics (8) can be further derived as:

[TABLE]

Let $u_{x_{ik}}\in\mathbb{R}$ corresponds to the following term in (9):

[TABLE]

where the interconnection strength or the coupling weight $a_{iq}$ belongs to the adjacency matrix $A$ such that

[TABLE]

The equation (10) is regarded widely as the consensus protocol or the consensus law[29, 34]. Define further $u_{x_{i}}\in\mathbb{R}^{l}$ , the consensus protocol (10) can be modified to accommodate $x_{i}\in\mathbb{R}^{l}$ as given below:

[TABLE]

Similarly,

[TABLE]

is a compact form representation of (11).

If $i$ and $q$ are neighbors in $\mathcal{G}$ with $e_{iq}=x_{i}-x_{q}$ defined as the local synchronization error, then the coupling weight can be represented as a function of $e_{iq}$ , i.e. $\dot{a}_{iq}=h_{i}(e_{iq})$ , where $h_{i}:\mathbb{R}^{l}\rightarrow\mathbb{R}$ monotonically increases in $e_{iq}$ . It yields a stronger synchronization between the primal variables of the coupling agents which motivates to incorporate adaptive synchronization to address the convergence rate of the distributed primal-dual dynamics. In line with this, the following coupling weight update rule is proposed:

[TABLE]

where $d_{iq}=d_{qi}>0$ is the adaptive gain constant.

Remark 3.

Represent (13) in the form $\dot{a}_{iq}=h_{i}(e_{iq},\dot{e}_{iq})$ , throughout the rest of the paper it is assumed that the real valued function $h_{i}:\mathbb{R}^{ln}\rightarrow\mathbb{R}$ is Lipschitz continuous.

The dynamics (13) addresses two questions, viz. how far from each other the local primary variables are and how fast they can be synchronized to a common trajectory. The quadratic appearance of $e_{iq}$ and $\dot{e}_{iq}$ in (13) ensures that it is monotonically increasing in $\mathbb{R}$ .

2.2.2 Integrating the adaptive coupling law (13) with the primal-dual dynamics (7)

By integrating the adaptive coupling law (13) with the PDD (7) and partitioning the resulting dynamics into three interconnected subsystems i.e., $H_{1}$ (primal partition), $H_{2}$ (consensus dual partition), and $H_{3}$ (inequality dual partition) as shown in Fig. 1, yields:

[TABLE]

The system $H_{3}$ represents the $\theta^{ik}_{j}$ dynamics in the stacked vector form with $u_{H_{3}}$ and $y_{H_{3}}$ as its input and output respectively, as given below:

[TABLE]

where $y_{H_{1}},y_{H_{2}},y_{H_{3}}\in\mathbb{R}^{ln}$ and $u_{H_{1}}=-(L\otimes I_{l})y_{H_{2}}-y_{H_{3}},u_{H_{2}}=(L\otimes I_{l})y_{H_{1}}$ , and $u_{H_{3}}=y_{H_{1}}$ .

The ADPDD (14)-(16) has been characterized as the feedback interconnected networked system as shown in the Fig. 1. Each agent in the underlying network is diffusively coupled with its neighboring agents by virtue of the communication topology that defines the interaction between such agents on the graph $\mathcal{G}(\mathcal{N},\mathcal{E})$ . It can be noted that the network representation in Fig. 1 is independent of the graph parameters such as communication topology, number of agents, and interaction links. Irrespective of such parameters, if the graph $\mathcal{G}(\mathcal{N},\mathcal{E})$ is connected, one can arrive at the stability results of the underlying network by only verifying its passivity properties. Towards this end, the following subsection first motivates the passivity analysis of the network shown in Fig. 1 which further leads to its closed loop stability and robustness analysis.

2.3 Passivity based stability analysis of ADPDD

This section begins with passivity analysis of the subsystems $H_{1}$ , $H_{2}$ , $H_{3}$ and their feedback interconnection shown in Fig. 1 and then moves towards the stability and robustness analysis of the said feedback interconnection. The Krasovskii type storage function has been defined for each subsystem (see, [15]) which has led to a new passivity property with differentiation at both ports[35, Proposition 2]. The intuition behind this proposition is to define the Krasovskii type storage function $V(x)$ for the dynamical system defined in (1), such that $\dot{V}\leq\dot{u}^{T}\dot{y}$ , where $\dot{u}$ and $\dot{y}$ are considered as port variables. This inequality shows that the map from the port input $\dot{u}$ to the port output $\dot{y}$ is passive. Motivated by this result, subsequently it is shown that the ADPDD is a passive system.

2.3.1 $H_{1}$ is passive

Proposition 2.1.

Assuming that the graph $G$ is connected and $f$ is strictly convex in $x$ , if there exists $x_{eq}\in\mathbb{R}^{ln}$ that satisfies (4), then the subsystem $H_{1}$ is passive with port variables $(\dot{{y}}_{H_{1}},\dot{u}_{H_{1}})$ .

Proof.

Let $\tilde{a}_{iq}=a_{iq}-a^{*}_{iq}$ with $a^{*}_{iq}>0$ defined as follows:

[TABLE]

where $a^{*}$ is a parameter to be selected.

Consider the following storage function for the update law (13) [29].

[TABLE]

Differentiating (18) with respect to time yields the following:

[TABLE]

Acknowledging the graph symmetry and substituting for $e_{iq}=x_{i}-x_{q}$ , (19) modifies to

[TABLE]

Now, consider the following storage function for $H_{1}$ , which is a sum of Krasovskii-type storage function of $x$ and (18):

[TABLE]

Differentiating (21) with respect to time and using (20) yields,

[TABLE]

Notice that $y_{H_{1}}=x$ and choosing $a^{*}>1$ makes the term $(1-a^{*})\lambda_{2}(L\otimes I_{l})\|y_{H_{1}}\|^{2}$ in (22) negative definite. Since $\lambda_{min}({\mathbb{H}})+a^{*}\lambda_{2}(L\otimes I_{l})>0$ for a non-negative value of $a^{*}$ , the inequality (22) implies that the subsystem $H_{1}$ is output strictly passive (“OSP”[30]) with respect to the port variables $\dot{u}_{H_{1}}$ and $\dot{y}_{H_{1}}$ . ∎

2.3.2 $H_{2}$ is passive

Proposition 2.2.

Assuming that the graph $G$ is connected and $f$ is strictly convex in $x$ , if there exists $\alpha_{eq}\in\mathbb{R}^{ln}$ satisfying (4), then the subsystem $H_{2}$ is passive with port variables $(\dot{y}_{H_{2}},\dot{u}_{H_{2}})$ .

Proof.

Consider a Krasovskii-type storage function for $H_{2}$ as given below:

[TABLE]

Differentiating (23) with respect to time yields,

[TABLE]

(24) yields the following inequality,

[TABLE]

Hence, the subsystem $H_{2}$ is passive with respect to port variables $\dot{u}_{H_{2}}$ and $\dot{y}_{H_{2}}$ . ∎

2.3.3 $H_{3}$ is passive

In the following, $H_{3}$ is modeled as a switched dynamical system.

The dynamics in (16) becomes discontinuous when $\theta^{ik}_{j}=0$ and $g_{j}(x_{ik})<0$ . The value of $g_{j}(x_{ik})^{+}$ switches from $g_{j}(x_{ik})$ to [math]. To further clarify that, (16) is reformulated below as given in Kose [20].

[TABLE]

From (26), the projection is seen to be active for the second case. Let $\mathcal{I}_{i}=\{1,\ldots,lm^{ik}_{g}\}$ and $\sigma_{i}:[0,\infty)\rightarrow\mathcal{I}_{i},\forall k=1,\ldots,l;j\in\mathcal{I}_{i}$ be an arbitrary switching signal. Then

[TABLE]

represents the switching time instances when there is an active projection. Considering (27), the inequality constraint dynamics given in (16) takes the form of a switched system:

[TABLE]

where $\sigma_{i}(t)\subset\sigma(t),\forall^{n}_{i=1}$ . Let $V_{H_{3}}$ be the Lyapunov function associated with $H_{3}$ . It is defined as given below:

[TABLE]

Proposition 2.3.

The subsystem $H_{3}$ is passive with port input $\dot{u}_{H_{3}}$ , and port output $\dot{y}_{H_{3}}$ for each pair of switching time instances $(\tau^{+}_{\sigma_{i}},\tau^{-}_{\sigma_{i}})$ corresponding to (28) where $\tau^{-}_{{\sigma_{i}}}<\tau^{+}_{{\sigma_{i}}}$ such that $\sigma_{i}(\tau^{+}_{\sigma_{i}})=\sigma(\tau^{-}_{\sigma_{i}})=\sigma_{i}\in\mathcal{I}_{i}$ and $\sigma_{i}(\tau^{\prime})\neq\sigma_{i}$ for $\tau^{-}_{{\sigma_{i}}}<\tau<\tau^{+}_{\sigma_{i}}$ .

Proof.

Differentiating (29) with respect to time yields,

[TABLE]

Using $u_{H_{3}}$ and $y_{H_{3}}$ from (16) in (30),

[TABLE]

Thus,

[TABLE]

(32) ensures that the switched system (28) represents a finite family of passive systems. However, it must be ensured that the Lyapunov function $V_{H_{3}}$ does not increase during the switching events. In line with this, the following two cases have been considered:

It may happen for some $x_{ik}$ in (28), that the function $g_{j}(x_{ik})$ goes from negative to positive through [math]. This will cause the Lyapunov function to change from $V_{k}(\theta^{ik}_{j}(\tau^{-}_{\sigma_{i}}))$ to $V_{k}(\theta^{ik}_{j}(\tau^{+}_{\sigma_{i}}))$ . If that happens, the Lagrangian multiplier $\theta^{ik}_{j}>0$ will add a new term to $V_{k}(\theta^{ik}_{j}(\tau_{\sigma_{i}}))$ . Since, $V_{k}(\theta^{ik}_{j}(\tau_{\sigma_{i}}))$ is continuous in time, (32) holds for $\tau>\tau^{-}_{\sigma_{i}}$ as well as $\tau<\tau^{+}_{\sigma_{i}}$ . Hence, $V_{k}(\theta^{ik}_{j}(\tau^{+}_{\sigma_{i}}))=V_{j}(\theta^{ik}_{j}(\tau^{-}_{\sigma_{i}}))$ . 2. 2.

In this case the projection of $k^{th}$ constraint for a given $j$ becomes active, i.e., $\theta^{ik}_{j}$ reaches to [math] from a positive value for the $k^{th}$ constraint of the $i^{th}$ machine. Hence, the corresponding $k^{th}$ term of the Lyapunov function $V_{k}(\theta^{ik}_{j})$ will disappear. In turn, the following inequality will be satisfied. $V_{k}(\theta^{ik}_{j}(\tau^{+}_{\sigma_{i}}))<V_{k}(\theta^{ik}_{j}(\tau^{-}_{\sigma_{i}}))$ .

Hence, in both the cases, the Lyapunov function $V_{k}(\theta^{ik}_{j}(\tau))$ will be non-increasing. ∎

2.4 Stability analysis of the feedback interconnection shown in Fig 1.

Proposition 2.4.

Let $a^{*}>1$ then the interconnected network dynamics (14)-(16) is passive from the input $u_{x}$ to the output $y_{H_{1}}$ .

Proof.

Let $V$ be the candidate Lyapunov function for the interconnected system represented in Fig. 1 such that

[TABLE]

Differentiating (33) and using (22), (24), (31) yields

[TABLE]

Thus, the interconnected network dynamics (14)-(16) is passive, if $a^{*}>1$ strictly holds.

The following result establishes the boundedness of the trajectories of (14)-(16).

Proposition 2.5.

The trajectories of (14)-(16) are bounded for any finite initial conditions.

Proof.

To show that the trajectories of (14)-(16) are bounded, consider the following storage function:

[TABLE]

where $W$ is the storage function defined in (18). Differentiating (36) with respect to time yields

[TABLE]

Note that $(\theta^{ik}_{j}-(\theta^{ik}_{j})^{*})g_{j}(x_{ik})\geq 0,\forall j\in\sigma_{i}(t)$ because $g_{j}(x_{ik})<0$ and $\theta^{ik}_{j}=0$ as confirmed by (28). Using first order condition of convexity-concavity of the Lagrangian function (5) and replacing $\dot{W}$ by right-hand side of (20), (38) modifies to the following:

[TABLE]

Since $(x^{*},\alpha^{*},\theta^{*})$ is the saddle-point of (5), with $a^{*}>1$ yields the following

[TABLE]

which is sufficient to ensure that the trajectories of (14)-(16) are bounded. ∎

In what follows, the asymptotic stability of the saddle-point solution of (14)-(16) is established. To this end, the underlying networked dynamics is represented as a hybrid system wherein $H_{1}$ , $H_{2}$ are represented as continuous-time dynamical systems and $H_{3}$ is represented as a system with right-hand side discontinuity. The framework of LaSalle’s invariance principle for hybrid dynamical systems (see, [27]) is stated below, which in our case provides a useful result on the convergence of (14)-(16) to the saddle point solution that satisfies (4).

Proposition 2.6.

Consider the hybrid networked dynamics (14)-(16) and let $z=[x^{T},\alpha^{T},\theta^{T}]^{T}\in\mathcal{X}\subseteq\mathbb{R}^{ln(2+m^{ik}_{g})}$ , and $\Psi\subseteq\mathcal{X}$ be compact and positively invariant. Assuming that the Lyapunov function $V$ defined in (33) is continuously differentiable and $\dot{V}\leq 0$ along the trajectories of $z(t)\in\Psi$ , every trajectory in $\Psi$ converges to $\epsilon$ , where $\epsilon\subset\Psi$ is a maximal positive invariant set of $\Psi$ such that

$\dot{V}=0$ * for a fixed $\sigma$ .* 2. 2.

$V_{k}(\theta^{ik}_{j}(\tau^{+}_{\sigma_{i}}))=V_{k}(\theta^{ik}_{j}(\tau^{-}_{\sigma_{i}}))$ * for a switching instance $\tau$ between $\tau^{-}_{\sigma_{i}}$ and $\tau^{+}_{\sigma_{i}}$ .*

∎

Proposition 2.6 gives the next result on the convergence of (14)-(16) to the saddle point solution that satisfies the conditions in (4).

Proposition 2.7.

The hybrid network dynamics (14)-(16) converges to the saddle point solution $x^{*},\alpha^{*},\theta^{*}$ satisfying (4).

Proof.

From Proposition 2.6, for a fixed $\sigma$ , $\dot{V}=0$ . Thus the primal as well as dual dynamics in (14)-(16) converge to the saddle point solution contained within the set $\epsilon$ . If $g_{j}(x^{*}_{ik})<0$ then $(\theta^{ik}_{j})^{*}=0$ . However, if $g_{j}(x^{*}_{ik})>0$ , then $(\theta^{ik}_{j})^{*}$ will penalize the constraint violation by rising to a large value. Since all trajectories are bounded, it contradicts the continuity of $V$ , thus $\dot{\theta}^{ik}_{j}=0$ . To this end, the solutions of (14)-(16) also satisfy the KKT conditions (4) and yield the saddle point solution $(x^{*},\alpha^{*},\theta^{*})$ . ∎

Choosing $a^{*}>1$ and using (12), (35) modifies to

[TABLE]

Proposition 2.8.

The saddle point solution of (5) is asymptotically stable.

Proof.

The proof is straightforward from Proposition 2.4 and Proposition 2.7 and (40). ∎

In the recent article [36] the global asymptotic stability of the primal-dual dynamics is proved by using the Lyapunov function similar to that of the sum of Krasovskii-type Lyapunov function (33) and the Lyapunov function defined in (36). This result can be extended to the globally asymptotic stability of the saddle-point of (5).

Remark 4.

Let $\tilde{V}:\mathbb{R}^{ln}\times\mathbb{R}^{ln}\times\mathbb{R}^{lnm^{ik}_{g}}\rightarrow\mathbb{R}$ denote the candidate Lyapunov function for the ADPDD (14)-(16), given as sum of the candidate Lyapunov functions (33) and (36) as follows:

[TABLE]

If Assumption 1 holds then the trajectories of (14)-(16) converge to the saddle-point $(x^{*},\alpha^{*},\theta^{*})$ which is globally asymptotically stable. The proof of the Remark would be similar to proof the of [36, Theorem 5.1]. Hence it is omitted from here to avoid repetition.

With the global asymptotic stability of the proposed dynamics (14)-(16) established, the subsequent section addresses its rate of convergence and its comparison with the rate of convergence with the primal-dual dynamics without adaptive weights.

2.5 Accelerated convergence using ADPDD

Let $\mathcal{A}\subseteq\mathbb{R}^{|\mathcal{E}|}_{>0}$ define the set of coupling weights, and $|\mathcal{E}|$ define the cardinality of the edge set $\mathcal{E}$ . Then, in view of its definition, the Laplacian matrix $L\otimes I_{l}$ is a parameter varying, real and symmetric matrix, which is differentiable and uniformly continuous on $\mathcal{A}$ . As a consequence, the following hold:

Statement 1.

There exists $\Lambda>0$ such that the spectral norm $\|L\otimes I_{l}\|<\Lambda,\forall a_{iq}\in\mathcal{A},\forall q\in\mathcal{N}_{i},\forall i\in\mathcal{N}$ .

Statement 2.

The gradient of $L\otimes I_{l}$ with respect to $a_{iq}$ is bounded above by some scalar $\eta$ , $\left\|\nabla L\otimes I_{l}\right\|\leq\eta,a_{iq}\in\mathcal{A}$ .

Let $L_{0}\otimes I_{l}$ be the Laplacian matrix of $\mathcal{G}$ whose coupling weights are constant parameters, then $L_{0}\otimes I_{l}$ results in a constant matrix.

Proposition 2.9.

If the coupling weights evolve according to the law (13), then the following holds $\forall t>t_{0}$ :

[TABLE]

Proof.

If $E$ is the incidence matrix of the undirected graph $\mathcal{G}$ , then the Laplacian matrices $L_{0}$ and $L\otimes I_{l}$ can be written as:

[TABLE]

where $C(t)$ is a diagonal matrix containing the coupling weights. To prove (42), it is first proved that $x^{T}EC(t)E^{T}x\geq x^{T}EE^{T}x$ .

[TABLE]

For an undirected graph $\mathcal{G}$ , $a_{iq}(t_{0})\geq 1,\forall q\in\mathcal{N}_{i},\forall i\in\mathcal{N}$ . Then $\forall(q,i)\in\mathcal{E}$ , $C(t)\geq I_{|\mathcal{E}|}$ . Hence,

[TABLE]

in fact, $C(t_{0})$ is a diagonal matrix with the coupling weights $a_{iq}(t_{0})$ , thus $C(t)\geq C(t_{0}),\forall t>t_{0}$ . Thus from the above reasoning, and (46),

[TABLE]

From (43) and (44),

[TABLE]

Let $\lambda_{i}$ be the $i^{th}$ eigenvalue in the ordered-pair of eigenvalues represented below:

[TABLE]

Then according to Courant-Fischer theorem [37],

[TABLE]

where $v_{1}$ is the eigenvector (vector of all ones) corresponding to the eigenvalue $\lambda_{1}=0$ . Thus for $i=2$ ,

[TABLE]

∎

Proposition 2.10.

If the coupling weights evolve according to (13), then the following always hold:

[TABLE]

Proof.

The proof simply follows from the inequality (48). Taking the ratio of the ordered pair of eigenvalues of $L\otimes I_{l}$ and $L_{0}\otimes I_{l}$ , yields the following:

[TABLE]

But, for $t>t_{0}$ , the inequality (52) strictly holds. Thus

[TABLE]

∎

Proposition 2.9 and 2.10 can be further used to prove that the adaptive primal dual dynamics has an accelerated, yet bounded convergence rate as compared to the conventional primal-dual dynamics.

Proposition 2.11.

If the inequality (50) holds, then the primal dynamic in (14), under the adaptive coupling law (13), achieves accelerated convergence.

Proof.

Below a timescale separation is enforced in the dynamics of the primal subsystem $H_{1}$ ,

[TABLE]

with $\epsilon<<1$ ensuring that the primal variable $x_{i}$ evolves faster than the coupling weights $a_{iq}$ .

The primal subsystem has two control inputs $u_{x}$ , to study the primal dynamics with respect to $u_{x}$ in (12), let us analyze the primal subsystem $H_{1}$ when $u_{H_{1}}$ is at steady state or equal to [math]. With the assumption that the coupling weight dynamics is much slower, the primal dynamics is re-written as:

[TABLE]

where $F(x)=\nabla_{x}f(x)+(L\otimes I_{l})x$ .

Using Assumption 1 it can be proved that the primal dynamics (56) is strongly monotone for all $x\in\mathbb{R}^{ln}$ by evaluating the Jacobian of $F(x)$ , i.e. $\nabla F(x)=\mathbb{H}+L\otimes I_{l}\geq\mu I$ , where $\mu$ is the modulus of convexity of $f$ (from Assumption 1. Since $\mu>0$ , the Jacobian $\nabla F(x)$ is symmetric and positive definite $\forall x\in\mathbb{R}^{ln}$ . It proves that $F(x)$ is strongly monotone by virtue of which the primal dynamics (56) converges to the unique optimizer $x^{*}$ . Thus uniqueness of the primal optimizer remains invariant under the adaptive coupling law (13).

The following result establishes the accelerated convergence of (56) with respect to the unique optimizer $x^{*}$ . Let $V_{H_{1}}$ define the Lyapunov candidate function as given below:

[TABLE]

Differentiating $V_{H_{1}}$ with respect to time $t$ ,

[TABLE]

where $\lambda_{m}=2(\lambda_{min}(\mathbb{H})+\lambda_{2}(L\otimes I_{l}))$ . Therefor,

[TABLE]

or

[TABLE]

Further, since the primal-dual dynamics has a bounded convergence with respect to the saddle point solution (see Proposition 39), using Assumption 3, and Remark 4, every initial condition $x(t_{0})\in\mathbb{R}^{ln}$ approaches the optimal solution $x^{*}$ faster than the usual. Thus the accelerated convergence holds globally. Considering the upper bound on $\lambda_{2}(L\otimes I_{l})$ as given in (53), let $\lambda_{2}(L\otimes I_{l})=\frac{\lambda_{n}(L\otimes I_{l})}{\lambda_{n}(L_{0}\otimes I_{l})}\lambda_{2}(L_{0}\otimes I_{l})$ and $\lambda_{m_{0}}=2(\lambda_{min}(\mathbb{H})+\lambda_{2}(L_{0}\otimes I_{l}))$ . Then it is seen that $\lambda_{m}=\lambda_{min}(\mathbb{H})+\frac{\lambda_{n}(L\otimes I_{l})}{\lambda_{n}(L_{0}\otimes I_{l})}\lambda_{2}(L_{0}\otimes I_{l})>>\lambda_{m_{0}}$ . Hence proved. ∎

Remark 5.

The ADPDD (14)-(16) achieves accelerated convergence to the saddle point solution that satisfies the KKT conditions (2).

Proof.

The proof follows from Proposition 2.7 and Proposition 2.11. The occurrence of the primal optimizer $x^{*}$ and the dual optimizers is simultaneous.

Recall from (14)-(16) that $y_{H_{1}}=x,y_{H_{2}}=\alpha$ , and $y_{H_{3}}=\sum_{j=1}^{m^{ik}_{g}}\theta^{ik}_{j}\nabla_{x_{ik}}g_{j}(x_{ik}),\forall^{l}_{k=1};\forall^{m^{ik}_{g}}_{j=1}$ . $y_{H_{1}}=x^{*}\implies\dot{y}_{H_{1}}=0$ which also implies that $\dot{\alpha}=0$ and $\dot{\theta}=0$ . Thus accelerated convergence of (14) to the primal optimizer $x^{*}$ implies the accelerated convergence of both (15) and (16) converge to the dual optimizers $(\alpha^{*},\theta^{*})$ . ∎

Remark 6.

Since the adaptive synchronization in the primal variables results in an accelerated convergence to the primal optimizer $x^{*}$ , the synchronization error $x_{i}-x_{j},\forall j\in\mathcal{N}_{i},\forall i\in\mathcal{N}$ remains lower than that of DPDD for all time. This significantly reduces the consensus constraint violation and thus the dual variable $\alpha$ pertaining to the consensus constraints does not become unnecessarily large for the ADPDD problem. The uniqueness of the primal optimizer for both Lagrangian functions (5) and (LABEL:dlag1) owes to the strongly monotone property of $F(x)$ as discussed in the Proposition 2.11. However, the dual dynamics (15) (which solely a function of the adaptively coupled primal variables) is not strongly monotone, which implies that the Lagrangian (5) and (LABEL:dlag1) are not strongly concave with respect to $\alpha$ . Thus, the dual dynamics (15) under the effect of adaptively synchronized primal variables and the one without adaptively synchronized primal variables settle to different equilibrium states. This further indicates that there exists a unique primal optimizer for both Lagrangian functions (5) and (LABEL:dlag1) but the same does not hold for the dual optimizer $\alpha^{*}$ . Since the dual variable $\theta^{*}$ pertains to the local inequality constraints, it remains unaffected by the adaptive synchronization in primal variables. Hence, the dual optimizer $\theta^{*}$ is also unique for both (5) and (LABEL:dlag1).

The convergence rate of the distributed primal-dual dynamics is improved under the influence of adaptive synchronization. However, it may adversely affect the robustness of the proposed dynamics. Thus, there arises a necessity to quantify the robustness of the proposed dynamics with respect to the rate of convergence. The analysis presented below obtains a relation between the convergence rate of the proposed dynamics and its $L_{2}$ -gain.

2.6 Robustness analysis of the network dynamics with respect to the exogenous inputs

Before proceeding with the robustness analysis of this section, it is worth noting the following remark on robustness property of the passive dynamical systems.

Remark 7.

From the inequalities (22), (25), and (32), it is apparent that the interconnected network dynamics comprising (14)-(16) is passive, and inherently robust to the perturbations arising in the primal and dual variables [see, Proposition 4.3.1, Remark 4.3.3 of [30]].

Remark 7 states the qualitative behavior of the proposed dynamics with respect to the notion of robustness. In the following, the robustness of the proposed dynamics against exogenous inputs is quantified in terms of the $L_{2}$ -gain.

Consider without loss of generality, the new inputs to (14)-(16) as

[TABLE]

respectively, where $\bigtriangleup u_{(.)}$ corresponds to the perturbations in the input $u_{(.)}\in\mathbb{R}^{ln}$ . As discussed in [38], $\bigtriangleup u_{(.)}$ represent additive uncertainties or disturbances such as the numerical error accumulated in the corresponding variables. In what follows, the robustness of the ADPDD is quantified using $L_{2}$ -gain analysis of dynamical systems. Let ${\tilde{u}}=[{\tilde{u}}^{T}_{H_{1}},{\tilde{u}}^{T}_{H_{2}},{\tilde{u}}^{T}_{H_{3}}]^{T}$ and ${y}=[{y}^{T}_{H_{1}},{y}^{T}_{H_{2}},{y}^{T}_{H_{3}}]^{T}$ .

Proposition 2.12.

The interconnected network dynamics (14), (15), and (16) with $a_{iq}$ updated according to (13), remains $L_{2}$ stable with the $L_{2}$ -gain, $\gamma\leq\frac{1}{\lambda_{min}({\mathbb{H}})+a^{*}\lambda_{2}(L\otimes I_{l})}$ , if $a^{*}>1$ strictly holds.

Proof.

Replacing the inputs in (14)-(16) by the new ones as defined in (62), the time differential of the Lyapunov function (33) modifies to the following:

[TABLE]

Acknowledging that $y_{H_{1}}=x$ and using (12) in (63) further yields

[TABLE]

where $\lambda_{min}({\mathbb{H}})+a^{*}\lambda_{2}(L\otimes I_{l})>0$ since $\mathbb{H}$ is positive definite. With $a^{*}>1$ , the $L_{2}$ -gain of the interconnected network dynamics, from the port input $\dot{\tilde{u}}$ to the port output $\dot{y}$ can be calculated by setting $u_{x}$ to [math]. From inequality (64), the map from the input $\dot{\tilde{u}}$ to the output $\dot{y}$ remains finite $L_{2}$ -gain stable around the saddle point $x^{*},\alpha^{*},\theta^{*}$ , when the corresponding $L_{2}$ -gain, satisfies

[TABLE]

∎

The inequality (65), clearly indicates that the $L_{2}$ gain corresponding to the adaptive distributed primal-dual dynamics reduces in margin as compared to the $L_{2}$ gain corresponding to the distributed primal-dual dynamics (without adaptive synchronization). Using (53), one can obtain the following expression for the $L_{2}$ -gain in the worst case:

[TABLE]

Comparing (65) and (66), it can be found out that the $L_{2}$ -gain for the ADPDD has a reduced margin than that of the DPDD. Thus the algorithm calls for trade-off between the robustness and the accelerated convergence of the proposed dynamics. While the adaptive synchronization improves the rate of convergence of the primal-dual dynamics, it simultaneously degrades the robustness of the proposed algorithm wherein the worst-case $L_{2}$ -gain is quantified by $\underline{\gamma}(<\gamma)$ in (66).

3 Applications and Numerical Examples

This section discusses the application of the proposed dynamics to the distributed optimization problems concerning least squares[7, 39] and support vector machines[40]. These problems are solved online over a network of wireless sensors or computing devices, in such premises the rate of convergence is a vital factor. In the following, the proposed dynamics (14)-(16) is employed to solve the distributed least squares[41] and distributed support vector machines[5, 6] problems.

3.1 Distributed Least Squares

Distributed least squares problems have been widely studied over recent years[42, 43, 12]. These techniques have found applications in parameter estimation over wireless sensor networks [44], estimation of electro-mechanical oscillation modes of large power system networks [41, 45] etc. Each agent in the network is given a task to simultaneously and iteratively compute the same least squares solution to the linear equation $Ax=b$ where $A\in\mathbb{R}^{r_{1}\times r_{2}}$ with $r_{1}>r_{2}$ and $b\in\mathbb{R}^{r_{1}\times 1}$ .

Formally, the least squares problem is defined as given below[46]:

[TABLE]

3.1.1 Data partitioning

It is assumed that each agent in the network adheres to $n_{r}=r_{1}/n$ consecutive rows of $A$ and $b$ . For the sake of simplicity, equal partitioning of the rows of $A$ is considered. However, the proposed approach would hold even if the partitioning is uneven.

[TABLE]

where $A_{i}\in\mathbb{R}^{n_{r}\times{l}}$ and $A_{i}\in\mathbb{R}^{n_{r}\times 1}$ .

3.1.2 Distributed formulation of least squares problem

The consensus-based distributed optimization formulation of (67) would require the local estimates $x_{1},x_{2},\ldots,x_{n}$ to reach consensus on the global optimizer $x^{*}$ . With data partitioning as defined above, the distributed version of the least squares problem (67)[41] is defined as

[TABLE]

3.1.3 Solution to the distributed least squares problem (69) using ADPDD

The Lagrangian problem corresponding to (69) can be defined as

[TABLE]

Similarly to (7), the proposed dynamics can be derived from (70) as given below:

[TABLE]

where $u_{H_{1}}=-(L\otimes I_{l})y_{H_{2}}$ and $u_{H_{2}}=(L\otimes I_{l})y_{H_{1}}$ .

3.1.4 Simulations

The simulation parameters are randomly generated matrix $A\in\mathbb{R}^{100\times 80}$ and vector $b\in\mathbb{R}^{100\times 1}$ . The network with a cyclic graph topology is assumed to comprise of $\mathrm{4}$ agents wherein each agent holds $A_{i}\in\mathbb{R}^{25\times 80}$ component of $A$ as well as the respective $b_{i}$ . Each agent in the network computes $x\in\mathbb{R}^{80}$ local estimates and reaches consensus over the global solution $x^{*}$ as shown in the Fig. 2. The simulations were carried out using $d_{iq}=0.1$ , the rate of convergence of (71) is compared with that of the non-adaptive version of the distributed primal-dual dynamics employed to solve the problem (69). The rate of convergence is significantly improved as shown in the Fig. 3. The global solution to (69) is also compared with the solution of the least square solver $\mathrm{lsqlin}$ in $\mathrm{MATLAB}$ . The global optimizer $x^{*}_{1}=x^{*}_{2}=x^{*}_{3}=x^{*}_{4}$ obtained using the proposed algorithm coincides with the optimal solution $x^{*}$ obtained using $\mathrm{lsqlin}$ as shown in the Fig. 4.

3.2 Quadratic-inequality Constrained Distributed Least Squares

A box-constrained linear least squares problem is the one in which the upper and lower bounds on the estimated values are incorporated to handle limitations of the physical system. These methods are studied with applications to GPS positioning [47], geodesic applications [48, 49, 50] etc. The box-constrained least squares problem is generally defined as follows:

[TABLE]

where $x_{l}$ and $x_{u}$ are the upper and lower bounds of the variable $x$ . It is known that a quadratic constraint formulation of the box constrained least square problem is an efficient approach to obtain the optimal solution of (73) [39]. The quadratic-constrained equivalent formulation of the box-constrained least square problem (74) is given as:

[TABLE]

where $\bar{x}_{i}$ is the midpoint of the interval $[x_{l},x_{u}]$ . It is computed as $\bar{x}_{i}=(x_{l}+x_{u})/2$ with $\rho_{i}=(x_{u}-x_{l})/2$ .

A distributed framework for the quadratic-constrained least squares problem (74) can be obtained as:

[TABLE]

The ADPDD formulation of the problem (75) is similar to that of the proposed dynamics (14)-(16). Hence, it is omitted to avoid repetition of the equations.

3.2.1 Simulations

For the sake of simplicity and readability of the simulation results, a small problem of the form (75) is taken as a proof of concept with the parameters $A\in\mathbb{R}^{20\times 4}$ and $b\in\mathbb{R}^{20\times 1}$ . A network with a cyclic graph topology containing $\mathrm{4}$ agents is considered wherein each agent holds on to $A_{i}\in\mathbb{R}^{5\times 4}$ component of the matrix $A$ . All agents iteratively reach the global consensus of the optimizer value $x^{*}$ with $d_{iq}=2$ , as shown in the Fig. 5. It can be observed that the trajectories $x_{1},x_{2},x_{3}$ , and $x_{4}$ synchronize to respective common trajectories at around $t\approxeq 0.03\leavevmode\nobreak\ \mathrm{seconds}$ . The result is also compared with the solution of $\mathrm{lsqlin}$ and it can be seen from the Fig. 6 that the global optimizer of (75) coincides with the solution obtained using $\mathrm{lsqlin}$ . The accelerated convergence of the proposed algorithm employed to solve (75) is evident from the Fig. 7.

Remark 8.

A strong synchronization between the trajectories of the agents imply guaranteed convergence to the global optimizer under sparse communication events. It is also indicative of the fact that the communication between the agents need not be periodic. The proposed algorithm can be augmented with the event-triggered control framework.

3.3 Distributed Support Vector Machines

Support vector machines (SVMs) are supervised learning based paradigms in the machine learning domain, used for classification and regression analysis on raw data, (see [40]). For applications with a huge amount of data, there are often limitations with respect to bandwidth requirement, data storage and processing capability of the computing machine, response time, etc. As it turns out, a single computing machine is inefficient in dealing with the SVM algorithm with large datasets. Distributed versions of support vector machines have been proposed as an alternative method to overcome these limitations, as discussed in [5, 6]. With the aim of enabling accelerated convergence to the optimal solution, the distributed SVM problem is formulated in terms of the adaptive primal-dual dynamics. However, due to the complexity involved with simulations of large-scale SVM problems, the present work only considers the mathematical formulation and does not provide the simulation results for the same.

A problem formulation of the support vector machines for the case of non-separable data is given below:

[TABLE]

where $\frac{1}{\|w\|}$ is the margin that separates positive and negative observations, $(x_{j},y_{j})\in S$ is a paired observation sample, and $w,b$ are weight and bias variables, respectively. $1-\xi_{j}-y_{j}(w^{T}x_{j}+b)$ is called as a hinge loss function. C is used to trade off the sum over all slack variables $\xi$ against the size of the margin. $p>0$ is the scaling factor.

3.3.1 Data Partitioning

It is assumed that the set of observations $S$ is horizontally partitioned and distributed among computing nodes in $\mathcal{G}(\mathcal{N},\mathcal{E})$ [6], where now $\mathcal{N}=\{1,\ldots,n\}$ represents the computing nodes and the set of edges $\mathcal{E}$ describes communication links between them. Assuming that the graph is connected and enabling only one-hop neighborhood communication, each node $i$ communicates with its neighbors belonging to $\mathcal{N}_{i}$ . Each node $i\in\mathcal{N}$ stores a sample set of labeled observations, denoted by $S_{i}=\{(x_{i1},y_{i1}),\ldots,(x_{im_{i}},y_{im_{i}})\}$ . Note that:

$S_{i}$ is a set of labeled observations allocated to $i^{th}$ computing node, $S_{i}\in S$ , where $S$ is a superset of the labeled observations. 2. 2.

$x_{i}\in\mathbb{R}^{m_{i}\times 1}$ . 3. 3.

$y_{ij}\in\{-1,+1\}$ is a class label.

In what follows, an adaptive primal-dual dynamics based formulation of distributed support vector machines is provided.

3.3.2 ADPDD formulation of Distributed Support Vector Machines

A distributed version of the support vector machines problem (76) is formulated as given below (see, [5]):

[TABLE]

The objective function in (77) is a differentiable $(C^{2})$ and strongly convex in $w$ . The decision (primal) variables are $w,b\in\mathbb{R}^{m}$ , where $w_{i}=w_{q},b_{i}=b_{q}$ are the consensus constraints with $q$ as a neighbor of $i$ if and only if $q\in\mathcal{N}_{i}$ . Let $h_{ij}(\xi_{ij},w_{i},b_{i})=1-\xi_{ij}-y_{ij}(w_{i}x_{ij}+b_{i})$ .

The Lagrangian formulation of the problem (77) is given by

[TABLE]

where $\theta_{ij},\mu_{ij}$ are the Lagrange multipliers associated with inequality constraints $h_{ij}(\xi_{ij},w_{i},b_{i})$ and $\xi_{ij}\geq 0$ , of $i^{th}$ computing node, and $\alpha_{i},\beta_{i}$ are the Lagrange multipliers associated with coupling constraints of $i^{th}$ and $q^{th},\forall q\in\mathcal{N}_{i}$ nodes. $L$ is the Laplacian matrix of the undirected graph $G$ .

Let $z=[w^{T},b^{T}]^{T}$ (with $z_{i}=[w_{i},b_{i}]$ , $l=2$ ) then, $e_{iq}=z_{i}-z_{q}$ . The interconnected network dynamics for the distributed support vector machines problem (77) is represented as follows:

[TABLE]

The subsystem $H_{2}$ contains only consensus-dual variables, with $u_{H_{2}}$ and $y_{H_{2}}$ as its input and output respectively, as given below:

[TABLE]

The subsystem $H_{3}$ contains the slack variable, and the dual variables corresponding to the inequality constraints, with $u_{H_{3}}$ and $y_{H_{3}}$ as its input and output respectively, as given below:

[TABLE]

where $\zeta,\eta,\mu\in\mathbb{R}^{n}$ , and $\zeta_{i}=\sum_{j=1}^{m_{i}}\theta_{ij}(-y_{ij}x_{ij})$ with $\eta_{i}=\sum_{j=1}^{m_{i}}\theta_{ij}(-y_{ij})$ .

Thus, the proposed dynamics can be implemented for solving the distributed support vector machines problem (77) as shown in (79)-(81). The solution of the underlying dynamics will correspond to the saddle-point solution of (78), wherein the primal solution is the optimal solution of (77).

In the following, two different formulations of (2) are considered and the results of the proposed dynamics are compared with that of the non-adaptive version of the distributed primal-dual dynamics.

3.4 Numerical Example 1

Consider the following distributed optimization problem consisting $\mathrm{3}$ agents having more than one variable and convex inequality constraints.

[TABLE]

where the objective function associated with each agent is given below

[TABLE]

with the following local inequality constraints

[TABLE]

The graph connectivity is assumed to be as follows: $\mathcal{N}_{1}=1$ , $\mathcal{N}_{2}=2$ , and $\mathcal{N}_{3}=1$ . The ADPDD algorithm is employed to solve the problem (82), and the corresponding trajectories are shown in Fig. 8. The primal optimizers are $\mathrm{(1.4099,0.8966)}$ . Besides that, in Fig. 9 the steady state eigenvalues of $L\otimes I_{l}$ are plotted along with the eigenvalues of $L_{0}\otimes I_{l}$ . The eigenvalue $\lambda_{2}(L\otimes I_{l})$ at steady state is equal to $\mathrm{192.8079}$ as compared to the eigenvalue corresponding to a non-adaptive DPDD, $\lambda_{2}(L_{0}\otimes I_{l})=1$ . From Proposition 2.9 and Proposition 2.11, it can be seen that the adaptive synchronization has sought to increase the rate of convergence of the ADPDD.

3.5 Numerical Example 2

In this subsection, the local inequality constraints associated with each agent are relaxed and the following optimization problem is considered on a random graph with $\mathrm{10}$ agents as shown in Fig. 10. Note that the degree of each agent is selected randomly.

[TABLE]

with a randomly generated Hessian $\mathbb{H}=\mathrm{diag}([136,439,355,298,302,350,327,398,353,294])$ . The proposed dynamics is employed to solve (89), first considering $d_{iq}=0.001$ and then $d_{iq}=0.01$ . Fig. 11 and Fig. 12 correspond to the case of $d_{iq}=0.001$ while Fig. 13 and Fig. 14 correspond to the case of $d_{iq}=0.01$ . It can be seen that for the latter case the convergence is much faster. This owes to the difference between the resulting eigenvalues, i.e., for the case of $d_{iq}=0.001$ , the second smallest eigenvalue $\lambda_{2}(L\otimes I_{l})$ yields to be $\mathrm{10.72}$ whereas the same for the case of $d_{iq}=0.01$ increases to $\mathrm{33310}$ . The eigenvalue results for both values of $d_{iq}$ are shown in the Fig. 12 and the Fig. 14.

4 Conclusions

In this paper, an adaptive distributed primal-dual dynamics is proposed to solve inequality and consensus constrained distributed optimization problems. The adaptive synchronization of the primal variables is brought into play by allowing the coupling weights to update according to the difference between the local trajectories (trajectories belonging to the neighboring nodes or agents) as well as the difference between the rate of change of the local trajectories respectively. It is proved that the proposed dynamics represents a network of feedback-interconnected passive dynamical systems which are asymptotically stable. Further, by allowing a time-scale separation between the adaptive coupling law and primal dynamics, stronger convergence bounds for the primal dynamic are derived, and it is proved that the adaptively coupled primal dynamics converges to the unique primal optimizer.

The performance of the proposed dynamics is quantified in terms of the induced $L_{2}$ -gain from the disturbance input to the output. The effect of adaptive synchronization on the $L_{2}$ -gain is discussed and it is established that the adaptive distributed primal-dual dynamics are comparatively less robust to the exogenous input disturbances than the distributed primal-dual dynamics. On the other hand, the analysis also revealed that in order to achieve accelerated convergence to the saddle-point solution, the proposed algorithm must call for a trade-off between the convergence and the robustness parameters.

The future scope of the work will be directed towards improving the rate of convergence of the proposed dynamics without compromising its robustness properties. Its applications to large-scale distributed optimization problems such as distributed support vector machines [6], distributed least squares [12] etc will be considered.

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Michael Rabbat and Robert Nowak. Distributed optimization in sensor networks. In Proceedings of the 3rd international symposium on Information processing in sensor networks , pages 20–27. ACM, 2004.
2[2] Bjorn Johansson, Cesare Maria Carretti, and Mikael Johansson. On distributed optimization using peer-to-peer communications in wireless sensor networks. In 2008 5th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks , pages 497–505. IEEE, 2008.
3[3] Alexander Bertrand and Marc Moonen. Consensus-based distributed total least squares estimation in ad hoc wireless sensor networks. IEEE Transactions on Signal Processing , 59(5):2320–2330, 2011.
4[4] Peng Yi, Yiguang Hong, and Feng Liu. Distributed gradient algorithm for constrained optimization with application to load sharing in power systems. Systems & Control Letters , 83:45–52, 2015.
5[5] Pedro A Forero, Alfonso Cano, and Georgios B Giannakis. Consensus-based distributed support vector machines. Journal of Machine Learning Research , 11(May):1663–1707, 2010.
6[6] Marco Stolpe, Kanishka Bhaduri, and Kamalika Das. Distributed support vector machines: an overview. In Solving Large Scale Learning Tasks. Challenges and Algorithms , pages 109–138. Springer, 2016.
7[7] Angelia Nedić and Ji Liu. Distributed optimization for control. Annual Review of Control, Robotics, and Autonomous Systems , 1:77–103, 2018.
8[8] Daniel Pérez Palomar and Mung Chiang. A tutorial on decomposition methods for network utility maximization. IEEE Journal on Selected Areas in Communications , 24(8):1439–1451, 2006.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Accelerated Distributed Primal-Dual Dynamics using Adaptive Synchronization

Abstract

1 Introduction

1.1 Relevant literature and contributions

1.2 Notations and Preliminaries

Definition 1.1** ([31]).**

2 Problem Formulation and main results

2.1 Distributed Optimization

Assumption 1**.**

Remark 1**.**

Remark 2**.**

2.2 Adaptively Synchronized Distributed Primal-dual dynamics

2.2.1 Adaptive synchronization

Remark 3**.**

2.2.2 Integrating the adaptive coupling law (13) with the primal-dual dynamics (7)

2.3 Passivity based stability analysis of ADPDD

2.3.1 H1H_{1}H1​ is passive

Proposition 2.1**.**

Proof.

2.3.2 H2H_{2}H2​ is passive

Proposition 2.2**.**

Proof.

2.3.3 H3H_{3}H3​ is passive

Proposition 2.3**.**

Proof.

2.4 Stability analysis of the feedback interconnection shown in Fig 1.

Proposition 2.4**.**

Proof.

Proposition 2.5**.**

Proof.

Proposition 2.6**.**

Proposition 2.7**.**

Proof.

Proposition 2.8**.**

Proof.

Remark 4**.**

2.5 Accelerated convergence using ADPDD

Statement 1**.**

Statement 2**.**

Proposition 2.9**.**

Proof.

Proposition 2.10**.**

Proof.

Proposition 2.11**.**

Proof.

Remark 5**.**

Proof.

Remark 6**.**

2.6 Robustness analysis of the network dynamics with respect to the exogenous inputs

Remark 7**.**

Proposition 2.12**.**

Proof.

3 Applications and Numerical Examples

3.1 Distributed Least Squares

3.1.1 Data partitioning

3.1.2 Distributed formulation of least squares problem

3.1.3 Solution to the distributed least squares problem (69) using ADPDD

3.1.4 Simulations

3.2 Quadratic-inequality Constrained Distributed Least Squares

3.2.1 Simulations

Remark 8**.**

3.3 Distributed Support Vector Machines

3.3.1 Data Partitioning

3.3.2 ADPDD formulation of Distributed Support Vector Machines

3.4 Numerical Example 1

3.5 Numerical Example 2

4 Conclusions

Definition 1.1 ([31]).

Assumption 1.

Remark 1.

Remark 2.

Remark 3.

2.3.1 $H_{1}$ is passive

Proposition 2.1.

2.3.2 $H_{2}$ is passive

Proposition 2.2.

2.3.3 $H_{3}$ is passive

Proposition 2.3.

Proposition 2.4.

Proposition 2.5.

Proposition 2.6.

Proposition 2.7.

Proposition 2.8.

Remark 4.

Statement 1.

Statement 2.

Proposition 2.9.

Proposition 2.10.

Proposition 2.11.

Remark 5.

Remark 6.

Remark 7.

Proposition 2.12.

Remark 8.