Accelerated Distributed Primal-Dual Dynamics using Adaptive Synchronization
P. A. Bansode, K. C. Kosaraju, S. R. Wagh, R. Pasumarthy, N. M. Singh

TL;DR
This paper introduces an adaptive primal-dual dynamics with synchronization for distributed optimization, achieving accelerated convergence and robustness in multi-agent systems, demonstrated through applications to least squares and SVM problems.
Contribution
It presents a novel adaptive synchronization law that accelerates convergence of primal-dual dynamics in distributed optimization, with proven stability and robustness properties.
Findings
Achieves faster convergence rates compared to non-adaptive methods
Proves stability and passivity of the proposed dynamics
Demonstrates effectiveness on distributed least squares and SVM problems
Abstract
This paper proposes an adaptive primal-dual dynamics for distributed optimization in multi-agent systems. The proposed dynamics incorporates an adaptive synchronization law that reinforces the interconnection strength between the primal variables of the coupled agents, the given law accelerates the convergence of the proposed dynamics to the saddle-point solution. The resulting dynamics is represented as a feedback interconnected networked system that proves to be passive. The passivity properties of the proposed dynamics are exploited along with the LaSalle's invariance principle for hybrid systems, to establish asymptotic convergence and stability of the saddle-point solution. Further, the primal dynamics is analyzed for the rate of convergence and stronger convergence bounds are established, it is proved that the primal dynamics achieve accelerated convergence under the adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Control Multi-Agent Systems · Neural Networks Stability and Synchronization · Nonlinear Dynamics and Pattern Formation
Accelerated Distributed Primal-Dual Dynamics using Adaptive Synchronization
P. A. Bansode
&K. C. Kosaraju
&S. R. Wagh
&R. Pasumarthy
&N. M. Singh P. A. Bansode is with department of Instrumentation Engineering, Ramrao Adik Institute of Technology, Mumbai, 400706 India. [email protected]. C. Kosaraju is with faculty of Science and Engineering, University of Groningen, AG Groningen, 9747 The Netherlands.S. R. Wagh is with department of Electrical Engineering, Veermata Jijabai Technological Institute, Mumbai, 400019 India.R. Pasumarthy is with department of Electrical Engineering, Indian Institute of Technology Madras, Madras, 600036 India.N. M. Singh is with department of Electrical Engineering, Veermata Jijabai Technological Institute, Mumbai, 400019 India.
Abstract
This paper proposes an adaptive primal-dual dynamics for distributed optimization in multi-agent systems. The proposed dynamics incorporates an adaptive synchronization law that reinforces the interconnection strength between the coupled agents. By strengthening the synchronization between the primal variables of the coupled agents, the given law accelerates the convergence of the proposed dynamics to the saddle-point solution. The resulting dynamics is represented as a feedback-interconnected networked system that proves to be passive. The passivity properties of the proposed dynamics are exploited along with the LaSalle’s invariance principle for hybrid systems, to establish asymptotic convergence and stability of the saddle-point solution. Further, the primal dynamics is analyzed for the rate of convergence and stronger convergence bounds are established, it is proved that the primal dynamics achieve accelerated convergence under the adaptive synchronization. The robustness of the proposed dynamics is quantified using -gain analysis and the correlation between the rate of convergence and robustness of the proposed dynamics is presented. The effectiveness of the proposed dynamics is demonstrated by applying it to solve distributed least squares and distributed support vector machines problems.
K****eywords Distributed optimization Networked control system Primal-dual dynamics Adaptive synchronization
1 Introduction
Distributed optimization techniques have been the subject of substantial research for many years. Their applications include wireless sensor networks [1, 2, 3], power networks [4], large scale support vector machines [5, 6] etc. An exhaustive survey of these techniques can be found in [7]. Mainly, distributed optimization techniques are categorized as either decomposition based distributed optimization (see, [8] and references therein) or consensus-based distributed optimization. The consensus based distributed optimization techniques have been significantly explored lately [9, 5, 4, 6, 10, 11, 12], which is the prime subject of this paper.
Many algorithms have been proposed to solve consensus-based distributed optimization problems arising in networked systems, such as the seminal work on distributed sub-gradient methods[13], distributed primal-dual dynamical algorithms[4], distributed gradient descent algorithms [14, 10] etc. Out of these, the distributed primal-dual dynamics based algorithms deserve special attention because of their rich systems and control theoretic properties [15, 16, 17, 18, 19] and ability to obtain simultaneously both primal as well as dual optimal solutions. The seminal work on the primal-dual dynamics or the saddle point dynamics dates back to late s [20, 21]. While their application for solving optimization problems over a network first appeared in [15] with the focus on asymptotic convergence and stability of these algorithms. This framework is later extended to distributed optimization over a network of communicating nodes in [22, 4]. The primal-dual dynamics in [22] combine the decomposition and the consensus-based methods to propose proportional-integral distributed optimization for equality constrained optimization problems and achieves a globally asymptotically stable saddle-point solution. The primal-dual gradient-based algorithm proposed in [4] achieves asymptotic convergence for a consensus-based distributed optimization problem with local inequality constraints and implements the algorithm for load-sharing control in power networks. The notion of asymptotic convergence and stability of the (distributed) primal-dual dynamics for distributed optimization has been well established.
From the perspectives of online optimization, it is necessary to certify the distributed optimization algorithms on the basis of their rate of convergence as well as stability. The asymptotic convergence of the primal-dual dynamics implies that the trajectories will converge to the saddle-point solution as which is not sufficient as a notion when the algorithm solves the distributed optimization problem online. Lately, the algorithms such as distributed gradient (sub-gradient) methods have been widely re-studied with the objective of improvement in the rate of convergence, see [14, 23, 24, 10, 25]. However, the distributed primal-dual dynamics are not yet explored with the same objective which could limit their application to large-scale distributed optimization problems. While the existing methods on improving the rate convergence of the primal-dual dynamics rely upon increasing the convexity of the objective function by using quadratic penalty terms (augmented Lagrangian techniques)[18], their usage for solving distributed optimization problems will destroy the distributed structure of the objective function. Thus, increasing convexity by using quadratic penalties may not pose as a suitable way of improving the rate of convergence of the distributed primal-dual dynamics. As an alternative route to this could be to exploit the graph-Laplacian properties of the underlying network and use adaptive coupling gains between the nodes to improve the convergence results. Addressing this issue, the present work primarily contributes to the accelerated convergence of the distributed primal-dual dynamics.
1.1 Relevant literature and contributions
The work proposed in this paper is in the same spirit with the recent articles [4, 19]. In [4], the framework of primal-dual dynamics for network utility maximization [26] which uses Krasovskii type Lyapunov function to derive asymptotic convergence, is extended for distributed optimization with application to load sharing control in power systems. Our contribution significantly differs from [4] in the sense that the proposed dynamics is first analyzed using passivity tools of dynamical systems which then lead to its asymptotic stability when combined with the LaSalle’s invariance principle of hybrid systems [27]. The advantage of passivity-based stability analysis is that the proposed dynamics can be realized as a negative feedback interconnection of the primal and the dual subsystems. This also facilitates to understand the interaction between the primal and the dual dynamical subsystems through their inputs and outputs. Thus each subsystem also enjoys stability properties of feedback connected dynamical systems. This feature later comes to the aid of robustness analysis of the proposed dynamics using -gains. The fundamental results on passivity-based stability analysis of the primal-dual dynamics are established in [19]. Our work, in a way, extends these results for the consensus-based distributed optimization problems. However, the primal dynamical subsystem derived in this paper does not use Brayton-Moser framework [28] to arrive at the optimal solution.
The central theme of the paper, that is the adaptively coupled primal-dual dynamics is derived by integrating the consensus protocol in the distributed primal-dual dynamics with the adaptive coupling laws motivated from the results in [29]. In [29], the adaptive synchronization technique has been proved to guarantee the synchronization between the trajectories of diffusively coupled agents of a multiagent system. This technique is essentially based upon modifying the coupling weights of the diffusively coupled agents in accordance with the synchronization error between them. Larger values of synchronization errors result in increasing the coupling weights and vice-a-versa. In this paper, it is shown that the adaptation in the coupling weights strengthens the synchronization of the primal variables of the coupled agents. With this, the proposed work establishes results on an accelerated convergence of the proposed dynamics to the saddle point solution. While the adaptive synchronization has proved to accelerate the convergence, it is shown that it affects the robustness of the proposed dynamics. By introducing exogenous inputs in the interconnected network dynamics of the primal-dual subsystems, the -gain of the proposed dynamics is analyzed and worst case -gain is quantified in correlation with the rate of convergence. Although it is well known that the interconnected network of passive dynamical systems is inherently robust to exogenous inputs [30], our results have quantified the -gain margins and established a relation between these margins and the rate of convergence.
To summarize, the proposed work envelopes the following key points:
The proposed algorithm, designated hereafter as the adaptively synchronized distributed primal-dual dynamics (ADPDD), ensures synchronization of the network-wide primal variables to a common trajectory which is then driven to the optimal solution. 2. 2.
The ADPDD is posed as a negative feedback interconnection of the primal dynamical subsystem and the dual dynamical subsystems. It is proved that these subsystems remain individually passive, which subsequently, ensures the passivity and the asymptotic stability of the proposed dynamics. 3. 3.
The convergence rate of the ADPDD is established and it is proved that the ADPDD has an accelerated convergence than the distributed primal-dual dynamics (DPDD). 4. 4.
By allowing time-scale separation between the adaptive coupling laws and the primal-dual dynamics, the adaptively coupled distributed primal dynamics is proved to have an accelerated convergence to the optimal solution than the conventional distributed primal dynamics. 5. 5.
The -gain analysis of the proposed dynamics against the exogenous disturbances is presented to show the correlation between the rate of convergence and the robustness of the proposed algorithm. 6. 6.
Applicability of the proposed algorithm to solve distributed least squares, distributed support vector machines problems is discussed.
1.2 Notations and Preliminaries
The set (respectively or ) is the set of real (respectively non-negative or positive) numbers. is the identity matrix. is a zero vector of appropriate dimensions. For a square matrix , represents eigenvalues of in an ascending order. The smallest eigenvalue of is given by and the second smallest eigenvalue is given by . If and are real matrices, then is a block matrix that defines the Kronecker product of and .
The interaction topology in a multi-agent system is represented using an undirected graph with as the set of agents and as the set of edges. The neighbor set of the agent is , where . The number of agents is the cardinality of . Let be the degree matrix of and be the adjacency matrix of , with elements , then is the Laplacian matrix of . By definition, is a symmetric positive semidefinite matrix that encodes the connectivity of the agents and their interaction topology in .
If is continuously differentiable in , then is the gradient of with respect to . If is twice continuously differentiable and strictly convex in then is a symmetric positive definite matrix of second-order partial derivatives of with respect to .
Consider the following dynamical system
[TABLE]
where state , input , and output , with (of appropriate dimensions) sufficiently smooth and satisfying .
Definition 1.1** ([31]).**
The system (1) is said to be passive if there exists a positive semidefinite storage function (Lyapunov function) , continuously differentiable in such that .
For scalars , if or , and otherwise.
The remainder of the paper is mainly divided into two sections. Section 2 discusses the main results of the paper and Section 3 presents examples to validate the proposed work. Subsection 2 is divided as follows: Section 2.1 describes the consensus-based distributed optimization problem. In Subsection 2.2.1 the adaptive synchronization technique is elaborated. Subsection 2.2.2 formulates the adaptive distributed primal-dual dynamical algorithm to solve distributed optimization problem proposed in Subsection 2.1. Subsections 2.3 and 2.4 present passivity and stability analysis of the proposed dynamics. In Subsection 2.5 the convergence bounds of the proposed algorithm are obtained and the proof for an accelerated convergence of the same is provided. Subsection 2.6 provides -gain analysis of the proposed dynamics and establishes a correlation between both robustness and rate of convergence of the same. Section 3 presents the application of the proposed dynamics to the distributed least squares and the distributed support vector machines problems. Some numerical examples of academic interests are also discussed. Section 4 concludes the paper.
2 Problem Formulation and main results
2.1 Distributed Optimization
Consider the following distributed optimization problem
[TABLE]
where and . It is assumed that the functions is twice differentiable and strongly convex, and is convex. The optimization problem (2) can be decomposed into subproblems wherein each subproblem minimizes the cost subject to the consensus constraint and inequality constraints . The problem (2) can not be fully decoupled into a set of subproblems because of the consensus constraints, but it can be addressed as a network-based multiagent optimization problem using graph theory as a tool. Let an undirected and connected graph describe the communication topology of the underlying network, where denotes the set of agents or subproblems, and denotes the set of communication links. Each agent minimizes a local cost function subject to the consensus constraints and the local inequality constraints . The global consensus corresponds to the optimal solution of (2), when . The index is the number of inequality constraints associated with the scalar .
The strong duality of (2) is subject to the convexity of and the constraint satisfaction given by the Slater’s condition (see, [32]), which is as follows: Assuming that there exists an such that , then is strictly feasible, where is the domain of (2) defined as . The convexity of strongly imply the uniqueness of its optimal solution .
Assumption 1**.**
* is strongly convex and -smooth, i.e., for all ,*
[TABLE]
The Lagrangian function of the problem (2) is given by:
where is a Lagrange multiplier associated with the consensus constraint and is a Lagrange multiplier associated with the inequality constraint . The vector notations of the respective Lagrange multipliers are and .
Remark 1**.**
Assuming that the Slater’s condition is satisfied and a strong duality holds, the saddle-point satisfies the Karush-Kuhn-Tucker (KKT) conditions derived the Lagrangian (LABEL:dlag1), as follows:
[TABLE]
In order to ensure the global consensus of the states , the Lagrangian function defined in (LABEL:dlag1) is augmented with the term . The augmented Lagrangian function is defined below:
[TABLE]
Remark 2**.**
Note that augmenting the Lagrangian (LABEL:dlag1) with does not affect its convexity-concavity properties. This owes to the fact that is a positive semidefinite function of the primal variable . Thus the saddle-point satisfying (4) also satisfies the following KKT conditions for the Lagrangian (5):
[TABLE]
Using the augmented Lagrangian (5), the primal-dual dynamics is derived as follows:
[TABLE]
With the primal-dual dynamics derived as given in (7), the following subsection develops the ADPDD.
2.2 Adaptively Synchronized Distributed Primal-dual dynamics
The following subsection presents the adaptive synchronization mechanism which is later integrated with the dynamics defined in (7) to arrive at ADPDD.
2.2.1 Adaptive synchronization
The adaptive synchronization mechanism has been widely used in multi-agent systems to guarantee synchronization between the agents with respect to their state variables [29, 33], which is explained subsequently.
The primal variables associated with each agent evolve according to
[TABLE]
as described in (7). By performing gradient descent on (5), the primal dynamics (8) can be further derived as:
[TABLE]
Let corresponds to the following term in (9):
[TABLE]
where the interconnection strength or the coupling weight belongs to the adjacency matrix such that
[TABLE]
The equation (10) is regarded widely as the consensus protocol or the consensus law[29, 34]. Define further , the consensus protocol (10) can be modified to accommodate as given below:
[TABLE]
Similarly,
[TABLE]
is a compact form representation of (11).
If and are neighbors in with defined as the local synchronization error, then the coupling weight can be represented as a function of , i.e. , where monotonically increases in . It yields a stronger synchronization between the primal variables of the coupling agents which motivates to incorporate adaptive synchronization to address the convergence rate of the distributed primal-dual dynamics. In line with this, the following coupling weight update rule is proposed:
[TABLE]
where is the adaptive gain constant.
Remark 3**.**
Represent (13) in the form , throughout the rest of the paper it is assumed that the real valued function is Lipschitz continuous.
The dynamics (13) addresses two questions, viz. how far from each other the local primary variables are and how fast they can be synchronized to a common trajectory. The quadratic appearance of and in (13) ensures that it is monotonically increasing in .
2.2.2 Integrating the adaptive coupling law (13) with the primal-dual dynamics (7)
By integrating the adaptive coupling law (13) with the PDD (7) and partitioning the resulting dynamics into three interconnected subsystems i.e., (primal partition), (consensus dual partition), and (inequality dual partition) as shown in Fig. 1, yields:
[TABLE]
The system represents the dynamics in the stacked vector form with and as its input and output respectively, as given below:
[TABLE]
where and , and .
The ADPDD (14)-(16) has been characterized as the feedback interconnected networked system as shown in the Fig. 1. Each agent in the underlying network is diffusively coupled with its neighboring agents by virtue of the communication topology that defines the interaction between such agents on the graph . It can be noted that the network representation in Fig. 1 is independent of the graph parameters such as communication topology, number of agents, and interaction links. Irrespective of such parameters, if the graph is connected, one can arrive at the stability results of the underlying network by only verifying its passivity properties. Towards this end, the following subsection first motivates the passivity analysis of the network shown in Fig. 1 which further leads to its closed loop stability and robustness analysis.
2.3 Passivity based stability analysis of ADPDD
This section begins with passivity analysis of the subsystems , , and their feedback interconnection shown in Fig. 1 and then moves towards the stability and robustness analysis of the said feedback interconnection. The Krasovskii type storage function has been defined for each subsystem (see, [15]) which has led to a new passivity property with differentiation at both ports[35, Proposition 2]. The intuition behind this proposition is to define the Krasovskii type storage function for the dynamical system defined in (1), such that , where and are considered as port variables. This inequality shows that the map from the port input to the port output is passive. Motivated by this result, subsequently it is shown that the ADPDD is a passive system.
2.3.1 is passive
Proposition 2.1**.**
Assuming that the graph is connected and is strictly convex in , if there exists that satisfies (4), then the subsystem is passive with port variables .
Proof.
Let with defined as follows:
[TABLE]
where is a parameter to be selected.
Consider the following storage function for the update law (13) [29].
[TABLE]
Differentiating (18) with respect to time yields the following:
[TABLE]
Acknowledging the graph symmetry and substituting for , (19) modifies to
[TABLE]
Now, consider the following storage function for , which is a sum of Krasovskii-type storage function of and (18):
[TABLE]
Differentiating (21) with respect to time and using (20) yields,
[TABLE]
Notice that and choosing makes the term in (22) negative definite. Since for a non-negative value of , the inequality (22) implies that the subsystem is output strictly passive (“OSP”[30]) with respect to the port variables and . ∎
2.3.2 is passive
Proposition 2.2**.**
Assuming that the graph is connected and is strictly convex in , if there exists satisfying (4), then the subsystem is passive with port variables .
Proof.
Consider a Krasovskii-type storage function for as given below:
[TABLE]
Differentiating (23) with respect to time yields,
[TABLE]
(24) yields the following inequality,
[TABLE]
Hence, the subsystem is passive with respect to port variables and . ∎
2.3.3 is passive
In the following, is modeled as a switched dynamical system.
The dynamics in (16) becomes discontinuous when and . The value of switches from to [math]. To further clarify that, (16) is reformulated below as given in Kose [20].
[TABLE]
From (26), the projection is seen to be active for the second case. Let and be an arbitrary switching signal. Then
[TABLE]
represents the switching time instances when there is an active projection. Considering (27), the inequality constraint dynamics given in (16) takes the form of a switched system:
[TABLE]
where . Let be the Lyapunov function associated with . It is defined as given below:
[TABLE]
Proposition 2.3**.**
The subsystem is passive with port input , and port output for each pair of switching time instances corresponding to (28) where such that and for .
Proof.
Differentiating (29) with respect to time yields,
[TABLE]
[TABLE]
Thus,
[TABLE]
(32) ensures that the switched system (28) represents a finite family of passive systems. However, it must be ensured that the Lyapunov function does not increase during the switching events. In line with this, the following two cases have been considered:
It may happen for some in (28), that the function goes from negative to positive through [math]. This will cause the Lyapunov function to change from to . If that happens, the Lagrangian multiplier will add a new term to . Since, is continuous in time, (32) holds for as well as . Hence, . 2. 2.
In this case the projection of constraint for a given becomes active, i.e., reaches to [math] from a positive value for the constraint of the machine. Hence, the corresponding term of the Lyapunov function will disappear. In turn, the following inequality will be satisfied. .
Hence, in both the cases, the Lyapunov function will be non-increasing. ∎
2.4 Stability analysis of the feedback interconnection shown in Fig 1.
Proposition 2.4**.**
Let then the interconnected network dynamics (14)-(16) is passive from the input to the output .
Proof.
Let be the candidate Lyapunov function for the interconnected system represented in Fig. 1 such that
[TABLE]
Differentiating (33) and using (22), (24), (31) yields
[TABLE]
Thus, the interconnected network dynamics (14)-(16) is passive, if strictly holds.
The following result establishes the boundedness of the trajectories of (14)-(16).
Proposition 2.5**.**
The trajectories of (14)-(16) are bounded for any finite initial conditions.
Proof.
To show that the trajectories of (14)-(16) are bounded, consider the following storage function:
[TABLE]
where is the storage function defined in (18). Differentiating (36) with respect to time yields
[TABLE]
Note that because and as confirmed by (28). Using first order condition of convexity-concavity of the Lagrangian function (5) and replacing by right-hand side of (20), (38) modifies to the following:
[TABLE]
Since is the saddle-point of (5), with yields the following
[TABLE]
which is sufficient to ensure that the trajectories of (14)-(16) are bounded. ∎
In what follows, the asymptotic stability of the saddle-point solution of (14)-(16) is established. To this end, the underlying networked dynamics is represented as a hybrid system wherein , are represented as continuous-time dynamical systems and is represented as a system with right-hand side discontinuity. The framework of LaSalle’s invariance principle for hybrid dynamical systems (see, [27]) is stated below, which in our case provides a useful result on the convergence of (14)-(16) to the saddle point solution that satisfies (4).
Proposition 2.6**.**
Consider the hybrid networked dynamics (14)-(16) and let , and be compact and positively invariant. Assuming that the Lyapunov function defined in (33) is continuously differentiable and along the trajectories of , every trajectory in converges to , where is a maximal positive invariant set of such that
* for a fixed .* 2. 2.
* for a switching instance between and .*
∎
Proposition 2.6 gives the next result on the convergence of (14)-(16) to the saddle point solution that satisfies the conditions in (4).
Proposition 2.7**.**
The hybrid network dynamics (14)-(16) converges to the saddle point solution satisfying (4).
Proof.
From Proposition 2.6, for a fixed , . Thus the primal as well as dual dynamics in (14)-(16) converge to the saddle point solution contained within the set . If then . However, if , then will penalize the constraint violation by rising to a large value. Since all trajectories are bounded, it contradicts the continuity of , thus . To this end, the solutions of (14)-(16) also satisfy the KKT conditions (4) and yield the saddle point solution . ∎
Choosing and using (12), (35) modifies to
[TABLE]
Proposition 2.8**.**
The saddle point solution of (5) is asymptotically stable.
Proof.
The proof is straightforward from Proposition 2.4 and Proposition 2.7 and (40). ∎
In the recent article [36] the global asymptotic stability of the primal-dual dynamics is proved by using the Lyapunov function similar to that of the sum of Krasovskii-type Lyapunov function (33) and the Lyapunov function defined in (36). This result can be extended to the globally asymptotic stability of the saddle-point of (5).
Remark 4**.**
Let denote the candidate Lyapunov function for the ADPDD (14)-(16), given as sum of the candidate Lyapunov functions (33) and (36) as follows:
[TABLE]
If Assumption 1 holds then the trajectories of (14)-(16) converge to the saddle-point which is globally asymptotically stable. The proof of the Remark would be similar to proof the of [36, Theorem 5.1]. Hence it is omitted from here to avoid repetition.
With the global asymptotic stability of the proposed dynamics (14)-(16) established, the subsequent section addresses its rate of convergence and its comparison with the rate of convergence with the primal-dual dynamics without adaptive weights.
2.5 Accelerated convergence using ADPDD
Let define the set of coupling weights, and define the cardinality of the edge set . Then, in view of its definition, the Laplacian matrix is a parameter varying, real and symmetric matrix, which is differentiable and uniformly continuous on . As a consequence, the following hold:
Statement 1**.**
There exists such that the spectral norm .
Statement 2**.**
The gradient of with respect to is bounded above by some scalar , .
Let be the Laplacian matrix of whose coupling weights are constant parameters, then results in a constant matrix.
Proposition 2.9**.**
If the coupling weights evolve according to the law (13), then the following holds :
[TABLE]
Proof.
If is the incidence matrix of the undirected graph , then the Laplacian matrices and can be written as:
[TABLE]
where is a diagonal matrix containing the coupling weights. To prove (42), it is first proved that .
[TABLE]
For an undirected graph , . Then , . Hence,
[TABLE]
in fact, is a diagonal matrix with the coupling weights , thus . Thus from the above reasoning, and (46),
[TABLE]
[TABLE]
Let be the eigenvalue in the ordered-pair of eigenvalues represented below:
[TABLE]
Then according to Courant-Fischer theorem [37],
[TABLE]
where is the eigenvector (vector of all ones) corresponding to the eigenvalue . Thus for ,
[TABLE]
∎
Proposition 2.10**.**
If the coupling weights evolve according to (13), then the following always hold:
[TABLE]
Proof.
The proof simply follows from the inequality (48). Taking the ratio of the ordered pair of eigenvalues of and , yields the following:
[TABLE]
But, for , the inequality (52) strictly holds. Thus
[TABLE]
∎
Proposition 2.9 and 2.10 can be further used to prove that the adaptive primal dual dynamics has an accelerated, yet bounded convergence rate as compared to the conventional primal-dual dynamics.
Proposition 2.11**.**
If the inequality (50) holds, then the primal dynamic in (14), under the adaptive coupling law (13), achieves accelerated convergence.
Proof.
Below a timescale separation is enforced in the dynamics of the primal subsystem ,
[TABLE]
with ensuring that the primal variable evolves faster than the coupling weights .
The primal subsystem has two control inputs , to study the primal dynamics with respect to in (12), let us analyze the primal subsystem when is at steady state or equal to [math]. With the assumption that the coupling weight dynamics is much slower, the primal dynamics is re-written as:
[TABLE]
where .
Using Assumption 1 it can be proved that the primal dynamics (56) is strongly monotone for all by evaluating the Jacobian of , i.e. , where is the modulus of convexity of (from Assumption 1. Since , the Jacobian is symmetric and positive definite . It proves that is strongly monotone by virtue of which the primal dynamics (56) converges to the unique optimizer . Thus uniqueness of the primal optimizer remains invariant under the adaptive coupling law (13).
The following result establishes the accelerated convergence of (56) with respect to the unique optimizer . Let define the Lyapunov candidate function as given below:
[TABLE]
Differentiating with respect to time ,
[TABLE]
where . Therefor,
[TABLE]
or
[TABLE]
Further, since the primal-dual dynamics has a bounded convergence with respect to the saddle point solution (see Proposition 39), using Assumption 3, and Remark 4, every initial condition approaches the optimal solution faster than the usual. Thus the accelerated convergence holds globally. Considering the upper bound on as given in (53), let and . Then it is seen that . Hence proved. ∎
Remark 5**.**
The ADPDD (14)-(16) achieves accelerated convergence to the saddle point solution that satisfies the KKT conditions (2).
Proof.
The proof follows from Proposition 2.7 and Proposition 2.11. The occurrence of the primal optimizer and the dual optimizers is simultaneous.
Recall from (14)-(16) that , and . which also implies that and . Thus accelerated convergence of (14) to the primal optimizer implies the accelerated convergence of both (15) and (16) converge to the dual optimizers . ∎
Remark 6**.**
Since the adaptive synchronization in the primal variables results in an accelerated convergence to the primal optimizer , the synchronization error remains lower than that of DPDD for all time. This significantly reduces the consensus constraint violation and thus the dual variable pertaining to the consensus constraints does not become unnecessarily large for the ADPDD problem. The uniqueness of the primal optimizer for both Lagrangian functions (5) and (LABEL:dlag1) owes to the strongly monotone property of as discussed in the Proposition 2.11. However, the dual dynamics (15) (which solely a function of the adaptively coupled primal variables) is not strongly monotone, which implies that the Lagrangian (5) and (LABEL:dlag1) are not strongly concave with respect to . Thus, the dual dynamics (15) under the effect of adaptively synchronized primal variables and the one without adaptively synchronized primal variables settle to different equilibrium states. This further indicates that there exists a unique primal optimizer for both Lagrangian functions (5) and (LABEL:dlag1) but the same does not hold for the dual optimizer . Since the dual variable pertains to the local inequality constraints, it remains unaffected by the adaptive synchronization in primal variables. Hence, the dual optimizer is also unique for both (5) and (LABEL:dlag1).
The convergence rate of the distributed primal-dual dynamics is improved under the influence of adaptive synchronization. However, it may adversely affect the robustness of the proposed dynamics. Thus, there arises a necessity to quantify the robustness of the proposed dynamics with respect to the rate of convergence. The analysis presented below obtains a relation between the convergence rate of the proposed dynamics and its -gain.
2.6 Robustness analysis of the network dynamics with respect to the exogenous inputs
Before proceeding with the robustness analysis of this section, it is worth noting the following remark on robustness property of the passive dynamical systems.
Remark 7**.**
From the inequalities (22), (25), and (32), it is apparent that the interconnected network dynamics comprising (14)-(16) is passive, and inherently robust to the perturbations arising in the primal and dual variables [see, Proposition 4.3.1, Remark 4.3.3 of [30]].
Remark 7 states the qualitative behavior of the proposed dynamics with respect to the notion of robustness. In the following, the robustness of the proposed dynamics against exogenous inputs is quantified in terms of the -gain.
Consider without loss of generality, the new inputs to (14)-(16) as
[TABLE]
respectively, where corresponds to the perturbations in the input . As discussed in [38], represent additive uncertainties or disturbances such as the numerical error accumulated in the corresponding variables. In what follows, the robustness of the ADPDD is quantified using -gain analysis of dynamical systems. Let and .
Proposition 2.12**.**
The interconnected network dynamics (14), (15), and (16) with updated according to (13), remains stable with the -gain, , if strictly holds.
Proof.
Replacing the inputs in (14)-(16) by the new ones as defined in (62), the time differential of the Lyapunov function (33) modifies to the following:
[TABLE]
Acknowledging that and using (12) in (63) further yields
[TABLE]
where since is positive definite. With , the -gain of the interconnected network dynamics, from the port input to the port output can be calculated by setting to [math]. From inequality (64), the map from the input to the output remains finite -gain stable around the saddle point , when the corresponding -gain, satisfies
[TABLE]
∎
The inequality (65), clearly indicates that the gain corresponding to the adaptive distributed primal-dual dynamics reduces in margin as compared to the gain corresponding to the distributed primal-dual dynamics (without adaptive synchronization). Using (53), one can obtain the following expression for the -gain in the worst case:
[TABLE]
Comparing (65) and (66), it can be found out that the -gain for the ADPDD has a reduced margin than that of the DPDD. Thus the algorithm calls for trade-off between the robustness and the accelerated convergence of the proposed dynamics. While the adaptive synchronization improves the rate of convergence of the primal-dual dynamics, it simultaneously degrades the robustness of the proposed algorithm wherein the worst-case -gain is quantified by in (66).
3 Applications and Numerical Examples
This section discusses the application of the proposed dynamics to the distributed optimization problems concerning least squares[7, 39] and support vector machines[40]. These problems are solved online over a network of wireless sensors or computing devices, in such premises the rate of convergence is a vital factor. In the following, the proposed dynamics (14)-(16) is employed to solve the distributed least squares[41] and distributed support vector machines[5, 6] problems.
3.1 Distributed Least Squares
Distributed least squares problems have been widely studied over recent years[42, 43, 12]. These techniques have found applications in parameter estimation over wireless sensor networks [44], estimation of electro-mechanical oscillation modes of large power system networks [41, 45] etc. Each agent in the network is given a task to simultaneously and iteratively compute the same least squares solution to the linear equation where with and .
Formally, the least squares problem is defined as given below[46]:
[TABLE]
3.1.1 Data partitioning
It is assumed that each agent in the network adheres to consecutive rows of and . For the sake of simplicity, equal partitioning of the rows of is considered. However, the proposed approach would hold even if the partitioning is uneven.
[TABLE]
where and .
3.1.2 Distributed formulation of least squares problem
The consensus-based distributed optimization formulation of (67) would require the local estimates to reach consensus on the global optimizer . With data partitioning as defined above, the distributed version of the least squares problem (67)[41] is defined as
[TABLE]
3.1.3 Solution to the distributed least squares problem (69) using ADPDD
The Lagrangian problem corresponding to (69) can be defined as
[TABLE]
Similarly to (7), the proposed dynamics can be derived from (70) as given below:
[TABLE]
where and .
3.1.4 Simulations
The simulation parameters are randomly generated matrix and vector . The network with a cyclic graph topology is assumed to comprise of agents wherein each agent holds component of as well as the respective . Each agent in the network computes local estimates and reaches consensus over the global solution as shown in the Fig. 2. The simulations were carried out using , the rate of convergence of (71) is compared with that of the non-adaptive version of the distributed primal-dual dynamics employed to solve the problem (69). The rate of convergence is significantly improved as shown in the Fig. 3. The global solution to (69) is also compared with the solution of the least square solver in . The global optimizer obtained using the proposed algorithm coincides with the optimal solution obtained using as shown in the Fig. 4.
3.2 Quadratic-inequality Constrained Distributed Least Squares
A box-constrained linear least squares problem is the one in which the upper and lower bounds on the estimated values are incorporated to handle limitations of the physical system. These methods are studied with applications to GPS positioning [47], geodesic applications [48, 49, 50] etc. The box-constrained least squares problem is generally defined as follows:
[TABLE]
where and are the upper and lower bounds of the variable . It is known that a quadratic constraint formulation of the box constrained least square problem is an efficient approach to obtain the optimal solution of (73) [39]. The quadratic-constrained equivalent formulation of the box-constrained least square problem (74) is given as:
[TABLE]
where is the midpoint of the interval . It is computed as with .
A distributed framework for the quadratic-constrained least squares problem (74) can be obtained as:
[TABLE]
The ADPDD formulation of the problem (75) is similar to that of the proposed dynamics (14)-(16). Hence, it is omitted to avoid repetition of the equations.
3.2.1 Simulations
For the sake of simplicity and readability of the simulation results, a small problem of the form (75) is taken as a proof of concept with the parameters and . A network with a cyclic graph topology containing agents is considered wherein each agent holds on to component of the matrix . All agents iteratively reach the global consensus of the optimizer value with , as shown in the Fig. 5. It can be observed that the trajectories , and synchronize to respective common trajectories at around . The result is also compared with the solution of and it can be seen from the Fig. 6 that the global optimizer of (75) coincides with the solution obtained using . The accelerated convergence of the proposed algorithm employed to solve (75) is evident from the Fig. 7.
Remark 8**.**
A strong synchronization between the trajectories of the agents imply guaranteed convergence to the global optimizer under sparse communication events. It is also indicative of the fact that the communication between the agents need not be periodic. The proposed algorithm can be augmented with the event-triggered control framework.
3.3 Distributed Support Vector Machines
Support vector machines (SVMs) are supervised learning based paradigms in the machine learning domain, used for classification and regression analysis on raw data, (see [40]). For applications with a huge amount of data, there are often limitations with respect to bandwidth requirement, data storage and processing capability of the computing machine, response time, etc. As it turns out, a single computing machine is inefficient in dealing with the SVM algorithm with large datasets. Distributed versions of support vector machines have been proposed as an alternative method to overcome these limitations, as discussed in [5, 6]. With the aim of enabling accelerated convergence to the optimal solution, the distributed SVM problem is formulated in terms of the adaptive primal-dual dynamics. However, due to the complexity involved with simulations of large-scale SVM problems, the present work only considers the mathematical formulation and does not provide the simulation results for the same.
A problem formulation of the support vector machines for the case of non-separable data is given below:
[TABLE]
where is the margin that separates positive and negative observations, is a paired observation sample, and are weight and bias variables, respectively. is called as a hinge loss function. C is used to trade off the sum over all slack variables against the size of the margin. is the scaling factor.
3.3.1 Data Partitioning
It is assumed that the set of observations is horizontally partitioned and distributed among computing nodes in [6], where now represents the computing nodes and the set of edges describes communication links between them. Assuming that the graph is connected and enabling only one-hop neighborhood communication, each node communicates with its neighbors belonging to . Each node stores a sample set of labeled observations, denoted by . Note that:
is a set of labeled observations allocated to computing node, , where is a superset of the labeled observations. 2. 2.
. 3. 3.
is a class label.
In what follows, an adaptive primal-dual dynamics based formulation of distributed support vector machines is provided.
3.3.2 ADPDD formulation of Distributed Support Vector Machines
A distributed version of the support vector machines problem (76) is formulated as given below (see, [5]):
[TABLE]
The objective function in (77) is a differentiable and strongly convex in . The decision (primal) variables are , where are the consensus constraints with as a neighbor of if and only if . Let .
The Lagrangian formulation of the problem (77) is given by
[TABLE]
where are the Lagrange multipliers associated with inequality constraints and , of computing node, and are the Lagrange multipliers associated with coupling constraints of and nodes. is the Laplacian matrix of the undirected graph .
Let (with , ) then, . The interconnected network dynamics for the distributed support vector machines problem (77) is represented as follows:
[TABLE]
The subsystem contains only consensus-dual variables, with and as its input and output respectively, as given below:
[TABLE]
The subsystem contains the slack variable, and the dual variables corresponding to the inequality constraints, with and as its input and output respectively, as given below:
[TABLE]
where , and with .
Thus, the proposed dynamics can be implemented for solving the distributed support vector machines problem (77) as shown in (79)-(81). The solution of the underlying dynamics will correspond to the saddle-point solution of (78), wherein the primal solution is the optimal solution of (77).
In the following, two different formulations of (2) are considered and the results of the proposed dynamics are compared with that of the non-adaptive version of the distributed primal-dual dynamics.
3.4 Numerical Example 1
Consider the following distributed optimization problem consisting agents having more than one variable and convex inequality constraints.
[TABLE]
where the objective function associated with each agent is given below
[TABLE]
with the following local inequality constraints
[TABLE]
The graph connectivity is assumed to be as follows: , , and . The ADPDD algorithm is employed to solve the problem (82), and the corresponding trajectories are shown in Fig. 8. The primal optimizers are . Besides that, in Fig. 9 the steady state eigenvalues of are plotted along with the eigenvalues of . The eigenvalue at steady state is equal to as compared to the eigenvalue corresponding to a non-adaptive DPDD, . From Proposition 2.9 and Proposition 2.11, it can be seen that the adaptive synchronization has sought to increase the rate of convergence of the ADPDD.
3.5 Numerical Example 2
In this subsection, the local inequality constraints associated with each agent are relaxed and the following optimization problem is considered on a random graph with agents as shown in Fig. 10. Note that the degree of each agent is selected randomly.
[TABLE]
with a randomly generated Hessian . The proposed dynamics is employed to solve (89), first considering and then . Fig. 11 and Fig. 12 correspond to the case of while Fig. 13 and Fig. 14 correspond to the case of . It can be seen that for the latter case the convergence is much faster. This owes to the difference between the resulting eigenvalues, i.e., for the case of , the second smallest eigenvalue yields to be whereas the same for the case of increases to . The eigenvalue results for both values of are shown in the Fig. 12 and the Fig. 14.
4 Conclusions
In this paper, an adaptive distributed primal-dual dynamics is proposed to solve inequality and consensus constrained distributed optimization problems. The adaptive synchronization of the primal variables is brought into play by allowing the coupling weights to update according to the difference between the local trajectories (trajectories belonging to the neighboring nodes or agents) as well as the difference between the rate of change of the local trajectories respectively. It is proved that the proposed dynamics represents a network of feedback-interconnected passive dynamical systems which are asymptotically stable. Further, by allowing a time-scale separation between the adaptive coupling law and primal dynamics, stronger convergence bounds for the primal dynamic are derived, and it is proved that the adaptively coupled primal dynamics converges to the unique primal optimizer.
The performance of the proposed dynamics is quantified in terms of the induced -gain from the disturbance input to the output. The effect of adaptive synchronization on the -gain is discussed and it is established that the adaptive distributed primal-dual dynamics are comparatively less robust to the exogenous input disturbances than the distributed primal-dual dynamics. On the other hand, the analysis also revealed that in order to achieve accelerated convergence to the saddle-point solution, the proposed algorithm must call for a trade-off between the convergence and the robustness parameters.
The future scope of the work will be directed towards improving the rate of convergence of the proposed dynamics without compromising its robustness properties. Its applications to large-scale distributed optimization problems such as distributed support vector machines [6], distributed least squares [12] etc will be considered.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Michael Rabbat and Robert Nowak. Distributed optimization in sensor networks. In Proceedings of the 3rd international symposium on Information processing in sensor networks , pages 20–27. ACM, 2004.
- 2[2] Bjorn Johansson, Cesare Maria Carretti, and Mikael Johansson. On distributed optimization using peer-to-peer communications in wireless sensor networks. In 2008 5th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks , pages 497–505. IEEE, 2008.
- 3[3] Alexander Bertrand and Marc Moonen. Consensus-based distributed total least squares estimation in ad hoc wireless sensor networks. IEEE Transactions on Signal Processing , 59(5):2320–2330, 2011.
- 4[4] Peng Yi, Yiguang Hong, and Feng Liu. Distributed gradient algorithm for constrained optimization with application to load sharing in power systems. Systems & Control Letters , 83:45–52, 2015.
- 5[5] Pedro A Forero, Alfonso Cano, and Georgios B Giannakis. Consensus-based distributed support vector machines. Journal of Machine Learning Research , 11(May):1663–1707, 2010.
- 6[6] Marco Stolpe, Kanishka Bhaduri, and Kamalika Das. Distributed support vector machines: an overview. In Solving Large Scale Learning Tasks. Challenges and Algorithms , pages 109–138. Springer, 2016.
- 7[7] Angelia Nedić and Ji Liu. Distributed optimization for control. Annual Review of Control, Robotics, and Autonomous Systems , 1:77–103, 2018.
- 8[8] Daniel Pérez Palomar and Mung Chiang. A tutorial on decomposition methods for network utility maximization. IEEE Journal on Selected Areas in Communications , 24(8):1439–1451, 2006.
