Resilient Distributed Optimization Algorithms for Resource Allocation
Cesar A. Uribe, Hoi-To Wai, Mahnoosh Alizadeh

TL;DR
This paper introduces a resilient distributed optimization algorithm that maintains convergence despite Byzantine attacks on communication channels, enhancing the security and robustness of resource allocation in cyber-physical systems.
Contribution
It develops a robust primal-dual algorithm incorporating advanced statistics to counteract Byzantine attacks in distributed resource allocation.
Findings
Algorithm converges to a neighborhood of the robust model
Neighborhood size is proportional to attack fraction
Enhances security in distributed resource management
Abstract
Distributed algorithms provide flexibility over centralized algorithms for resource allocation problems, e.g., cyber-physical systems. However, the distributed nature of these algorithms often makes the systems susceptible to man-in-the-middle attacks, especially when messages are transmitted between price-taking agents and a central coordinator. We propose a resilient strategy for distributed algorithms under the framework of primal-dual distributed optimization. We formulate a robust optimization model that accounts for Byzantine attacks on the communication channels between agents and coordinator. We propose a resilient primal-dual algorithm using state-of-the-art robust statistics methods. The proposed algorithm is shown to converge to a neighborhood of the robust optimization model, where the neighborhood's radius is proportional to the fraction of attacked channels.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\NewEnviron
killcontents
Resilient Distributed Optimization Algorithms for Resource Allocation
César A. Uribe†, Hoi-To Wai†, Mahnoosh Alizadeh CAU and HTW have contributed equally. CAU is with LIDS, MIT, Cambridge, MA, USA. HTW is with Dept. of SEEM, CUHK, Shatin, Hong Kong. MA is with Dept. of ECE, UCSB, Santa Barbara, CA, USA. This work is partially supported by UCOP Grant LFR-18-548175 and CUHK Direct Grant #4055113. E-mails: [email protected], [email protected], [email protected]
Abstract
Distributed algorithms provide flexibility over centralized algorithms for resource allocation problems, e.g., cyber-physical systems. However, the distributed nature of these algorithms often makes the systems susceptible to man-in-the-middle attacks, especially when messages are transmitted between price-taking agents and a central coordinator. We propose a resilient strategy for distributed algorithms under the framework of primal-dual distributed optimization. We formulate a robust optimization model that accounts for Byzantine attacks on the communication channels between agents and coordinator. We propose a resilient primal-dual algorithm using state-of-the-art robust statistics methods. The proposed algorithm is shown to converge to a neighborhood of the robust optimization model, where the neighborhood’s radius is proportional to the fraction of attacked channels.
1 Introduction
Consider the following multi-agent optimization problem involving the average of parameters in the constraints:
[TABLE]
where both and are continuously differentiable, convex functions, and is a compact convex set in . We let and
[TABLE]
such that is an upper bound on the diameters of .
Problem (1) arises in many resource allocation problems with a set of potentially nonlinear constraints on the amount of allowable resources, see Section 1.1 for a detailed exploration.
We consider a system where there exists a central coordinator and agents. In this context, the function and parameter are the utility of the th agent and the resource controlled by agent , respectively. As the agents work independently, it is desirable to design algorithms that allow the agents to solve (1) cooperatively through communication with the central coordinator. Among others, the primal-dual optimization methods [1] have been advocated as they naturally give rise to decomposable algorithms that favor distributed implementation [2]. In addition to their practical success, these methods are supported by strong theoretical guarantees where fast convergence to an optimal solution of (1) is well established. However, the distributed nature of these methods also exposes the system to vulnerabilities not faced by traditional centralized systems. Precisely, existing algorithms assume the agents, and the communication links between central server and agents, to be completely trustworthy. However, an attacker can take over a sub-system operated by the agents, and deliberately edit the messages in these communication links, *i.e., *a Byzantine attack. This might result in an unstable system with possible damages to hardware and the system overall.
In this paper, we propose strategies for securing primal-dual distributed algorithms, e.g., in [1], tailored to solving a relaxed version of the resource allocation problem (1). A key observation is that the existing algorithms depend on reliably computing the average of a set of parameter vectors, , transmitted by the agents. As a remedy, we apply robust statistics techniques as a subroutine, therefore proposing a resilient distributed algorithm that is proven to converge to a neighborhood of the optimal solution of a robust version of (1).
Vulnerabilities of various types of distributed algorithms have been identified and addressed in a number of recent studies. Relevant examples are [3, 4, 5, 6, 7] which study secure decentralized algorithms on a general network topology but consider consensus-based optimization models. Moreover, [8, 9, 10] consider a similar optimization architecture as this paper, yet they focus on securing distributed algorithms for machine learning tasks which assumes i.i.d. functions, a fundamentally different setting from the current paper. Our work is also related to the literature on robust statistics [11, 12], and particularly, with the recently rekindled research efforts on high dimensional robust statistics [13, 14, 15]. These works will be the working horse for our attack resilient algorithm.
Our contributions and organization are as follows. First, we derive a formal model for attack resilient resource allocation via a conservative approximation for the robust optimization problem [cf. Section 3]. Second, we apply and derive new robust estimation results to secure distributed resource allocation algorithms [cf. Section 4]. Third, we provide a non-asymptotic convergence guarantee of the proposed attack resilient algorithm [cf. Section 4.1]. In particular, our algorithm is shown to converge to a neighborhood to the optimal solution of (1), where is the fraction of attacked links.
Notations. Unless otherwise specified, denotes the standard Euclidean norm. For any , denotes the finite set .
1.1 Motivating Examples
Our set-up here can be employed in a wide range of optimization problems for resource allocation and networked control in multi-agent systems, e.g., in the pioneering example of congestion control in data networks [16, 17]; in determining the optimal price of electricity and enabling more efficient demand supply balancing (a.k.a. demand response) in smart power distribution systems [18, 19]; in managing user transmit powers and data rates in wireless cellular networks [20]; in determining optimal caching policies by content delivery networks [21]; in optimizing power consumption in wireless sensor networks with energy-restricted batteries [22, 23]; and in designing congestion control systems in urban traffic networks [24]. These examples would have different utility functions and constraint sets that can be handled through our general formulation in (1). For example, in the power/rate control problem in data networks, the cost functions are usually logarithmic functions associated with rate , e.g., . In demand response applications in power distribution systems, the utilities capture the users’ benefits from operating their electric appliances under different settings. For example, we can capture the cost function of temperature controlled by a price-responsive air conditioner as [19]. In terms of constraints, our general nonlinear constraint formulation can not only capture common linear resource constraints such as link capacity in data networks [16, 17], but can also handle important non-linear constraints arising in many different applications. For example, in radial power distribution systems, nonlinear convexified power flow constraints can be included for distributed demand response optimization (to see a description of distribution system power flow constraints, see, e.g., [25, 26]). This can enable our algorithm to perform demand supply balancing in power disribution systems in a distributed and resilient fashion.
2 Primal-dual Algorithm for Resource Allocation
This section reviews the basic primal-dual algorithm for resource allocation. Let be the dual variable. We consider the Lagrangian function of (1):
[TABLE]
Assuming strong duality holds (e.g., under the Slater’s condition), solving problem (1) is equivalent to solving its dual problem:
[TABLE]
For a given , the inner minimization of (P) is known as the Lagrangian relaxation of (1), which can be interpreted as a penalized resource allocation problem [19].
In a distributed setting, the goal is to solve (1) where the agents only observe a pricing signal received from the central coordinator, and this pricing signal is to be updated iteratively at the central coordinator. As suggested in [1], we apply the primal-dual algorithm (PDA) to a regularized version of (P). Let us define
[TABLE]
such that is -strongly convex and -strongly concave in and , respectively. Let be the iteration index, be the step sizes, the PDA recursion is described by:
[TABLE]
where is the Euclidean projection operator, denotes , and the gradients are:
[TABLE]
[TABLE]
for all , . We denoted as the th element of . In particular, observe that (4) performs a projected gradient descent/ascent on the primal/dual variables.
From the above, both gradients with respect to (w.r.t.) and depend only on the average parameter . We summarize the primal dual distributed resource allocation (PD-DRA) procedure in Algorithm 1. In addition to solving the general problem (1), Algorithm 1 also serves as a general solution method to popular resource allocation problems [19].
As the regularized primal-dual problem is strongly convex/concave in primal/dual variables, Algorithm 1 converges linearly to an optimal solution [1]. To study this, let us denote as the primal-dual variable at the th iteration,
[TABLE]
Fact 1**.**
[1, Theorem 3.5]** Assume that the map is Lipschitz continuous. For all , we have
[TABLE]
where is a saddle point to the regularized version of (P). Setting gives \|{\bm{z}}^{(k+1)}-{\bm{z}}^{\star}\|^{2}\leq\big{(}1-\upsilon^{2}/L_{\Phi}^{2}\big{)}\|{\bm{z}}^{(k)}-{\bm{z}}^{\star}\|^{2}, .
3 Problem Formulation
Despite the simplicity and the strong theoretical guarantee, the PD-DRA method is susceptible to attacks on the channels between the central coordinator and the agents, as described below.
Attack Model. We consider a situation when uplink channels between agents and the central coordinator are compromised [see Fig. 1]. Let be the set of compromised uplink channels, whose identities are unknown to the central coordinator. We define as the set of trustworthy channels. At iteration , instead of receiving from each agent [cf. Step 2(a)], the central coordinator receives the following messages:
[TABLE]
We focus on a Byzantine attack scenario such that the messages, , communicated on the attacked channels can be arbitrary. Under such scenario, if the central coordinator forms the naive average and computes the gradients accordingly, this may result in uncontrollable error since the deviation can be arbitrarily large. It is anticipated that the PD-DRA method would not provide a solution to the regularized version of (P).
Robust Optimization Model. In light of the Byzantine attack, it is impossible to optimize the original problem (P) since the contribution from becomes unknown to the central coordinator. As a compromise, we focus on optimizing the cost function of agents with trustworthy uplinks and the following robust optimization problem as our target model:
[TABLE]
note that is taken away from the decision variables and we have included (10b) to account for the worst case scenario for the resource usage of the agents with compromised uplinks. This is to ensure that the physical operation limit of the system will not be violated under attack. Consider the following assumption which will be assumed throughout the paper:
H 1**.**
For all , the gradient of is bounded with and is -Lipschitz continuous.
We define
[TABLE]
Lemma 1**.**
Under H1. The following problem yields a conservative approximation of (10), i.e., its feasible set is a subset of the feasible set of (10):
[TABLE]
Similar to PD-DRA, we define the regularized Lagrangian function of (12) as:
[TABLE]
Again, the regularized Lagrangian function is -strongly convex and concave in and , respectively.
Our main task is to tackle the following modified problem of (P) under Byzantine attack on (some of) the uplinks:
[TABLE]
and we let be the optimal solution to (P’). Notice that (P’) bears a similar form as (P) and thus one may apply the PD-DRA method to the former naturally. However, such application requires the central coordinator to compute the sample average
[TABLE]
at each iteration. However, the above might not be computationally feasible under the attack model, since the central coordinator is oblivious to the identity of . This is the main objective in the design of our scheme.
4 Robust Distributed Resource Allocation
In this section, we describe two estimators for approximating [cf. (14)] from the received messages (9) without knowing the identity of links in . To simplify notations, we define as a known upper bound to the fraction of compromised channels and assume where less than half of the channels are compromised.
As discussed after (14), the problem at hand is robust mean estimation, whose applications to robust distributed optimization has been considered in the machine learning literature [9, 10, 14] under the assumption that the ‘trustworthy’ signals are drawn i.i.d. from a Gaussian distribution. Our setting is different since the signals , are variables from the previous iteration whose distribution is non-Gaussian in general. Our analysis will be developed without such assumption on the distribution.
We first consider a simple median-based estimator applied to each coordinate . First, define the coordinate-wise median as:
[TABLE]
where computes the median of the operand. Then, our estimator is computed as the mean of the nearest neighbors of \big{[}{\bm{\theta}}_{\sf med}^{(k)}\big{]}_{j}. To formally describe this, let us define:
[TABLE]
where is chosen as . Our estimator is:
[TABLE]
The following bounds the performance of (17).
Proposition 1**.**
Suppose that \max_{i\in{\mathcal{}H}}\big{\|}\bm{\theta}_{i}^{(k)}-\overline{\bm{\theta}}_{\mathcal{}H}^{(k)}\big{\|}_{\infty}\leq r, then for any , it holds that
[TABLE]
Under mild assumptions, the condition \max_{i\in{\mathcal{}H}}\big{\|}\bm{\theta}_{i}^{(k)}-\overline{\bm{\theta}}_{\mathcal{}H}^{(k)}\big{\|}_{\infty}\leq r can be satisfied with , as implied by the compactness of [cf. (2)]. Moreover, for sufficiently small , the right hand side on (18) can be approximated by . However, this median-based estimator may perform poorly for large (especially when \alpha\rightarrow{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{1}/{2}}) or dimension . For these situations, a more sophisticated estimator is required, as detailed next.
To derive the second estimator, we apply an auxiliary result from [15] which provides an algorithm for estimating , as summarized in Algorithm 2. We observe:
Proposition 2**.**
[15, Proposition 16]** Suppose that . For any , Algorithm 2 produces an output such that .
Again, similar to Proposition 1, the required condition above can be satisfied with under mild conditions. Thus, Proposition 2 states that Algorithm 2 recovers up to an error of . Note that this bound is dimension free unlike the median estimator analyzed in Proposition 1.
The idea behind Algorithm 2 is to sequentially identify and remove the subset of points that cannot be re-constructed from the mean of the data points. The solution of the optimization problem in Line 3 measures how well can we recover the data points as an average of the other points. The bounded sample variance assumption guarantees that one can re-construct any element in the set from its mean, thus, all such points that introduce a large error, as quantified by can be safely removed. Line 5 quantifies the magnitude of the optimal point of Line 3, and if such value is large, such points that introduce a large error are down-weighted. The process is repeated until the optimal solution of Line 3 is small enough and a low rank approximation of the optimal can be used to return the sample mean estimate.
Attack Resilient PD-DRA method. The above section provides the enabling tool for developing the resilient PD-DRA method, which we summarize in Algorithm 3. The algorithm behaves similarly as Algorithm 1 applied to (P’), with the exception that the central coordinator is oblivious to , and it uses a robust mean estimator to find an approximate average for the signals sent through the trustworthy links. This approximate value is used to compute the new price signals, and sent back to agents. In particular, the primal-dual updates are described by
[TABLE]
[TABLE]
Lemma 2**.**
Algorithm 3 is a primal-dual algorithm [1] for (P’) with perturbed gradients:
[TABLE]
where we have used concatenated variable as and . Under H1 and assuming that for all , we have:
[TABLE]
[TABLE]
The assumption can be guaranteed since is bounded.
4.1 Convergence Analysis
Finally, based on Lemma 2, we can analyze the convergence of Algorithm 3. Let be a saddle point of (P’) and define
[TABLE]
Theorem 1**.**
Assume the map is -Lipschitz continuous. For Algorithm 3, for all it holds
[TABLE]
where is the total perturbation at iteration . Moreover, if we choose and is upper bounded by for all , then
[TABLE]
Combining the results from the last subsection, the theorem shows the desired result that the resilient PD-DRA method converges to a neighborhood of the saddle point of (P’), if the median-based estimator (17) is used [or if Algorithm 2 is used], where is the fraction of attacked uplink channels. Moreover, it shows that the convergence rate to the neighborhood is linear, which is similar to the classical PDA analysis [1].
Interestingly, Theorem 1 illustrates a trade-off in the choice of the step size between convergence speed and accuracy. In specific, (25) shows that the rate of convergence factor can be minimized by setting . However, in the meantime, the asymptotic upper bound in (26) is increasing with and it can be minimized by setting . This will be a design criterion to be explored in practical implementations.
5 Conclusions
In this paper, we studied the strategies for securing a primal-dual algorithm for distributed resource allocation. Particularly, we propose a resilient distributed algorithm based on primal-dual optimization and robust statistics. We derive bounds for the performance of the studied algorithm and show that it converges to a neighborhood of a robustified resource allocation problem when the number of attacked channels is small.
Acknowledgement
The authors would like to thank the anonymous reviewers for feedback, and Mr. Berkay Turan (UCSB) for pointing out typos in the original submission of this paper.
Appendix A Proof of Lemma 1
Since is -smooth, the following holds
[TABLE]
Furthermore, observe that the gradient of is uniformly bounded by and the diameter of is , then the right hand side of (27) can be upper bounded by
[TABLE]
As such, defining c_{t}\mathrel{\mathop{:}}=\frac{|{\mathcal{}A}|}{N}\big{(}RB+\frac{1}{2}LR^{2}\big{)}, it can be seen that
[TABLE]
implies the desired constraint in (10).
Appendix B Proof of Proposition 1
Fix any . The assumption implies that for all , one has
[TABLE]
We observe that . Applying [8, Lemma 1] shows that the median estimator111At each coordinate, the median is the geometric median estimator of one dimension in [8]. satisfies
[TABLE]
The above implies that for all , we have
[TABLE]
This implies that r_{j}^{(k)}\leq\Big{(}1+\sqrt{\frac{(1-\alpha)^{2}}{1-2\alpha}}\Big{)}r since . We then bound the performance of :
[TABLE]
thus
[TABLE]
Notice that and thus . Gathering terms shows
[TABLE]
The above holds for all . Applying the norm equivalence shows the desired bound.
Appendix C Proof of Lemma 2
Comparing the equations in (21) with (19), (20), we identify that
[TABLE]
[TABLE]
where \big{[}{\bm{e}}_{\bm{\theta}}^{(k)}\big{]}_{i} denotes the th block of . Using H1 and the said assumptions, we immediately see that
[TABLE]
which then implies (22). H1 implies that is -Lipschitz continuous, therefore
[TABLE]
which implies (23).
Appendix D Proof of Theorem 1
Based on Proposition 2, our idea is to perform a perturbation analysis on the PDA algorithm. Without loss of generality, we assume and denote . To simplify notations, we also drop the subscript, denote the modified and regularized Lagrangian function as . Furthermore, we denote the saddle point to (P’) as .
Using the fact that \bm{\theta}^{\star}={\mathcal{}P}_{{\mathcal{}C}}(\bm{\theta}^{\star})={\mathcal{}P}_{{\mathcal{}C}}\big{(}\bm{\theta}^{\star}-\gamma{\nabla}_{\bm{\theta}}{\mathcal{}L}(\bm{\theta}^{\star},\bm{\lambda}^{\star})\big{)}, we observe that in the primal update:
[TABLE]
where (a) is due to the projection inequality . Furthermore, using the Young’s inequality, for any , we have
[TABLE]
Similarly, in the dual update we get,
[TABLE]
Summing up the two inequalities gives:
[TABLE]
where (a) uses the strong monotonicity and smoothness of the map . Setting yields
[TABLE]
Observe that we can choose such that . Moreover, the above inequality implies that evaluates to
[TABLE]
If for all , then converges to a neighborhood of of radius
[TABLE]
Setting concludes the proof.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. Koshal, A. Nedić, and U. V. Shanbhag, “Multiuser optimization: Distributed algorithms and error analysis,” SIAM Journal on Optimization , vol. 21, no. 3, pp. 1046–1081, 2011.
- 2[2] D. P. Palomar and M. Chiang, “A tutorial on decomposition methods for network utility maximization,” IEEE JSAC , vol. 24, no. 8, pp. 1439–1451, 2006.
- 3[3] S. Sundaram and C. N. Hadjicostis, “Distributed function calculation via linear iterative strategies in the presence of malicious agents,” IEEE TAC , vol. 56, no. 7, pp. 1495–1508, 2011.
- 4[4] F. Pasqualetti, A. Bicchi, and F. Bullo, “Consensus computation in unreliable networks: A system theoretic approach,” IEEE TAC , vol. 57, no. 1, pp. 90–104, 2012.
- 5[5] R. Gentz, S. X. Wu, H.-T. Wai, A. Scaglione, and A. Leshem, “Data injection attacks in randomized gossiping,” IEEE TSIPN , vol. 2, no. 4, pp. 523–538, 2016.
- 6[6] S. Sundaram and B. Gharesifard, “Distributed optimization under adversarial nodes,” IEEE TAC , 2018.
- 7[7] Y. Chen, S. Kar, and J. Moura, “Resilient distributed estimation: Sensor attacks,” IEEE Transactions on Automatic Control , 2018.
- 8[8] J. Feng, H. Xu, and S. Mannor, “Distributed robust learning,” ar Xiv preprint ar Xiv:1409.5937 , 2014.
