Resilient Distributed Optimization Algorithms for Resource Allocation

Cesar A. Uribe; Hoi-To Wai; Mahnoosh Alizadeh

arXiv:1904.02638·math.OC·September 11, 2019·CDC

Resilient Distributed Optimization Algorithms for Resource Allocation

Cesar A. Uribe, Hoi-To Wai, Mahnoosh Alizadeh

PDF

TL;DR

This paper introduces a resilient distributed optimization algorithm that maintains convergence despite Byzantine attacks on communication channels, enhancing the security and robustness of resource allocation in cyber-physical systems.

Contribution

It develops a robust primal-dual algorithm incorporating advanced statistics to counteract Byzantine attacks in distributed resource allocation.

Findings

01

Algorithm converges to a neighborhood of the robust model

02

Neighborhood size is proportional to attack fraction

03

Enhances security in distributed resource management

Abstract

Distributed algorithms provide flexibility over centralized algorithms for resource allocation problems, e.g., cyber-physical systems. However, the distributed nature of these algorithms often makes the systems susceptible to man-in-the-middle attacks, especially when messages are transmitted between price-taking agents and a central coordinator. We propose a resilient strategy for distributed algorithms under the framework of primal-dual distributed optimization. We formulate a robust optimization model that accounts for Byzantine attacks on the communication channels between agents and coordinator. We propose a resilient primal-dual algorithm using state-of-the-art robust statistics methods. The proposed algorithm is shown to converge to a neighborhood of the robust optimization model, where the neighborhood's radius is proportional to the fraction of attacked channels.

Equations120

\begin{array}[]{rl}\displaystyle\min_{\bm{\theta}_{i}\in\mathbb{R}^{d},\forall i}&U(\bm{\theta})\mathrel{\mathop{:}}=\frac{1}{N}\sum_{i=1}^{N}U_{i}(\bm{\theta}_{i})\\ {\rm s.t.}&g_{t}\left(\frac{1}{N}\sum_{i=1}^{N}\bm{\theta}_{i}\right)\leq 0,~{}t=1,...,T,\vspace{.1cm}\\ &\bm{\theta}_{i}\in{\mathcal{}C}_{i},~{}i=1,...,N,\end{array}

\begin{array}[]{rl}\displaystyle\min_{\bm{\theta}_{i}\in\mathbb{R}^{d},\forall i}&U(\bm{\theta})\mathrel{\mathop{:}}=\frac{1}{N}\sum_{i=1}^{N}U_{i}(\bm{\theta}_{i})\\ {\rm s.t.}&g_{t}\left(\frac{1}{N}\sum_{i=1}^{N}\bm{\theta}_{i}\right)\leq 0,~{}t=1,...,T,\vspace{.1cm}\\ &\bm{\theta}_{i}\in{\mathcal{}C}_{i},~{}i=1,...,N,\end{array}

θ, θ^{'} \in C_{i} max ∥ θ - θ^{'} ∥ \leq R, i = 1, ..., N,

θ, θ^{'} \in C_{i} max ∥ θ - θ^{'} ∥ \leq R, i = 1, ..., N,

\begin{split}&{\mathcal{}L}(\{\bm{\theta}_{i}\}_{i=1}^{N};\bm{\lambda})\mathrel{\mathop{:}}=\frac{1}{N}\sum_{i=1}^{N}U_{i}(\bm{\theta}_{i})+\sum_{t=1}^{T}\lambda_{t}\!~{}g_{t}\Big{(}\frac{1}{N}\sum_{i=1}^{N}\bm{\theta}_{i}\Big{)}.\end{split}\vspace{-.1cm}

\begin{split}&{\mathcal{}L}(\{\bm{\theta}_{i}\}_{i=1}^{N};\bm{\lambda})\mathrel{\mathop{:}}=\frac{1}{N}\sum_{i=1}^{N}U_{i}(\bm{\theta}_{i})+\sum_{t=1}^{T}\lambda_{t}\!~{}g_{t}\Big{(}\frac{1}{N}\sum_{i=1}^{N}\bm{\theta}_{i}\Big{)}.\end{split}\vspace{-.1cm}

λ \in R_{+}^{T} max θ_{i} \in C_{i}, \forall i min L ({θ_{i}}_{i = 1}^{N}; λ) .

λ \in R_{+}^{T} max θ_{i} \in C_{i}, \forall i min L ({θ_{i}}_{i = 1}^{N}; λ) .

L_{υ} ({θ_{i}}_{i = 1}^{N}; λ) : = L ({θ_{i}}_{i = 1}^{N}; λ) + \frac{υ}{2 N} \sum_{i = 1}^{N} ∥ θ_{i} ∥^{2} - \frac{υ}{2} ∥ λ ∥^{2},

L_{υ} ({θ_{i}}_{i = 1}^{N}; λ) : = L ({θ_{i}}_{i = 1}^{N}; λ) + \frac{υ}{2 N} \sum_{i = 1}^{N} ∥ θ_{i} ∥^{2} - \frac{υ}{2} ∥ λ ∥^{2},

θ_{i}^{(k + 1)} =

θ_{i}^{(k + 1)} =

\displaystyle~{}~{}~{}~{}{\mathcal{}P}_{{\mathcal{}C}_{i}}\big{(}\bm{\theta}_{i}^{(k)}-\gamma\!~{}{\nabla}_{\bm{\theta}_{i}}{\mathcal{}L}_{\upsilon}(\{\bm{\theta}_{i}^{(k)}\}_{i=1}^{N};\bm{\lambda}^{(k)})\big{)},\forall~{}i\in[N]

\displaystyle\bm{\lambda}^{(k+1)}=\big{[}\bm{\lambda}^{(k)}+\gamma\!~{}{\nabla}_{\bm{\lambda}}{\mathcal{}L}_{\upsilon}(\{\bm{\theta}_{i}^{(k)}\}_{i=1}^{N};\bm{\lambda}^{(k)})\big{]}_{+}

\begin{split}&{\nabla}_{\bm{\theta}_{i}}{\mathcal{}L}_{\upsilon}(\{\bm{\theta}_{i}^{(k)}\}_{i=1}^{N};\bm{\lambda}^{(k)})=\textstyle\frac{1}{N}\Big{(}{\nabla}_{\bm{\theta}_{i}}U_{i}(\bm{\theta}_{i}^{(k)})+\upsilon\!~{}\bm{\theta}_{i}^{(k)}\\ &\hskip 45.52458pt\textstyle+\sum_{t=1}^{T}\lambda_{t}^{(k)}{\nabla}_{\bm{\theta}}g_{t}(\bm{\theta})\Big{|}_{\bm{\theta}=\frac{1}{N}\sum_{i=1}^{N}\bm{\theta}_{i}^{(k)}}\Big{)},\\[-17.07182pt] \end{split}

\begin{split}&{\nabla}_{\bm{\theta}_{i}}{\mathcal{}L}_{\upsilon}(\{\bm{\theta}_{i}^{(k)}\}_{i=1}^{N};\bm{\lambda}^{(k)})=\textstyle\frac{1}{N}\Big{(}{\nabla}_{\bm{\theta}_{i}}U_{i}(\bm{\theta}_{i}^{(k)})+\upsilon\!~{}\bm{\theta}_{i}^{(k)}\\ &\hskip 45.52458pt\textstyle+\sum_{t=1}^{T}\lambda_{t}^{(k)}{\nabla}_{\bm{\theta}}g_{t}(\bm{\theta})\Big{|}_{\bm{\theta}=\frac{1}{N}\sum_{i=1}^{N}\bm{\theta}_{i}^{(k)}}\Big{)},\\[-17.07182pt] \end{split}

\begin{split}&\big{[}{\nabla}_{\bm{\lambda}}{\mathcal{}L}_{\upsilon}(\{\bm{\theta}_{i}^{(k)}\}_{i=1}^{N};\bm{\lambda}^{(k)})\big{]}_{t}=g_{t}\Big{(}{\textstyle\frac{1}{N}\sum_{i=1}^{N}}\bm{\theta}_{i}^{(k)}\Big{)}-\upsilon\!~{}\lambda_{t}^{(k)},\end{split}

\begin{split}&\big{[}{\nabla}_{\bm{\lambda}}{\mathcal{}L}_{\upsilon}(\{\bm{\theta}_{i}^{(k)}\}_{i=1}^{N};\bm{\lambda}^{(k)})\big{]}_{t}=g_{t}\Big{(}{\textstyle\frac{1}{N}\sum_{i=1}^{N}}\bm{\theta}_{i}^{(k)}\Big{)}-\upsilon\!~{}\lambda_{t}^{(k)},\end{split}

\bm{\Phi}({\bm{z}}^{(k)})\mathrel{\mathop{:}}=\left(\begin{array}[]{c}{\nabla}_{\bm{\theta}}{\mathcal{}L}_{\upsilon}(\{\bm{\theta}_{i}^{(k)}\}_{i=1}^{N},\bm{\lambda}^{(k)})\\ {\nabla}_{\bm{\lambda}}{\mathcal{}L}_{\upsilon}(\{\bm{\theta}_{i}^{(k)}\}_{i=1}^{N},\bm{\lambda}^{(k)})\end{array}\right).\vspace{-.1cm}

\bm{\Phi}({\bm{z}}^{(k)})\mathrel{\mathop{:}}=\left(\begin{array}[]{c}{\nabla}_{\bm{\theta}}{\mathcal{}L}_{\upsilon}(\{\bm{\theta}_{i}^{(k)}\}_{i=1}^{N},\bm{\lambda}^{(k)})\\ {\nabla}_{\bm{\lambda}}{\mathcal{}L}_{\upsilon}(\{\bm{\theta}_{i}^{(k)}\}_{i=1}^{N},\bm{\lambda}^{(k)})\end{array}\right).\vspace{-.1cm}

∥ z^{(k + 1)} - z^{⋆} ∥^{2} \leq (1 - 2 γ υ + γ^{2} L_{Φ}^{2}) ∥ z^{(k)} - z^{⋆} ∥^{2},

∥ z^{(k + 1)} - z^{⋆} ∥^{2} \leq (1 - 2 γ υ + γ^{2} L_{Φ}^{2}) ∥ z^{(k)} - z^{⋆} ∥^{2},

r_{i}^{(k)} = {θ_{i}^{(k)}, b_{i}^{(k)}, if i \in H, if i \in A . \vspace - .2 c m

r_{i}^{(k)} = {θ_{i}^{(k)}, b_{i}^{(k)}, if i \in H, if i \in A . \vspace - .2 c m

θ_{i} \in C_{i}, i \in H min

θ_{i} \in C_{i}, i \in H min

s.t.

\overline{g}_{t}(\bm{\theta})\mathrel{\mathop{:}}=g_{t}(\bm{\theta})+{\textstyle\frac{|{\mathcal{}A}|}{N}}\big{(}RB+{\textstyle\frac{1}{2}}LR^{2}\big{)},

\overline{g}_{t}(\bm{\theta})\mathrel{\mathop{:}}=g_{t}(\bm{\theta})+{\textstyle\frac{|{\mathcal{}A}|}{N}}\big{(}RB+{\textstyle\frac{1}{2}}LR^{2}\big{)},

\begin{array}[]{rl}\displaystyle\min_{\bm{\theta}_{i}\in{\mathcal{}C}_{i},i\in{\mathcal{}H}}&\frac{1}{|{\mathcal{}H}|}\sum_{i\in{\mathcal{}H}}U_{i}(\bm{\theta}_{i})\vspace{.1cm}\\ {\rm s.t.}&\displaystyle\overline{g}_{t}\left({\textstyle\frac{1}{N}\sum_{i\in{\mathcal{}H}}\bm{\theta}_{i}}\right)\leq 0,~{}\forall~{}t\in[T],\end{array}

\begin{array}[]{rl}\displaystyle\min_{\bm{\theta}_{i}\in{\mathcal{}C}_{i},i\in{\mathcal{}H}}&\frac{1}{|{\mathcal{}H}|}\sum_{i\in{\mathcal{}H}}U_{i}(\bm{\theta}_{i})\vspace{.1cm}\\ {\rm s.t.}&\displaystyle\overline{g}_{t}\left({\textstyle\frac{1}{N}\sum_{i\in{\mathcal{}H}}\bm{\theta}_{i}}\right)\leq 0,~{}\forall~{}t\in[T],\end{array}

\overline{L}_{υ} ({θ_{i}}_{i \in H}; λ; H) : = \frac{1}{∣ H ∣} \sum_{i \in H} U_{i} (θ_{i}) + \sum_{t = 1}^{T} λ_{t} \overline{g}_{t} (\frac{1}{N} \sum_{i \in H} θ_{i}) + \frac{υ}{2∣ H ∣} \sum_{i \in H} ∥ θ_{i} ∥^{2} - \frac{υ}{2} ∥ λ ∥^{2} .

\overline{L}_{υ} ({θ_{i}}_{i \in H}; λ; H) : = \frac{1}{∣ H ∣} \sum_{i \in H} U_{i} (θ_{i}) + \sum_{t = 1}^{T} λ_{t} \overline{g}_{t} (\frac{1}{N} \sum_{i \in H} θ_{i}) + \frac{υ}{2∣ H ∣} \sum_{i \in H} ∥ θ_{i} ∥^{2} - \frac{υ}{2} ∥ λ ∥^{2} .

λ \in R_{+}^{T} max θ_{i} \in C_{i}, \forall i \in H min \overline{L}_{υ} ({θ_{i}}_{i \in H}; λ; H),

λ \in R_{+}^{T} max θ_{i} \in C_{i}, \forall i \in H min \overline{L}_{υ} ({θ_{i}}_{i \in H}; λ; H),

\overline{θ}_{H}^{(k)} : = \frac{1}{∣ H ∣} \sum_{i \in H} θ_{i}^{(k)},

\overline{θ}_{H}^{(k)} : = \frac{1}{∣ H ∣} \sum_{i \in H} θ_{i}^{(k)},

\big{[}{\bm{\theta}}_{\sf med}^{(k)}\big{]}_{j}={\sf med}\big{(}\{[{\bm{r}}_{i}^{(k)}]_{j}\}_{i=1}^{N}\big{)},

\big{[}{\bm{\theta}}_{\sf med}^{(k)}\big{]}_{j}={\sf med}\big{(}\{[{\bm{r}}_{i}^{(k)}]_{j}\}_{i=1}^{N}\big{)},

{\mathcal{}N}_{j}^{(k)}=\{i\in[N]:\big{|}\big{[}{\bm{r}}_{i}^{(k)}-{\bm{\theta}}_{\sf med}^{(k)}\big{]}_{j}\big{|}\leq r_{j}^{(k)}\},

{\mathcal{}N}_{j}^{(k)}=\{i\in[N]:\big{|}\big{[}{\bm{r}}_{i}^{(k)}-{\bm{\theta}}_{\sf med}^{(k)}\big{]}_{j}\big{|}\leq r_{j}^{(k)}\},

[θ_{H}^{(k)}]_{j} = \frac{1}{( 1 - α ) N} \sum_{i \in N_{j}^{(k)}} [r_{i}^{(k)}]_{j} .

[θ_{H}^{(k)}]_{j} = \frac{1}{( 1 - α ) N} \sum_{i \in N_{j}^{(k)}} [r_{i}^{(k)}]_{j} .

\big{\|}\widehat{\bm{\theta}}^{(k)}_{\mathcal{}H}-\overline{\bm{\theta}}_{\mathcal{}H}^{(k)}\big{\|}\leq\frac{\alpha}{1-\alpha}\Big{(}2+\sqrt{\frac{(1-\alpha)^{2}}{1-2\alpha}}\Big{)}r\sqrt{d}\;.\vspace{-.1cm}

\big{\|}\widehat{\bm{\theta}}^{(k)}_{\mathcal{}H}-\overline{\bm{\theta}}_{\mathcal{}H}^{(k)}\big{\|}\leq\frac{\alpha}{1-\alpha}\Big{(}2+\sqrt{\frac{(1-\alpha)^{2}}{1-2\alpha}}\Big{)}r\sqrt{d}\;.\vspace{-.1cm}

\displaystyle\max_{\begin{subarray}{c}Y\succeq 0,\\ \text{tr}(Y)\leq 1\end{subarray}}\min_{\begin{subarray}{c}0\leq{W_{ij}},\\ W_{ij}\leq\frac{4{-}\alpha}{\alpha(2{+}\alpha){\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}n}},\\ \sum_{j}W_{ji}=1\end{subarray}}\sum_{i\in\mathcal{B}}c_{i}(\bm{\theta}_{i}^{(k)}{-}X_{\mathcal{B}}w_{i})^{\top}Y(\bm{\theta}_{i}^{(k)}{-}X_{\mathcal{B}}w_{i})

\displaystyle\max_{\begin{subarray}{c}Y\succeq 0,\\ \text{tr}(Y)\leq 1\end{subarray}}\min_{\begin{subarray}{c}0\leq{W_{ij}},\\ W_{ij}\leq\frac{4{-}\alpha}{\alpha(2{+}\alpha){\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}n}},\\ \sum_{j}W_{ji}=1\end{subarray}}\sum_{i\in\mathcal{B}}c_{i}(\bm{\theta}_{i}^{(k)}{-}X_{\mathcal{B}}w_{i})^{\top}Y(\bm{\theta}_{i}^{(k)}{-}X_{\mathcal{B}}w_{i})

\bm{\theta}_{i}^{(k+1)}={\mathcal{}P}_{{\mathcal{}C}_{i}}\big{(}\bm{\theta}_{i}^{(k)}-{\textstyle\frac{\gamma}{N}}\big{(}\widehat{\bm{g}}^{(k)}_{\mathcal{}H}+{\nabla}U_{i}(\bm{\theta}_{i}^{(k)})+\upsilon\bm{\theta}_{i}^{(k)}\big{)}\big{)},

\bm{\theta}_{i}^{(k+1)}={\mathcal{}P}_{{\mathcal{}C}_{i}}\big{(}\bm{\theta}_{i}^{(k)}-{\textstyle\frac{\gamma}{N}}\big{(}\widehat{\bm{g}}^{(k)}_{\mathcal{}H}+{\nabla}U_{i}(\bm{\theta}_{i}^{(k)})+\upsilon\bm{\theta}_{i}^{(k)}\big{)}\big{)},

\lambda_{t}^{(k+1)}=\big{[}\lambda_{t}^{(k)}+\gamma\big{(}\overline{g_{t}}({\textstyle\frac{|{\mathcal{}H}|}{N}}\widehat{\bm{\theta}}_{\mathcal{}H}^{(k)})-\upsilon\lambda_{t}^{(k)}\big{)}\big{]}_{+}.

\lambda_{t}^{(k+1)}=\big{[}\lambda_{t}^{(k)}+\gamma\big{(}\overline{g_{t}}({\textstyle\frac{|{\mathcal{}H}|}{N}}\widehat{\bm{\theta}}_{\mathcal{}H}^{(k)})-\upsilon\lambda_{t}^{(k)}\big{)}\big{]}_{+}.

g_{θ}^{(k)}

g_{θ}^{(k)}

g_{λ}^{(k)}

∥ e_{θ}^{(k)} ∥ \leq \overline{λ} L T ∥ θ_{H}^{(k)} - \overline{θ}_{H}^{(k)} ∥,

∥ e_{θ}^{(k)} ∥ \leq \overline{λ} L T ∥ θ_{H}^{(k)} - \overline{θ}_{H}^{(k)} ∥,

∥ e_{λ}^{(k)} ∥ \leq B T ∥ θ_{H}^{(k)} - \overline{θ}_{H}^{(k)} ∥. \vspace - .1 c m

∥ e_{λ}^{(k)} ∥ \leq B T ∥ θ_{H}^{(k)} - \overline{θ}_{H}^{(k)} ∥. \vspace - .1 c m

\overline{\bm{\Phi}}({\bm{z}}^{(k)})\mathrel{\mathop{:}}=\left(\begin{array}[]{c}{\nabla}_{\bm{\theta}}\overline{\mathcal{}L}_{\upsilon}(\{\bm{\theta}_{i}^{(k)}\}_{i\in{\mathcal{}H}},\bm{\lambda}^{(k)};{\mathcal{}H})\\ -{\nabla}_{\bm{\lambda}}\overline{\mathcal{}L}_{\upsilon}(\{\bm{\theta}_{i}^{(k)}\}_{i\in{\mathcal{}H}},\bm{\lambda}^{(k)};{\mathcal{}H})\end{array}\right),

\overline{\bm{\Phi}}({\bm{z}}^{(k)})\mathrel{\mathop{:}}=\left(\begin{array}[]{c}{\nabla}_{\bm{\theta}}\overline{\mathcal{}L}_{\upsilon}(\{\bm{\theta}_{i}^{(k)}\}_{i\in{\mathcal{}H}},\bm{\lambda}^{(k)};{\mathcal{}H})\\ -{\nabla}_{\bm{\lambda}}\overline{\mathcal{}L}_{\upsilon}(\{\bm{\theta}_{i}^{(k)}\}_{i\in{\mathcal{}H}},\bm{\lambda}^{(k)};{\mathcal{}H})\end{array}\right),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\NewEnviron

killcontents

Resilient Distributed Optimization Algorithms for Resource Allocation

César A. Uribe†, Hoi-To Wai†, Mahnoosh Alizadeh CAU and HTW have contributed equally. CAU is with LIDS, MIT, Cambridge, MA, USA. HTW is with Dept. of SEEM, CUHK, Shatin, Hong Kong. MA is with Dept. of ECE, UCSB, Santa Barbara, CA, USA. This work is partially supported by UCOP Grant LFR-18-548175 and CUHK Direct Grant #4055113. E-mails: [email protected], [email protected], [email protected]

Abstract

Distributed algorithms provide flexibility over centralized algorithms for resource allocation problems, e.g., cyber-physical systems. However, the distributed nature of these algorithms often makes the systems susceptible to man-in-the-middle attacks, especially when messages are transmitted between price-taking agents and a central coordinator. We propose a resilient strategy for distributed algorithms under the framework of primal-dual distributed optimization. We formulate a robust optimization model that accounts for Byzantine attacks on the communication channels between agents and coordinator. We propose a resilient primal-dual algorithm using state-of-the-art robust statistics methods. The proposed algorithm is shown to converge to a neighborhood of the robust optimization model, where the neighborhood’s radius is proportional to the fraction of attacked channels.

1 Introduction

Consider the following multi-agent optimization problem involving the average of parameters in the constraints:

[TABLE]

where both $U_{i}:\mathbb{R}^{d}\rightarrow\mathbb{R}$ and $g_{t}:\mathbb{R}^{d}\rightarrow\mathbb{R}$ are continuously differentiable, convex functions, and ${\mathcal{}C}_{i}$ is a compact convex set in $\mathbb{R}^{d}$ . We let ${\bm{0}}\in{\mathcal{}C}_{i}$ and

[TABLE]

such that $R$ is an upper bound on the diameters of ${\mathcal{}C}_{i}$ .

Problem (1) arises in many resource allocation problems with a set of potentially nonlinear constraints on the amount of allowable resources, see Section 1.1 for a detailed exploration.

We consider a system where there exists a central coordinator and $N$ agents. In this context, the function $U_{i}(\bm{\theta}_{i})$ and parameter $\bm{\theta}_{i}$ are the utility of the $i$ th agent and the resource controlled by agent $i$ , respectively. As the agents work independently, it is desirable to design algorithms that allow the $N$ agents to solve (1) cooperatively through communication with the central coordinator. Among others, the primal-dual optimization methods [1] have been advocated as they naturally give rise to decomposable algorithms that favor distributed implementation [2]. In addition to their practical success, these methods are supported by strong theoretical guarantees where fast convergence to an optimal solution of (1) is well established. However, the distributed nature of these methods also exposes the system to vulnerabilities not faced by traditional centralized systems. Precisely, existing algorithms assume the agents, and the communication links between central server and agents, to be completely trustworthy. However, an attacker can take over a sub-system operated by the agents, and deliberately edit the messages in these communication links, *i.e., *a Byzantine attack. This might result in an unstable system with possible damages to hardware and the system overall.

In this paper, we propose strategies for securing primal-dual distributed algorithms, e.g., in [1], tailored to solving a relaxed version of the resource allocation problem (1). A key observation is that the existing algorithms depend on reliably computing the average of a set of parameter vectors, $\{\bm{\theta}_{i}\}_{i=1}^{N}$ , transmitted by the agents. As a remedy, we apply robust statistics techniques as a subroutine, therefore proposing a resilient distributed algorithm that is proven to converge to a neighborhood of the optimal solution of a robust version of (1).

Vulnerabilities of various types of distributed algorithms have been identified and addressed in a number of recent studies. Relevant examples are [3, 4, 5, 6, 7] which study secure decentralized algorithms on a general network topology but consider consensus-based optimization models. Moreover, [8, 9, 10] consider a similar optimization architecture as this paper, yet they focus on securing distributed algorithms for machine learning tasks which assumes i.i.d. functions, a fundamentally different setting from the current paper. Our work is also related to the literature on robust statistics [11, 12], and particularly, with the recently rekindled research efforts on high dimensional robust statistics [13, 14, 15]. These works will be the working horse for our attack resilient algorithm.

Our contributions and organization are as follows. First, we derive a formal model for attack resilient resource allocation via a conservative approximation for the robust optimization problem [cf. Section 3]. Second, we apply and derive new robust estimation results to secure distributed resource allocation algorithms [cf. Section 4]. Third, we provide a non-asymptotic convergence guarantee of the proposed attack resilient algorithm [cf. Section 4.1]. In particular, our algorithm is shown to converge to a ${\mathcal{}O}(\alpha^{2})$ neighborhood to the optimal solution of (1), where $\alpha\in[0,\frac{1}{2})$ is the fraction of attacked links.

Notations. Unless otherwise specified, $\|\cdot\|$ denotes the standard Euclidean norm. For any $N\in\mathbb{N}$ , $[N]$ denotes the finite set $\{1,...,N\}$ .

1.1 Motivating Examples

Our set-up here can be employed in a wide range of optimization problems for resource allocation and networked control in multi-agent systems, e.g., in the pioneering example of congestion control in data networks [16, 17]; in determining the optimal price of electricity and enabling more efficient demand supply balancing (a.k.a. demand response) in smart power distribution systems [18, 19]; in managing user transmit powers and data rates in wireless cellular networks [20]; in determining optimal caching policies by content delivery networks [21]; in optimizing power consumption in wireless sensor networks with energy-restricted batteries [22, 23]; and in designing congestion control systems in urban traffic networks [24]. These examples would have different utility functions and constraint sets that can be handled through our general formulation in (1). For example, in the power/rate control problem in data networks, the cost functions are usually logarithmic functions associated with rate $\theta_{i}$ , e.g., $U_{i}(\theta_{i})=-\beta_{i}\log(\theta_{i})$ . In demand response applications in power distribution systems, the utilities capture the users’ benefits from operating their electric appliances under different settings. For example, we can capture the cost function of temperature $\theta_{i}$ controlled by a price-responsive air conditioner as $U_{i}(\theta_{i})=b_{i}(\theta_{i}-\theta_{\rm{comf}})^{2}-c_{i}$ [19]. In terms of constraints, our general nonlinear constraint formulation can not only capture common linear resource constraints such as link capacity in data networks [16, 17], but can also handle important non-linear constraints arising in many different applications. For example, in radial power distribution systems, nonlinear convexified power flow constraints can be included for distributed demand response optimization (to see a description of distribution system power flow constraints, see, e.g., [25, 26]). This can enable our algorithm to perform demand supply balancing in power disribution systems in a distributed and resilient fashion.

2 Primal-dual Algorithm for Resource Allocation

This section reviews the basic primal-dual algorithm for resource allocation. Let $\bm{\lambda}\in\mathbb{R}_{+}^{T}$ be the dual variable. We consider the Lagrangian function of (1):

[TABLE]

Assuming strong duality holds (e.g., under the Slater’s condition), solving problem (1) is equivalent to solving its dual problem:

[TABLE]

For a given $\bm{\lambda}$ , the inner minimization of (P) is known as the Lagrangian relaxation of (1), which can be interpreted as a penalized resource allocation problem [19].

In a distributed setting, the goal is to solve (1) where the agents only observe a pricing signal received from the central coordinator, and this pricing signal is to be updated iteratively at the central coordinator. As suggested in [1], we apply the primal-dual algorithm (PDA) to a regularized version of (P). Let us define

[TABLE]

such that ${\mathcal{}L}_{\upsilon}(\cdot)$ is $\upsilon$ -strongly convex and $\upsilon$ -strongly concave in $\{\bm{\theta}_{i}\}_{i=1}^{N}$ and $\bm{\lambda}$ , respectively. Let $k\in\mathbb{Z}_{+}$ be the iteration index, $\gamma>0$ be the step sizes, the PDA recursion is described by:

[TABLE]

where ${\mathcal{}P}_{{\mathcal{}C}_{i}}(\cdot)$ is the Euclidean projection operator, $[\cdot]_{+}$ denotes $\max\{0,\cdot\}$ , and the gradients are:

[TABLE]

for all $i$ , $t$ . We denoted $[{\bm{x}}]_{t}$ as the $t$ th element of ${\bm{x}}\in\mathbb{R}^{T}$ . In particular, observe that (4) performs a projected gradient descent/ascent on the primal/dual variables.

From the above, both gradients with respect to (w.r.t.) $\bm{\theta}_{i}$ and $\lambda_{t}$ depend only on the average parameter $\overline{\bm{\theta}}^{(k)}\mathrel{\mathop{:}}=\frac{1}{N}\sum_{i=1}^{N}\bm{\theta}_{i}^{(k)}$ . We summarize the primal dual distributed resource allocation (PD-DRA) procedure in Algorithm 1. In addition to solving the general problem (1), Algorithm 1 also serves as a general solution method to popular resource allocation problems [19].

As the regularized primal-dual problem is strongly convex/concave in primal/dual variables, Algorithm 1 converges linearly to an optimal solution [1]. To study this, let us denote ${\bm{z}}^{(k)}=(\{\bm{\theta}_{i}^{(k)}\}_{i=1}^{N},\bm{\lambda}^{(k)})$ as the primal-dual variable at the $k$ th iteration,

[TABLE]

Fact 1.

[1, Theorem 3.5]** Assume that the map $\bm{\Phi}({\bm{z}}^{(k)})$ is $L_{\Phi}$ Lipschitz continuous. For all $k\geq 1$ , we have

[TABLE]

where ${\bm{z}}^{\star}$ is a saddle point to the regularized version of (P). Setting $\gamma=\upsilon/L_{\Phi}^{2}$ gives $\|{\bm{z}}^{(k+1)}-{\bm{z}}^{\star}\|^{2}\leq\big{(}1-\upsilon^{2}/L_{\Phi}^{2}\big{)}\|{\bm{z}}^{(k)}-{\bm{z}}^{\star}\|^{2}$ , $\forall~{}k\geq 1$ .

3 Problem Formulation

Despite the simplicity and the strong theoretical guarantee, the PD-DRA method is susceptible to attacks on the channels between the central coordinator and the agents, as described below.

Attack Model. We consider a situation when uplink channels between agents and the central coordinator are compromised [see Fig. 1]. Let ${\mathcal{}A}\subset[N]$ be the set of compromised uplink channels, whose identities are unknown to the central coordinator. We define ${\mathcal{}H}\mathrel{\mathop{:}}=[N]\setminus{\mathcal{}A}$ as the set of trustworthy channels. At iteration $k$ , instead of receiving $\bm{\theta}_{i}^{(k)}$ from each agent $i\in[N]$ [cf. Step 2 (a)], the central coordinator receives the following messages:

[TABLE]

We focus on a Byzantine attack scenario such that the messages, ${\bm{b}}_{i}^{(k)}$ , communicated on the attacked channels can be arbitrary. Under such scenario, if the central coordinator forms the naive average $\widehat{\bm{\theta}}^{(k)}=1/N\sum_{i=1}^{N}{\bm{r}}_{i}^{(k)}$ and computes the gradients ${\nabla}g_{t}(\widehat{\bm{\theta}}^{(k)})$ accordingly, this may result in uncontrollable error since the deviation $\widehat{\bm{\theta}}^{(k)}-(1/N)\sum_{i=1}^{N}\bm{\theta}_{i}^{(k)}$ can be arbitrarily large. It is anticipated that the PD-DRA method would not provide a solution to the regularized version of (P).

Robust Optimization Model. In light of the Byzantine attack, it is impossible to optimize the original problem (P) since the contribution from $U_{i}(\cdot):i\in{\mathcal{}A}$ becomes unknown to the central coordinator. As a compromise, we focus on optimizing the cost function of agents with trustworthy uplinks and the following robust optimization problem as our target model:

[TABLE]

note that $\{\bm{\theta}_{j}\}_{j\in{\mathcal{}A}}$ is taken away from the decision variables and we have included (10b) to account for the worst case scenario for the resource usage of the agents with compromised uplinks. This is to ensure that the physical operation limit of the system will not be violated under attack. Consider the following assumption which will be assumed throughout the paper:

H 1.

For all $\bm{\theta}\in\mathbb{R}^{d}$ , the gradient of $g_{t}$ is bounded with $\|{\nabla}g_{t}(\bm{\theta})\|\leq B$ and is $L$ -Lipschitz continuous.

We define

[TABLE]

Lemma 1.

Under H1. The following problem yields a conservative approximation of (10), i.e., its feasible set is a subset of the feasible set of (10):

[TABLE]

Similar to PD-DRA, we define the regularized Lagrangian function of (12) as:

[TABLE]

Again, the regularized Lagrangian function is $\upsilon$ -strongly convex and concave in $\bm{\theta}$ and $\bm{\lambda}$ , respectively.

Our main task is to tackle the following modified problem of (P) under Byzantine attack on (some of) the uplinks:

[TABLE]

and we let $\widehat{\bm{z}}^{\star}=(\widehat{\bm{\theta}}^{\star},\widehat{\bm{\lambda}}^{\star})$ be the optimal solution to (P’). Notice that (P’) bears a similar form as (P) and thus one may apply the PD-DRA method to the former naturally. However, such application requires the central coordinator to compute the sample average

[TABLE]

at each iteration. However, the above might not be computationally feasible under the attack model, since the central coordinator is oblivious to the identity of ${\mathcal{}H}$ . This is the main objective in the design of our scheme.

4 Robust Distributed Resource Allocation

In this section, we describe two estimators for approximating $\overline{\bm{\theta}}^{(k)}_{\mathcal{}H}$ [cf. (14)] from the received messages (9) without knowing the identity of links in ${\mathcal{}H}$ . To simplify notations, we define $\alpha\geq|{\mathcal{}A}|/N$ as a known upper bound to the fraction of compromised channels and assume $\alpha<1/2$ where less than half of the channels are compromised.

As discussed after (14), the problem at hand is robust mean estimation, whose applications to robust distributed optimization has been considered in the machine learning literature [9, 10, 14] under the assumption that the ‘trustworthy’ signals are drawn i.i.d. from a Gaussian distribution. Our setting is different since the signals $\bm{\theta}_{i}^{(k)}$ , $i\in{\mathcal{}H}$ are variables from the previous iteration whose distribution is non-Gaussian in general. Our analysis will be developed without such assumption on the distribution.

We first consider a simple median-based estimator applied to each coordinate $j=1,...,d$ . First, define the coordinate-wise median as:

[TABLE]

where ${\sf med}(\cdot)$ computes the median of the operand. Then, our estimator is computed as the mean of the nearest $(1-\alpha)N$ neighbors of $\big{[}{\bm{\theta}}_{\sf med}^{(k)}\big{]}_{j}$ . To formally describe this, let us define:

[TABLE]

where $r_{j}^{(k)}$ is chosen as $|{\mathcal{}N}_{j}^{(k)}|=(1-\alpha)N$ . Our estimator is:

[TABLE]

The following bounds the performance of (17).

Proposition 1.

Suppose that $\max_{i\in{\mathcal{}H}}\big{\|}\bm{\theta}_{i}^{(k)}-\overline{\bm{\theta}}_{\mathcal{}H}^{(k)}\big{\|}_{\infty}\leq r$ , then for any $\alpha\in(0,\frac{1}{2})$ , it holds that

[TABLE]

Under mild assumptions, the condition $\max_{i\in{\mathcal{}H}}\big{\|}\bm{\theta}_{i}^{(k)}-\overline{\bm{\theta}}_{\mathcal{}H}^{(k)}\big{\|}_{\infty}\leq r$ can be satisfied with $r=\Theta(R)$ , as implied by the compactness of ${\mathcal{}C}_{i}$ [cf. (2)]. Moreover, for sufficiently small $\alpha$ , the right hand side on (18) can be approximated by ${\mathcal{}O}(\alpha R\sqrt{d})$ . However, this median-based estimator may perform poorly for large $\alpha$ (especially when $\alpha\rightarrow{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{1}/{2}}$ ) or dimension $d$ . For these situations, a more sophisticated estimator is required, as detailed next.

To derive the second estimator, we apply an auxiliary result from [15] which provides an algorithm for estimating $\overline{\bm{\theta}}_{\mathcal{}H}^{(k)}$ , as summarized in Algorithm 2. We observe:

Proposition 2.

[15, Proposition 16]** Suppose that $\lambda_{\max}(\frac{1}{|{\mathcal{}H}|}\sum_{i\in{\mathcal{}H}}(\bm{\theta}_{i}^{(k)}-\overline{\bm{\theta}}_{\mathcal{}H}^{(k)})(\bm{\theta}_{i}^{(k)}-\overline{\bm{\theta}}_{\mathcal{}H}^{(k)})^{\top})\leq\sigma^{2}$ . For any $\alpha\in[0,\frac{1}{4})$ , Algorithm 2 produces an output such that $\|\overline{\bm{\theta}}_{\mathcal{}H}^{(k)}-\widehat{\bm{\theta}}_{\mathcal{}H}^{(k)}\|={\mathcal{}O}(\sigma\sqrt{\alpha})$ .

Again, similar to Proposition 1, the required condition above can be satisfied with $\sigma=\Theta(R)$ under mild conditions. Thus, Proposition 2 states that Algorithm 2 recovers $\overline{\bm{\theta}}_{\mathcal{}H}^{(k)}$ up to an error of ${\mathcal{}O}(\sqrt{\alpha}R)$ . Note that this bound is dimension free unlike the median estimator analyzed in Proposition 1.

The idea behind Algorithm 2 is to sequentially identify and remove the subset of points that cannot be re-constructed from the mean of the data points. The solution of the optimization problem in Line 3 measures how well can we recover the data points as an average of the other $|{\mathcal{}H}|$ points. The bounded sample variance assumption guarantees that one can re-construct any element in the set ${\mathcal{}H}$ from its mean, thus, all such points that introduce a large error, as quantified by $c_{i}$ can be safely removed. Line 5 quantifies the magnitude of the optimal point of Line 3, and if such value is large, such points that introduce a large error are down-weighted. The process is repeated until the optimal solution of Line 3 is small enough and a low rank approximation of the optimal $W$ can be used to return the sample mean estimate.

Attack Resilient PD-DRA method. The above section provides the enabling tool for developing the resilient PD-DRA method, which we summarize in Algorithm 3. The algorithm behaves similarly as Algorithm 1 applied to (P’), with the exception that the central coordinator is oblivious to ${\mathcal{}H}$ , and it uses a robust mean estimator to find an approximate average for the signals sent through the trustworthy links. This approximate value is used to compute the new price signals, and sent back to agents. In particular, the primal-dual updates are described by

[TABLE]

Lemma 2.

Algorithm 3 is a primal-dual algorithm [1] for (P’) with perturbed gradients:

[TABLE]

where we have used concatenated variable as $\bm{\theta}=(\bm{\theta}_{1},...,\bm{\theta}_{N})$ and $\bm{\lambda}=(\lambda_{1},...,\lambda_{T})$ . Under H1 and assuming that $\lambda_{t}^{(k)}\leq\overline{\lambda}$ for all $k$ , we have:

[TABLE]

The assumption $\lambda_{t}^{(k)}\leq\overline{\lambda}$ can be guaranteed since $\overline{g_{t}}({\textstyle\frac{|{\mathcal{}H}|}{N}}\widehat{\bm{\theta}}_{\mathcal{}H}^{(k)})$ is bounded.

4.1 Convergence Analysis

Finally, based on Lemma 2, we can analyze the convergence of Algorithm 3. Let $\widehat{\bm{z}}^{\star}=(\widehat{\bm{\theta}}^{\star},\widehat{\bm{\lambda}}^{\star})$ be a saddle point of (P’) and define

[TABLE]

Theorem 1.

Assume the map $\overline{\bm{\Phi}}({\bm{z}}^{(k)})$ is $L_{\Phi}$ -Lipschitz continuous. For Algorithm 3, for all $k\geq 0$ it holds

[TABLE]

where $E_{k}\mathrel{\mathop{:}}=\|{\bm{e}}_{\bm{\theta}}^{(k)}\|^{2}+\|{\bm{e}}_{\bm{\lambda}}^{(k)}\|^{2}$ is the total perturbation at iteration $k$ . Moreover, if we choose $\gamma<{\upsilon}/{2L_{\Phi}^{2}}$ and $E_{k}$ is upper bounded by $\overline{E}$ for all $k$ , then

[TABLE]

Combining the results from the last subsection, the theorem shows the desired result that the resilient PD-DRA method converges to a ${\mathcal{}O}(\alpha^{2}R^{2}d)$ neighborhood of the saddle point of (P’), if the median-based estimator (17) is used [or ${\mathcal{}O}(\alpha R^{2})$ if Algorithm 2 is used], where $\alpha$ is the fraction of attacked uplink channels. Moreover, it shows that the convergence rate to the neighborhood is linear, which is similar to the classical PDA analysis [1].

Interestingly, Theorem 1 illustrates a trade-off in the choice of the step size $\gamma$ between convergence speed and accuracy. In specific, (25) shows that the rate of convergence factor $1-\gamma\upsilon+2\gamma^{2}L_{\Phi}^{2}$ can be minimized by setting $\gamma=\upsilon/(4L_{\Phi}^{2})$ . However, in the meantime, the asymptotic upper bound in (26) is increasing with $\gamma$ and it can be minimized by setting $\gamma\rightarrow 0$ . This will be a design criterion to be explored in practical implementations.

5 Conclusions

In this paper, we studied the strategies for securing a primal-dual algorithm for distributed resource allocation. Particularly, we propose a resilient distributed algorithm based on primal-dual optimization and robust statistics. We derive bounds for the performance of the studied algorithm and show that it converges to a neighborhood of a robustified resource allocation problem when the number of attacked channels is small.

Acknowledgement

The authors would like to thank the anonymous reviewers for feedback, and Mr. Berkay Turan (UCSB) for pointing out typos in the original submission of this paper.

Appendix A Proof of Lemma 1

Since $g_{t}$ is $L_{t}$ -smooth, the following holds

[TABLE]

Furthermore, observe that the gradient of $g_{t}$ is uniformly bounded by $B$ and the diameter of ${\mathcal{}C}_{j}$ is $R$ , then the right hand side of (27) can be upper bounded by

[TABLE]

As such, defining $c_{t}\mathrel{\mathop{:}}=\frac{|{\mathcal{}A}|}{N}\big{(}RB+\frac{1}{2}LR^{2}\big{)}$ , it can be seen that

[TABLE]

implies the desired constraint in (10).

Appendix B Proof of Proposition 1

Fix any $j\in[d]$ . The assumption implies that for all $i\in{\mathcal{}H}$ , one has

[TABLE]

We observe that $|{\mathcal{}H}|\geq(1-\alpha)N$ . Applying [8, Lemma 1] shows that the median estimator111At each coordinate, the median is the geometric median estimator of one dimension in [8]. satisfies

[TABLE]

The above implies that for all $i\in{\mathcal{}H}$ , we have

[TABLE]

This implies that $r_{j}^{(k)}\leq\Big{(}1+\sqrt{\frac{(1-\alpha)^{2}}{1-2\alpha}}\Big{)}r$ since $|{\mathcal{}H}|\geq(1-\alpha)N$ . We then bound the performance of $\widehat{\bm{\theta}}^{(k)}$ :

[TABLE]

thus

[TABLE]

Notice that $|{\mathcal{}A}\cap{\mathcal{}N}_{j}^{(k)}|\leq\alpha N$ and thus $|{\mathcal{}H}\setminus{\mathcal{}N}_{j}^{(k)}|\leq\alpha N$ . Gathering terms shows

[TABLE]

The above holds for all $j\in[d]$ . Applying the norm equivalence shows the desired bound.

Appendix C Proof of Lemma 2

Comparing the equations in (21) with (19), (20), we identify that

[TABLE]

where $\big{[}{\bm{e}}_{\bm{\theta}}^{(k)}\big{]}_{i}$ denotes the $i$ th block of ${\bm{e}}_{\bm{\theta}}^{(k)}$ . Using H1 and the said assumptions, we immediately see that

[TABLE]

which then implies (22). H1 implies that $\overline{g}_{t}$ is $B$ -Lipschitz continuous, therefore

[TABLE]

which implies (23).

Appendix D Proof of Theorem 1

Based on Proposition 2, our idea is to perform a perturbation analysis on the PDA algorithm. Without loss of generality, we assume $N=1$ and denote $\bm{\theta}=\bm{\theta}_{1}$ . To simplify notations, we also drop the subscript, denote the modified and regularized Lagrangian function as ${\mathcal{}L}=\overline{\mathcal{}L}_{\upsilon}$ . Furthermore, we denote the saddle point to (P’) as ${\bm{z}}^{\star}=(\bm{\theta}^{\star},\bm{\lambda}^{\star})$ .

Using the fact that $\bm{\theta}^{\star}={\mathcal{}P}_{{\mathcal{}C}}(\bm{\theta}^{\star})={\mathcal{}P}_{{\mathcal{}C}}\big{(}\bm{\theta}^{\star}-\gamma{\nabla}_{\bm{\theta}}{\mathcal{}L}(\bm{\theta}^{\star},\bm{\lambda}^{\star})\big{)}$ , we observe that in the primal update:

[TABLE]

where (a) is due to the projection inequality $\|{\mathcal{}P}_{{\mathcal{}C}}({\bm{x}}-{\bm{y}})\|\leq\|{\bm{x}}-{\bm{y}}\|$ . Furthermore, using the Young’s inequality, for any $c_{0},c_{1}>0$ , we have

[TABLE]

Similarly, in the dual update we get,

[TABLE]

Summing up the two inequalities gives:

[TABLE]

where (a) uses the strong monotonicity and smoothness of the map $\bm{\Phi}$ . Setting $c_{1}=\upsilon/2$ yields

[TABLE]

Observe that we can choose $\gamma$ such that $1-\gamma\upsilon+\gamma^{2}(1+c_{0})L_{\Phi}^{2}<1$ . Moreover, the above inequality implies that $\|{\bm{z}}^{(k)}-{\bm{z}}^{\star}\|^{2}$ evaluates to

[TABLE]

If $E_{k}\leq\overline{E}$ for all $k$ , then ${\bm{z}}^{(k)}$ converges to a neighborhood of ${\bm{z}}^{\star}$ of radius

[TABLE]

Setting $c_{0}=1$ concludes the proof.

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. Koshal, A. Nedić, and U. V. Shanbhag, “Multiuser optimization: Distributed algorithms and error analysis,” SIAM Journal on Optimization , vol. 21, no. 3, pp. 1046–1081, 2011.
2[2] D. P. Palomar and M. Chiang, “A tutorial on decomposition methods for network utility maximization,” IEEE JSAC , vol. 24, no. 8, pp. 1439–1451, 2006.
3[3] S. Sundaram and C. N. Hadjicostis, “Distributed function calculation via linear iterative strategies in the presence of malicious agents,” IEEE TAC , vol. 56, no. 7, pp. 1495–1508, 2011.
4[4] F. Pasqualetti, A. Bicchi, and F. Bullo, “Consensus computation in unreliable networks: A system theoretic approach,” IEEE TAC , vol. 57, no. 1, pp. 90–104, 2012.
5[5] R. Gentz, S. X. Wu, H.-T. Wai, A. Scaglione, and A. Leshem, “Data injection attacks in randomized gossiping,” IEEE TSIPN , vol. 2, no. 4, pp. 523–538, 2016.
6[6] S. Sundaram and B. Gharesifard, “Distributed optimization under adversarial nodes,” IEEE TAC , 2018.
7[7] Y. Chen, S. Kar, and J. Moura, “Resilient distributed estimation: Sensor attacks,” IEEE Transactions on Automatic Control , 2018.
8[8] J. Feng, H. Xu, and S. Mannor, “Distributed robust learning,” ar Xiv preprint ar Xiv:1409.5937 , 2014.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Resilient Distributed Optimization Algorithms for Resource Allocation

Abstract

1 Introduction

1.1 Motivating Examples

2 Primal-dual Algorithm for Resource Allocation

Fact 1**.**

3 Problem Formulation

H​​ 1**.**

Lemma 1**.**

4 Robust Distributed Resource Allocation

Proposition 1**.**

Proposition 2**.**

Lemma 2**.**

4.1 Convergence Analysis

Theorem 1**.**

5 Conclusions

Acknowledgement

Appendix A Proof of Lemma 1

Appendix B Proof of Proposition 1

Appendix C Proof of Lemma 2

Appendix D Proof of Theorem 1

Fact 1.

H 1.

Lemma 1.

Proposition 1.

Proposition 2.

Lemma 2.

Theorem 1.