Robust Group LASSO Over Decentralized Networks

Manxi Wang; Yongcheng Li; Xiaohan Wei; Qing Ling

arXiv:1701.03043·cs.DC·January 12, 2017

Robust Group LASSO Over Decentralized Networks

Manxi Wang, Yongcheng Li, Xiaohan Wei, Qing Ling

PDF

Open Access

TL;DR

This paper develops decentralized algorithms for robustly recovering group sparse signals over multi-agent networks with sparse errors, using dynamic consensus strategies to replace centralized processing.

Contribution

It introduces a decentralized approach for robust group LASSO signal recovery that avoids reliance on a central fusion center, utilizing dynamic average consensus techniques.

Findings

01

Algorithms effectively recover signals in simulations.

02

Decentralized method matches centralized performance.

03

Dynamic consensus enables real-time tracking.

Abstract

This paper considers the recovery of group sparse signals over a multi-agent network, where the measurements are subject to sparse errors. We first investigate the robust group LASSO model and its centralized algorithm based on the alternating direction method of multipliers (ADMM), which requires a central fusion center to compute a global row-support detector. To implement it in a decentralized network environment, we then adopt dynamic average consensus strategies that enable dynamic tracking of the global row-support detector. Numerical experiments demonstrate the effectiveness of the proposed algorithms.

Tables2

Table 1. Table 1 : Algorithm 1: Centralized Robust Group LASSO

Given: measurement

𝐌

; sensing matrices

𝐀_{(l)}

; parameters

β

and

τ

Initialize: signal

𝐘 ​ (0) = 𝟎

; error

𝐒 ​ (0) = 𝟎

; multiplier

𝐙 ​ (0) = 𝟎

while not converged (

t = 0, 1, \dots

) for all

l

do

for

p = 0, 1, \dots, P - 1

𝐯_{l} ​ (t + \frac{p}{P}) = 𝐀_{(l)}^{T} ​ (𝐀_{(l)} ​ 𝐲_{l} ​ (t + \frac{p}{P}) + 𝐬_{l} ​ (t) - 𝐦_{l} - \frac{𝐳_{l} ​ (t)}{β})

u_{n ​ l} ​ (t + \frac{p}{P}) = y_{n ​ l} ​ (t + \frac{p}{P}) - τ ​ v_{n ​ l} ​ (t + \frac{p}{P})

,

\forall n

y_{n ​ l} ​ (t + \frac{p + 1}{P}) = \frac{y_{n ​ l} ​ (t + \frac{p}{P})}{{‖ 𝐮^{n} ​ (t + \frac{p}{P}) ‖}_{2}} ​ \max (0, {‖ 𝐮^{n} ​ (t + \frac{p}{P}) ‖}_{2} - \frac{τ}{β})

,

\forall n

end for

𝐰_{l} ​ (t + 1) = 𝐦_{l} - 𝐀_{(l)} ​ 𝐲_{l} ​ (t + 1) - \frac{𝐳_{l} ​ (t)}{β}

s_{m ​ l} ​ (t + 1) = sgn ​ (w_{m ​ l} ​ (t + 1)) ​ \max (0, | w_{m ​ l} ​ (t + 1) | - \frac{λ}{β})

,

\forall m

𝐳_{l} ​ (t + 1) = 𝐳_{l} ​ (t) - β ​ (𝐀_{(l)} ​ 𝐲_{l} ​ (t + 1) + 𝐬_{l} ​ (t + 1) - 𝐦_{l})

end while

Table 2. Table 2 : Algorithm 2: Decentralized Robust Group LASSO

Given: measurement

𝐌

; sensing matrices

𝐀_{(l)}

; parameters

β

and

τ

Initialize: signal

𝐘 ​ (0) = 𝟎

; error

𝐒 ​ (0) = 𝟎

; multiplier

𝐙 ​ (0) = 𝟎

while not converged (

t = 0, 1, \dots

) agent

l

do

for

p = 0, 1, \dots, P - 1

𝐯_{l} ​ (t + \frac{p}{P}) = 𝐀_{(l)}^{T} ​ (𝐀_{(l)} ​ 𝐲_{l} ​ (t + \frac{p}{P}) + 𝐬_{l} ​ (t) - 𝐦_{l} - \frac{𝐳_{l} ​ (t)}{β})

u_{n ​ l} ​ (t + \frac{p}{P}) = y_{n ​ l} ​ (t + \frac{p}{P}) - τ ​ v_{n ​ l} ​ (t + \frac{p}{P})

,

\forall n

h_{n ​ l} ​ (t + \frac{p}{P})

is updated through an average consensus strategy

y_{n ​ l} ​ (t + \frac{p + 1}{P}) = \frac{y_{n ​ l} ​ (t + \frac{p}{P})}{\sqrt{L ​ h_{n ​ l} ​ (t + \frac{p}{P})}} ​ \max (0, \sqrt{L ​ h_{n ​ l} ​ (t + \frac{p}{P})} - \frac{τ}{β})

,

\forall n

end for

𝐰_{l} ​ (t + 1) = 𝐦_{l} - 𝐀_{(l)} ​ 𝐲_{l} ​ (t + 1) - \frac{𝐳_{l} ​ (t)}{β}

,

s_{m ​ l} ​ (t + 1) = sgn ​ (w_{m ​ l} ​ (t + 1)) ​ \max (0, | w_{m ​ l} ​ (t + 1) | - \frac{λ}{β})

,

\forall m

𝐳_{l} ​ (t + 1) = 𝐳_{l} ​ (t) - β ​ (𝐀_{(l)} ​ 𝐲_{l} ​ (t + 1) + 𝐬_{l} ​ (t + 1) - 𝐦_{l})

end while

Equations51

m_{l} = A_{(l)} y_{l} + s_{l},

m_{l} = A_{(l)} y_{l} + s_{l},

M = [A_{(1)} y_{1}, \dots, A_{(L)} y_{L}] + S .

M = [A_{(1)} y_{1}, \dots, A_{(L)} y_{L}] + S .

Y min ∥ Y ∥_{2, 1} + λ ∥ M - [A_{(1)} y_{1}, \dots, A_{(L)} y_{L}] ∥_{F}^{2} .

Y min ∥ Y ∥_{2, 1} + λ ∥ M - [A_{(1)} y_{1}, \dots, A_{(L)} y_{L}] ∥_{F}^{2} .

Y, S min

Y, S min

s . t .

∥ Y ∥_{2, 1} + λ ∥ S ∥_{1}

∥ Y ∥_{2, 1} + λ ∥ S ∥_{1}

+ \frac{β}{2} ∥ [A_{(1)} y_{1}, \dots, A_{(L)} y_{L}] + S - M ∥_{F}^{2},

Y (t + 1)

Y (t + 1)

+ \frac{β}{2} ∥ [A_{(1)} y_{1}, \dots, A_{(L)} y_{L}] + S (t) - M - \frac{Z ( t )}{β} ∥_{F}^{2} .

S (t + 1) = arg S min λ ∥ S ∥_{1}

S (t + 1) = arg S min λ ∥ S ∥_{1}

+ \frac{β}{2} ∥ [A_{(1)} y_{1} (t + 1), \dots, A_{(L)} y_{L} (t + 1)] + S - M - \frac{Z ( t )}{β} ∥_{F}^{2} .

s_{ml}(t+1)=\textrm{sgn}(w_{ml}(t+1))\max\big{(}0,|w_{ml}(t+1)|-\frac{\lambda}{\beta}\big{)},

s_{ml}(t+1)=\textrm{sgn}(w_{ml}(t+1))\max\big{(}0,|w_{ml}(t+1)|-\frac{\lambda}{\beta}\big{)},

Z (t + 1) = Z (t)

Z (t + 1) = Z (t)

\displaystyle-\beta\big{(}[\mathbf{A}_{(1)}\mathbf{y}_{1}(t+1),\cdots,\mathbf{A}_{(L)}\mathbf{y}_{L}(t+1)]+\mathbf{S}(t+1)-\mathbf{M}\big{)}.

Y min ∥ Y ∥_{2, 1} + β ⟨ V (t + \frac{p}{P}), Y ⟩ + \frac{β}{2 τ} ∥ Y - Y (t + \frac{p}{P}) ∥_{F}^{2},

Y min ∥ Y ∥_{2, 1} + β ⟨ V (t + \frac{p}{P}), Y ⟩ + \frac{β}{2 τ} ∥ Y - Y (t + \frac{p}{P}) ∥_{F}^{2},

\mathbf{v}_{l}(t+\frac{p}{P})=\mathbf{A}_{(l)}^{T}\big{(}\mathbf{A}_{(l)}\mathbf{y}_{l}(t+\frac{p}{P})+\mathbf{s}_{l}(t)-\mathbf{m}_{l}-\frac{\mathbf{z}_{l}(t)}{\beta}\big{)}.

\mathbf{v}_{l}(t+\frac{p}{P})=\mathbf{A}_{(l)}^{T}\big{(}\mathbf{A}_{(l)}\mathbf{y}_{l}(t+\frac{p}{P})+\mathbf{s}_{l}(t)-\mathbf{m}_{l}-\frac{\mathbf{z}_{l}(t)}{\beta}\big{)}.

Y min ∥ Y ∥_{2, 1} + \frac{β}{2 τ} ∥ Y - Y (t + \frac{p}{P}) + τ V (t + \frac{p}{P}) ∥_{F}^{2},

Y min ∥ Y ∥_{2, 1} + \frac{β}{2 τ} ∥ Y - Y (t + \frac{p}{P}) + τ V (t + \frac{p}{P}) ∥_{F}^{2},

\displaystyle\mathbf{y}^{n}(t+\frac{p+1}{P})=\frac{\mathbf{u}^{n}(t+\frac{p}{P})}{\|\mathbf{u}^{n}(t+\frac{p}{P})\|_{2}}\max\big{(}0,\|\mathbf{u}^{n}(t+\frac{p}{P})\|_{2}-\frac{\tau}{\beta}\big{)}.

\displaystyle\mathbf{y}^{n}(t+\frac{p+1}{P})=\frac{\mathbf{u}^{n}(t+\frac{p}{P})}{\|\mathbf{u}^{n}(t+\frac{p}{P})\|_{2}}\max\big{(}0,\|\mathbf{u}^{n}(t+\frac{p}{P})\|_{2}-\frac{\tau}{\beta}\big{)}.

∥ u^{n} (t + \frac{p}{P}) ∥_{2} = L^{\frac{1}{2}} (\frac{1}{L} l = 1 \sum L u_{n l}^{2} (t + \frac{p}{P}))^{\frac{1}{2}} = (L h_{n l} (t + \frac{p}{P}))^{\frac{1}{2}},

∥ u^{n} (t + \frac{p}{P}) ∥_{2} = L^{\frac{1}{2}} (\frac{1}{L} l = 1 \sum L u_{n l}^{2} (t + \frac{p}{P}))^{\frac{1}{2}} = (L h_{n l} (t + \frac{p}{P}))^{\frac{1}{2}},

h_{n l} (t + \frac{p}{P}) ≜ \frac{1}{L} l = 1 \sum L u_{n l}^{2} (t + \frac{p}{P})

h_{n l} (t + \frac{p}{P}) ≜ \frac{1}{L} l = 1 \sum L u_{n l}^{2} (t + \frac{p}{P})

h^{n} (t + \frac{p}{P}) = Σ^{K} (u^{n} (t + \frac{p}{P}))^{2},

h^{n} (t + \frac{p}{P}) = Σ^{K} (u^{n} (t + \frac{p}{P}))^{2},

\sigma_{rl}=\left\{\begin{array}[]{ll}\min\{\frac{1}{d_{r}},\frac{1}{d_{l}}\},&\hbox{if $(r,l)\in\mathcal{E}$;}\\ \sum_{(r,l)\in\mathcal{E}}\max\left\{0,\frac{1}{d_{r}}-\frac{1}{d_{l}}\right\},&\hbox{if $r=l$;}\\ 0,&\hbox{else.}\end{array}\right.

\sigma_{rl}=\left\{\begin{array}[]{ll}\min\{\frac{1}{d_{r}},\frac{1}{d_{l}}\},&\hbox{if $(r,l)\in\mathcal{E}$;}\\ \sum_{(r,l)\in\mathcal{E}}\max\left\{0,\frac{1}{d_{r}}-\frac{1}{d_{l}}\right\},&\hbox{if $r=l$;}\\ 0,&\hbox{else.}\end{array}\right.

h_{n l} (t + \frac{p}{P}) = r \neq = l \sum σ_{r l} (h_{n r} (t + \frac{p - 1}{P}) - h_{n l} (t + \frac{p - 1}{P}))

h_{n l} (t + \frac{p}{P}) = r \neq = l \sum σ_{r l} (h_{n r} (t + \frac{p - 1}{P}) - h_{n l} (t + \frac{p - 1}{P}))

+ h_{n l} (t + \frac{p - 1}{P}) + u_{n l}^{2} (t + \frac{p}{P}) - u_{n l}^{2} (t + \frac{p - 1}{P}) .

+ h_{n l} (t + \frac{p - 1}{P}) + u_{n l}^{2} (t + \frac{p}{P}) - u_{n l}^{2} (t + \frac{p - 1}{P}) .

\tilde{h}_{n l} (t + \frac{p}{P}) = u_{n l}^{2} (t + \frac{p}{P}) - 2 u_{n l}^{2} (t + \frac{p - 1}{P}) + u_{n l}^{2} (t + \frac{p - 2}{P})

\tilde{h}_{n l} (t + \frac{p}{P}) = u_{n l}^{2} (t + \frac{p}{P}) - 2 u_{n l}^{2} (t + \frac{p - 1}{P}) + u_{n l}^{2} (t + \frac{p - 2}{P})

+ \tilde{h}_{n l} (t + \frac{p - 1}{P}) + r \neq = l \sum σ_{r l} (\tilde{h}_{n r} (t + \frac{p - 1}{P}) - \tilde{h}_{n l} (t + \frac{p - 1}{P})),

+ \tilde{h}_{n l} (t + \frac{p - 1}{P}) + r \neq = l \sum σ_{r l} (\tilde{h}_{n r} (t + \frac{p - 1}{P}) - \tilde{h}_{n l} (t + \frac{p - 1}{P})),

h_{n l} (t + \frac{p}{P}) = \tilde{h}_{n l} (t + \frac{p}{P})

h_{n l} (t + \frac{p}{P}) = \tilde{h}_{n l} (t + \frac{p}{P})

+ h_{n l} (t + \frac{p - 1}{P}) + r \neq = l \sum σ_{r l} (h_{n r} (t + \frac{p - 1}{P}) - h_{n l} (t + \frac{p - 1}{P})) .

+ h_{n l} (t + \frac{p - 1}{P}) + r \neq = l \sum σ_{r l} (h_{n r} (t + \frac{p - 1}{P}) - h_{n l} (t + \frac{p - 1}{P})) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Distributed Sensor Networks and Detection Algorithms · Target Tracking and Data Fusion in Sensor Networks

Full text

Robust Group LASSO Over Decentralized Networks

Abstract

This paper considers the recovery of group sparse signals over a multi-agent network, where the measurements are subject to sparse errors. We first investigate the robust group LASSO model and its centralized algorithm based on the alternating direction method of multipliers (ADMM), which requires a central fusion center to compute a global row-support detector. To implement it in a decentralized network environment, we then adopt dynamic average consensus strategies that enable dynamic tracking of the global row-support detector. Numerical experiments demonstrate the effectiveness of the proposed algorithms.

**Index Terms— ** Decentralized optimization, dynamic average consensus, group sparsity, alternating direction method of multipliers (ADMM)

1 Introduction

Suppose that $L$ distributed agents constitute a bidirectionally connected network and sense correlated signals under sparse measurement errors. The measurement equation of agent $l$ is

[TABLE]

where $\mathbf{m}_{l}\in\mathcal{R}^{M}$ is the measurement vector, $\mathbf{A}_{(l)}$ is the sensing matrix, $\mathbf{y}_{l}\in\mathcal{R}^{N}$ is the unknown signal vector, and $\mathbf{s}_{l}\in\mathcal{R}^{M}$ is the unknown sparse error vector. We are particularly interested in a certain correlation pattern of the signal vectors, where the signal matrix $\mathbf{Y}=[\mathbf{y}_{1},\ldots,\mathbf{y}_{L}]\in\mathcal{R}^{N\times L}$ is group sparse, meaning that $\mathbf{Y}$ is sparse and its nonzero entries appear in a small number of common rows. Define $\mathbf{M}\in\mathcal{R}^{M\times L}$ as the measurement matrix and $\mathbf{S}\in\mathcal{R}^{M\times L}$ as the sparse error matrix, the matrix form of the agents’ measurement equations is

[TABLE]

Given $\mathbf{M}$ and $\mathbf{A}_{(i)}$ ’s, the goal of the network is to recover $\mathbf{Y}$ and $\mathbf{S}$ from the linear measurement equation (2).

1.1 Robust Group LASSO Model

The recovery of group sparse (also known as block sparse [3] or jointly sparse [4]) signals finds a variety of applications such as direction-of-arrival estimation [5, 6], collaborative spectrum sensing [7, 8, 9] and motion detection [10]. A well-known model to recover group sparse signals is group LASSO (least absolute shrinkage and selection operator) [11], which solves

[TABLE]

Here $\lambda$ is a nonnegative trade-off parameter. A key assumption leading to the success of such model is the sub-Gaussianity of errors. However, in many applications, the measurements of the agents may be seriously contaminated or even missing due to uncertainties such as sensor failure or transmission errors. This kind of measurement errors are often sparse [12]. Hence, a natural extension of (3) is to exploit the structures of both the signal matrix $\mathbf{Y}$ and the sparse error matrix $\mathbf{S}$ by solving

[TABLE]

This model is termed as robust group LASSO, whose performance guarantee is given in [13]. Under mild conditions, the robust group LASSO model is able to simultaneously recover the true values of $\mathbf{Y}$ and $\mathbf{S}$ with high probability.

1.2 Our Contributions

This paper develops efficient algorithms to solve the robust group LASSO model (4). Our contributions are as follows.

(i)

We propose a centralized algorithm that is based on the alternating direction method of multipliers (ADMM), a powerful operator-splitting technique. One subproblem of the centralized algorithm is the traditional group LASSO model, which is approximately solved by a block coordinate descent (BCD) approach through successively estimating the row-support of the signal matrix $\mathbf{Y}$ . 2. (ii)

We develop decentralized versions of the above algorithm that are suitable for autonomous computation over large-scale networks. Since estimating the row-support of the signal matrix $\mathbf{Y}$ requires collaborative information fusion of all the agents, we propose to achieve inexact information fusion through dynamic average consensus techniques, which only require information exchange among neighboring agents.

1.3 Notations

Matrices are denoted by bold uppercase letters and vectors are denoted by bold lowercase letters. For a matrix $\mathbf{D}$ , $\mathbf{d}^{i}$ denotes its $i$ -th row, $\mathbf{d}_{j}$ denotes its $j$ -th column, while $d_{ij}$ denotes its $(i,j)$ -th element. The $\ell_{2,1}$ -norm of $\mathbf{D}$ is $\|\mathbf{D}\|_{2,1}\triangleq\sum_{i}(\sum_{j}d_{ij}^{2})^{1/2}$ , the $\ell_{1}$ -norm is $\|\mathbf{D}\|_{1}\triangleq\sum_{i}\sum_{j}|d_{ij}|$ , and the Frobenius norm is $\|\mathbf{D}\|_{F}\triangleq(\sum_{i}\sum_{j}d_{ij}^{2})^{1/2}$ .

The multi-agent network is described as a bidirectional graph $(\mathcal{L},\mathcal{E})$ . If two agents $r,l\in\mathcal{L}$ are neighbors, then they can communicate with each other within one hop, and $(r,l)\in\mathcal{E}$ is a bidirectional communication edge.

2 Centralized Robust Group LASSO

Optimally solving (4) is nontrivial since the objective function is a weighted summation of two nonsmooth functions $\|\mathbf{Y}\|_{2,1}$ and $\|\mathbf{S}\|_{1}$ , where $\mathbf{Y}$ and $\mathbf{S}$ are entangled in the constraint. Therefore we resort to the alternating direction method of multipliers (ADMM) to split the two entangled variables $\mathbf{Y}$ and $\mathbf{S}$ such that the resulting subproblems are easier to solve.

2.1 Using ADMM to Solve (4)

The augmented Lagrangian function of (4) is

[TABLE]

where $\mathbf{Z}\in\mathcal{R}^{M\times L}$ is the Lagrange multiplier and $\beta$ is a positive penalty parameter. The ADMM alternatingly minimizes the augmented Lagrangian function with respect to $\mathbf{Y}$ and $\mathbf{S}$ , and then updates the Lagrange multiplier $\mathbf{Z}$ [14]. At time $t$ , the ADMM works as follows.

First, fixing $\mathbf{S}=\mathbf{S}(t)$ and $\mathbf{Z}=\mathbf{Z}(t)$ , we minimize the augmented Lagrangian function respect to $\mathbf{Y}$ to get $\mathbf{Y}(t+1)$ . Simple manipulation shows that it is equivalent to

[TABLE]

Note that (5) is a standard group lasso problem that generally does not have a closed-form solution. We will develop an efficient algorithm to solve (5) later in this section.

Second, fixing $\mathbf{Y}=\mathbf{Y}(t+1)$ and $\mathbf{Z}=\mathbf{Z}(t)$ , we minimize the augmented Lagrangian function respect to $\mathbf{S}$ to get $\mathbf{S}(t+1)$ . Again, combining the linear term with the quadratic term of $\mathbf{S}$ yields

[TABLE]

Denoting $\mathbf{W}(t+1)=\mathbf{M}-[\mathbf{A}_{(1)}\mathbf{y}_{1}(t+1),\cdots,\mathbf{A}_{(L)}\mathbf{y}_{L}(t+1)]-\mathbf{Z}(t)/\beta$ , (6) has a closed-form solution given by

[TABLE]

where $\textrm{sgn}(\cdot)$ is the sign function; $s_{ml}(t+1)$ and $w_{ml}(t+1)$ denote the $(m,l)$ -th entries of $\mathbf{S}(t+1)$ and $\mathbf{W}(t+1)$ , respectively. Note that the term $|s_{ml}(t+1)|$ can be viewed as the support detector of the $(m,l)$ -th element of $\mathbf{S}$ . If $|s_{ml}(t+1)|$ is smaller than the threshold $\lambda/\beta$ , then $s_{ml}(t+1)$ is set to be zero.

Finally, given $\mathbf{Y}=\mathbf{Y}(t+1)$ and $\mathbf{S}=\mathbf{S}(t+1)$ , the Lagrange multiplier $\mathbf{Z}$ is updated according to the following formula

[TABLE]

Since the update of $\mathbf{S}$ in (7) and the update of $\mathbf{Z}$ in (8) are both simple, now we focus on the update of $\mathbf{Y}$ in (5) that is the bottleneck of the ADMM. Observe that in (5) the $\ell_{2,1}$ -norm term is separable with respect to $\mathbf{y}_{i}$ ’s but nonsmooth, while the Frobenius term is smooth but nonseparable with respect to $\mathbf{y}_{i}$ ’s. Therefore, in this paper we solve (5) with the block coordinate descent (BCD) algorithm that has shown to be an efficient tool to handle this special problem structure [15, 16, 17].

2.2 Using BCD to Solve (5)

To set up the iterative BCD algorithm that solves (5) at time $t$ , we divide time $t$ into $P$ slots. At time $t$ slot $p$ ( $p=0,1,\cdots,P-1$ ), we linearize the Frobenius norm term in (5) with respect to $\mathbf{Y}(t+\frac{p}{P})$ and add an extra quadratic regularization term, which gives

[TABLE]

where $\tau$ is a positive proximal parameter and the $l$ -th column of $\mathbf{V}(t+\frac{p}{P})\in\mathcal{R}^{N\times L}$ is defined as

[TABLE]

Note that (9) is equivalent to

[TABLE]

which has a closed-form solution given by the soft-thresholding operator [18]. Denote $\mathbf{U}(t+\frac{p}{P})=\mathbf{Y}(t+\frac{p}{P})-\tau\mathbf{V}(t+\frac{p}{P})\in\mathcal{R}^{N\times L}$ whose $n$ -th row is given by $\mathbf{u}^{n}(t+\frac{p}{P})=\mathbf{y}^{n}(t+\frac{p}{P})-\tau\mathbf{v}^{n}(t+\frac{p}{P})$ . Also denote $\mathbf{Y}(t+\frac{p+1}{P})\in\mathcal{R}^{N\times L}$ as the solution of (11). The $n$ -th row of $\mathbf{Y}(t+\frac{p+1}{P})$ is

[TABLE]

Again, note that the term $\|\mathbf{u}^{n}(t+\frac{p}{P})\|_{2}$ can be viewed as the row-support detector of the $n$ -th row of $\mathbf{Y}$ . If $\|\mathbf{u}^{n}(t+\frac{p}{P})\|_{2}$ is smaller than the threshold $\tau/\beta$ , then $\mathbf{y}^{n}(t+\frac{p+1}{P})$ is set to be zero.

2.3 Implementation of Centralized Robust Group LASSO

The centralized ADMM to solve the robust group LASSO model (4) is summarized in Table I. Each iteration of the ADMM includes an inner-loop BCD subroutine that updates $\mathbf{Y}$ through solving (5), the update of $\mathbf{S}$ that has a closed-form solution (7), and the update of $\mathbf{Z}$ in (8). The ADMM parameter $\beta$ can be any positive value, though its choice may influence the convergence rate. The BCD parameter $\tau$ is set to be the minimum of largest eigenvalues of $\mathbf{A}_{(l)}^{T}\mathbf{A}_{(l)},~{}l=1,2,\cdots,L$ that guarantees the convergence of the BCD subroutine [15, 16, 17]. As long as $\tau$ is properly chosen and $P$ is large enough, the BCD subroutine is able to solve the subproblem (5) with enough accuracy such that the ADMM converges to the global minimum of the convex program (4).

The algorithm outlined in Table I is centralized, which means that a fusion center is necessary to gather information from all the agents and conduct optimization. This centralized scheme is sensitive to the failure of the fusion center, requires multi-hop communication within the network, and is hence unscalable with respect to the networks size. In view of the need of decentralized optimization for large-scale networks, we discuss how to implement it in a decentralized manner, as shown in the next section.

3 Decentralized Robust Group LASSO

Observe that Algorithm 1 is naturally distributed, except for the update of $y_{nl}(t+\frac{p+1}{P})$ , which involves calculating the global row-support detector $\|\mathbf{u}^{n}(t+\frac{p}{P})\|_{2}$ across agents. Hence, given the vector $\mathbf{u}^{n}(t+\frac{p}{P})$ , the key to the decentralized implementation of Algorithm 1 is how to calculate its $\ell_{2}$ -norm $\|\mathbf{u}^{n}(t+\frac{p}{P})\|_{2}$ in a decentralized manner. Recall that

[TABLE]

where

[TABLE]

is the average of the squares. Therefore, the problem becomes: Suppose each agent $l$ holds the value of $u_{nl}^{2}(t+\frac{p}{P})$ , how can we design efficient strategies to (exactly or inexactly) calculate their mean $h_{nl}(t+\frac{p}{P})$ in a decentralized manner? Below we consider three approaches to obtain the average.

3.1 Static Average Consensus

The first strategy comes from the classic average consensus algorithm [19]. Calculate

[TABLE]

where $\mathbf{h}^{n}(t+\frac{p}{P})\in\mathcal{R}^{1\times L}$ is a row vector containing all $h_{nl}(t+\frac{p}{P})$ , $\left(\mathbf{u}^{n}(t+\frac{p}{P})\right)^{2}$ means element-wise squares of $\mathbf{u}^{n}(t+\frac{p}{P})$ , $K$ is a large iteration number, and $\bm{\Sigma}$ is the mixing matrix. The mixing matrix $\bm{\Sigma}$ is doubly stochastic, and its $(r,l)$ -th element $\sigma_{rl}$ is nonzero if and only if $(r,l)\in\mathcal{E}$ or $r=l$ . A typical choice of $\bm{\Sigma}$ follows the Metropolis-Hastings rule [19],

[TABLE]

Here $d_{l}$ is the degree of agent $l$ .

Obviously, the graph-sparse structure of the mixing matrix $\bm{\Sigma}$ enables decentralized computation of $\mathbf{h}^{n}(t+\frac{p}{P})$ . According to the theory of average consensus [19], if $K$ goes to infinity, then all the elements of $\mathbf{h}^{n}(t+\frac{p}{P})$ converge to the expected average $(1/L)\sum_{l=1}^{L}u_{nl}^{2}(t+\frac{p}{P})$ , in which the decentralized implementation is equivalent to its centralized counterpart. However, increasing $K$ means introducing more rounds of communication and computation, implying that setting $K$ large is inefficient. On the other hand, setting $K$ small (say, $K=1$ ) often leads to unsatisfactory result.

3.2 Dynamic Average Consensus

The above-mentioned dilemma motivates us to introduce a new scheme to dynamically calculate the row-support detector. To simplify the algorithmic protocol, we allow neighboring agents to exchange only one round of information. Under this setting, every agent holds a dynamic value $u_{nl}^{2}(t+\frac{p}{P})$ , while all the agents manage to track their dynamic average with one round of communication. Apparently, if the values of $u_{nl}^{2}(t+\frac{p}{P})$ change irregularly, the agents have no chance to reach their exact dynamic average. Nevertheless, observe that if the values of $u_{nl}^{2}(t+\frac{p}{P})$ converge to their steady states, convergence of the dynamic average will be possible. We consider two dynamic average consensus strategies proposed by [20].

First-order dynamic average consensus. Calculate

[TABLE]

Second-order dynamic average consensus. Calculate

[TABLE]

3.3 Implementation of Centralized Robust Group LASSO

The decentralized group LASSO algorithm is outlined in Table II. It is very close to the centralized algorithm in Table I, except that the row-support detector is successively approximated through static and dynamic average consensus strategies.

If the static average consensus strategy is adopted, then at time $t$ slot $p$ , the network needs $K$ rounds of information exchange. The number of round reduces to one in the two dynamic average consensus strategies. Observe that in each round of first-order dynamic average consensus, agent $l$ requires $h_{nr}$ from all of its neighbors $r$ . However, in each round of second-order dynamic average consensus, agent $l$ requires both $h_{nr}$ and $\tilde{h}_{nr}$ from all of its neighbors $r$ . Therefore, the second-order strategy doubles the communication cost per time slot, compared to its first-order counterpart.

With particular note, when $K$ is set to be large enough in the static average consensus strategy, the average consensus is exact. Therefore, the resulting decentralized algorithm enjoys the same convergence guarantee as the centralized one, at the cost of unaffordable communication cost. Embedding the two dynamic average consensus strategies saves remarkable communication cost, but makes convergence analysis a challenging task. We will leave it as our future work.

In addition, to avoid possible computational instability, we also set safeguards to the value of $h_{nl}(t+\frac{p}{P})$ . If going beyond the region of $[h_{\min},h_{\max}]$ , its value is set to the nearest boundary.

4 Numerical Experiments

In the numerical experiments, we consider a network of $L=30$ agents. The dimension of every signal vector is $N=200$ , while the dimension of every measurement vector is $M=30$ . The group sparse signal matrix $\mathbf{Y}\in\mathcal{R}^{200\times 30}$ has $10$ nonzero rows (row sparsity ratio is $5\%$ ), whose positions are uniformly randomly chosen. The amplitudes of the nonzero elements follow i.i.d. uniform distribution within $[-50,50]$ . Elements of every sensing matrix $\mathbf{A}_{(l)}\in\mathcal{R}^{30\times 200}$ follow i.i.d. standard normal distribution. The sparse error matrix $\mathbf{S}\in\mathcal{R}^{30\times 30}$ has $90$ nonzero elements (sparsity ratio is $10\%$ ), whose positions are uniformly randomly chosen and the amplitudes follow i.i.d. uniform distribution within $[-50,50]$ .

In the robust group LASSO model, the weight parameter $\lambda=1$ . The ADMM parameter $\beta$ is also set as $1$ . The BCD parameter $\tau$ is set to be the minimum of largest eigenvalues of $\mathbf{A}_{(l)}^{T}\mathbf{A}_{(l)},~{}l=1,2,\cdots,L$ . Every iteration of the ADMM algorithm is divided into $P=50$ slots so as to run the BCD subroutine. For the static average consensus strategy, we let $K=50$ , meaning that each slot requires $50$ rounds of communication. For the dynamic average consensus strategies, we let the safeguards $h_{\min}=1$ and $h_{\max}=\infty$ . The performance metric is relative error, defined as the Frobenius distance between the true $[\mathbf{Y}^{T}~{}\mathbf{S}^{T}]$ solving (4) and the estimated one by ADMM, normalized by the Frobenius norm of $[\mathbf{Y}^{T}~{}\mathbf{S}^{T}]$ .

We first compare the centralized algorithm and the three decentralized ones, as depicted in Fig. 1. The connectivity ratio of the network (the percentage of randomly connected edges out of all possible ones) is $50\%$ . The curve of the centralized algorithm coincides with that using static average consensus. Recall that static average consensus incurs $50$ round of communications at every time slot, and is hence expensive. In contrast, the dynamic average consensus strategies demonstrate satisfactory convergence properties, though yielding slightly degraded estimates. Particularly, the second-order dynamic average consensus is close to the centralized one in terms of the relative error.

In the second set of numerical experiments, we vary the connectivity ratio to observe its impact on the decentralized algorithms, as shown in Fig. 2. When the connectivity ratio decreases, the performance of the static average consensus degrades significantly. The reason is that a lower connectivity ratio reduces the speed of network information fusion, and hence makes the static average consensus less accurate under a given $K$ . The two dynamic average consensus strategies, on the other hand, are not very sensitive to the variation of connectivity ratio.

The numerical experiments validate the effectiveness of using dynamic average consensus to decentralize computation over networks. Though its theoretical properties in tracking problems have been investigated [20], its interplay with the overall optimization scheme is still unclear, and shall be our future research focus.

Acknowledgement. Qing Ling is supported in part by NSF China grant 61573331 and NSF Anhui grant 1608085QF130.

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1]
2[2]
3[3] Y. Eldar, P. Kuppinger, and H. Bölcskei, “Block-sparse signals: Uncertainty relations and efficient recovery,” IEEE Transactions on Signal Processing , vol. 58, no. 6, pp. 3042–3054, 2010.
4[4] M. E. Davis and Y. C. Eldar, “Rank awareness in joint sparse recovery,” IEEE Transactions on Information Theory , vol. 58, no. 2, pp. 1135-146, 2012.
5[5] D. Malioutov, M. Çetin, and A. S. Willsky, “A sparse signal reconstruction perspective for source localization with sensor arrays,” IEEE Transactions on Signal Processing , vol. 53, no. 8, pp. 3010–3022, 2005.
6[6] X. Wei, Y. Yuan, and Q. Ling, “DOA estimation using a greedy block coordinate descent algorithm,” IEEE Transactions on Signal Processing , vol. 60, no. 12 pp. 6382–6394, 2012.
7[7] F. Zeng, C. Li and Z. Tian, “Distributed compressive spectrum sensing in cooperative multihop cognitive networks,” IEEE Journal of Selected Topics in Signal Processing , vol. 5, no. 2, pp. 37–48, 2011.
8[8] J. Meng, W. Yin, H. Li, E. Hossain, and Z. Han, “Collaborative spectrum sensing from sparse observations in cognitive radio networks,” IEEE Journal on Selected Areas in Communications , vol. 29, no. 2, pp. 327–337, 2011.