Linearly Constrained Smoothing Group Sparsity Solvers in Off-grid Model

Cheng-Yu Hung; Mostafa Kaveh

arXiv:1903.07164·eess.SP·June 4, 2019

Linearly Constrained Smoothing Group Sparsity Solvers in Off-grid Model

Cheng-Yu Hung, Mostafa Kaveh

PDF

Open Access

TL;DR

This paper develops efficient algorithms for off-grid DoA estimation in compressed sensing, addressing matrix perturbations with various optimization formulations and convergence analyses.

Contribution

It introduces novel group-sparsity solvers using ADMM, Nesterov smoothing, and primal-dual methods tailored for off-grid model perturbations.

Findings

01

Algorithms demonstrate high accuracy in numerical simulations.

02

Proposed methods converge efficiently with reduced computational cost.

03

Effective handling of off-grid effects in compressed sensing scenarios.

Abstract

In compressed sensing, the sensing matrix is assumed perfectly known. However, there exists perturbation in the sensing matrix in reality due to sensor offsets or noise disturbance. Directions-of-arrival (DoA) estimation with off-grid effect satisfies this situation, and can be formulated into a (non)convex optimization problem with linear inequalities constraints, which can be solved by the interior point method (using the CVX tools), but at a large computational cost. In this work, in order to design efficient algorithms, we consider various alternative formulations, such as unconstrained formulation, primal-dual formulation, or conic formulation to develop group-sparsity promoted solvers. First, the consensus alternating direction method of multipliers (C-ADMM) is applied. Then, iterative algorithms for the BPDN formulation is proposed by combining the Nesterov smoothing technique…

Equations172

y = A s + n,

y = A s + n,

y = (A + B Γ) s + n,

y = (A + B Γ) s + n,

x \in R^{n} min F (x), F (x) := i = 1 \sum n f_{i} (x),

x \in R^{n} min F (x), F (x) := i = 1 \sum n f_{i} (x),

x \in R^{n} min F (x) = {f (x) + h (x) + i (x)},

x \in R^{n} min F (x) = {f (x) + h (x) + i (x)},

v (t) = k = 1 \sum K \tilde{s}_{k} (t) a (θ_{k}) + n (t) = \tilde{A} (θ) \tilde{s} (t) + n (t),

v (t) = k = 1 \sum K \tilde{s}_{k} (t) a (θ_{k}) + n (t) = \tilde{A} (θ) \tilde{s} (t) + n (t),

R_{v} = E [vv^{H}] = k = 1 \sum K σ_{k}^{2} a (θ_{k}) a (θ_{k})^{H} + σ_{n}^{2} I,

\tilde{v} (t) = (\tilde{A} (ϕ) + \tilde{B} Γ) \overset{ˉ}{s} (t) + n (t),

\tilde{v} (t) = (\tilde{A} (ϕ) + \tilde{B} Γ) \overset{ˉ}{s} (t) + n (t),

y

y

= (A (ϕ) s + B p) + σ_{n} 1_{n} = [A (ϕ), B] x + σ_{n} 1_{n},

x \in X arg min \frac{1}{2} ∣∣ y - G x ∣ ∣_{2}^{2} + η ∣∣ x ∣ ∣_{2, 1},

x \in X arg min \frac{1}{2} ∣∣ y - G x ∣ ∣_{2}^{2} + η ∣∣ x ∣ ∣_{2, 1},

s.t. X := {x = [s^{T}, p^{T}]^{T} : s \geq 0, - r s \leq p \leq r s} .

ar g x \in R^{2 N \times 1} min F (x) = {\frac{1}{2} ∣∣ y - G x ∣ ∣_{2}^{2} + η ∣∣ x ∣ ∣_{2, 1} + ι_{X} (x)},

ar g x \in R^{2 N \times 1} min F (x) = {\frac{1}{2} ∣∣ y - G x ∣ ∣_{2}^{2} + η ∣∣ x ∣ ∣_{2, 1} + ι_{X} (x)},

ar g x, z_{i} min \frac{1}{2} ∣∣ y - G z_{1} ∣ ∣_{2}^{2} + η ∣∣ z_{2} ∣ ∣_{2, 1} + ι_{X} (z_{3}) .

ar g x, z_{i} min \frac{1}{2} ∣∣ y - G z_{1} ∣ ∣_{2}^{2} + η ∣∣ z_{2} ∣ ∣_{2, 1} + ι_{X} (z_{3}) .

s . t . z_{1} = x, z_{2} = x, z_{3} = x

L_{ρ} (z, x, u) = i = 1 \sum 3 (f_{i} (z_{i}) + u_{i}^{T} (z_{i} - x) + \frac{ρ}{2} ∥ z_{i} - x ∥_{2}^{2}),

L_{ρ} (z, x, u) = i = 1 \sum 3 (f_{i} (z_{i}) + u_{i}^{T} (z_{i} - x) + \frac{ρ}{2} ∥ z_{i} - x ∥_{2}^{2}),

L_{0} (z^{*}, x^{*}, u) \leq L_{0} (z^{*}, x^{*}, u^{*}) \leq L_{0} (z, x, u^{*})

L_{0} (z^{*}, x^{*}, u) \leq L_{0} (z^{*}, x^{*}, u^{*}) \leq L_{0} (z, x, u^{*})

h (x)

h (x)

= u \in U_{l_{2}} max g_{i} \in Ω \sum η ⟨ x_{g_{i}}, u_{g_{i}} ⟩ = u \in U_{l_{2}} max η ⟨ x, u ⟩,

U_{l_{2}} = {u \in R^{2 N \times 1} : ∥ u_{g_{i}} ∥_{2} \leq 1, \forall g_{i} \in Ω}

U_{l_{2}} = {u \in R^{2 N \times 1} : ∥ u_{g_{i}} ∥_{2} \leq 1, \forall g_{i} \in Ω}

h_{μ}^{l_{2}} (x) := u \in U_{l_{2}} max {η ⟨ x, u ⟩ - μ d_{l_{2}} (u)}

h_{μ}^{l_{2}} (x) := u \in U_{l_{2}} max {η ⟨ x, u ⟩ - μ d_{l_{2}} (u)}

h (x)

h (x)

\overset{ˉ}{h} (ν) = η ∥ ν ∥_{1} = u \in U_{l_{1}} max η ⟨ ν, u ⟩,

\overset{ˉ}{h} (ν) = η ∥ ν ∥_{1} = u \in U_{l_{1}} max η ⟨ ν, u ⟩,

U_{l_{1}} = {u \in R^{N \times 1} : ∥ u ∥_{\infty} \leq 1}

U_{l_{1}} = {u \in R^{N \times 1} : ∥ u ∥_{\infty} \leq 1}

h_{μ}^{l_{1}} (ν) := u \in U_{l_{1}} max {η ⟨ ν, u ⟩ - μ d_{l_{1}} (u)}

h_{μ}^{l_{1}} (ν) := u \in U_{l_{1}} max {η ⟨ ν, u ⟩ - μ d_{l_{1}} (u)}

\nabla h_{μ}^{l_{2}} (x) = η u^{l_{2}}, \nabla h_{μ}^{l_{1}} (ν) = η u^{l_{1}}

\nabla h_{μ}^{l_{2}} (x) = η u^{l_{2}}, \nabla h_{μ}^{l_{1}} (ν) = η u^{l_{1}}

S_{2} (a) = {\frac{a}{∥ a ∥ _{2}}, a, if ∥ a ∥_{2} > 1 if ∥ a ∥_{2} \leq 1.

S_{2} (a) = {\frac{a}{∥ a ∥ _{2}}, a, if ∥ a ∥_{2} > 1 if ∥ a ∥_{2} \leq 1.

S_{1} (a) = ⎩ ⎨ ⎧ 1, a_{i}, - 1, if a_{i} > 1, \forall i if ∣ a_{i} ∣ \leq 1, \forall i if a_{i} < - 1,, \forall i

S_{1} (a) = ⎩ ⎨ ⎧ 1, a_{i}, - 1, if a_{i} > 1, \forall i if ∣ a_{i} ∣ \leq 1, \forall i if a_{i} < - 1,, \forall i

ar g x \in R^{n} min {H_{i} (x) + ι_{X} (x)}, i = 1 or 2.

ar g x \in R^{n} min {H_{i} (x) + ι_{X} (x)}, i = 1 or 2.

prox_{ι} (y) = ar g x \in R^{n} min {\frac{1}{2} ∥ y - x ∥^{2} + ι (x)} .

prox_{ι} (y) = ar g x \in R^{n} min {\frac{1}{2} ∥ y - x ∥^{2} + ι (x)} .

F (x^{k}) - F (x^{*}) \leq \frac{ϵ}{2} + \frac{2 ( L _{f} + 2 \frac{D _{i}}{ϵ σ} ) ∥ x ^{0} - x ^{*} ∥ ^{2}}{( k + 1 ) ^{2}},

F (x^{k}) - F (x^{*}) \leq \frac{ϵ}{2} + \frac{2 ( L _{f} + 2 \frac{D _{i}}{ϵ σ} ) ∥ x ^{0} - x ^{*} ∥ ^{2}}{( k + 1 ) ^{2}},

\frac{4∥ x ^{0} - x ^{*} ∥ ^{2}}{ϵ} (L_{f} + \frac{2 D _{i}}{ϵ σ}) - 1

\frac{4∥ x ^{0} - x ^{*} ∥ ^{2}}{ϵ} (L_{f} + \frac{2 D _{i}}{ϵ σ}) - 1

x \in X arg min F (x) = {f (x) + h (x)},

x \in X arg min F (x) = {f (x) + h (x)},

s.t. X = {x = [s^{T}, p^{T}]^{T} : s \geq 0, - r s \leq p \leq r s} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Indoor and Outdoor Localization Technologies · Microwave Imaging and Scattering Analysis

Full text

Linearly Constrained Smoothing Group Sparsity Solvers in Off-grid Model

Cheng-Yu Hung, and Mostafa Kaveh C. Y. Hung was with the Department of Electrical and Computer Engineering, University of Minnesota - Twin Cities, Minneapolis, MN, 55455 USA e-mail: [email protected]. Kaveh is with University of Minnesota.

Abstract

In compressed sensing, the sensing matrix is assumed perfectly known. However, there exists perturbation in the sensing matrix in reality due to sensor offsets or noise disturbance. Directions-of-arrival (DoA) estimation with off-grid effect satisfies this situation, and can be formulated into a (non)convex optimization problem with linear inequalities constraints, which can be solved by the interior point method (using the CVX tools), but at a large computational cost. In this work, in order to design efficient algorithms, we consider various alternative formulations, such as unconstrained formulation, primal-dual formulation, or conic formulation to develop group-sparsity promoted solvers. First, the consensus alternating direction method of multipliers (C-ADMM) is applied. Then, iterative algorithms for the BPDN formulation is proposed by combining the Nesterov smoothing technique with accelerated proximal gradient method, and the convergence analysis of the method is conducted as well. We also developed a variant of EGT (Excessive Gap Technique)-based primal-dual method to systematically reduce the smoothing parameter sequentially. Finally, we propose algorithms for quadratically constrained $\ell_{2}$ - $\ell_{1}$ mixed norm minimization problem by using the smoothed dual conic optimization (SDCO) and continuation technique. The performance of accuracy and convergence for all the proposed methods are demonstrated in the numerical simulations.

Index Terms:

The Nesterov smoothing, Basis pursuit denoising (BPDN), Group Lasso, Alternating direction method of multipliers (ADMM), Conic optimization.

I Introduction

In compressed sensing [1, 2], an underdetermined linear system is considered

[TABLE]

where ${\bf y}\in{\mathbb{C}}^{M\times 1}$ is an observation measurement vector, ${\bf A}\in{\mathbb{C}}^{M\times N}(M\ll N)$ is a known dictionary matrix, ${\bf n}\in{\mathbb{C}}^{M\times 1}$ is a measurement error or additive noise vector, and ${\bf s}\in{\mathbb{C}}^{N\times 1}$ is a $K$ -sparse signal vector of interest. There are only $K$ nonzero entries in $\bf s$ , and $K\ll N$ . As long as the dictionary matrix $\bf A$ meets the requirement of the Restricted Isometry Property (RIP) [2, 3, 4], the sparse vector $\bf s$ can be reconstructed even with a few measurements by many solvers, such as group Lasso (least absolute shrinkage and selection operator) [5], basis pursuit denoising (BPDN) [6], or Dantzig selector [7]. The performance analysis and computable performance bounds of these sparse recovery solvers are conducted in [8, 9]. However, the dictionary matrix $\bf A$ may not be known perfectly due to certain noise or modeling perturbations. In [10], the sensitivity of basis mismatch in the dictionary matrix is analyzed. For instance, the compressed sensing approach for DoA estimation may assume a known dictionary formed from the array responses at a grid of candidate directions [11]. In practice, however, the DoAs are most likely not to locate on the model grid, leading to the now well-known off-grid DoA estimation problem, for which a number of model approximations and solutions have been proposed, for example [12, 13, 14, 15, 16, 17, 18]. A commonly-used observation for off-grid DoAs follows the noisy structured perturbation model given by:

[TABLE]

where ${\bf A}\in{\mathbb{C}}^{M\times N}$ is known, and ${\bf B}\in{\mathbb{C}}^{M\times N}$ is known as part of the off-grid approximation. $\Gamma=diag(\boldsymbol{\beta})\in{\mathbb{R}}^{N\times N}$ , and ${\boldsymbol{\beta}}=[\beta_{1},\dots,\beta_{N}]^{T}$ is denoted as the unknown coefficient vector for the approximation. ${\bf s}\in{\mathbb{R}}^{N\times 1}$ is the sparse vector associated with grid points nearest the true DoAs. Equation (2) can be solved by formulating a sparsity promoting constrained nonconvex minimization problem to estimate $\bf s$ and $\boldsymbol{\beta}$ sequentially by the alternating method [12, 13], but with slow convergence. The alternating direction method of multipliers (ADMM)[19] is a very popular method, which can be applied to solve this problem.

Furthermore, many inverse problems in signal processing, data mining, or statistical machine learning can be cast as a composite optimization problem, which involves the minimization of a sum of differentiable functions and nonsmooth ones. The off-grid DoA estimation problem of (2) can be formulated into this type of composite form. Subgradient algorithms [20] are developed to deal with nonsmooth optimization problems but with very slow convergence rate. Instead of using subgradient methods, we attempt to design algorithms for solving nonsmooth optimization (NSO) problems efficiently by using a sequence of approximate smoothing problems to substitute for the original ones. The core of the techniques considered is to make the nondifferentiable functions smooth without introducing substantial approximate errors caused by the smoothing process. Several different smoothing techniques have been proposed to solve NSO problems [21, 22, 23]. A primal-dual symmetric method derived from the excessive gap condition for nonsmooth convex optimization is proposed in [24]. In [15], the nondifferentiable function, which is approximated by the Moreau envelope function [23], is used in the column-wise mismatch problem. In [25], the overlapping group-lasso penalty is smoothed by the Nesterov smoothing technique [21]. A unified framework of smoothing approximation with fast gradient schemes is proposed in [26]. In [27], an adaptive Nesterov-based smoothing method is developed to dynamically choose the smoothing parameter at each iteration of the update. In [28], a number of primal-dual iterative approaches for solving large-scale nonsmooth optimization problems, such as the M+LFBF (Monotone+Lipschitz Forward Backward Forward) algorithm, are reviewed. In [29, 30], subgradient methods are proposed, but their complexity cannot be better than than ${\mathcal{O}}(\frac{1}{\sqrt{k}})$ where $k$ is the number of iterations. Alternatively, smoothing as presented in [21] can be applied to mitigate non-smoothness of the objective function. In [31], a proximal iterative smoothing algorithm was proposed to solve convex nonsmooth optimization problems.

In this work, an unconstrained off-grid DoA estimator is first discussed. It consists of one differentiable function and two nonsmooth ones, which are a regularized group-sparsity penalty and an indicator function. First, the consensus ADMM (C-ADMM) [19] is applied to solve this unconstrained optimization problem by using a common global variable which makes all the local variables of objective functions equal, but it can be very slow to converge to high accuracy. In order to have a low reconstruction error of DoA estimation quickly, the Nesterov smoothing methodology [21, 25] is used to reformulate the group-sparsity penalty into a ”max”-structure function, and then smoothing it by adding a strongly convex term. We propose two reformulations for the group-sparsity penalty since $\ell_{2}$ - $\ell_{1}$ mixed norm has a two-layer norm structure. Then, the accelerated proximal gradient [32] method is used on the smoothed optimization case. Note that our first proposed Nesterov smoothing method is equivalent to the one in [15], as can be deduced from the results of [31]. The second Nesterov smoothing method is proposed by use of the property of dual of $\ell_{1}$ norm. It’s noted that the fixed smoothing parameter has to be chosen empirically in this method. However, [33] shows that the accuracy performance increases when the smoothing parameter decreases. Thus, by the excess gap technique (EGT) [24], in order to reduce the smoothing parameter sequentially, we developed a variant of EGT-based primal-dual method, in which a surrogate of cost function is introduced. Furthermore, inspired by [34, 33], a variant of conic formulation for quadratically constrained $\ell_{2}$ - $\ell_{1}$ mixed norm minimization with linear ineuqalities is proposed, and solved by using the smoothed dual conic optimization and continuation technique. The accuracy, and convergence of performance for the proposed methods are demonstrated, and compared with the interior point method (CVX) [35], MUSIC [36], M+LFBF [28], and CRLB [37].

This paper is organized as follows. In Section II, Some mathematical preliminaries, and the off-grid DoA model with its C-ADMM solver are introduced. In Section III, the Nesterov smoothing technique is employed to reformulate the group-sparsity penalty in two ways. Then, accelerated smoothing proximal gradient (ASPG) is used to solve the reformulated optimization problems. The convergence behavior is analyzed as well. In Section IV, the EGT-based approach is utilized to provide a systematic way to reduce the smoothing parameter. Finally, in Section V, the smoothing technique is applied in the conic formulation on the off-grid DoA estimation. Section VI presents numerical results to verify the performance in terms of DoA resolution ability, estimation accuracy, and convergence behavior.

Notation: Throughout the paper, vectors and matrices are represented by boldface lowercase and uppercase letters, respectively. $E(\cdot)$ denotes the expectation operator. For any given matrix $\bf X$ , ${\bf X}^{H}$ denotes the Hermitian transpose matrix, and $vec({\bf X})$ is the vectorization operator of the matrix. $diag({\bf x})$ represents a diagonal square matrix with the elements of vector $\bf x$ on the diagonal. $\odot$ denotes the Hadamard product. $\otimes$ denotes the Kronecker product. For any two vectors ${\bf x},{\bf y}$ , $({\bf x},{\bf y})$ is denoted as a new vector in which $\bf x$ is stacked by $\bf y$ , and $\langle{\bf x},{\bf y}\rangle$ means the inner product. $Proj_{\mathcal{X}}({\bf x})$ denotes the projection operator of projecting a vector $\bf x$ onto a space ${\mathcal{X}}$ .

II Preliminaries, DoA Model with Structured Perturbations, and C-ADMM solver

II-A Preliminaries

Consider the following unconstrained separable convex optimization problem [38]:

[TABLE]

where $\{f_{i}({\bf x}),\cdots,f_{n}({\bf x})\}$ is a sequence of convex functions from ${\mathbb{R}}^{n}$ to $\mathbb{R}$ .

In this paper, specifically, an unconstrained convex optimization problem is considered:

[TABLE]

that satisfy the following assumptions, and definitions:

Assumption 1.

(i)

$f:{\mathbb{R}}^{n}\rightarrow{\mathbb{R}}\cup\{+\infty\}$ is a proper, closed, convex and continuously differentiable function. Its gradient is Lipschitz continuous with parameter $L_{f}$ . 2. (ii)

$h:{\mathbb{R}}^{n}\rightarrow{\mathbb{R}}\cup\{+\infty\}$ is a proper, closed, and convex $\rho_{h}$ -Lipschitz continuous function. It is not necessarily differentiable. 3. (iii)

$i:{\mathbb{R}}^{n}\rightarrow{\mathbb{R}}\cup\{+\infty\}$ is a proper, lower semicontinuous, and convex function but possibly nonsmooth. For instance, the indicator function of a closed set is lower semi-continuous.

Definition 1 (Lipschitz Continuous).

A function $f:{\mathbb{R}}^{n}\rightarrow{\mathbb{R}}$ is $\rho$ -Lipschitz continuous if there exits $\rho>0$ such that $|f({\bf x})-f({\bf y})|\leq{\rho}\|\bf x-y\|$ , $\forall{\bf x,y}\in\mathbb{R}^{n}$ .

Definition 2 (Lipschitz Continuous Gradient).

The gradient of a differentiable convex function $f:{\mathbb{R}}^{n}\rightarrow{\mathbb{R}}$ is Lipschitz continuous with parameter $L>0$ if $\|{\nabla f({\bf x})}-{\nabla f({\bf y})}\|\leq L\|\bf x-y\|$ , $\forall{\bf x,y}\in\mathbb{R}^{n}$ .

Definition 3 (Strongly Convex).

The function $f:{\mathcal{X}}\rightarrow{\mathbb{R}}$ is $\sigma$ -strongly convex on a closed convex set $\mathcal{X}$ with parameter $\sigma>0$ if $f({\bf y})\geq f({\bf x})+\nabla f({\bf x})^{T}({\bf y-x})+\frac{\sigma}{2}\|\bf y-x\|_{2}^{2}$ , $\forall{\bf x,y}\in\mathcal{X}$ .

In the next subsection, we will show that the DoA estimation problem with structured perturbations can be reformulated into the form of (4).

II-B DoA Model with Structured Perturbations

Consider an array of $M$ sensors and suppose that there are $K$ far-field narrowband sources impinging on the array from angles $\theta_{1},\dots,\theta_{K}$ . The measurement model, and its covariance are described by

[TABLE]

where

•

${\bf v}(t)\in{\mathbb{C}}^{M\times 1}$ is the observation vector.

•

${\tilde{s}}_{k}(t)$ is the $k$ -th received signal with power $\sigma^{2}_{k}$ .

•

${\bf a}(\theta_{k})$ denotes the steering vector for direction $\theta_{k}$ with $m$ -th entry $e^{-j2\pi\frac{d_{m}}{\lambda}sin\theta_{k}}$ , where $\lambda$ is wavelength. $\tilde{\bf A}(\boldsymbol{\theta})=[{\bf a}(\theta_{1}),\dots,{\bf a}(\theta_{K})]$ .

In compressed sensing, $\boldsymbol{\phi}=[\phi_{1},\dots,\phi_{N}]$ is defined as uniformly discretized grid atoms for the dictionary matrix. The off-grid DoA is denoted by $\beta_{i}=\theta_{k}-\phi_{i}$ if $\phi_{i}$ is closest to $\theta_{k},\forall k$ ; otherwise, $\beta_{i}=0$ . We assume that $0\leq|\beta_{i}|\leq r$ and $r=\frac{|{\phi_{i}-\phi_{i+1}}|}{2}$ .

By using Taylor series, the first-order approximate measurement model [39] is

[TABLE]

where $\tilde{{\bf B}}=[\frac{\partial{\bf a}(\phi_{1})}{\partial\phi_{1}},\dots,\frac{\partial{\bf a}(\phi_{N})}{\partial\phi_{N}}]\in{\mathbb{C}^{M\times N}}$ , ${\boldsymbol{\beta}}=[\beta_{1},\dots,\beta_{N}]^{T}$ , $\Gamma=diag(\boldsymbol{\beta})$ , and $\bar{\bf s}$ is a $\mathbb{C}^{N\times 1}$ sparse vector. $\tilde{\bf A}(\boldsymbol{\phi})=[{\bf a}(\phi_{1}),\dots,{\bf a}(\phi_{N})]$ . By vectorizing the covariance of (7), we have

[TABLE]

where

•

${\bf y}=vec({\bf R_{\tilde{v}}})$ .

•

${\bf A}(\boldsymbol{\phi})=[{\bf a}(\phi_{1})^{H}\otimes{\bf a}(\phi_{1}),\dots,{\bf a}(\phi_{N})^{H}\otimes{\bf a}(\phi_{N})]\in{\mathbb{C}^{M^{2}\times N}}$ .

•

${{\bf B}}=[\frac{\partial{\bf a}(\phi_{1})}{\partial\phi_{1}}\otimes\frac{\partial{\bf a}(\phi_{1})}{\partial\phi_{1}},\dots,\frac{\partial{\bf a}(\phi_{N})}{\partial\phi_{N}}\otimes\frac{\partial{\bf a}(\phi_{N})}{\partial\phi_{N}}]\in{\mathbb{C}^{M^{2}\times N}}$ .

•

${\bf s}$ is a $\mathbb{R}^{N\times 1}$ sparse vector with $K$ nonzero terms $\sigma_{k}^{2}$ ’s.

${\bf 1}_{n}=[e_{1}^{T},\dots,e_{M}^{T}]^{T}$ where $e_{i}\in{\mathbb{R}}^{M\times 1}$ is an all-zero vector except with 1 at $i$ -th entry. ${\bf x}=[{\bf s}^{T},{\bf p}^{T}]^{T}\in{\mathbb{R}^{2N\times 1}}$ , and ${\bf p}={\boldsymbol{\beta}}\odot{\bf s}$ . Let ${\bf G}=[{\bf A}(\boldsymbol{\phi}),{\bf B}]$ be a fat matrix for the following sections. Note that if $r$ is less than or equal to $0.5$ , then ${\bf s}\gg{\bf p}$ since the value of $\beta_{k}$ is much smaller than $\sigma^{2}_{k}$ at mild SNRs.

Since ${\bf s,p}$ have the same sparsity pattern (non-zero entries), we can solve (8) over a closed convex set $\mathcal{X}$ by the group Lasso :

[TABLE]

where $\eta>0$ is a regularization parameter, and $r$ is defined previously. Because the constraint set ${\mathcal{X}}$ is a linear inequalities constraint, we can transform it into an unconstrained one by using an indicator function, which is also known as the basis pursuit denoising problem (BPDN) formulation:

[TABLE]

where ${\iota}_{\mathcal{X}}(\bf x)=0$ if ${\bf x}\in{\mathcal{X}}$ ; otherwise, $\infty$ . Let $f({\bf x}):=\frac{1}{2}||{\bf y}-{{\bf G}}{\bf x}||_{2}^{2}$ , $h({\bf x}):=\eta||{\bf x}||_{2,1}$ , and $i({\bf x}):={\iota}_{\mathcal{X}}(\bf x)$ such that (10) fits the framework of (4). Our goal is to solve an optimal solution of problem (10) efficiently. However, two nonsmooth functions, $||{\bf x}||_{2,1}$ and ${\iota}_{\mathcal{X}}(\bf x)$ , in the objective makes this problem difficult to solve it. Thus, the C-ADMM is applied to overcome this situation.

II-C Consensus Alternating Direction Method of Multipliers (C-ADMM)

Let us consider the unconstrained problem (10). This problem can be solved by C-ADMM, which uses a consensus global variable ${\bf x}$ and local variables ${\bf z}_{i}$ :

[TABLE]

We call this a ”consensus problem” since the constraint forces all the local variables to be equal.

C-ADMM of this problem can be derived from the augmented Lagrangian

[TABLE]

where $f_{1}({\bf z}_{1})=\frac{1}{2}||{\bf y}-{{\bf G}}{\bf z}_{1}||_{2}^{2}$ , $f_{2}({\bf z}_{2})=\eta||{\bf z}_{2}||_{2,1}$ , $f_{3}({\bf z}_{2})={\iota}_{\mathcal{X}}({\bf z}_{3})$ , and $\rho$ is a penalty parameter. The resulting consensus ADMM is summarized in Algorithm 1.

The convergence of C-ADMM is in terms of the following two assumptions:

Assumption 2.

The extended-real-valued function $f_{i}({\bf z}_{i}):{\mathbb{R}}^{n}\rightarrow{\mathbb{R}}\cup\{+\infty\}$ are closed, proper, and convex.

Assumption 3.

The unaugmented Lagragian $L_{0}$ has a saddle point. Namely, there exists a not necessarily unique solution $({\bf z}^{\ast},{\bf x}^{\ast},{\bf u}^{\ast})$ such that

[TABLE]

In [19], under assumptions 2 and 3, C-ADMM is shown to have its iterations satisfy residual convergence, objective convergence, and dual variable convergence. The update steps of C-ADMM is summarizes in the Algorithm 1.

III The Smoothing Techniques

In the following sections, we will show how to deal with problem (10) by combining the accelerated proximal gradient (APG) algorithm with the Nesterov smoothing technique. We aim to smooth the group-sparsity penalty $h({\bf x})=\eta||{\bf x}||_{2,1}$ so that the APG method can be used. A variant of EGT-based primal-dual method and smoothed dual conic optimization method will be described in the following sections. In order to present the idea more clearly, we introduce the notation $||{\bf x}||_{2,1}=\sum_{g_{i}\in{\Omega}}\|{\bf x}_{g_{i}}\|_{2}$ , where ${\bf x}_{g_{i}}\in{\mathbb{R}}^{|g_{i}|}$ denotes the subvector of ${\bf x}\in{\mathbb{R}}^{2N\times 1}$ having the same sparse pattern in group $g_{i}$ , where $|\cdot|$ is the cardinality of a set. Each group $g_{i}$ represents a subset of index set $\{1,\cdots,2N\}$ and is disjoint from the others. Denote $\Omega=\{g_{1},\dots,g_{|\Omega|}\}$ as the set of groups, and $2N=\sum_{i=1}^{|\Omega|}|g_{i}|$ . In our case, $|\Omega|=N$ , $|g_{i}|=2,g_{i}=\{i,i+N\},\forall i=1,\cdots,N$ , ${\bf x}_{g_{i}}=[{x}_{i},{x}_{i+N}]^{T}\in{\mathbb{R}}^{2}$ where ${x}_{i}={s}_{i}$ and ${x}_{i+N}={p}_{i}$ . Denote ${x}_{i}$ , ${s}_{i}$ , and ${p}_{i}$ as the $i$ -th entry of $\bf x,s$ , and $\bf p$ , respectively.

III-A Two Reformulations of Group-sparsity Penalty

Since $h({\bf x})$ is an $\ell_{2}$ - $\ell_{1}$ mixed norm with two layers, i.e., the inner is $\ell_{2}$ norm and the outer is $\ell_{1}$ norm, we can utilize the dual norm property to reformulate it as a maximization of a linear function over an auxiliary variable with ”simple” constraints in two different ways.

First, inspired by [25], by using the convex conjugate function and the fact that the dual norm of $\ell_{2}$ norm is $\ell_{2}$ norm, $\|{\bf x}_{g_{i}}\|_{2}$ has the max-structure as $\max_{\|{\bf u}_{g_{i}}\|_{2}\leq 1}{\bf u}_{g_{i}}^{T}{\bf x}_{g_{i}}$ where ${\bf u}_{g_{i}}\in{\mathbb{R}}^{|g_{i}|}$ denotes an auxiliary vector. Then, $h({\bf x})$ can be written as

[TABLE]

where

[TABLE]

is the set of vectors in the space of the Cartesian product of $\ell_{2}$ norm unit ball. In the Nesterov smoothing technique, if a nonsmooth convex function has the max-structure, then we have its corresponding smoothed function

[TABLE]

with a smoothing parameter $\mu>0$ , where a $prox$ - $function$ $d_{l_{2}}({\bf u})$ [21] is continuous and strongly convex on ${\mathcal{U}}_{l_{2}}$ with a strong convexity parameter $\sigma$ . Its $prox$ - $center$ of $d(\bf u)$ is denoted by ${\bf u}_{0}=\arg\min_{{\bf u}\in{\mathcal{U}}_{l_{2}}}\{d_{l_{2}}({\bf u})\}$ . By the definition of strongly convex, $d_{l_{2}}({\bf u})\geq\frac{\sigma}{2}\|{\bf u}-{\bf u}_{0}\|^{2}_{2}$ . Since $d_{l_{2}}({\bf u})$ is strongly convex, $h^{l_{2}}_{\mu}({\bf x})$ is a smooth and convex function so that its solution is unique and its gradient can be computed easily.

Second, inspired by the fact that the dual norm of $\ell_{1}$ norm is $\ell_{\infty}$ norm, $\|{\bf x}\|_{1}$ has the max-structure as $\max_{\|{\bf u}\|_{\infty}\leq 1}{\bf u}^{T}{\bf x}$ , where ${\bf u}$ denotes an auxiliary vector. Therefore, we propose a second reformulation. Let us define $\nu_{i}:=\|{\bf x}_{g_{i}}\|_{2}$ and ${\boldsymbol{\nu}}=[\nu_{1},\dots,\nu_{{|\Omega|}}]^{T}\in{\mathbb{R}}^{N\times 1}$ , and then $h({\bf x})$ can be rewritten as

[TABLE]

We define a new function ${\bar{h}}({\boldsymbol{\nu}})$ as

[TABLE]

where

[TABLE]

is the set of vectors in the space of $\ell_{\infty}$ norm unit ball. Since it has the max-structure, we have the corresponding smoothed function of ${\bar{h}}({\boldsymbol{\nu}})$ as

[TABLE]

with a smoothing parameter $\mu>0$ . Then, $h^{l_{1}}_{\mu}({\boldsymbol{\nu}})$ is also a smooth and convex function if a strongly convex function $d_{l_{1}}({\bf u})$ is chosen. Note that the dimension of $\bf x$ is twice as many as $\boldsymbol{\nu}$ .

Since both $h^{l_{2}}_{\mu}({\bf x})$ and $h^{l_{1}}_{\mu}({\boldsymbol{\nu}})$ are smooth and convex, their gradients can be formed by the following modified theorem [21]

Theorem 1.

For any $\mu>0$ , the functions $h^{l_{2}}_{\mu}({\bf x})$ and $h^{l_{1}}_{\mu}({\boldsymbol{\nu}})$ are well-defined and continuously differentiable in $\bf x$ and ${\boldsymbol{\nu}}$ , respectively. Moreover, both functions are convex and their gradients:

[TABLE]

are Lipschitz continuous with the same constant $L_{\mu}=\frac{1}{\mu\sigma}$ , where ${\bf u}^{l_{2}}$ and ${\bf u}^{l_{1}}$ are the optimal solutions to (16) and (20), respectively.

Suppose that $\forall{\bf u}\in{\mathcal{U}}_{l_{2}}$ ; we choose $d_{l_{2}}({\bf u})=\frac{1}{2}\|{\bf u}\|^{2}_{2}$ with a strong convexity parameter $\sigma=1$ . Then $\forall g_{i}$ , ${\bf u}^{l_{2}}_{g_{i}}$ , which is a subvector of ${\bf u}^{l_{2}}$ , can be calculated as ${\bf u}^{l_{2}}_{g_{i}}={\mathcal{S}}_{2}(\frac{\eta}{\mu}{\bf x}_{g_{i}})$ where ${\mathcal{S}}_{2}(\cdot)$ denotes the projection operator of projecting a vector $\bf a$ to a $\ell_{2}$ unit ball

[TABLE]

Similarly, $\forall{\bf u}\in{\mathcal{U}}_{l_{1}},$ if we choose $d_{l_{1}}({\bf u})=\frac{1}{2}\|{\bf u}\|^{2}_{2}$ , then ${\bf u}^{l_{1}}$ can be computed as ${\bf u}^{l_{1}}={\mathcal{S}}_{1}(\frac{\eta}{\mu}{\boldsymbol{\nu}})$ where ${\mathcal{S}}_{1}(\cdot)$ denotes the projection operator of projecting a vector $\bf a$ to an $\ell_{\infty}$ unit ball

[TABLE]

where $a_{i}$ is the $i$ -th entry of ${\bf a}$ .

Note that the dimension of ${\boldsymbol{\nu}}$ is a half of that for $\bf x$ . Therefore, for the case of $\nabla h^{l_{1}}_{\mu}({\boldsymbol{\nu}})$ , zero-padding is performed such that $\nabla h^{l_{1}}_{\mu}({\bf x}):=[\nabla h^{l_{1}}_{\mu}({\boldsymbol{\nu}})^{T},{\bf 0}^{T}]^{T}\in{\mathbb{R}}^{2N\times 1}$ , where ${\bf 0}$ is a ${\mathbb{R}}^{N\times 1}$ zero vector, so that a new gradient $\nabla h^{l_{1}}_{\mu}({\bf x})$ can be used in the accelerated proximal gradient. This is acceptable only when parameter $r$ is taken small enough. Since ${\bf p}\ll{\bf s}$ holds in this case, the value of $\nu_{i}$ mainly comes from the contribution of $\bf s$ , so that zero vector can be assigned as the partial derivative of $\bf p$ .

III-B Accelerated Smoothing Proximal Gradient (ASPG)

Now, we solve two ”smoothed” versions of problem (10)

[TABLE]

where $H_{i}({\bf x}):=f({\bf x})+h_{\mu}^{l_{i}}({\bf x})$ , $i=1$ or $2$ , and then its gradient is computed as $\nabla{H_{i}}({\bf x})=\nabla{f(\bf x)}+\eta{\bf u}^{l_{i}}$ .

Problem (24) can be solved by the accelerated proximal gradient method [32] in which a proximal operator is used:

[TABLE]

In fact, the proximal operator $\text{prox}_{\iota_{\mathcal{X}}}(\bf y)$ of indicator function $\iota_{\mathcal{X}}({\bf x})$ is the projection operator onto the set $\mathcal{X}$ , $\Pi_{\mathcal{X}}(\bf x)$ . The ASPG method is summarized in the Algorithm 2.

III-C Convergence Analysis

We show the convergence rate of the Algorithm 2 in the Lemma 2.

Lemma 2.

Suppose ${\bf x}^{k}$ is the $k$ -th iterative solution in Algorithm 2, and $\bf x^{*}$ is the optimal solution of problem (10). Assume that $\epsilon$ -approximation is required, i.e., $F({\bf x}^{k})-F({\bf x}^{*})\leq\epsilon$ . If we set $\mu=\frac{\epsilon}{2D_{i}}$ , where $D_{i}=\max_{{\bf u}\in{\mathcal{U}}_{l_{i}}}d_{l_{i}}({\bf u})$ , then

[TABLE]

where $L_{f}$ is Lipschitz continuous gradient parameter of $f(\bf x)$ . The number of iteration $k$ has an upper bound by

[TABLE]

This lemma implies its convergence rate is ${\mathcal{O}}(\frac{1}{k})$ . We cannot achieve convergence rate ${\mathcal{O}}(\frac{1}{k^{2}})$ of the accelerated proximal gradient method due to the smoothing process, but the convergence rate is better than that for subgradient methods with ${\mathcal{O}}(\frac{1}{\sqrt{k}})$ [20, 29].

IV The EGT-based Primal-Dual Method

In ASPG, the smoothing paramter $\mu$ is chosen empirically and fixed. This leads to decrease the practical efficiency of ASPG. Thus, the excessive gap technique [24] is employed to choose $\mu$ systematically in the framework of primal-dual gradient symmetric formulations.

Let us consider the constrained optimization problem (9) as follows:

[TABLE]

where $f({\bf x})=\frac{1}{2}||{\bf y}-{{\bf G}}{\bf x}||_{2}^{2}$ , and $h({\bf x})=\eta||{\bf x}||_{2,1}=\max_{{\bf u}_{2}\in{\mathcal{U}}_{l_{2}}}\{\eta\langle{\bf x},{\bf u}_{2}\rangle\}$ . (Note that there are two reformulations of $h({\bf x})$ proposed in subsection III.A. The first one will be used for convenience to express the idea in this subsection.)

We know that $h({\bf x})$ is not strongly convex. And since ${\bf G}$ is a fat matrix, the error fitting function $f({\bf x})$ is not strongly convex either. Thus, we use $f_{r}({\bf x})=||{\bf y}-{{\bf G}}{\bf x}||_{2}$ as a surrogate of $f({\bf x})$ such that it can be expressed in a $max$ -structure form, and smoothed by using a strongly convex function, although $f_{r}({\bf x})$ is not differentiable everywhere. Thus, instead of solving (9), we propose

[TABLE]

Then, we will smooth not only the regularization term $h({\bf x})$ , but also the new error fitting function $f_{r}({\bf x})$ . This will lead to a closed form solution. Next, we will show how to achieve this goal by the excessive gap technique.

We can rewrite (29) into the following primal problem by using the dual norm definition:

[TABLE]

and its dual problem as

[TABLE]

where ${\bf u}$ is a dual variable vector composed of ${\bf u}_{1}$ and ${\bf u}_{2}$ , which belong to ${\mathcal{U}}_{2}$ , and ${\mathcal{U}}_{l_{2}}$ , respectively, where

[TABLE]

Since both $F({\bf x})$ and $\Phi({\bf u})$ are nondifferentiable, we can construct a smoothing approximation of primal-dual problem as follows

[TABLE]

by using two strongly convex functions $d_{1}({\bf x})=\frac{1}{2}\|{\bf x}\|^{2}_{2}$ , and $d_{2}({\bf u})=\frac{1}{2}\|{\bf u}\|^{2}_{2}$ with two smoothing parameters $\mu_{1}$ , and $\mu_{2}$ .

For the primal problem, denote ${\bf u}_{1,\mu_{2}},{\bf u}_{2,\mu_{2}}$ as the unique optimal solution of $F_{\mu_{2}}({\bf x})$ , which can be derived in closed forms as

[TABLE]

By Danskin’s theorem [40], the gradient of $F_{\mu_{2}}({\bf x})$ is computed as

[TABLE]

with Lipschitz-continuous constant $L_{1}(F_{\mu_{2}}({\bf x}))=\frac{1}{{\mu_{2}}}\|[{\bf G},\eta{\bf I}]^{H}\|^{2}$ .

Similarly, for the dual problem, denote ${\bf x}_{\mu_{1}}$ as the unique optimal solution of $\Phi_{\mu_{1}}({\bf u})$ , which can be derived in a closed form as

[TABLE]

And the gradient of $\Phi_{\mu_{1}}({\bf u})$ is

[TABLE]

wth Lipschitz-continuous constant $L_{2}(\Phi_{\mu_{1}}({\bf u}))$ $=\frac{1}{{\mu_{1}}}\|[{\bf G}^{H},\eta{\bf I}]^{H}\|^{2}$ by Danskin’s theorem.

Since we know that

•

$\Phi({\bf u})\leq F({\bf x})$

•

By definition, $F_{\mu_{2}}({\bf x})\leq F({\bf x})$ , $\Phi({\bf u})\leq\Phi_{\mu_{1}}({\bf u})$

•

Excessive gap condition (EGC) [24] holds when, for certain ${\bf x}\in{\mathcal{X}}$ and ${{\bf u}=[{\bf u}_{1}^{T},{\bf u}_{2}^{T}]^{T},{\bf u}_{1}\in{\mathcal{U}}_{2},{\bf u}_{2}\in{\mathcal{U}}_{l_{2}}}$ with sufficiently large $\mu_{1},\mu_{2}$ , this inequality occurs

[TABLE]

Then, the following modified lemma can be derived:

Lemma 3.

Let ${\bf x}\in{\mathcal{X}}$ and ${\bf u}=[{\bf u}_{1}^{T},{\bf u}_{2}^{T}]^{T},{\bf u}_{1}\in{\mathcal{U}}_{2},{\bf u}_{2}\in{\mathcal{U}}_{l_{2}}$ satisfy EGC. Then,

[TABLE]

where $D_{1}=\max_{{\bf x}\in{\mathcal{X}}}\|{\bf x}\|^{2}$ , $D_{2}=\max_{{\bf u}_{1}\in{\mathcal{U}}_{2}}\|{\bf u}_{1}\|^{2}$ , $D_{3}=\max_{{\bf u}_{2}\in{\mathcal{U}}_{l_{2}}}\|{\bf u}_{2}\|^{2}$ .

By this modified lemma, EGC provides an upper bound of primal-dual pair $({\bf x,u})$ so that we can update iteratively the primal-dual pair $({\bf x,u})$ and keep satisfying EGC as $\mu_{1},\mu_{2}$ approach to zero. We also apply the primal gradient mapping [24]:

[TABLE]

and the dual gradient mapping:

[TABLE]

to choose some starting point when satisfying the EGC. In our case, they can be simplified in closed forms:

[TABLE]

By choosing feasibly initial points for primal and dual variables, the modified lemma for the primal part of iterative algorithms is proposed as follows:

Lemma 4.

For a starting point ${\bf x}_{0}$ , define

[TABLE]

for an arbitrary $\mu_{2}>0$ , and any $\mu_{1}\geq L_{1}(F_{\mu_{2}})$ . Fix $\tau\in(0,1)$ and choose $\mu_{1}^{+}=(1-\tau)\mu_{1}$ ,

[TABLE]

Then $(\bar{\bf x}_{+},\bar{\bf u}_{+})$ satisfies EGC (39) with smoothness parameter $\mu_{1}^{+},\mu_{2}^{+}$ provided that $\tau$ is chosen by $\frac{\tau^{2}}{1-\tau}\leq\frac{\mu_{1}}{L_{1}(F_{\mu_{2}})}$ .

Thus, if EGC is satisfied for certain primal-dual pair, then the primal-dual pair can be updated iteratively when keeping satisfy the EGC as $\mu_{1}$ and $\mu_{2}$ go to zero. In other words, we can try to decrease $\mu_{1}$ with fixed $\mu_{2}$ for the primal problem; decrease $\mu_{2}$ with fixed $\mu_{1}$ for the dual problem. The updates for primal-dual pair is summarized in the following Algorithm 3. The convergence rate is of order ${\mathcal{O}}(\frac{1}{k})$ given in [24].

V Extension: Smoothed Dual Conic Formulation

In the previous approaches for solving the constrained BPDN problem (9), it is not natural to select a proper regularization parameter $\eta$ . However, an estimate error $\epsilon$ for the error fitting term $f({\bf x})$ might be known based on SNRs. Thus, while only keeping the nonsmooth penalty function $h({\bf x})$ as an objective, formulating $f({\bf x})$ into a constraint is preferred. This leads to reformulating (9) into a conical convex optimization problem.

V-A Primal-Dual Conic Formulations and the Smoothing

Instead of solving linear inequalities constrained BPDN problem (9)

[TABLE]

where $f({\bf x}):=\frac{1}{2}||{\bf y}-{{\bf G}}{\bf x}||_{2}^{2}$ , $h({\bf x}):=\eta||{\bf x}||_{2,1}$ , ${\mathcal{X}}=\{{\bf x}=[{\bf s}^{T},{\bf p}^{T}]^{T}:{\bf s}\geq 0,-r{\bf s}\leq{\bf p}\leq r{\bf s}\}$ . Inspired by [34], a quadratically constrained with linear inequalities constraints problem is considered

[TABLE]

since it is more natural to select an appropriate $\epsilon$ rather than an appropriate regularization parameter $\eta$ .

Note that ${\mathcal{X}}$ is a set of elements satisfying linear inequalities, so it can be replaced by a matrix form representation ${\bf C}{\bf x}\leq{\bf 0}$ . Then, let us consider the conic form of the primal problem

[TABLE]

and derive its dual by Lagrange multipliers

[TABLE]

where $g({\bf z},{\bf w})=\inf_{\bf x}||{\bf x}||_{2,1}-\langle{\bf z},{\bf y}-{\bf G}{\bf x}\rangle-\epsilon\|{\bf z}\|_{2}+\langle{\bf w},{\bf C}{\bf x}\rangle$ .

Note that both objectives are nonsmooth in the primal and dual formulation. So, we smooth $||{\bf x}||_{2,1}$ by adding the strongly convex prox-function $d({\bf x})=\frac{\sigma\mu}{2}\|{\bf x}-{\bf x}_{0}\|^{2}_{2}$ with a smoothing parameter $\mu$ and a strong convexity parameter $\sigma=1$ . ${\bf x}_{0}$ is denoted as the prox-center of $d({\bf x})$

[TABLE]

In this way, the smoothed dual problem is given by

[TABLE]

where

[TABLE]

is a smooth function over $\bf x$ . The optimal solution of $g_{\mu}({\bf z},{\bf w})$ is unique because of the strong convexity of $d({\bf x})$ . Define ${\bf x}({\bf z},{\bf w})$ as the optimal solution of $g_{\mu}({\bf z},{\bf w})$ which is computed as

[TABLE]

where a group-soft-thresholding operator $GST({\bf x},t)$ of ${\bf x}=[{\bf s}^{T},{\bf p}^{T}]^{T}\in{\mathbb{R}}^{2N}$ is defined as

[TABLE]

We rewrite the smoothed dual problem as

[TABLE]

where

[TABLE]

V-B Smoothed Dual Conic Optimization (SDCO) Solver

The problem (49) we try to solve is in a composite form with smooth part $g_{sm}$ and nonsmooth part $h$ . The smoothed part $g_{sm}({\bf z},{\bf w})$ is differentiable and its gradient is computed as $\nabla g_{sm}({\bf z},{\bf w})=\begin{bmatrix}{\bf y}-{\bf G}{\bf x}({\bf z},{\bf w})\\ -{\bf C}{\bf x}({\bf z},{\bf w})\end{bmatrix}$ in accordance with Danskin’s theorem.

Then, the generalized gradient projection method [41, 42] is applied to solve (49) by updating

[TABLE]

where $L_{k}$ is the inverse of step size $t_{k}$ . References [30, 43] show that $\epsilon$ -optimality can be achieved in ${\mathcal{O}}(1/\epsilon)$ iterations if $t_{k}$ is selected properly. Actually, a closed form solution for $({\bf z}_{k+1},{\bf w}_{k+1})$ can be derived as

[TABLE]

where an $l_{2}$ -shrinkage operation $Shrink({\bf x},t)$ is defined as

[TABLE]

The right-hand side is first-order approximation of (V-B), and satisfies an upper bound property

[TABLE]

which holds for sufficiently large $L_{k}$ . Typically, if $L_{k}\geq L,\forall k$ , then the upper bound (56) holds, where $L$ is Lipschitz constant. Under those assumptions, $\epsilon$ -optimality can be achieved in ${\mathcal{O}}(L/\epsilon)$ iterations by performing (V-B). A variation of the generalized gradient projection method proposed by Nesterov, which is an optimal first-order method with ${\mathcal{O}}(L/\sqrt{\epsilon})$ iterations, is used instead of (V-B). The approach is summarized in Algorithm 4.

It is noted that the smaller smoothing parameter $\mu$ , the better is the accuracy performance. On the other hand, the continuation scheme, which was proposed in NESTA [33], improves the convergence rate. Accordingly, a sequence of subproblems is solved by Algorithm 4 with decreasing smoothing parameters $\mu_{k}$ . Each result of subproblems feeds into the next round. The standard continuation scheme combined with Algorithm 4 is listed below:

VI Numerical Results

In this section, the off-grid DoA estimation is conducted to demonstrate the performance of the proposed methods. The two proposed accelerated smoothing proximal gradient methods are designated as ASPG-L2 (using $h^{l_{2}}_{\mu}({\bf x})$ ) and ASPG-L1 (using $\nabla h^{l_{1}}_{\mu}({\boldsymbol{\nu}})$ ), the consensus ADMM method is designated as C-ADMM. the variant of excess-gap technique method is called EGT-based, and the variant of smoothed dual conic optimization method with continuation is called SDCO-Ct. We also solve problem (9) by using CVX packages. The CVX method implemented by the interior point method can be viewed as a benchmark, which is used to evaluate the estimation performance degradation caused by smoothing in the proposed methods. The estimation errors of these methods are compared with the same for the MUSIC estimator, M+LFBF and the CRLB. Consider $K=2$ uncorrelated source signals from DoAs ${\boldsymbol{\theta}}=[13.2220,28.6022]$ degree impinging on a uniform linear array of $M=8$ sensors with half-wavelength interelement spacing. The two sources are randomly generated with normal distribution of zero mean and variance $\sigma_{s}^{2}$ . The noise term is i.i.d. AWGN with zero mean and variance $\sigma_{n}^{2}$ . We use one hundred snapshots to estimate the covariance matrix. The size $N$ of search grid is set to 360 with $r=0.25$ degree, which is used for all methods. One hundred realizations are performed at each SNR. In the ASPG method, the decreasing factor is $\gamma=0.5$ , and smoothing parameter is chosen as $\mu=10^{-8}$ . In the EGT-based method, the two smoothing parameters $\mu_{1},\mu_{2}$ are controlled by $\tau=\frac{2}{k+3}$ , where $k$ is the iteration number. In the SDCO-Ct method, the initial value of smoothing parameter is set to one, and sequentially reduced by multiplying with $0.5$ at each step in the outer loop. All the other parameter settings can be referred in the Algorithm blocks.

VI-A DoA Resolution of Two Reformulated Group-sparsity Penalties

The resolution ability of two reformulated group-sparsity penalties (using $h^{l_{2}}_{\mu}({\bf x})$ , and $\nabla h^{l_{1}}_{\mu}({\boldsymbol{\nu}})$ ) is verified with the ASPG method. In Figure 1, the estimated power spectrum of ASPG methods is presented at SNR $=0$ dB. Due to the smoothing process, both have lost their sparsity. However, the two peaks of ASPG-L1 are more separated than ASPG-L2. In other words, ASPG-L1 estimator owns higher DoA resolution. In Figure 2, at SNR $=4$ dB, the resolution ability of ASPG-L1 estimator gets improved compared with the case of SNR $=0$ dB, while ASPG-L2 estimator does not. As can be seen in Figure 2, the shape of two major peaks of ASPG-L1 is sharper, and much more separated.

VI-B Accuracy of Off-Grid DoA Estimation

The accuracy performance of off-grid DoA estimation for the proposed methods is presented by the root-mean-square-error (RMSE) of DoA estimation, which is defined as $(E[\frac{1}{K}\|\hat{{\boldsymbol{\theta}}}-{\boldsymbol{\theta}}\|_{2}^{2}])^{\frac{1}{2}}$ . Noted that since we show the DoA resolution of the second reformulation ( $\nabla h^{l_{1}}_{\mu}({\boldsymbol{\nu}})$ ) is better than the first one, we perform the EGT-based method by adopting the second reformulated group-sparsity penalty in order to get better performance. As seen in Figure 3, the RMSE of CVX, C-ADMM, ASPG-L1, ASPG-L2, EGT-based, SDCO-Ct are almost the same and better than MUSIC at SNR $=0$ dB. When SNRs are low, the performance degradation mainly comes from the bad estimation of nonzero term locations in the sparse vector ${\bf x}=[{\bf s}^{T},{\bf p}^{T}]^{T}\in{\mathbb{R}^{2N\times 1}}$ , where ${\bf p}={\boldsymbol{\beta}}\odot{\bf s}$ . We notice that the RMSE of SDCO-Ct, and M+LFBF get worse at SNR $=-2$ dB, which also indicates that their resolution ability becomes weaker.

When SNRs are high, if the RMSE performance cannot approach CRLB, this means that the estimation of the off-grid DoA vector ${\boldsymbol{\beta}}$ is not satisfied. At SNR $=2$ , and 4 dB, the performance of ASPG-L1, CVX, ADMM, EGT-based, SDCO-Ct, and MUSIC is better than ASPG-L2, and M+LFBF. The reason of bad performance in the ASPG-L2 is that the sparse property of group-sparsity penalty $\|{\bf x}_{g_{i}}\|_{2}$ is lost during the smoothing process by only using the property that the dual norm of $\ell_{2}$ norm is also $\ell_{2}$ norm so that sparsity is not promoted in this way. Thus, a satisfying estimation of ${\boldsymbol{\beta}}$ cannot be obtained.

VI-C DoA Resolution Performance

In this numerical experiment, the resolution test is performed to demonstrate the ability of detecting two closely located DoAs for the proposed methods at SNR $=0$ dB by checking the normalized spectra. In Figure 4, the DoA resolution of MUSIC is worse than all the others because it almost cannot detect the second DoA. Due to the smoothing process, ASPG-L1, EGT-based, and SDCO-Ct lose the sparse property of group-sparsity penalty so that the shape of two major detected peaks is not sharp as C-ADMM. However, instead of using fixed smoothing parameters in the ASPG method, the EGT-based, and SDCO-Ct method use different approaches to sequentially reduce the smoothing parametersso that the resolution ability is improved. The sharpness of two peaks of SDCO-Ct is closer to C-ADMM compared with all the others.

VI-D Convergence Performance Comparisons

The convergence performance of the proposed methods is verified in this numerical simulation in terms of reconstruction error or objective function value. The reconstruction error is defined as $E[\frac{\|\hat{{\boldsymbol{\theta}}}-{\boldsymbol{\theta}}\|_{2}}{\|{\boldsymbol{\theta}}\|_{2}}]$ . First, we inspect the convergence of the EGT-based method, in which the smoothing parameters for primal and dual problem are chosen with respect to iteration numbers, which is like the diminishing step size rule [40]. As shown in Figure 5, the duality gap becomes very small after the iteration number achieves 50. Second, the convergence comparison between the SDCO with and without continuation is conducted. In Figure 6, the convergence rate of the SDCO with continuation is almost the same as the one without continuation. However, it can achieves a lower objective function value that leads to better accuracy performance, since the smoothing parameter is reduced gradually by the continuation technique.

Finally, we inspect the convergence performance of C-ADMM, M+LFBF, ASPG-L1, ASPG-L2, EGT-based, and SDCO-Ct. In Figure 7, at SNR $=0$ dB, M+LFBF, ASPG-L1, ASPG-L2, EGT-based, and SDCO-Ct converge after iteration number is 100, while C-ADMM converges after iteration number is 300. Only SDCO-Ct, EGT-based, and C-ADMM can have lowest reconstruction error among them, but SDCO-Ct seems unstable in this case. In Figure 8, at SNR $=2$ dB, the convergence rate of C-ADMM gets improved., but is still slower than all the others. The SDCO-Ct method is the fastest one to converge to the lowest reconstruction error, and the unstableness is much less than the previous case.

VII Conclusion

In this paper, several iterative methods with the Nesterov smoothing technique were proposed for the estimation of off-grid DoAs. First, the C-ADMM method is applied. In order to improve the convergence rate of C-ADMM, two reformulation of the group-sparsity penalty is introduced and smoothed by the Nesterov smoothing technique so that its gradient can be calculated easily. Then, the accelerated proximal gradient is used to solve the unconstrained optimization problem with the smoothed objective functions plus the nonsmooth indicator function. The smoothing parameter is selected empirically. Thus, the variant of EGT-based method is employed so that the smoothing parameter can be chosen systematically. Instead of heuristically choosing a regularization parameter in the BPDN problem formulation, the variant of SDCO method is proposed, and its smoothing parameter can also be decided by using the continuation technique. The accuracy performance and convergence of the proposed methods were verified by a numerical example of DoA estimation.

Appendix A Proof of Lemma 2

Proof.

Denote the smoothed version of the objective function $F({\bf x})$ as

[TABLE]

with the Lipschitz continuous gradient constant $L=L_{f}+\frac{1}{\mu\sigma}$ . By using similar proof schemes in [44], we decompose

[TABLE]

Then, based on the theorem from [45], we have the following bound for an optimal solution $\bf x^{*}$ :

[TABLE]

Also, by the definition of $h_{\mu}^{l_{i}}({\bf x})$ , we have

[TABLE]

This implies that

[TABLE]

Thus,

[TABLE]

Let $\mu=\frac{\epsilon}{2D_{i}}$ , then

[TABLE]

If we let $\frac{\epsilon}{2}+\frac{2(L_{f}+\frac{2D_{i}}{\epsilon\sigma})\|{\bf x}^{0}-{\bf x}^{*}\|^{2}}{(k+1)^{2}}=\epsilon$ , then we have the upper bound in (27). ∎

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. L. Donoho, “Compressed sensing,” Information Theory, IEEE Transactions on , vol. 52, no. 4, pp. 1289–1306, 2006.
2[2] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” Information Theory, IEEE Transactions on , vol. 52, no. 2, pp. 489–509, 2006.
3[3] M. F. Duarte and Y. C. Eldar, “Structured compressed sensing: From theory to applications,” IEEE Transactions on Signal Processing , vol. 59, no. 9, pp. 4053–4085, 2011.
4[4] Y. C. Eldar and G. Kutyniok, Compressed sensing: theory and applications . Cambridge University Press, 2012.
5[5] M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B (Statistical Methodology) , vol. 68, no. 1, pp. 49–67, 2006.
6[6] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM review , vol. 43, no. 1, pp. 129–159, 2001.
7[7] E. Candes and T. Tao, “The dantzig selector: Statistical estimation when p is much larger than n,” The Annals of Statistics , pp. 2313–2351, 2007.
8[8] G. Tang and A. Nehorai, “Performance analysis for sparse support recovery,” IEEE transactions on information theory , vol. 56, no. 3, pp. 1383–1399, 2010.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Linearly Constrained Smoothing Group Sparsity Solvers in Off-grid Model

Abstract

Index Terms:

I Introduction

II Preliminaries, DoA Model with Structured Perturbations, and C-ADMM solver

II-A Preliminaries

Assumption 1**.**

Definition 1** (Lipschitz Continuous).**

Definition 2** (Lipschitz Continuous Gradient).**

Definition 3** (Strongly Convex).**

II-B DoA Model with Structured Perturbations

II-C Consensus Alternating Direction Method of Multipliers (C-ADMM)

Assumption 2**.**

Assumption 3**.**

III The Smoothing Techniques

III-A Two Reformulations of Group-sparsity Penalty

Theorem 1**.**

III-B Accelerated Smoothing Proximal Gradient (ASPG)

III-C Convergence Analysis

Lemma 2**.**

IV The EGT-based Primal-Dual Method

Lemma 3**.**

Lemma 4**.**

V Extension: Smoothed Dual Conic Formulation

V-A Primal-Dual Conic Formulations and the Smoothing

V-B Smoothed Dual Conic Optimization (SDCO) Solver

VI Numerical Results

VI-A DoA Resolution of Two Reformulated Group-sparsity Penalties

VI-B Accuracy of Off-Grid DoA Estimation

VI-C DoA Resolution Performance

VI-D Convergence Performance Comparisons

VII Conclusion

Appendix A Proof of Lemma 2

Proof.

Assumption 1.

Definition 1 (Lipschitz Continuous).

Definition 2 (Lipschitz Continuous Gradient).

Definition 3 (Strongly Convex).

Assumption 2.

Assumption 3.

Theorem 1.

Lemma 2.

Lemma 3.

Lemma 4.