Linearly Constrained Smoothing Group Sparsity Solvers in Off-grid Model
Cheng-Yu Hung, Mostafa Kaveh

TL;DR
This paper develops efficient algorithms for off-grid DoA estimation in compressed sensing, addressing matrix perturbations with various optimization formulations and convergence analyses.
Contribution
It introduces novel group-sparsity solvers using ADMM, Nesterov smoothing, and primal-dual methods tailored for off-grid model perturbations.
Findings
Algorithms demonstrate high accuracy in numerical simulations.
Proposed methods converge efficiently with reduced computational cost.
Effective handling of off-grid effects in compressed sensing scenarios.
Abstract
In compressed sensing, the sensing matrix is assumed perfectly known. However, there exists perturbation in the sensing matrix in reality due to sensor offsets or noise disturbance. Directions-of-arrival (DoA) estimation with off-grid effect satisfies this situation, and can be formulated into a (non)convex optimization problem with linear inequalities constraints, which can be solved by the interior point method (using the CVX tools), but at a large computational cost. In this work, in order to design efficient algorithms, we consider various alternative formulations, such as unconstrained formulation, primal-dual formulation, or conic formulation to develop group-sparsity promoted solvers. First, the consensus alternating direction method of multipliers (C-ADMM) is applied. Then, iterative algorithms for the BPDN formulation is proposed by combining the Nesterov smoothing technique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Indoor and Outdoor Localization Technologies · Microwave Imaging and Scattering Analysis
Linearly Constrained Smoothing Group Sparsity Solvers in Off-grid Model
Cheng-Yu Hung, and Mostafa Kaveh C. Y. Hung was with the Department of Electrical and Computer Engineering, University of Minnesota - Twin Cities, Minneapolis, MN, 55455 USA e-mail: [email protected]. Kaveh is with University of Minnesota.
Abstract
In compressed sensing, the sensing matrix is assumed perfectly known. However, there exists perturbation in the sensing matrix in reality due to sensor offsets or noise disturbance. Directions-of-arrival (DoA) estimation with off-grid effect satisfies this situation, and can be formulated into a (non)convex optimization problem with linear inequalities constraints, which can be solved by the interior point method (using the CVX tools), but at a large computational cost. In this work, in order to design efficient algorithms, we consider various alternative formulations, such as unconstrained formulation, primal-dual formulation, or conic formulation to develop group-sparsity promoted solvers. First, the consensus alternating direction method of multipliers (C-ADMM) is applied. Then, iterative algorithms for the BPDN formulation is proposed by combining the Nesterov smoothing technique with accelerated proximal gradient method, and the convergence analysis of the method is conducted as well. We also developed a variant of EGT (Excessive Gap Technique)-based primal-dual method to systematically reduce the smoothing parameter sequentially. Finally, we propose algorithms for quadratically constrained - mixed norm minimization problem by using the smoothed dual conic optimization (SDCO) and continuation technique. The performance of accuracy and convergence for all the proposed methods are demonstrated in the numerical simulations.
Index Terms:
The Nesterov smoothing, Basis pursuit denoising (BPDN), Group Lasso, Alternating direction method of multipliers (ADMM), Conic optimization.
I Introduction
In compressed sensing [1, 2], an underdetermined linear system is considered
[TABLE]
where is an observation measurement vector, is a known dictionary matrix, is a measurement error or additive noise vector, and is a -sparse signal vector of interest. There are only nonzero entries in , and . As long as the dictionary matrix meets the requirement of the Restricted Isometry Property (RIP) [2, 3, 4], the sparse vector can be reconstructed even with a few measurements by many solvers, such as group Lasso (least absolute shrinkage and selection operator) [5], basis pursuit denoising (BPDN) [6], or Dantzig selector [7]. The performance analysis and computable performance bounds of these sparse recovery solvers are conducted in [8, 9]. However, the dictionary matrix may not be known perfectly due to certain noise or modeling perturbations. In [10], the sensitivity of basis mismatch in the dictionary matrix is analyzed. For instance, the compressed sensing approach for DoA estimation may assume a known dictionary formed from the array responses at a grid of candidate directions [11]. In practice, however, the DoAs are most likely not to locate on the model grid, leading to the now well-known off-grid DoA estimation problem, for which a number of model approximations and solutions have been proposed, for example [12, 13, 14, 15, 16, 17, 18]. A commonly-used observation for off-grid DoAs follows the noisy structured perturbation model given by:
[TABLE]
where is known, and is known as part of the off-grid approximation. , and is denoted as the unknown coefficient vector for the approximation. is the sparse vector associated with grid points nearest the true DoAs. Equation (2) can be solved by formulating a sparsity promoting constrained nonconvex minimization problem to estimate and sequentially by the alternating method [12, 13], but with slow convergence. The alternating direction method of multipliers (ADMM)[19] is a very popular method, which can be applied to solve this problem.
Furthermore, many inverse problems in signal processing, data mining, or statistical machine learning can be cast as a composite optimization problem, which involves the minimization of a sum of differentiable functions and nonsmooth ones. The off-grid DoA estimation problem of (2) can be formulated into this type of composite form. Subgradient algorithms [20] are developed to deal with nonsmooth optimization problems but with very slow convergence rate. Instead of using subgradient methods, we attempt to design algorithms for solving nonsmooth optimization (NSO) problems efficiently by using a sequence of approximate smoothing problems to substitute for the original ones. The core of the techniques considered is to make the nondifferentiable functions smooth without introducing substantial approximate errors caused by the smoothing process. Several different smoothing techniques have been proposed to solve NSO problems [21, 22, 23]. A primal-dual symmetric method derived from the excessive gap condition for nonsmooth convex optimization is proposed in [24]. In [15], the nondifferentiable function, which is approximated by the Moreau envelope function [23], is used in the column-wise mismatch problem. In [25], the overlapping group-lasso penalty is smoothed by the Nesterov smoothing technique [21]. A unified framework of smoothing approximation with fast gradient schemes is proposed in [26]. In [27], an adaptive Nesterov-based smoothing method is developed to dynamically choose the smoothing parameter at each iteration of the update. In [28], a number of primal-dual iterative approaches for solving large-scale nonsmooth optimization problems, such as the M+LFBF (Monotone+Lipschitz Forward Backward Forward) algorithm, are reviewed. In [29, 30], subgradient methods are proposed, but their complexity cannot be better than than where is the number of iterations. Alternatively, smoothing as presented in [21] can be applied to mitigate non-smoothness of the objective function. In [31], a proximal iterative smoothing algorithm was proposed to solve convex nonsmooth optimization problems.
In this work, an unconstrained off-grid DoA estimator is first discussed. It consists of one differentiable function and two nonsmooth ones, which are a regularized group-sparsity penalty and an indicator function. First, the consensus ADMM (C-ADMM) [19] is applied to solve this unconstrained optimization problem by using a common global variable which makes all the local variables of objective functions equal, but it can be very slow to converge to high accuracy. In order to have a low reconstruction error of DoA estimation quickly, the Nesterov smoothing methodology [21, 25] is used to reformulate the group-sparsity penalty into a ”max”-structure function, and then smoothing it by adding a strongly convex term. We propose two reformulations for the group-sparsity penalty since - mixed norm has a two-layer norm structure. Then, the accelerated proximal gradient [32] method is used on the smoothed optimization case. Note that our first proposed Nesterov smoothing method is equivalent to the one in [15], as can be deduced from the results of [31]. The second Nesterov smoothing method is proposed by use of the property of dual of norm. It’s noted that the fixed smoothing parameter has to be chosen empirically in this method. However, [33] shows that the accuracy performance increases when the smoothing parameter decreases. Thus, by the excess gap technique (EGT) [24], in order to reduce the smoothing parameter sequentially, we developed a variant of EGT-based primal-dual method, in which a surrogate of cost function is introduced. Furthermore, inspired by [34, 33], a variant of conic formulation for quadratically constrained - mixed norm minimization with linear ineuqalities is proposed, and solved by using the smoothed dual conic optimization and continuation technique. The accuracy, and convergence of performance for the proposed methods are demonstrated, and compared with the interior point method (CVX) [35], MUSIC [36], M+LFBF [28], and CRLB [37].
This paper is organized as follows. In Section II, Some mathematical preliminaries, and the off-grid DoA model with its C-ADMM solver are introduced. In Section III, the Nesterov smoothing technique is employed to reformulate the group-sparsity penalty in two ways. Then, accelerated smoothing proximal gradient (ASPG) is used to solve the reformulated optimization problems. The convergence behavior is analyzed as well. In Section IV, the EGT-based approach is utilized to provide a systematic way to reduce the smoothing parameter. Finally, in Section V, the smoothing technique is applied in the conic formulation on the off-grid DoA estimation. Section VI presents numerical results to verify the performance in terms of DoA resolution ability, estimation accuracy, and convergence behavior.
Notation: Throughout the paper, vectors and matrices are represented by boldface lowercase and uppercase letters, respectively. denotes the expectation operator. For any given matrix , denotes the Hermitian transpose matrix, and is the vectorization operator of the matrix. represents a diagonal square matrix with the elements of vector on the diagonal. denotes the Hadamard product. denotes the Kronecker product. For any two vectors , is denoted as a new vector in which is stacked by , and means the inner product. denotes the projection operator of projecting a vector onto a space .
II Preliminaries, DoA Model with Structured Perturbations, and C-ADMM solver
II-A Preliminaries
Consider the following unconstrained separable convex optimization problem [38]:
[TABLE]
where is a sequence of convex functions from to .
In this paper, specifically, an unconstrained convex optimization problem is considered:
[TABLE]
that satisfy the following assumptions, and definitions:
Assumption 1**.**
- (i)
is a proper, closed, convex and continuously differentiable function. Its gradient is Lipschitz continuous with parameter . 2. (ii)
is a proper, closed, and convex -Lipschitz continuous function. It is not necessarily differentiable. 3. (iii)
is a proper, lower semicontinuous, and convex function but possibly nonsmooth. For instance, the indicator function of a closed set is lower semi-continuous.
Definition 1** (Lipschitz Continuous).**
A function is -Lipschitz continuous if there exits such that , .
Definition 2** (Lipschitz Continuous Gradient).**
The gradient of a differentiable convex function is Lipschitz continuous with parameter if , .
Definition 3** (Strongly Convex).**
The function is -strongly convex on a closed convex set with parameter if , .
In the next subsection, we will show that the DoA estimation problem with structured perturbations can be reformulated into the form of (4).
II-B DoA Model with Structured Perturbations
Consider an array of sensors and suppose that there are far-field narrowband sources impinging on the array from angles . The measurement model, and its covariance are described by
[TABLE]
where
- •
is the observation vector.
- •
is the -th received signal with power .
- •
denotes the steering vector for direction with -th entry , where is wavelength. .
In compressed sensing, is defined as uniformly discretized grid atoms for the dictionary matrix. The off-grid DoA is denoted by if is closest to ; otherwise, . We assume that and .
By using Taylor series, the first-order approximate measurement model [39] is
[TABLE]
where , , , and is a sparse vector. . By vectorizing the covariance of (7), we have
[TABLE]
where
- •
.
- •
.
- •
.
- •
is a sparse vector with nonzero terms ’s.
where is an all-zero vector except with 1 at -th entry. , and . Let be a fat matrix for the following sections. Note that if is less than or equal to , then since the value of is much smaller than at mild SNRs.
Since have the same sparsity pattern (non-zero entries), we can solve (8) over a closed convex set by the group Lasso :
[TABLE]
where is a regularization parameter, and is defined previously. Because the constraint set is a linear inequalities constraint, we can transform it into an unconstrained one by using an indicator function, which is also known as the basis pursuit denoising problem (BPDN) formulation:
[TABLE]
where if ; otherwise, . Let , , and such that (10) fits the framework of (4). Our goal is to solve an optimal solution of problem (10) efficiently. However, two nonsmooth functions, and , in the objective makes this problem difficult to solve it. Thus, the C-ADMM is applied to overcome this situation.
II-C Consensus Alternating Direction Method of Multipliers (C-ADMM)
Let us consider the unconstrained problem (10). This problem can be solved by C-ADMM, which uses a consensus global variable and local variables :
[TABLE]
We call this a ”consensus problem” since the constraint forces all the local variables to be equal.
C-ADMM of this problem can be derived from the augmented Lagrangian
[TABLE]
where , , , and is a penalty parameter. The resulting consensus ADMM is summarized in Algorithm 1.
The convergence of C-ADMM is in terms of the following two assumptions:
Assumption 2**.**
The extended-real-valued function are closed, proper, and convex.
Assumption 3**.**
The unaugmented Lagragian has a saddle point. Namely, there exists a not necessarily unique solution such that
[TABLE]
In [19], under assumptions 2 and 3, C-ADMM is shown to have its iterations satisfy residual convergence, objective convergence, and dual variable convergence. The update steps of C-ADMM is summarizes in the Algorithm 1.
III The Smoothing Techniques
In the following sections, we will show how to deal with problem (10) by combining the accelerated proximal gradient (APG) algorithm with the Nesterov smoothing technique. We aim to smooth the group-sparsity penalty so that the APG method can be used. A variant of EGT-based primal-dual method and smoothed dual conic optimization method will be described in the following sections. In order to present the idea more clearly, we introduce the notation , where denotes the subvector of having the same sparse pattern in group , where is the cardinality of a set. Each group represents a subset of index set and is disjoint from the others. Denote as the set of groups, and . In our case, , , where and . Denote , , and as the -th entry of , and , respectively.
III-A Two Reformulations of Group-sparsity Penalty
Since is an - mixed norm with two layers, i.e., the inner is norm and the outer is norm, we can utilize the dual norm property to reformulate it as a maximization of a linear function over an auxiliary variable with ”simple” constraints in two different ways.
First, inspired by [25], by using the convex conjugate function and the fact that the dual norm of norm is norm, has the max-structure as where denotes an auxiliary vector. Then, can be written as
[TABLE]
where
[TABLE]
is the set of vectors in the space of the Cartesian product of norm unit ball. In the Nesterov smoothing technique, if a nonsmooth convex function has the max-structure, then we have its corresponding smoothed function
[TABLE]
with a smoothing parameter , where a - [21] is continuous and strongly convex on with a strong convexity parameter . Its - of is denoted by . By the definition of strongly convex, . Since is strongly convex, is a smooth and convex function so that its solution is unique and its gradient can be computed easily.
Second, inspired by the fact that the dual norm of norm is norm, has the max-structure as , where denotes an auxiliary vector. Therefore, we propose a second reformulation. Let us define and , and then can be rewritten as
[TABLE]
We define a new function as
[TABLE]
where
[TABLE]
is the set of vectors in the space of norm unit ball. Since it has the max-structure, we have the corresponding smoothed function of as
[TABLE]
with a smoothing parameter . Then, is also a smooth and convex function if a strongly convex function is chosen. Note that the dimension of is twice as many as .
Since both and are smooth and convex, their gradients can be formed by the following modified theorem [21]
Theorem 1**.**
For any , the functions and are well-defined and continuously differentiable in and , respectively. Moreover, both functions are convex and their gradients:
[TABLE]
are Lipschitz continuous with the same constant , where and are the optimal solutions to (16) and (20), respectively.
Suppose that ; we choose with a strong convexity parameter . Then , , which is a subvector of , can be calculated as where denotes the projection operator of projecting a vector to a unit ball
[TABLE]
Similarly, if we choose , then can be computed as where denotes the projection operator of projecting a vector to an unit ball
[TABLE]
where is the -th entry of .
Note that the dimension of is a half of that for . Therefore, for the case of , zero-padding is performed such that , where is a zero vector, so that a new gradient can be used in the accelerated proximal gradient. This is acceptable only when parameter is taken small enough. Since holds in this case, the value of mainly comes from the contribution of , so that zero vector can be assigned as the partial derivative of .
III-B Accelerated Smoothing Proximal Gradient (ASPG)
Now, we solve two ”smoothed” versions of problem (10)
[TABLE]
where , or , and then its gradient is computed as .
Problem (24) can be solved by the accelerated proximal gradient method [32] in which a proximal operator is used:
[TABLE]
In fact, the proximal operator of indicator function is the projection operator onto the set , . The ASPG method is summarized in the Algorithm 2.
III-C Convergence Analysis
We show the convergence rate of the Algorithm 2 in the Lemma 2.
Lemma 2**.**
Suppose is the -th iterative solution in Algorithm 2, and is the optimal solution of problem (10). Assume that -approximation is required, i.e., . If we set , where , then
[TABLE]
where is Lipschitz continuous gradient parameter of . The number of iteration has an upper bound by
[TABLE]
This lemma implies its convergence rate is . We cannot achieve convergence rate of the accelerated proximal gradient method due to the smoothing process, but the convergence rate is better than that for subgradient methods with [20, 29].
IV The EGT-based Primal-Dual Method
In ASPG, the smoothing paramter is chosen empirically and fixed. This leads to decrease the practical efficiency of ASPG. Thus, the excessive gap technique [24] is employed to choose systematically in the framework of primal-dual gradient symmetric formulations.
Let us consider the constrained optimization problem (9) as follows:
[TABLE]
where , and . (Note that there are two reformulations of proposed in subsection III.A. The first one will be used for convenience to express the idea in this subsection.)
We know that is not strongly convex. And since is a fat matrix, the error fitting function is not strongly convex either. Thus, we use as a surrogate of such that it can be expressed in a -structure form, and smoothed by using a strongly convex function, although is not differentiable everywhere. Thus, instead of solving (9), we propose
[TABLE]
Then, we will smooth not only the regularization term , but also the new error fitting function . This will lead to a closed form solution. Next, we will show how to achieve this goal by the excessive gap technique.
We can rewrite (29) into the following primal problem by using the dual norm definition:
[TABLE]
and its dual problem as
[TABLE]
where is a dual variable vector composed of and , which belong to , and , respectively, where
[TABLE]
Since both and are nondifferentiable, we can construct a smoothing approximation of primal-dual problem as follows
[TABLE]
[TABLE]
by using two strongly convex functions , and with two smoothing parameters , and .
For the primal problem, denote as the unique optimal solution of , which can be derived in closed forms as
[TABLE]
By Danskin’s theorem [40], the gradient of is computed as
[TABLE]
with Lipschitz-continuous constant .
Similarly, for the dual problem, denote as the unique optimal solution of , which can be derived in a closed form as
[TABLE]
And the gradient of is
[TABLE]
wth Lipschitz-continuous constant by Danskin’s theorem.
Since we know that
- •
- •
By definition, ,
- •
Excessive gap condition (EGC) [24] holds when, for certain and with sufficiently large , this inequality occurs
[TABLE]
Then, the following modified lemma can be derived:
Lemma 3**.**
Let and satisfy EGC. Then,
[TABLE]
where , , .
By this modified lemma, EGC provides an upper bound of primal-dual pair so that we can update iteratively the primal-dual pair and keep satisfying EGC as approach to zero. We also apply the primal gradient mapping [24]:
[TABLE]
and the dual gradient mapping:
[TABLE]
to choose some starting point when satisfying the EGC. In our case, they can be simplified in closed forms:
[TABLE]
By choosing feasibly initial points for primal and dual variables, the modified lemma for the primal part of iterative algorithms is proposed as follows:
Lemma 4**.**
For a starting point , define
[TABLE]
for an arbitrary , and any . Fix and choose ,
[TABLE]
Then satisfies EGC (39) with smoothness parameter provided that is chosen by .
Thus, if EGC is satisfied for certain primal-dual pair, then the primal-dual pair can be updated iteratively when keeping satisfy the EGC as and go to zero. In other words, we can try to decrease with fixed for the primal problem; decrease with fixed for the dual problem. The updates for primal-dual pair is summarized in the following Algorithm 3. The convergence rate is of order given in [24].
V Extension: Smoothed Dual Conic Formulation
In the previous approaches for solving the constrained BPDN problem (9), it is not natural to select a proper regularization parameter . However, an estimate error for the error fitting term might be known based on SNRs. Thus, while only keeping the nonsmooth penalty function as an objective, formulating into a constraint is preferred. This leads to reformulating (9) into a conical convex optimization problem.
V-A Primal-Dual Conic Formulations and the Smoothing
Instead of solving linear inequalities constrained BPDN problem (9)
[TABLE]
where , , . Inspired by [34], a quadratically constrained with linear inequalities constraints problem is considered
[TABLE]
since it is more natural to select an appropriate rather than an appropriate regularization parameter .
Note that is a set of elements satisfying linear inequalities, so it can be replaced by a matrix form representation . Then, let us consider the conic form of the primal problem
[TABLE]
and derive its dual by Lagrange multipliers
[TABLE]
where .
Note that both objectives are nonsmooth in the primal and dual formulation. So, we smooth by adding the strongly convex prox-function with a smoothing parameter and a strong convexity parameter . is denoted as the prox-center of
[TABLE]
In this way, the smoothed dual problem is given by
[TABLE]
where
[TABLE]
is a smooth function over . The optimal solution of is unique because of the strong convexity of . Define as the optimal solution of which is computed as
[TABLE]
where a group-soft-thresholding operator of is defined as
[TABLE]
We rewrite the smoothed dual problem as
[TABLE]
where
[TABLE]
V-B Smoothed Dual Conic Optimization (SDCO) Solver
The problem (49) we try to solve is in a composite form with smooth part and nonsmooth part . The smoothed part is differentiable and its gradient is computed as in accordance with Danskin’s theorem.
Then, the generalized gradient projection method [41, 42] is applied to solve (49) by updating
[TABLE]
where is the inverse of step size . References [30, 43] show that -optimality can be achieved in iterations if is selected properly. Actually, a closed form solution for can be derived as
[TABLE]
[TABLE]
where an -shrinkage operation is defined as
[TABLE]
The right-hand side is first-order approximation of (V-B), and satisfies an upper bound property
[TABLE]
which holds for sufficiently large . Typically, if , then the upper bound (56) holds, where is Lipschitz constant. Under those assumptions, -optimality can be achieved in iterations by performing (V-B). A variation of the generalized gradient projection method proposed by Nesterov, which is an optimal first-order method with iterations, is used instead of (V-B). The approach is summarized in Algorithm 4.
It is noted that the smaller smoothing parameter , the better is the accuracy performance. On the other hand, the continuation scheme, which was proposed in NESTA [33], improves the convergence rate. Accordingly, a sequence of subproblems is solved by Algorithm 4 with decreasing smoothing parameters . Each result of subproblems feeds into the next round. The standard continuation scheme combined with Algorithm 4 is listed below:
VI Numerical Results
In this section, the off-grid DoA estimation is conducted to demonstrate the performance of the proposed methods. The two proposed accelerated smoothing proximal gradient methods are designated as ASPG-L2 (using ) and ASPG-L1 (using ), the consensus ADMM method is designated as C-ADMM. the variant of excess-gap technique method is called EGT-based, and the variant of smoothed dual conic optimization method with continuation is called SDCO-Ct. We also solve problem (9) by using CVX packages. The CVX method implemented by the interior point method can be viewed as a benchmark, which is used to evaluate the estimation performance degradation caused by smoothing in the proposed methods. The estimation errors of these methods are compared with the same for the MUSIC estimator, M+LFBF and the CRLB. Consider uncorrelated source signals from DoAs degree impinging on a uniform linear array of sensors with half-wavelength interelement spacing. The two sources are randomly generated with normal distribution of zero mean and variance . The noise term is i.i.d. AWGN with zero mean and variance . We use one hundred snapshots to estimate the covariance matrix. The size of search grid is set to 360 with degree, which is used for all methods. One hundred realizations are performed at each SNR. In the ASPG method, the decreasing factor is , and smoothing parameter is chosen as . In the EGT-based method, the two smoothing parameters are controlled by , where is the iteration number. In the SDCO-Ct method, the initial value of smoothing parameter is set to one, and sequentially reduced by multiplying with at each step in the outer loop. All the other parameter settings can be referred in the Algorithm blocks.
VI-A DoA Resolution of Two Reformulated Group-sparsity Penalties
The resolution ability of two reformulated group-sparsity penalties (using , and ) is verified with the ASPG method. In Figure 1, the estimated power spectrum of ASPG methods is presented at SNR dB. Due to the smoothing process, both have lost their sparsity. However, the two peaks of ASPG-L1 are more separated than ASPG-L2. In other words, ASPG-L1 estimator owns higher DoA resolution. In Figure 2, at SNR dB, the resolution ability of ASPG-L1 estimator gets improved compared with the case of SNR dB, while ASPG-L2 estimator does not. As can be seen in Figure 2, the shape of two major peaks of ASPG-L1 is sharper, and much more separated.
VI-B Accuracy of Off-Grid DoA Estimation
The accuracy performance of off-grid DoA estimation for the proposed methods is presented by the root-mean-square-error (RMSE) of DoA estimation, which is defined as . Noted that since we show the DoA resolution of the second reformulation () is better than the first one, we perform the EGT-based method by adopting the second reformulated group-sparsity penalty in order to get better performance. As seen in Figure 3, the RMSE of CVX, C-ADMM, ASPG-L1, ASPG-L2, EGT-based, SDCO-Ct are almost the same and better than MUSIC at SNR dB. When SNRs are low, the performance degradation mainly comes from the bad estimation of nonzero term locations in the sparse vector , where . We notice that the RMSE of SDCO-Ct, and M+LFBF get worse at SNR dB, which also indicates that their resolution ability becomes weaker.
When SNRs are high, if the RMSE performance cannot approach CRLB, this means that the estimation of the off-grid DoA vector is not satisfied. At SNR , and 4 dB, the performance of ASPG-L1, CVX, ADMM, EGT-based, SDCO-Ct, and MUSIC is better than ASPG-L2, and M+LFBF. The reason of bad performance in the ASPG-L2 is that the sparse property of group-sparsity penalty is lost during the smoothing process by only using the property that the dual norm of norm is also norm so that sparsity is not promoted in this way. Thus, a satisfying estimation of cannot be obtained.
VI-C DoA Resolution Performance
In this numerical experiment, the resolution test is performed to demonstrate the ability of detecting two closely located DoAs for the proposed methods at SNR dB by checking the normalized spectra. In Figure 4, the DoA resolution of MUSIC is worse than all the others because it almost cannot detect the second DoA. Due to the smoothing process, ASPG-L1, EGT-based, and SDCO-Ct lose the sparse property of group-sparsity penalty so that the shape of two major detected peaks is not sharp as C-ADMM. However, instead of using fixed smoothing parameters in the ASPG method, the EGT-based, and SDCO-Ct method use different approaches to sequentially reduce the smoothing parametersso that the resolution ability is improved. The sharpness of two peaks of SDCO-Ct is closer to C-ADMM compared with all the others.
VI-D Convergence Performance Comparisons
The convergence performance of the proposed methods is verified in this numerical simulation in terms of reconstruction error or objective function value. The reconstruction error is defined as . First, we inspect the convergence of the EGT-based method, in which the smoothing parameters for primal and dual problem are chosen with respect to iteration numbers, which is like the diminishing step size rule [40]. As shown in Figure 5, the duality gap becomes very small after the iteration number achieves 50. Second, the convergence comparison between the SDCO with and without continuation is conducted. In Figure 6, the convergence rate of the SDCO with continuation is almost the same as the one without continuation. However, it can achieves a lower objective function value that leads to better accuracy performance, since the smoothing parameter is reduced gradually by the continuation technique.
Finally, we inspect the convergence performance of C-ADMM, M+LFBF, ASPG-L1, ASPG-L2, EGT-based, and SDCO-Ct. In Figure 7, at SNR dB, M+LFBF, ASPG-L1, ASPG-L2, EGT-based, and SDCO-Ct converge after iteration number is 100, while C-ADMM converges after iteration number is 300. Only SDCO-Ct, EGT-based, and C-ADMM can have lowest reconstruction error among them, but SDCO-Ct seems unstable in this case. In Figure 8, at SNR dB, the convergence rate of C-ADMM gets improved., but is still slower than all the others. The SDCO-Ct method is the fastest one to converge to the lowest reconstruction error, and the unstableness is much less than the previous case.
VII Conclusion
In this paper, several iterative methods with the Nesterov smoothing technique were proposed for the estimation of off-grid DoAs. First, the C-ADMM method is applied. In order to improve the convergence rate of C-ADMM, two reformulation of the group-sparsity penalty is introduced and smoothed by the Nesterov smoothing technique so that its gradient can be calculated easily. Then, the accelerated proximal gradient is used to solve the unconstrained optimization problem with the smoothed objective functions plus the nonsmooth indicator function. The smoothing parameter is selected empirically. Thus, the variant of EGT-based method is employed so that the smoothing parameter can be chosen systematically. Instead of heuristically choosing a regularization parameter in the BPDN problem formulation, the variant of SDCO method is proposed, and its smoothing parameter can also be decided by using the continuation technique. The accuracy performance and convergence of the proposed methods were verified by a numerical example of DoA estimation.
Appendix A Proof of Lemma 2
Proof.
Denote the smoothed version of the objective function as
[TABLE]
with the Lipschitz continuous gradient constant . By using similar proof schemes in [44], we decompose
[TABLE]
Then, based on the theorem from [45], we have the following bound for an optimal solution :
[TABLE]
Also, by the definition of , we have
[TABLE]
This implies that
[TABLE]
Thus,
[TABLE]
Let , then
[TABLE]
If we let , then we have the upper bound in (27). ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. L. Donoho, “Compressed sensing,” Information Theory, IEEE Transactions on , vol. 52, no. 4, pp. 1289–1306, 2006.
- 2[2] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” Information Theory, IEEE Transactions on , vol. 52, no. 2, pp. 489–509, 2006.
- 3[3] M. F. Duarte and Y. C. Eldar, “Structured compressed sensing: From theory to applications,” IEEE Transactions on Signal Processing , vol. 59, no. 9, pp. 4053–4085, 2011.
- 4[4] Y. C. Eldar and G. Kutyniok, Compressed sensing: theory and applications . Cambridge University Press, 2012.
- 5[5] M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B (Statistical Methodology) , vol. 68, no. 1, pp. 49–67, 2006.
- 6[6] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM review , vol. 43, no. 1, pp. 129–159, 2001.
- 7[7] E. Candes and T. Tao, “The dantzig selector: Statistical estimation when p is much larger than n,” The Annals of Statistics , pp. 2313–2351, 2007.
- 8[8] G. Tang and A. Nehorai, “Performance analysis for sparse support recovery,” IEEE transactions on information theory , vol. 56, no. 3, pp. 1383–1399, 2010.
