Sparse Bayesian learning with uncertainty models and multiple   dictionaries

Santosh Nannuru; Kay L. Gemba; Peter Gerstoft; William S. Hodgkiss,; Christoph Mecklenbr\"auker

arXiv:1704.00436·stat.AP·May 29, 2019

Sparse Bayesian learning with uncertainty models and multiple dictionaries

Santosh Nannuru, Kay L. Gemba, Peter Gerstoft, William S. Hodgkiss,, Christoph Mecklenbr\"auker

PDF

TL;DR

This paper enhances Sparse Bayesian Learning by incorporating models for dictionary mismatch and errors, processing multiple dictionaries, and estimating noise variances, leading to improved performance in direction-of-arrival estimation.

Contribution

It introduces a novel signal model that accounts for mismatch and errors, derives fixed point updates incorporating these factors, and extends SBL to multiple dictionaries with noise variance estimation.

Findings

01

SBL with mismatch models improves DoA estimation accuracy.

02

Processing multiple dictionaries enhances robustness against aliasing.

03

Experimental data shows qualitative performance gains of the proposed method.

Abstract

Sparse Bayesian learning (SBL) has emerged as a fast and competitive method to perform sparse processing. The SBL algorithm, which is developed using a Bayesian framework, approximately solves a non-convex optimization problem using fixed point updates. It provides comparable performance and is significantly faster than convex optimization techniques used in sparse processing. We propose a signal model which accounts for dictionary mismatch and the presence of errors in the weight vector at low signal-to-noise ratios. A fixed point update equation is derived which incorporates the statistics of mismatch and weight errors. We also process observations from multiple dictionaries. Noise variances are estimated using stochastic maximum likelihood. The derived update equations are studied quantitatively using beamforming simulations applied to direction-of-arrival (DoA). Performance of SBL…

Equations107

y

y

A

A

x

x

y

y

= A^{o} x^{o} + A^{e} x^{o} + A^{o} x^{e} + A^{e} x^{e} + n,

E (a_{m}^{e} a_{n}^{eH})

E (a_{m}^{e} a_{n}^{eH})

y

y

E (η)

E (η)

Σ_{η}

+ E (A^{e} x^{e} x^{eH} A^{eH}) + E (n n^{H})

\displaystyle=\sum_{m,n}\Big{[}{\mathsf{E}}(x^{o}_{m}x^{oH}_{n}){\mathsf{E}}({\mathbf{a}}^{e}_{m}{\mathbf{a}}^{eH}_{n})+{\mathsf{E}}(x^{e}_{m}x^{eH}_{n}){\mathbf{a}}^{o}_{m}{\mathbf{a}}^{oH}_{n}

\displaystyle\qquad+{\mathsf{E}}(x^{e}_{m}x^{eH}_{n}){\mathsf{E}}({\mathbf{a}}^{e}_{m}{\mathbf{a}}^{eH}_{n})\Big{]}+\sigma^{2}{\mathbf{I}}_{N}

\displaystyle=\sum_{m}\Big{[}\gamma_{m}{\bm{\Sigma}}_{m}^{e}+\gamma^{e}_{m}{\mathbf{a}}^{o}_{m}{\mathbf{a}}^{oH}_{m}+\gamma^{e}_{m}{\bm{\Sigma}}_{m}^{e}\Big{]}+\sigma^{2}{\mathbf{I}}_{N}

p (η)

p (η)

p (y ∣ x^{o})

p (y ∣ x^{o})

Y

Y

p (X^{o})

p (X^{o})

p (Y ∣ X^{o})

p (Y ∣ X^{o})

Y_{f}

Y_{f}

p (Y_{1 : F} ∣ X_{1 : F}^{o})

p (Y_{1 : F} ∣ X_{1 : F}^{o})

p (X_{f}^{o})

p (X_{f}^{o})

p (X_{f}^{o})

p (X_{f}^{o})

p (Y) = \int p (Y ∣ X^{o}) p (X^{o}) d X^{o}

p (Y) = \int p (Y ∣ X^{o}) p (X^{o}) d X^{o}

= \int l = 1 \prod L C N (y_{l}; A^{o} x_{l}^{o}, Σ_{η}) C N (x_{l}^{o}; 0, Γ) d X^{o}

= l = 1 \prod L C N (y_{l}; 0, Σ_{η} + A^{o} Γ A^{oH}) = l = 1 \prod L C N (y_{l}; 0, Σ_{y}),

lo g p (Y)

lo g p (Y)

\propto - L lo g ∣ Σ_{y} ∣ - Tr (Y^{H} Σ_{y}^{- 1} Y),

\hat{γ}

\hat{γ}

\displaystyle=\underset{{\bm{\gamma}}}{\arg\,\min}\,\Bigg{\{}L\log|{\bm{\Sigma}}_{{\mathbf{y}}}|+\text{Tr}({\mathbf{Y}}^{H}{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}{\mathbf{Y}})\Bigg{\}}.

\frac{\partial lo g ∣ Σ _{y} ∣}{\partial γ _{m}}

\frac{\partial lo g ∣ Σ _{y} ∣}{\partial γ _{m}}

\frac{\partial Σ _{y}^{- 1}}{\partial γ _{m}} = - Σ_{y}^{- 1} \frac{\partial Σ _{y}}{\partial γ _{m}} Σ_{y}^{- 1},

\frac{\partial}{\partial γ _{m}}

\frac{\partial}{\partial γ _{m}}

\displaystyle=L\,\text{Tr}\Big{(}{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}[{\bm{\Sigma}}_{m}^{e}+{\mathbf{a}}_{m}^{o}{\mathbf{a}}_{m}^{oH}]\Big{)}-

\displaystyle\qquad\quad\text{Tr}\Big{(}{\mathbf{Y}}^{H}{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}[{\bm{\Sigma}}_{m}^{e}+{\mathbf{a}}_{m}^{o}{\mathbf{a}}_{m}^{oH}]{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}{\mathbf{Y}}\Big{)}.

\displaystyle 1=\frac{1}{L}\frac{\text{Tr}\Big{(}{\mathbf{Y}}^{H}{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}[{\bm{\Sigma}}_{m}^{e}+{\mathbf{a}}_{m}^{o}{\mathbf{a}}_{m}^{oH}]{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}{\mathbf{Y}}\Big{)}}{\text{Tr}\Big{(}{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}[{\bm{\Sigma}}_{m}^{e}+{\mathbf{a}}_{m}^{o}{\mathbf{a}}_{m}^{oH}]\Big{)}}

\displaystyle 1=\frac{1}{L}\frac{\text{Tr}\Big{(}{\mathbf{Y}}^{H}{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}[{\bm{\Sigma}}_{m}^{e}+{\mathbf{a}}_{m}^{o}{\mathbf{a}}_{m}^{oH}]{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}{\mathbf{Y}}\Big{)}}{\text{Tr}\Big{(}{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}[{\bm{\Sigma}}_{m}^{e}+{\mathbf{a}}_{m}^{o}{\mathbf{a}}_{m}^{oH}]\Big{)}}

\displaystyle\frac{\gamma_{m}}{\gamma_{m}}=\Bigg{(}\frac{1}{L}\frac{\text{Tr}\Big{(}{\mathbf{Y}}^{H}{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}[{\bm{\Sigma}}_{m}^{e}+{\mathbf{a}}_{m}^{o}{\mathbf{a}}_{m}^{oH}]{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}{\mathbf{Y}}\Big{)}}{\text{Tr}\Big{(}{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}[{\bm{\Sigma}}_{m}^{e}+{\mathbf{a}}_{m}^{o}{\mathbf{a}}_{m}^{oH}]\Big{)}}\Bigg{)}^{b}

\displaystyle\gamma_{m}^{\text{new}}=\gamma_{m}^{\text{old}}\Bigg{(}\frac{\text{Tr}\Big{(}{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}[{\bm{\Sigma}}_{m}^{e}+{\mathbf{a}}_{m}^{o}{\mathbf{a}}_{m}^{oH}]{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}{\mathbf{S}}_{{\mathbf{y}}}\Big{)}}{\text{Tr}\Big{(}{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}[{\bm{\Sigma}}_{m}^{e}+{\mathbf{a}}_{m}^{o}{\mathbf{a}}_{m}^{oH}]\Big{)}}\Bigg{)}^{b}.

\displaystyle\gamma_{m}^{\text{new}}=\gamma_{m}^{\text{old}}\Bigg{(}\frac{\text{Tr}\Big{(}{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}[{\bm{\Sigma}}_{m}^{e}+{\mathbf{a}}_{m}^{o}{\mathbf{a}}_{m}^{oH}]{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}{\mathbf{S}}_{{\mathbf{y}}}\Big{)}}{\text{Tr}\Big{(}{\bm{\Sigma}}_{{\mathbf{y}}}^{-1}[{\bm{\Sigma}}_{m}^{e}+{\mathbf{a}}_{m}^{o}{\mathbf{a}}_{m}^{oH}]\Big{)}}\Bigg{)}^{b}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Sparse Bayesian learning with uncertainty models and multiple dictionaries

Santosh Nannuru, Kay L. Gemba, Peter Gerstoft, William S. Hodgkiss, Christoph Mecklenbräuker Santosh Nannuru, Kay L. Gemba, Peter Gerstoft, and William S. Hodgkiss are with the Scripps Institution of Oceanography at University of California, San Diego; [email protected], [email protected], [email protected], and [email protected] Mecklenbräuker is with Institute of Telecommunications, Vienna University of Technology, 1040 Vienna, Austria, [email protected].

Abstract

Sparse Bayesian learning (SBL) has emerged as a fast and competitive method to perform sparse processing. The SBL algorithm, which is developed using a Bayesian framework, approximately solves a non-convex optimization problem using fixed point updates. It provides comparable performance and is significantly faster than convex optimization techniques used in sparse processing. We propose a signal model which accounts for dictionary mismatch and the presence of errors in the weight vector at low signal-to-noise ratios. A fixed point update equation is derived which incorporates the statistics of mismatch and weight errors. We also process observations from multiple dictionaries. Noise variances are estimated using stochastic maximum likelihood. The derived update equations are studied quantitatively using beamforming simulations applied to direction-of-arrival (DoA). Performance of SBL using single- and multi-frequency observations, and in the presence of aliasing, is evaluated. SwellEx-96 experimental data demonstrates qualitatively the advantages of SBL.

Index Terms:

Sparse Bayesian learning, sparse processing, compressive sensing, beamforming, direction of arrival estimation, multiple dictionaries, multi frequency, aliasing, wide band

I Introduction and motivation

Compressed sensing or sparse processing is the process of estimating sparse vectors using significantly fewer measurements. Mathematically, this corresponds to solving an underdetermined system of linear equations under the constraint that the solution is sparse. The exact solution has combinatorial complexity which is impractical to solve for high dimensional problems. The most popular, approximate and computationally feasible, sparse processing method is basis pursuit [1] implemented using the LASSO [2] algorithm. Basis pursuit relaxes the sparsity criteria and the solution is given by solving a convex optimization problem. Though feasible, solving the optimization problem for high dimensions is still computationally slow. One of the faster alternatives is the matching pursuit algorithm [3]. But matching pursuit is a greedy approach and can lead to suboptimal support detection. Another alternative which is not greedy and is significantly faster than basis pursuit is sparse Bayesian learning (SBL) [4, 5, 6, 7, 8, 9, 10].

In SBL, the sparse weight vector in the underdetermined system of linear equations is treated as a random vector with Gaussian prior. Explicit sparsity constraints are not imposed on the weight vectors. Unlike traditional prior models, the parameters of the Gaussian prior are assumed unknown and are estimated by performing evidence maximization. The objective function for performing evidence maximization is non-convex and an approximate solution is obtained by formulating a fixed point update equation. The solution at convergence gives a parameter estimate which is sparse and hence the weight vectors are also sparse.

A significant advantage of SBL over basis pursuit is that it can determine automatically the sparsity without any user input. Being a probabilistic approach, SBL computes the posterior distribution of the sparse weight vectors and hence provides estimates of their covariance along with the mean. Computationally, SBL can significantly outperform LASSO [10].

Most of the literature on sparse processing assumes that the sensing matrix or dictionary is deterministic and known. This is not feasible in many applications such as beamforming [11, 12] and matched-field processing [13, 14]. Also, at low signal-to-noise ratio (SNR), the identified solution can contain false or spurious entries not present in the true solution. These false entries often mask true entries and introduce errors in parameter estimation.

The three main contributions of this work are the following:

1) SBL for uncertainty models: We propose modifications to SBL to address sensing matrix mismatch and to reduce errors in the weight vector which occur in the presence of noise. The linear-Gaussian signal model is modified and transformed into a linear non-Gaussian model. Using approximations, the model remains linear-Gaussian and hence the regular SBL methodology can be applied. We focus on statistical modeling and integrating out of the error parameters rather than their estimation. This approach has the advantage that a large class of errors can be modeled and the resulting algorithm has a simple formulation. A portion of this work addressing uncertainty in sensing matrix was published in [15].

2) Multi-snapshot and multi-dictionary SBL: We derive an SBL algorithm for multiple snapshots using a fixed-point update [10]. This gives unbiased noise estimates and has better convergence properties especially for high SNR [10]. We then consider multi-dictionary observations with common sparsity profiles. When available, combining multi-dictionary observations using SBL provides a processing gain especially at low SNR as demonstrated with multi-frequency dictionaries [16].

3) Simulations and real data analysis: The proposed algorithms are demonstrated and verified using beamforming simulations for estimating direction-of-arrivals (DoAs) of multiple plane waves. Data from the SwellEx-96 experiment demonstrates application to real data and its ability to reduce aliasing when processing multiple frequencies.

The remainder of the paper is organized as follows. A brief literature review is provided in Sect. I-A. The signal model along with assumptions on priors and likelihoods are discussed in Sect. II. The SBL algorithm is derived in Sect. III for uncertainty models and multiple dictionaries. The derived algorithms are studied using simulations and real data in Sect. IV. Conclusions are provided in Sect. V.

I-A Related literature

SBL was introduced for regression and classification problems in the context of machine learning [4]. It has been used since for signal processing [5, 7] with various modifications and extensions [6, 8, 9].

Since SBL does not impose explicitly any sparsity constraints but determines sparsity automatically, various explanations have been discussed. SBL solution can be obtained by solving an iterated reweighted LASSO problem and hence sparsity is expected [17, 18]. Under certain conditions on the sensing matrix, SBL can identify sparse solutions without any explicit sparsity constraints [19]. Cramer-Rao bounds for SBL solution are discussed in [20]. Various sparse signal recovery solutions including LASSO and SBL are unified within the Bayesian framework in [21].

Beamforming can estimate the DoAs of multiple plane waves from sensor array observations. By formulating beamforming as an underdetermined linear problem, compressed sensing can estimate DoAs [11, 12, 22]. The problem of mismatch and robustness of traditional beamforming algorithms has been studied extensively [23, 24, 25, 26, 27].

Perturbations and mismatch also have been addressed in the compressed sensing literature for basis pursuit [28, 29, 13], matching pursuit [30], and approximate message passing [31]. For SBL, beamforming in the presence of array imperfections is addressed in [32, 33]. Robustness of SBL to outliers in the image processing application is studied in [34].

I-B Notation

Scalar quantities are denoted by lowercase letters. A bold lowercase letter denotes a vector and a bold uppercase letter denotes a matrix. A vector or matrix of all zeros is denoted by ${\mathbf{0}}$ where appropriate dimensions are assumed. An identity matrix of dimension $N\times N$ is denoted ${\mathbf{I}}_{N}$ . The notation ${\mathbf{M}}^{H}$ denotes the Hermitian (conjugate transpose). The transpose operation is denoted ${\mathbf{M}}^{T}$ . The field of complex numbers is denoted ${\mathbb{C}}$ .

II Signal model

In this section, we discuss the signal model used in SBL and the assumptions made in this paper. Let ${\mathbf{y}}\in{\mathbb{C}}^{N}$ be the complex signal which is expressed as

[TABLE]

where the noise ${\mathbf{n}}\in{\mathbb{C}}^{N}$ is zero mean circularly symmetric complex Gaussian with density ${\cal C}{\cal N}({\mathbf{n}};{\mathbf{0}},\sigma^{2}{\mathbf{I}}_{N})$ ; ${\mathbf{A}}\in{\mathbb{C}}^{N\times M}$ is the sensing matrix; ${\mathbf{x}}\in{\mathbb{C}}^{M}$ is the weight vector. In sparse problem formulations, ${\mathbf{x}}$ is assumed sparse with at most $K$ non-zero entries where $K\ll M$ . Sparsity level $K$ is not required explicitly or modeled by SBL. The vector ${\mathbf{x}}$ acts as a selection operator identifying columns of ${\mathbf{A}}$ that best explain the signal ${\mathbf{y}}$ . We assume ${\mathbf{A}}$ has the maximal column rank $N$ .

Error in sensing matrix: Often ${\mathbf{A}}$ is assumed known. This does not hold when there is uncertainty in the model or parameters used to construct ${\mathbf{A}}$ . For example, in plane wave beamforming entries of ${\mathbf{A}}$ depend on array positions and wave speed which may be uncertain or can change over time. To account for perturbations we express

[TABLE]

where ${\mathbf{A}}^{o}$ is known and ${\mathbf{A}}^{e}$ is a random perturbation matrix [35, 28, 29, 30, 13, 27]. For beamforming, sensing matrix perturbations have been studied in [24, 32]. An example where multiplicative noise gives rise to such perturbations in the sensing matrix is discussed in Appendix A. Though the component ${\mathbf{A}}^{e}$ is random and unknown, its statistics are known. The prior model for ${\mathbf{A}}^{e}$ is discussed in Sect. II-A.

Error in weights: We assume ${\mathbf{x}}$ consists of two components

[TABLE]

where the first component ${\mathbf{x}}^{o}$ is sparse and the second component ${\mathbf{x}}^{e}$ may be sparse. The vector ${\mathbf{x}}^{o}$ consists of the true complex weights whereas ${\mathbf{x}}^{e}$ is composed of errors in ${\mathbf{x}}$ due to noise or modeling mismatch. Likely ${\mathbf{x}}^{e}$ is sparse but we cannot uniquely distinguish the support of ${\mathbf{x}}^{e}$ from that of ${\mathbf{x}}^{o}$ . Also, the support of ${\mathbf{x}}^{e}$ might vary because the noise realization changes over time. To overcome this limitation we assume that the statistics of ${\mathbf{x}}^{e}$ are known without knowledge of its support. Here both ${\mathbf{x}}^{o}$ and ${\mathbf{x}}^{e}$ are random and their prior models are discussed in Sect. II-A.

Signal model with uncertainty: Including the perturbed quantities from (2) and (3), the signal model (1) is

[TABLE]

where the first and the last terms are the regular linear model in SBL. The terms ${\mathbf{A}}^{e}{\mathbf{x}}^{o}$ , ${\mathbf{A}}^{o}{\mathbf{x}}^{e}$ and ${\mathbf{A}}^{e}{\mathbf{x}}^{e}$ are additional “noise” terms. We develop our theory for the general case and assume ${\mathbf{x}}^{o}$ , ${\mathbf{x}}^{e}$ , ${\mathbf{A}}^{e}$ , and ${\mathbf{n}}$ are mutually independent. Since the simulations (Sect. IV) consider either ${\mathbf{A}}^{e}={\mathbf{0}}$ or ${\mathbf{x}}^{e}={\mathbf{0}}$ , the independence assumption of ${\mathbf{x}}^{e}$ and ${\mathbf{A}}^{e}$ is not crucial.

II-A Prior models

Prior model for ${\mathbf{x}}^{o}$ : In SBL ${\mathbf{x}}^{o}$ is modeled as a zero mean circularly symmetric complex Gaussian with prior density $p({\mathbf{x}}^{o})={\cal C}{\cal N}({\mathbf{x}}^{o};{\mathbf{0}},{\bm{\Gamma}})$ , where the unknown covariance matrix ${\bm{\Gamma}}$ is assumed diagonal, ${\bm{\Gamma}}=\text{diag}({\bm{\gamma}})$ , ${\bm{\gamma}}=[\gamma_{1}\ldots\gamma_{M}]$ . The covariance ${\bm{\Gamma}}$ is estimated by SBL.

We assume the error terms ${\mathbf{x}}^{e}$ and ${\mathbf{A}}^{e}$ are stochastic and define statistics over them. These statistics easily integrate all possible error realizations while computing evidence and allows us to study their effect on average. An alternate approach could be to estimate ${\mathbf{x}}^{e}$ and ${\mathbf{A}}^{e}$ from the data. This would increase significantly the dimensionality of the problem and is not pursued here.

Prior model for ${\mathbf{x}}^{e}$ : The term ${\mathbf{x}}^{e}$ was introduced to account for errors in ${\mathbf{x}}$ . We model ${\mathbf{x}}^{e}$ to have zero mean and known diagonal covariance ${\bm{\Gamma}}^{e}=\text{diag}({\bm{\gamma}}^{e})$ . It quantifies the prior knowledge of errors in ${\mathbf{x}}$ . We can choose ${\bm{\gamma}}^{e}$ empirically based on the specific application. The term ${\mathbf{x}}^{e}$ establishes a noise floor for ${\mathbf{x}}$ and helps in strengthening weaker weights (see Sect. IV). In this sense it is similar to the concept of stochastic resonance [36, 37] where adding noise into a non-linear system improves its detection performance.

Prior model for ${\mathbf{A}}^{e}$ : Let $p({\mathbf{A}}^{e})$ be the density function of the error matrix ${\mathbf{A}}^{e}=[{\mathbf{a}}^{e}_{1}\ldots{\mathbf{a}}^{e}_{M}]$ . For computational tractability we assume that the $m$ th column ${\mathbf{a}}_{m}^{e}$ has known covariance ${\bm{\Sigma}}_{m}^{e}$ . No assumption is made about the mean. Also, let the columns of ${\mathbf{A}}^{e}$ be statistically orthogonal. Hence

[TABLE]

In [13] the perturbation vectors ${\mathbf{a}}_{m}^{e}$ are assumed stochastic and an elastic net regression is formulated by averaging out the perturbations. The perturbations are assumed to be complex Gaussian random vectors in [27]. Parametric modeling of the perturbations ${\mathbf{a}}_{m}^{e}$ is considered in [32] for plane wave beamforming. The parameters are estimated within the iterative framework of SBL but only specific perturbations are considered and cannot be generalized to include a broader class of errors.

II-B Approximate likelihood

Combining all the “noise” terms together as ${\bm{\eta}}={\mathbf{A}}^{e}{\mathbf{x}}^{o}+{\mathbf{A}}^{o}{\mathbf{x}}^{e}+{\mathbf{A}}^{e}{\mathbf{x}}^{e}+{\mathbf{n}}$ gives

[TABLE]

The modified noise ${\bm{\eta}}$ is not Gaussian since ${\bm{\eta}}$ is composed of terms ${\mathbf{A}}^{e}$ and ${\mathbf{x}}^{e}$ whose densities are not known in general (from the prior models in Sect. II-A). To move forward within the SBL framework, we approximate ${\bm{\eta}}$ to be Gaussian. Note that a Gaussian assumption on the variables ${\mathbf{A}}^{e}$ and ${\mathbf{x}}^{e}$ still will not simplify the distribution of ${\bm{\eta}}$ as the terms ${\mathbf{A}}^{e}{\mathbf{x}}^{o}$ and ${\mathbf{A}}^{e}{\mathbf{x}}^{e}$ involve products of Gaussian random variables which do not have closed form distributions.

To simplify the likelihood model, we compute the mean and covariance of ${\bm{\eta}}$ :

[TABLE]

We have used the independence of ${\mathbf{x}}^{o}$ , ${\mathbf{x}}^{e}$ , ${\mathbf{A}}^{e}$ , and ${\mathbf{n}}$ in the above simplification. While computing the covariance of ${\bm{\eta}}$ , the error terms ${\mathbf{x}}^{e}$ and ${\mathbf{A}}^{e}$ are integrated out and the covariance matrix ${\bm{\Sigma}}_{{\bm{\eta}}}$ depends on their statistics ${\bm{\gamma}}^{e},{\bm{\Sigma}}_{m}^{e}$ along with ${\bm{\gamma}}$ and $\sigma^{2}$ . This integration circumvents the need to estimate explicitly the unknowns ${\mathbf{x}}^{e}$ and ${\mathbf{A}}^{e}$ .

For analytical simplification, we approximate the density of ${\bm{\eta}}$ to be Gaussian with mean zero and covariance ${\bm{\Sigma}}_{{\bm{\eta}}}$

[TABLE]

To justify this approximation expand the modified noise as: ${\bm{\eta}}=\displaystyle\sum_{m}{(x^{o}_{m}{\mathbf{a}}^{e}_{m}+x^{e}_{m}{\mathbf{a}}^{o}_{m}+x^{e}_{m}{\mathbf{a}}^{e}_{m})}+{\mathbf{n}}$ . Thus ${\bm{\eta}}$ is a sum of a large number of random vectors. From the central limit theorem, ${\bm{\eta}}$ converges to a Gaussian distribution as $M\rightarrow\infty$ . When ${\mathbf{x}}^{o}$ is $K$ -sparse, the error in the Gaussian approximation (12) decreases with $\frac{1}{\sqrt{K}}$ . The likelihood for the signal model (7) is approximately

[TABLE]

Once the modified noise ${\bm{\eta}}$ is approximated as Gaussian, we treat ${\bm{\eta}}$ and ${\mathbf{x}}^{o}$ as independent (which is not necessarily true from the expression for ${\bm{\eta}}$ ). This assumption is necessary to evaluate analytically the evidence in Sect. III-A.

II-C Multiple snapshots

To increase the SNR, we process multiple observations (snapshots) simultaneously. Let ${\mathbf{Y}}=[{\mathbf{y}}_{1}\ldots{\mathbf{y}}_{L}]\in{\mathbb{C}}^{N\times L}$ denote $L$ consecutive snapshots arranged column-wise in a matrix. The multi snapshot analogue of (1) is

[TABLE]

where ${\mathbf{X}}^{o}=[{\mathbf{x}}_{1}^{o}\ldots{\mathbf{x}}_{L}^{o}]$ and $\underline{{\bm{\eta}}}=[{\bm{\eta}}_{1}\ldots{\bm{\eta}}_{L}]$ . The ${\mathbf{x}}_{l}^{o}$ are assumed i.i.d. Gaussian across snapshots

[TABLE]

The error terms ${\mathbf{A}}^{e}$ , ${\mathbf{x}}^{e}$ , and the noise ${\mathbf{n}}$ are assumed independent across snapshots. The multi-snapshot likelihood is

[TABLE]

where the single snapshot likelihood $p({\mathbf{y}}_{l}|{\mathbf{x}}_{l}^{o};{\mathbf{A}}^{o})$ is in (13).

II-D Multiple dictionaries

We assume observations generated by a set of dictionaries are available simultaneously and a portion of the support is common for all the weights. We are interested in recovering this shared sparsity structure. A physical example are recorded observations at several frequencies but generated by the same sparse set of sources (see Sect. IV-C2).

Let the observation vectors recorded by $F$ dictionaries be ${\mathbf{Y}}_{1:F}\equiv\{{\mathbf{Y}}_{1}\ldots{\mathbf{Y}}_{F}\}$ with the corresponding sparse weights ${\mathbf{X}}_{1:F}^{o}\equiv\{{\mathbf{X}}_{1}^{o}\ldots{\mathbf{X}}_{F}^{o}\}$ . We have

[TABLE]

where ${\mathbf{A}}_{f}^{o}$ are the sensing matrices and $\underline{{\bm{\eta}}}_{f}$ are (modified) noise contributions. The noise $\underline{{\bm{\eta}}}_{f}$ and the weights ${\mathbf{X}}_{f}^{o}$ are assumed independent. The multi-dictionary likelihood is then

[TABLE]

where $p({\mathbf{Y}}_{f}|{\mathbf{X}}_{f}^{o})$ is given by (16). We have two possibilities for the joint multi-dictionary prior over ${\mathbf{X}}_{1:F}^{o}$ .

Multiple covariance (MC) prior: In this model, the joint prior is given by

[TABLE]

where the prior covariance ${\bm{\Gamma}}_{f}=\text{diag}({\bm{\gamma}}_{f})$ depends on the dictionary. This model has been used in the context of multi-frequency beamforming in [38].

Common covariance (CC) prior: This model assumes the prior for all dictionaries is governed by the same statistical distribution

[TABLE]

i.e. ${\bm{\Gamma}}_{1}=\cdots={\bm{\Gamma}}_{F}=\text{diag}({\bm{\gamma}})$ . This imposes identical sparsity constraints on ${\mathbf{X}}_{1}^{o}\ldots{\mathbf{X}}_{F}^{o}$ . A common covariance matrix in multi-frequency beamforming was used in [9].

III Sparse Bayesian learning

III-A Evidence

In the SBL framework [4, 6], the prior parameter ${\bm{\gamma}}$ is assumed unknown and estimated using the observed signal ${\mathbf{Y}}$ . It is estimated by maximizing the evidence (also called Type-II maximum likelihood). We first consider the single dictionary case. The evidence $p({\mathbf{Y}})$ is obtained by averaging over all realizations of ${\mathbf{X}}^{o}$

[TABLE]

where ${\bm{\Sigma}}_{{\mathbf{y}}}={\bm{\Sigma}}_{{\bm{\eta}}}+{\mathbf{A}}^{o}{\bm{\Gamma}}{\mathbf{A}}^{oH}$ and it depends on the parameters $\sigma^{2}$ and ${\bm{\gamma}}$ . Ignoring the terms independent of $\sigma^{2}$ and ${\bm{\gamma}}$

[TABLE]

where $\text{Tr}()$ denotes the trace of a matrix.

III-B Fixed point update

The estimate $\hat{{\bm{\gamma}}}$ maximizes the evidence

[TABLE]

One approach to solve this problem is to use the EM algorithm [39] but the resulting update equations have slow convergence [4, 6]. We perform differentiation of the objective function (26) to obtain a local minimum. We have the following derivative relations for ${\bm{\Sigma}}_{{\mathbf{y}}}$

[TABLE]

Differentiating (26) with respect to the $m$ th diagonal element $\gamma_{m}$

[TABLE]

Equating the derivative of the objective function to zero

[TABLE]

where we introduced $\gamma_{m}$ terms to obtain an iterative update equation. Since the fixed point update is not unique, the exponent term $b$ is introduced to include a broad range of update rules. Different update equations introduced in the literature can be obtained using different values of $b$ . The update then is

[TABLE]

where ${\mathbf{S}}_{{\mathbf{y}}}$ is the sample covariance matrix ${\mathbf{S}}_{{\mathbf{y}}}=\frac{1}{L}{\mathbf{Y}}{\mathbf{Y}}^{H}$ . The SBL update (32) incorporates statistics ( ${\bm{\Sigma}}_{m}^{e}$ and ${\bm{\gamma}}^{e}$ ) of uncertainty models.

Remark: There are multiple ways to formulate a fixed point update equation. Our formulation is inspired by some of the equations used in the literature [4, 6, 10] and convergence properties of the simulation results. It is not clear for what values of $b$ , if any, convergence of (32) is guaranteed. For ${\bm{\Sigma}}_{m}^{e}={\mathbf{0}}$ and ${\bm{\gamma}}^{e}={\mathbf{0}}$ , a value of $b=1$ gives the update equation used in [4, 6] and $b=0.5$ gives the update equation in [10].

III-C Multi-dictionary SBL

We have two multi-dictionary update rules based on the priors for ${\mathbf{X}}_{1:F}$ in either (19) or (20).

III-C1 SBL-MC

With the prior (19) that is dictionary-dependent, the likelihood (18), and the independence assumptions, the joint evidence $p({\mathbf{Y}}_{1:F})$ is

[TABLE]

where ${\bm{\Sigma}}_{{\mathbf{y}}_{f}}={\bm{\Sigma}}_{{\bm{\eta}}_{f}}+{\mathbf{A}}_{f}^{o}{\bm{\Gamma}}_{f}{\mathbf{A}}_{f}^{oH}$ . Since the different dictionary components are decoupled, maximizing the joint evidence corresponds to maximizing the evidence for each dictionary individually. Thus the update rule for $f$ th dictionary is

[TABLE]

We can combine ${\bm{\gamma}}_{f}$ to obtain a multi-dictionary estimate

[TABLE]

If the sparsity of ${\bm{\gamma}}_{f}$ is the same across dictionaries, the averaging above will enhance the sparsity of the estimate ${\bm{\gamma}}$ in presence of noise. The summation (35) is inspired by traditional multi-frequency processing in conventional beamforming where the beamformer outputs at each frequency are combined incoherently [16].

III-C2 SBL-CC

With the prior (20) that is common across dictionaries, the likelihood (18), and the independence assumptions, the joint evidence $p({\mathbf{Y}}_{1:F})$ is given by (33) where ${\bm{\Sigma}}_{{\mathbf{y}}_{f}}={\bm{\Sigma}}_{{\bm{\eta}}_{f}}+{\mathbf{A}}_{f}^{o}{\bm{\Gamma}}{\mathbf{A}}_{f}^{oH}$ . Taking the logarithm and ignoring constant terms we have

[TABLE]

To estimate $\hat{{\bm{\gamma}}}$ we maximize the joint evidence:

[TABLE]

To obtain a minimum, we apply the derivative results as before and equate the derivative of this objective function to zero giving the update rule

[TABLE]

In this multi-dictionary formulation, a unified update rule is obtained that combines all the observations together from different dictionaries. The single dictionary update rule (32) is obtained using $F=1$ .

III-D Special cases

We consider special cases of (7) with ${\mathbf{x}}^{e}={\mathbf{0}}$ and/or ${\mathbf{A}}^{e}={\mathbf{0}}$ :

•

SBL: when both ${\mathbf{x}}^{e}={\mathbf{0}}$ and ${\mathbf{A}}^{e}={\mathbf{0}}$ we get the regular SBL [4, 6], Eq (1)

•

SBL-A: when only ${\mathbf{A}}^{e}$ is non-zero ( ${\mathbf{x}}^{e}={\mathbf{0}}$ ) gives

[TABLE]

signifying errors in the sensing matrix ${\mathbf{A}}$ .

•

SBL-x: when only ${\mathbf{x}}^{e}$ is non-zero ( ${\mathbf{A}}^{e}={\mathbf{0}}$ ) gives

[TABLE]

signifying errors in the weights ${\mathbf{x}}$ .

Both SBL-A and SBL-x can be combined with the multi-dictionary SBL formulations SBL-MC and SBL-CC.

III-E Noise estimate

Similar to $\gamma_{m}$ , an update equation for $\sigma^{2}$ can be obtained using the derivative of the evidence with respect to $\sigma^{2}$ . But this update is biased towards zero [6, 9, 10]. Hence we use a stochastic maximum likelihood based method to estimate $\sigma^{2}$ . Let ${\mathbf{A}}_{{\cal M}}$ be formed by $K$ columns of ${\mathbf{A}}$ indexed by ${\cal M}$ , where the set ${\cal M}$ indicates the location of non-zero entries of ${\mathbf{x}}$ with cardinality $|{\cal M}|=K$ . We can estimate ${\cal M}$ using ${\bm{\gamma}}$ through thresholding or picking its highest entries. The noise variance estimate for $f$ th dictionary is then [40, 9, 10]

[TABLE]

where ${\mathbf{A}}_{{\cal M}}^{+}$ denotes the Moore-Penrose pseudo-inverse. In [9] a common noise estimate is used for all dictionaries (i.e. frequencies).

III-F Posterior

Applying Bayes rule, the posterior for ${\mathbf{X}}$ is expressed as

[TABLE]

Since the prior is a Gaussian, the likelihood is approximated to be Gaussian, and the snapshots are independent, the posterior approximately is Gaussian with density given by

[TABLE]

The posterior mean ${\bm{\mu}}_{l}$ provides an estimate of the amplitude and phase of the weight vector at the $l$ th snapshot and also is sparse. The posterior covariance matrix ${\bm{\Sigma}}_{{\mathbf{x}}}$ provides an estimate of uncertainty in the weights.

IV Simulations and experimental data

IV-A SBL implementation

This section discusses the algorithmic implementation of the SBL update rules developed in Sect. III. A pseudocode of the SBL-CC algorithm is given in Algorithm 1. A similar algorithm can be obtained for SBL-MC by replacing (40) with (34)-(35). In either case, the single dictionary algorithm is obtained by setting $F=1$ .

Parameters $\epsilon$ and $N_{t}$ determine the error convergence criteria and the maximum number of iterations, respectively. We choose the power exponent in the update rule (40) to be $b=1$ as used in [4, 6].

The inputs to the algorithm are the sample covariance matrices ${\mathbf{S}}_{{\mathbf{y}}_{f}}$ , the sensing matrices ${\mathbf{A}}_{f}^{o}$ , and tuning parameters $\gamma_{m}^{e}$ and ${\bm{\Sigma}}_{m}^{e}$ . The parameters to estimate, $\gamma_{m}$ and $\sigma_{f}^{2}$ , are initialized to constant non-zero values. The parameter $\gamma_{m}$ can be dictionary-dependent, see Sect. III-C SBL sum-MF, in which case there is an additional loop over all the dictionaries (not shown here). The $\gamma_{m}$ are updated using (40). K peak locations are identified from ${\bm{\gamma}}^{\text{new}}$ to construct ${\mathbf{A}}_{{\cal M}}$ and the dictionary-dependent noise estimate (43). Though we assume $K$ to be known for estimating $\hat{\sigma}^{2}$ , this can be avoided by using model order identification methods [9].

We use beamforming to demonstrate the benefits of the proposed SBL algorithms. Sparsity of SBL is measured by ${\bm{\gamma}}$ . Since the beamforming dictionary has high coherence among neighboring columns, we only consider local peaks. A local peak is defined as an element which is larger than its adjacent elements. Since ${\bm{\gamma}}$ corresponds to the source power, it is treated as the angular power spectrum.

We consider the special cases in Section III-D, SBL-A (41) and SBL-x (42). Additionally we assume

[TABLE]

This simplifies the number of free parameters and allows for a systematic study. The use of constants $\phi^{e}$ and $\gamma^{e}$ is justified when all the errors have similar statistics. Substituting (48) in (11) with $\gamma^{e}=0$ , both the noise covariance $\sigma^{2}{\mathbf{I}}_{N}$ and the error covariance ${\bm{\Sigma}}_{m}^{e}$ are diagonal. Hence it is difficult estimating both $\phi^{e}$ and $\sigma^{2}$ . Whereas substituting (49) in (11) with $\phi^{e}=0$ results in structurally different covariances and hence an estimate of $\gamma^{e}$ might be possible from data. In this paper we explore a range of tuning parameter values.

Ideally the actual values of $\phi^{e}$ and $\gamma^{e}$ would depend on the application of interest. Since $\phi^{e}$ corresponds to the variance of the additive errors in the dictionary, a good choice of $\phi^{e}$ could be obtained by studying the variability of the underlying physical processes generating the dictionary. Since $\gamma^{e}$ is the variance of the errors in ${\mathbf{x}}$ (which is significant at low SNR), its value can be tuned based on the SNR. Precaution should be taken to not choose relatively high values for $\phi^{e}$ and $\gamma^{e}$ as they tend to smooth out ${\bm{\gamma}}$ and could suppress weaker sources.

IV-B Beamforming

In beamforming, the observed signal model is a linear combination of plane waves. Since the number of sources (arrival angles) is small, finely dividing the angle space results in a sparse ${\mathbf{x}}$ of complex amplitudes. SBL is used to recover these arrival angles.

For a narrow-band signal of wavelength $\lambda$ and uniform sensor array separation $d$ , the sensing matrix columns are

[TABLE]

for $m=1\ldots M$ , where $\theta_{m}$ is the $m$ th discretized angle. The angle space $[-90,90]^{\circ}$ is discretized with $1^{\circ}$ separation giving $M=181$ . We model a $N=20$ sensor array. The array SNR per snapshot is defined as

[TABLE]

where the subscript $ws$ denotes weak source. In this section we use a single frequency (a single dictionary) with sensor separation $d=\frac{\lambda}{2}$ . $L=30(>N)$ snapshots are processed.

IV-B1 Two source example

Consider two sources present at angles $[0,75]^{\circ}$ with powers $[22,20]$ dB. The magnitudes are assumed constant and their phases are random and distributed uniformly per snapshot.

Fig. 1 shows ${\bm{\gamma}}$ for one run of the simulation where SBL fails to correctly localize the peak at $75^{\circ}$ and changing the convergence parameters $\epsilon$ and $N_{t}$ in Algorithm 1 does not change this. Due to high column coherence there is broadening of the peak at $75^{\circ}$ and hence redistribution of the peak energy. Using SBL-x ( $\gamma^{e}=0.75$ ), the false peak is suppressed and the peak at $75^{\circ}$ is identified.

These improvements in peak localization by SBL-x are illustrated using percentiles of the second strongest peak location obtained from 2000 Monte Carlo runs in Fig. 2a. When $\gamma^{e}=0.75$ , the shaded area between the 1-99 percentiles shrinks, indicating better localization ability of SBL-x at low SNR. This reduction in the shaded area between the percentiles is due to fewer outlier points (one of these simulation runs was shown in Fig. 1 where SBL-x is able to correctly localize the source at $75^{\circ}$ and avoid the outlier estimate at $25^{\circ}$ ). The localization improves with SNR as expected. Histograms of the second strongest peak location for SNR $3$ dB and $7$ dB are shown in Fig. 2b, 2c, 2d and 2e. Fewer outliers are observed for $\gamma^{e}=0.75$ and the spread of the histogram is reduced around $75^{\circ}$ which is the true location of the weaker source.

IV-B2 Three source example

We consider three sources ( $K=3$ ) located at angles $[-20,-15,75]^{\circ}$ with powers $[10,22,20]$ dB. Following the model in Sect. II-A, the source amplitudes now are randomly sampled from a complex Gaussian with mean zero and variance equal to the source power. The weaker source ( $10$ dB) close to the strongest source ( $22$ dB) makes this challenging. In low SNR scenarios, this source can get masked by false peaks as seen in Fig. 1.

SBL is compared with traditional DoA estimation methods such as minimum variance distortionless response (MVDR) and MUSIC in Fig. 3a. SBL outperforms MVDR while its performance is comparable to that of MUSIC. The root mean square error (RMSE) is

[TABLE]

where $\theta_{ws}$ is the true and $\hat{\theta}_{ws}$ the estimated source angle of the weakest source. The expectation is computed from 2000 Monte Carlo runs. Since the weakest source likely fails first, it is appropriate restricting the RMSE metric to only this source. For traditional DoA methods, the estimated source angles ( $\hat{\theta}_{k}$ ) are the top $3$ peaks in the angular power spectrum while, for SBL, they correspond to top $3$ peaks of ${\bm{\gamma}}$ . The weakest of the top $3$ peaks is assigned to $\hat{\theta}_{ws}$ .

Fig. 3a compares the SBL-A and SBL-x algorithms with $\phi^{e}=0.03$ and $\gamma^{e}=0.75$ . SBL-A shows reduced RMSE than SBL at low SNR indicating improved DOA estimation ability even though there is no perturbation in ${\mathbf{A}}$ (i.e. ${\mathbf{A}}^{e}={\mathbf{0}}$ ). Also shown is the exhaustive search which finds the best DoA estimate ${\cal M}_{0}$ by exhaustively solving the minimization problem

[TABLE]

where $||\cdot||_{\cal F}$ is the Frobenius norm and $\tilde{{\mathbf{X}}}_{{\cal M}}={\mathbf{A}}_{{\cal M}}^{+}{\mathbf{Y}}$ . The objective function (53) is different from the SBL objective function (26) and hence we expect different solutions. The SBL-A and SBL-x algorithms are able to outperform the exhaustive search method. An explanation for this is that at low SNR, the uncertainty models used by SBL-A and SBL-x better explain the noise in the solution allowing superior localization of the peaks.

The performance of the SBL-A and SBL-x algorithms for a range of $\phi^{e}$ and $\gamma^{e}$ are illustrated in Fig. 3b and 3c. SBL-A and SBL-x show less sensitivity to the choice of parameters $\phi^{e}\in[0,0.03]$ and $\gamma^{e}\in[0,0.75]$ respectively. Further increasing $\phi^{e}$ and $\gamma^{e}$ the performance degrades as the model deviates significantly from the model generating the data.

Fig. 4 demonstrates the angular power spectrum for one run of the simulation. For the conventional beamformer (CBF), the power spectrum is ${\mathbf{a}}_{m}^{oH}{\mathbf{S}}_{{\mathbf{y}}}{\mathbf{a}}_{m}^{o}$ , and for SBL it is ${\bm{\gamma}}$ . The CBF has broad peaks and the weaker peak at $-20^{\circ}$ is poorly identified. In regular SBL, many false peaks are present since the SNR is low. The strongest peak is split into two peaks in Fig. 4b. These false peaks compete with the weaker peak and represent errors in ${\mathbf{x}}$ . SBL-A (Fig. 4c) and SBL-x (Fig. 4d) give improved performance and the false peaks are reduced.

IV-B3 Mismatch analysis

SBL-A performance when the data is generated with mismatched dictionaries is studied by corrupting the dictionary with multiplicative noise, see Appendix A. The data is generated using multiplicative noise and processed using SBL-A which assumes additive Gaussian noise. Dictionaries are generated using the model

[TABLE]

The multiplicative noise parameter $\delta_{m}$ is the same for each column. Each run of the simulation has a different ${\mathbf{A}}^{e}$ . The RMSE performance of SBL-A versus the parameter $\delta_{0}$ is in Fig. 5. Though the simulation scenario deviates from the modeling assumptions, SBL-A provides improvements.

IV-C Aliasing suppression using multi-dictionary SBL

SBL can be used to process multi-frequency spatial data in presence of aliasing. Each frequency has a different dictionary and the multi-dictionary analysis in Sect. III-C is used to process multi-frequency observations. Ref. [41] discusses aliasing suppression for wideband signals using basis pursuit and orthogonal matching pursuit. We demonstrate aliasing suppression ability of SBL using both simulated and experimental data.

IV-C1 Simulation analysis

A large array aperture and hence a large sensor array spacing is desirable to obtain high resolution beamforming. A drawback of large array spacing is that it limits the highest frequency that can be processed without encountering aliasing. This drawback partially can be overcome by multi-dictionary SBL.

The Gram matrix $({\mathbf{A}}^{H}{\mathbf{A}})$ for two array spacings are shown in Fig. 6, $N=20$ . For a uniform linear array (ULA) spacing of $d=\frac{\lambda}{2}$ there is one main lobe for each angle. When the spacing is doubled, i.e. $d=\lambda$ , grating (side) lobes appear which are a manifestation of aliasing.

Consider the three source example in Sect. IV-B2. Let $f_{1}$ and $f_{2}=2f_{1}$ be two frequencies with wavelengths $\lambda_{1}$ and $\lambda_{2}=\frac{\lambda_{1}}{2}$ . The signal power is the same at each frequency for a given source. The histograms of the top three peaks obtained from ${\bm{\gamma}}$ are shown in Fig. 7 when observations from each frequency is processed independently using SBL. Aliasing is absent in Fig. 7a since $d=\frac{\lambda_{1}}{2}$ . Doubling the signal frequency with the same sensor spacing, Fig. 7b, gives aliased peaks. Higher frequency gives higher resolution but with additional aliased peaks. Thus SBL (and its variants SBL-A and SBL-x) cannot avoid aliasing when only a single frequency is used.

We now combine the observations from the two frequencies using multi-dictionary SBL when the sensor spacing is fixed at $d=\frac{\lambda_{1}}{2}=\lambda_{2}$ . The two multi-dictionary SBL formulations are discussed in Sect. III-C. In SBL-MC, observations from each frequency are processed independently and the multi-frequency ${\bm{\gamma}}$ is obtained by summation (35). Fig. 7c shows the histogram when SBL-MC is used. The bin count is significant at aliased locations and hence SBL-MC cannot suppress aliasing. The second multi-dictionary approach, SBL-CC, enforces a common sparsity profile by requiring ${\bm{\gamma}}$ to be the same across frequencies. The histogram obtained using SBL-CC is shown in Fig. 7d. Since aliased peak locations are not shared across frequencies, they are suppressed by jointly processing multi-frequency observations using (40).

IV-C2 Experimental data analysis

The high-resolution performance of SBL compared to CBF is validated with experimental data in a complex multi-path, shallow-water environment. The aliasing suppression ability of multi-dictionary SBL is demonstrated by processing a subset array.

The data is from the Shallow Water evaluation cell Experiment $1996$ (SWellEx-96) Event S5 [14] collected on a $64$ -element vertical line array. Element $43$ is excluded from processing. The array spans the lower part of the $212$ m watercolumn from $94$ to $212$ m with inter-sensor spacing $d=1.875$ m. During the $77$ min Event S5, a deep source submerged at $60$ m was towed from $9$ km southwest to $3$ km northeast of the array at $5$ kn ( $2.5$ m/s).

The source was transmitting a set of ten frequencies with constant source levels of which the three frequencies $\{166,283,388\}$ Hz are used. The data are split into $2257$ overlapping segments, whereas a single segment is of 2.7 s duration. Snapshots are computed continuously from the data before being assigned to a segment. A FFT length of $2048$ samples (1.35 s) with $50$ % overlap results in $L=3$ snapshots for each segment with a FFT bin width of $0.75$ Hz. To accommodate Doppler shift, we search two adjacent FFT bins and extract the bin with maximum power.

Both the full array (64 elements, Array-1) and a subset (21 elements, Array-2) are used for processing. Array-2 is obtained by including every third element from Array-1 (Array-1 spacing $d$ and Array-2 spacing $3d$ ). By design, Array-1 suffers no aliasing whereas Array-2 suffers aliasing for frequencies above 133 Hz.

Single frequency (388 Hz) data is processed using both Array-1 and Array-2. Fig. 8(a) shows CBF output power (top row) and ${\bm{\gamma}}$ for SBL (bottom row) as the source moves over time. Array-1 processing does not suffer from aliasing (Fig. 8(a), left) and multi-path arrivals can be seen. SBL provides finer angular resolution than CBF. Significant aliasing (Fig. 8(a), right) is present in both the SBL and CBF outputs when Array-2 is used. This aliasing is due to insufficient spacial sampling. Significant energy is redistributed into aliased locations causing ambiguities in DoA estimation.

Combining three frequencies $\{166,283,388\}$ Hz and processing them from Array-1 and Array-2 is shown in Fig. 8(b). Along with CBF output power (top row), the ${\bm{\gamma}}$ surfaces are shown for SBL-MC (middle row) and SBL-CC (bottom row). Neither SBL nor CBF show any aliasing when Array-1 (Fig. 8(b), left) data is processed. For Array-2 (Fig. 8(b), right), CBF and SBL-MC both exhibit aliasing since the single frequency surfaces are averaged across frequencies. The relatively steep true arrivals around $\pm 20^{\circ}$ easily can get masked by the aliased arrivals causing DoA estimation errors. In comparison, SBL-CC shows no aliasing with Array-2 and the multi-path structure is preserved. We note that in general there are slightly fewer peaks identified, when compared to the corresponding Array-1 results, because of the reduced array gain of Array-2.

V Conclusions

The underdetermined system of linear equations in sparse processing is extended to account for errors in the sensing matrix and weights. The resulting non-Gaussian model was approximated as Gaussian to solve for the prior weight covariance using SBL. An SBL update rule was developed which takes into account the statistics of uncertainty models. To estimate the noise variance a stochastic maximum likelihood based method was used.

We also developed SBL to process observations from multiple dictionaries when a portion of the support is common for all the weights. The first multi-dictionary SBL has dictionary-dependent priors which are summed to obtain a combined prior. The second multi-dictionary SBL requires the prior to be shared across dictionaries giving a unified update rule.

Beamforming simulations for DoA estimation are used to demonstrate that false solutions can be removed at low SNR by explicitly accounting for errors in the sensing matrix and weights. Multi-frequency simulated and experimental data are processed using multi-dictionary SBL to recover DoAs in the presence of spatial aliasing. The multi-dictionary formulation with shared prior is able to avoid aliasing.

VI Acknowledgement

This work was supported by the Office of Naval Research Grant Nos. N00016-1-2341 and N00014-13-1-0632.

Appendix A - Multiplicative noise

Perturbations in the sensing matrix can arise from multiplicative noise [42, 43, 44]

[TABLE]

where ${\mathbf{A}}^{o}$ is a deterministic matrix, and ${\mathbf{A}}^{e}$ represents the multiplicative error in ${\mathbf{A}}$ . The notation $\circ$ denotes the Schur-Hadamard product of two matrices of same dimensions, i.e. the element-wise product of matrices given by

[TABLE]

A first order expansion of above multiplicative model (56) is

[TABLE]

where ${\mathbf{1}}$ denotes a matrix of all ones and ${\mathbf{A}}^{e_{2}}={\mathbf{A}}^{o}\circ{\mathbf{A}}^{e_{1}}$ . The model in (58) has been studied in [42, 43, 44] and the model in (59) has been studied in [45, 46, 28, 29, 30, 27].

Bibliography46

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM review , vol. 43, no. 1, pp. 129–159, 2001.
2[2] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Royal Stat. Soc. Series B (Methodological) , vol. 58, no. 1, pp. 267–288, 1996.
3[3] S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Sig. Proc. , vol. 41, no. 12, pp. 3397–3415, 1993.
4[4] M. E. Tipping, “Sparse Bayesian learning and the relevance vector machine,” J. Machine Learning Research , vol. 1, pp. 211–244, Jun. 2001.
5[5] D. P. Wipf and B. D. Rao, “Sparse Bayesian learning for basis selection,” IEEE Trans. Sig. Proc. , vol. 52, no. 8, pp. 2153–2164, 2004.
6[6] ——, “An empirical Bayesian strategy for solving the simultaneous sparse approximation problem,” IEEE Trans. Sig. Proc. , vol. 55, no. 7, pp. 3704–3716, Jul. 2007.
7[7] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans. Sig. Proc. , vol. 56, no. 6, pp. 2346–2356, Jun. 2008.
8[8] Z. Zhang and B. D. Rao, “Sparse signal recovery with temporally correlated source vectors using sparse Bayesian learning,” IEEE J. Sel. Topics Sig. Proc. , vol. 5, no. 5, pp. 912–926, 2011.