Joint nonstationary blind source separation and spectral analysis

Adrien Meynard

arXiv:1812.01399·eess.SP·December 5, 2018

Joint nonstationary blind source separation and spectral analysis

Adrien Meynard

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel algorithm for jointly separating nonstationary sources from mixed signals and estimating their spectral properties and nonstationarity, demonstrated on synthetic data.

Contribution

It introduces a new method for simultaneous blind source separation and spectral analysis in nonstationary environments.

Findings

01

Successful separation of nonstationary sources in synthetic tests

02

Effective estimation of spectral properties and deformations

03

Potential applicability to real-world nonstationary signal processing

Abstract

We address a nonstationary blind source separation (BSS) problem. The model includes both nonstationary sources and mixing. Therefore, we introduce an algorithm for joint BSS and estimation of stationarity-breaking deformations and spectra. Finally, its performances are evaluated on a synthetic example.

Tables1

Table 1. Table 1: Comparison of the performances between BSS algorithms: standard SOBI, piecewise SOBI and the proposed algorithm.

Criterion	SOBI	p-SOBI	JEFAS-BSS
SIR (dB)	$28.55$	$15.04$	$46.55$
SDR (dB)	$16.60$	$- 4.53$	$37.69$
Amari index	$4.63 \times 10^{- 2}$	$1.74 \times 10^{- 2}$	$1.40 \times 10^{- 4}$

Equations18

y (t) = D_{γ} x (t) = γ^{'} (t) x (γ (t)),

y (t) = D_{γ} x (t) = γ^{'} (t) x (γ (t)),

W_{x} (s, τ) = \int_{R} x (t) q^{- s /2} \overline{ψ} (\frac{t - τ}{q ^{s}}) d t with q > 1 .

W_{x} (s, τ) = \int_{R} x (t) q^{- s /2} \overline{ψ} (\frac{t - τ}{q ^{s}}) d t with q > 1 .

W_{y} (s, τ) \approx W_{x} (s + lo g_{q} (γ^{'} (τ)), γ (τ)) .

W_{y} (s, τ) \approx W_{x} (s + lo g_{q} (γ^{'} (τ)), γ (τ)) .

z (t) = A (t) y (t),

z (t) = A (t) y (t),

w_{z, τ} \approx A (τ) w_{y, τ} .

w_{z, τ} \approx A (τ) w_{y, τ} .

[Σ_{i} (θ_{i, τ})]_{k k^{'}} = q^{\frac{s _{k} + s _{k^{'}}}{2}} \int_{R} S_{X_{i}} (q^{- θ_{i, τ}} ξ) \overline{\hat{ψ}} (q^{s_{k}} ξ) \hat{ψ} (q^{s_{k^{'}}} ξ) d ξ .

[Σ_{i} (θ_{i, τ})]_{k k^{'}} = q^{\frac{s _{k} + s _{k^{'}}}{2}} \int_{R} S_{X_{i}} (q^{- θ_{i, τ}} ξ) \overline{\hat{ψ}} (q^{s_{k}} ξ) \hat{ψ} (q^{s_{k^{'}}} ξ) d ξ .

ℓ_{τ} (B_{τ}, θ_{τ}) = Δ

ℓ_{τ} (B_{τ}, θ_{τ}) = Δ

=

+ \frac{1}{2} i = 1 \sum N [B_{τ} w_{z, τ}]_{i \cdot} Σ_{i} (θ_{i, τ})^{- 1} [B_{τ} w_{z, τ}]_{i \cdot}^{H},

\tilde{B}_{τ} = ar g B_{τ} min ℓ_{τ} (B_{τ}, θ_{τ}) s.t. ∥ B_{τ} - B_{τ - Δ_{τ}} ∥_{\infty} \leq ϵ_{B} Δ_{τ} .

\tilde{B}_{τ} = ar g B_{τ} min ℓ_{τ} (B_{τ}, θ_{τ}) s.t. ∥ B_{τ} - B_{τ - Δ_{τ}} ∥_{\infty} \leq ϵ_{B} Δ_{τ} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AdMeynard/JEFAS
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlind Source Separation Techniques · Spectroscopy and Chemometric Analyses · Speech and Audio Processing

Full text

Joint nonstationary blind source separation and spectral analysis

Adrien Meynard1

1Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France.

Abstract

We address a nonstationary blind source separation (BSS) problem. The model includes both nonstationary sources and mixing. Therefore, we introduce an algorithm for joint BSS and estimation of stationarity-breaking deformations and spectra. Finally, its performances are evaluated on a synthetic example.

1 Introduction: model and background

The BSS problem, originally introduced in a stationary context, has also been discussed in nonstationary situations. Extensions to nonstationary signals have been proposed, based on time-frequency analysis (see [1], chap. 9 in [2] and references therein), or based on mutual information [3]. The BSS of a nonstationary mixtures of stationary signals have also been studied. For instance, in [4], the authors explore the convolutive BSS problem. In the following, we tackle a doubly nonstationary BSS problem, and propose a demultiplexing algorithm adapted to a specific class of nonstationary signals mixed by a instantaneous nonstationary mixing matrix.

1.1 Nonstationarity

The nonstationary signals of interest here are deformed versions of stationary signals.

Let $x$ denote a stationary signal, modeled as a realization of a stationary random process with power spectrum denoted by $\mathscr{S}_{X}$ . Acting on $x$ with a stationarity-breaking operator yields a nonstationary signal denoted by $y$ . Various classes of stationarity-breaking operators are relevant to model physical phenomena (e.g. frequency modulation [5], amplitude modulation [6]). We focus here on the time warping operator denoted by $\mathcal{D}_{\gamma}$ and defined by:

[TABLE]

where $\gamma\in C^{2}$ is a strictly increasing smooth function. Such deformations can model nonstationary physical phenomena as diverse as Doppler effect, speed variations of an engine, animal vocalization or speech [7, 6].

The wavelet transform is a natural tool to analyze such signals. Hence, the wavelet transform $\mathcal{W}_{x}$ of the signal $x$ is defined by:

[TABLE]

In that framework, it can be shown that the respective wavelet transforms $\mathcal{W}_{y}$ and $\mathcal{W}_{x}$ of $y$ and $x$ are approximately related by

[TABLE]

In the following, we make the assumption that $x$ is a realization of a stationary random process $X$ . In such a setting, the approximation error can be controlled thanks to the decay properties of the wavelet $\psi$ , and the variations of $\gamma^{\prime}$ . In [5, 6], corresponding quantitative error bounds are given.

1.2 Blind source separation

The problem we consider is the BSS of nonstationary signals modeled by equation (1).

We investigate the case where the number of sources and the number of observations are equal and denoted by $N$ . The sources are additionally assumed to be independent. Let ${\mathbf{y}}(t),{\mathbf{z}}(t)\in\mathbb{R}^{N}$ denote the column vectors containing respectively all the sources and observations at time $t$ . Then, the mixture is written as

[TABLE]

where ${\mathbf{A}}(t)\in\mathbb{R}^{N\times N}$ denotes the time varying mixing matrix, assumed to be invertible. This model generalizes the amplitude modulation model in the case $N=1$ detailed in [6]. For example, this model can be appropriate in bioacoustics to describe the BSS of a howling wolf pack [8, 9].

Our goal is to determine jointly the mixing matrix ${\mathbf{A}}(t)$ , the time warping functions $\gamma_{i}(t)$ , and the spectra of the stationary sources $\mathscr{S}_{X_{i}}$ for $i=1,\ldots,N$ from the observations ${\mathbf{z}}(t)$ .

Let us consider a fixed time $\tau$ , then for each observation $z_{i}$ , we denote by ${\mathbf{w}}_{z_{i},\tau}=\mathcal{W}_{z_{i}}({\mathbf{s}},\tau)$ the row vector containing the values of the wavelet transform for a vector of scales ${\mathbf{s}}$ (of size denoted by $M_{s}$ ). Then, all these vectors are gathered into a $N\times M_{s}$ matrix ${\mathbf{w}}_{{\mathbf{z}},\tau}$ such that ${\mathbf{w}}_{{\mathbf{z}},\tau}=\left({\mathbf{w}}_{z_{1},\tau}^{T}\cdots{\mathbf{w}}_{z_{N},\tau}^{T}\right)^{T}$ . The same operation is applied to the wavelet transform of the sources. The matrix ${\mathbf{A}}(t)$ is assumed to vary slowly with respect to the oscillations of the signals. It can be shown that the linear relation (4) becomes in this new setting a relationship between the wavelet transforms of ${\mathbf{y}}$ and ${\mathbf{z}}$ of the form

[TABLE]

Aside from the terms controlling the error bound in (3), the error bound in (5) is also controlled by the variations of the mixing matrix coefficients.

2 Estimation procedure

Approximation equations (3) and (5) allow us to write an approximate likelihood in the Gaussian case (see [10] for more details on this approach).

The estimation procedure is based upon discrete wavelet transforms, time-varying parameters are therefore estimated on a discrete time grid $D$ . In the following, the estimation procedure is described for a given $\tau\in D$ . For the sake of simplicity, we introduce the following notations: ${\mathbf{B}}_{\tau}={\mathbf{A}}(\tau)^{-1}$ , $\theta_{i,\tau}=\log_{q}\left(\gamma^{\prime}_{i}(\tau)\right)$ and $\boldsymbol{\theta}_{\tau}=(\theta_{1,\tau}\cdots\theta_{N,\tau})^{T}$ .

2.1 Probabilistic setting

It follows from the Gaussianity assumption on $X$ that ${\mathbf{w}}_{y_{i},\tau}\sim\mathcal{N}(\mathbf{0},\boldsymbol{\Sigma}_{i}(\theta_{i,\tau}))$ , where

[TABLE]

Let $p_{V}$ denote generically the probability density function of a random vector $V$ . Then, the source independence hypothesis gives the following opposite of the log-likelihood:

[TABLE]

where $[{\mathbf{M}}]_{i\cdot}$ denotes the $i$ -th line of the matrix ${\mathbf{M}}$ , and ${\mathbf{M}}^{H}$ is its conjugate transpose. Maximum likelihood (ML) estimates, i.e. minimizers of $\ell_{\tau}({\mathbf{B}}_{\tau},\boldsymbol{\theta}_{\tau})$ , can be evaluated numerically.

However, in order to take into account the smoothness assumption on the mixing matrix with respect to time, we switch to the Bayesian framework and introduce a prior $p_{{\mathbf{B}}_{\tau}}$ on the unmixing matrix ${\mathbf{B}}_{\tau}$ (assuming i.i.d. matrix coefficients). We choose for $p_{{\mathbf{B}}_{\tau}}$ a uniform distribution centered on ${\mathbf{B}}_{\tau-\Delta_{\tau}}$ , and with support $2\epsilon_{B}\Delta_{\tau}$ . Then, the maximum a posteriori (MAP) estimate $\tilde{\mathbf{B}}_{\tau}$ can be written as the solution of the problem

[TABLE]

This problem is consistent with the smoothness hypothesis on ${\mathbf{B}}_{\tau}$ . Indeed, assuming $\Delta_{\tau}$ is small, the constraint in equation (6) is almost equivalent to $\|{\mathbf{B}}_{\tau}^{\prime}\|_{\infty}\leq\epsilon_{B}$ .

Concerning the time warping estimation, we choose not to give a prior on $\boldsymbol{\theta}_{\tau}$ . Thus, $\tilde{\boldsymbol{\theta}}_{\tau}$ is the ML estimation of $\boldsymbol{\theta}_{\tau}$ .

2.2 Estimation algorithm

The estimation strategy is to alternate the estimations of ${\mathbf{B}}_{\tau}$ , $\boldsymbol{\theta}_{\tau}$ and the spectra. The algorithm 1 (named JEFAS-BSS) synthesizes all the estimation steps which are described below.

•

Mixing matrix estimation. In practice, we numerically solve the problem (6). Besides, because of the assumption of slow variations of the matrix coefficients, we make the approximation that ${\mathbf{B}}_{\tau}$ is constant on the interval $I_{\tau}=[\tau-\Delta_{\tau}/2,\ \tau+\Delta_{\tau}/2[$ . Finally, the estimated sources $\tilde{\mathbf{y}}_{\tau}$ are obtained via $\tilde{\mathbf{y}}_{\tau}(t)=\tilde{\mathbf{B}}_{\tau}{\mathbf{z}}(t)$ where $t\in I_{\tau}$ . Notice that for each interval $I_{\tau}$ , a new matrix $\tilde{\mathbf{B}}_{\tau}$ is applied to the observations. Due to the source ordering indeterminacy, a reordering method has to be introduced to connect consecutive segments of each source signal. We use for that the Gale-Shapley stable marriage algorithm [11] which constructs stable matchings between consecutive time slices source estimations. The ranking criterion is based on the comparison of the dot products between normalized Fourier spectra of these slices.

•

Deformations and spectra estimations. For each source, the joint estimation of $\left\{\theta_{i,\tau}\right\}_{\tau\in D}$ and $\mathscr{S}_{X_{i}}$ is obtained via the JEFAS algorithm (which is detailed in [6]). For this purpose, the input wavelet transform ${\mathbf{w}}_{{\mathbf{y}}}$ of the source $y_{i}$ is replaced with its estimate $\left\{{\mathbf{B}}_{\tau}{\mathbf{w}}_{{\mathbf{z}},\tau}\right\}_{\tau\in D}$ .

Regarding initialization, a basic method is to use a stationary BSS method on observations to obtain a first unmixing matrix estimate. For instance, SOBI [12] is a stationary BSS algorithm which can give an initial unmixing matrix. A better initial matrix can be obtained by piecewise SOBI estimates on non overlapping segments (called p-SOBI), where the stationarity assumption makes more sense.

The convergence is monitored using the Source to Interference Ratio (SIR) introduced in [13]. For a given estimated source, SIR quantifies the presence of interferences from the other true sources. As we do not have access to the ground truth sources, we use as stopping criterion the SIR between $\tilde{\mathbf{y}}^{(k-1)}$ and $\tilde{\mathbf{y}}^{(k)}$ (instead of ${\mathbf{y}}$ ) which gives an evaluation of the BSS update, and is therefore a relevant convergence assessment.

3 Results

We construct a synthetic example to evaluate the performances of the algorithm. The two sources are band-pass filtered white noise, with time-varying bandwidth. The mixing matrix coefficients are sinusoidally varying over time. On the left of figure 1, the wavelet transforms of both observations are displayed.

The evolution of the convergence criterion through iterations of JEFAS-BSS is displayed in figure 1 (top-right). We can empirically note that our algorithm converges in a small number of iterations. Indeed, after 15 iterations the convergence criterion is around 100 dB meaning the BSS update is negligible.

Finally, we evaluate the performances of the BSS algorithms (we refer to [6] for the evaluation of the performances of the deformations and spectra estimations). The Amari index [14] is a measure of divergence between the matrix $\tilde{\mathbf{B}}_{\tau}{\mathbf{A}}_{\tau}$ and the identity matrix. The closer to zero the Amari index the better. On the bottom-right of figure 1, we display the evolution of the Amari index through time for each BSS algorithm. In table 1, we also compare the SIR, the SDR (Source to Distortion Ratio [13]), and the time-averaged Amari index of the BSS algorithms. Those different criteria show that BSS-JEFAS performances are higher than those of SOBI and p-SOBI. Besides, in average, p-SOBI gives a better Amari index than SOBI, which is understandable because it takes into account the nonstationarity of the mixing matrix. Nonetheless, the SIR and SDR of p-SOBI are worse than those of SOBI. Indeed, because this method does not take into account the regularity of ${\mathbf{B}}_{\tau}$ , the connections between slices are sensitive to discontinuities and create distortion in the estimated sources.

Bibliography14

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Belouchrani, M. G. Amin, N. Thirion-Moreau, and Y. D. Zhang, “Source separation and localization using time-frequency distributions: An overview,” IEEE Signal Processing Magazine , vol. 30, pp. 97–107, Nov 2013.
2[2] C. Jutten and P. Comon, Séparation de sources 2. Au-delà de l’aveugle et applications. Hermes, 2007.
3[3] D.-T. Pham and J.-F. Cardoso, “Blind separation of instantaneous mixtures of nonstationary sources,” IEEE Transactions on Signal Processing , vol. 49, pp. 1837–1848, Sep 2001.
4[4] L. Parra and C. Spence, “Convolutive blind separation of non-stationary sources,” IEEE Transactions on Speech and Audio Processing , vol. 8, pp. 320–327, May 2000.
5[5] A. Meynard and B. Torrésani, “Spectral estimation for non-stationary signal classes,” in Sampling Theory and Applications , Proceedings of Samp TA 17, (Tallinn, Estonia), July 2017.
6[6] A. Meynard and B. Torresani, “Spectral analysis for nonstationary audio,” IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2018.
7[7] D. Stowell, Computational Analysis of Sound Scenes and Events , ch. Computational Bioacoustic Scene Analysis, pp. 303–333. Springer, 2018.
8[8] D. Passilongo, L. Mattioli, E. Bassi, L. Szabó, and M. Apollonio, “Visualizing sound: counting wolves by using a spectral view of the chorus howling,” Frontiers in Zoology , vol. 12, p. 22, Sep 2015.