Adaptive Blind Separation of Two Dependent Sources

George V. Moustakides; Feeby Salib; Kalliopi Basioti

arXiv:1906.10004·eess.SP·August 8, 2019·Allerton

Adaptive Blind Separation of Two Dependent Sources

George V. Moustakides, Feeby Salib, Kalliopi Basioti

PDF

TL;DR

This paper explores adaptive blind source separation for two sources that may be dependent, identifying conditions under which separation is feasible and providing simulations to demonstrate successful separation of dependent sources.

Contribution

It extends blind source separation techniques to dependent sources by analyzing symmetry conditions, broadening the scope beyond independent sources.

Findings

01

Separation is possible for dependent sources with certain symmetry in their joint pdf.

02

Theoretical analysis identifies classes of dependent sources that are separable.

03

Simulations confirm the practical effectiveness of the proposed approach.

Abstract

We consider the problem of adaptive blind separation of two sources from their instantaneous mixtures. We focus on the case where the two sources are not necessarily independent. By analyzing a general form of adaptive algorithms we show that separation is possible not only for independent sources but also for sources that are dependent provided their joint pdf satisfies certain symmetry conditions. A very interesting problem consists in identifying the class of dependent sources that are non-separable, namely, the counterpart of Gaussian sources of the independent case. We corroborate our theoretical analysis with a number of simulations and give examples of dependent sources that can be easily separated.

Equations88

X_{t} = A S_{t},

X_{t} = A S_{t},

\displaystyle\begin{split}\hat{S}_{t}&=\mathbf{B}_{t-1}X_{t}\\ \mathbf{B}_{t}&=\mathbf{B}_{t-1}-\mu\mathbf{H}\big{(}\hat{S}_{t}\big{)}\mathbf{B}_{t-1},~{}\mathbf{B}(0)=\mathbf{I},\end{split}

\displaystyle\begin{split}\hat{S}_{t}&=\mathbf{B}_{t-1}X_{t}\\ \mathbf{B}_{t}&=\mathbf{B}_{t-1}-\mu\mathbf{H}\big{(}\hat{S}_{t}\big{)}\mathbf{B}_{t-1},~{}\mathbf{B}(0)=\mathbf{I},\end{split}

H (Z) = [Z Z^{⊺} - I] + [Z G^{⊺} (Z) - G (Z) Z^{⊺}],

H (Z) = [Z Z^{⊺} - I] + [Z G^{⊺} (Z) - G (Z) Z^{⊺}],

\hat{S}_{t} C_{t} = C_{t - 1} S_{t} = C_{t - 1} - μ H (\hat{S}_{t}) C_{t - 1}, C (0) = A,

\hat{S}_{t} C_{t} = C_{t - 1} S_{t} = C_{t - 1} - μ H (\hat{S}_{t}) C_{t - 1}, C (0) = A,

C_{t} = C_{t - 1} - μ H (C_{t - 1} S_{t}) C_{t - 1}, C (0) = A,

C_{t} = C_{t - 1} - μ H (C_{t - 1} S_{t}) C_{t - 1}, C (0) = A,

C = [\pm c_{1} 0 0 \pm c_{2}], or C = [0 \pm c_{2} \pm c_{1} 0],

C = [\pm c_{1} 0 0 \pm c_{2}], or C = [0 \pm c_{2} \pm c_{1} 0],

1 + κ_{1} > 0, 1 + κ_{2} > 0, (1 + κ_{1}) (1 + κ_{2}) > 1.

1 + κ_{1} > 0, 1 + κ_{2} > 0, (1 + κ_{1}) (1 + κ_{2}) > 1.

f (s_{1}, s_{2}) = (1 - ϵ) f_{1} (s_{1}) f_{2} (s_{2}) + ϵ g_{1} (s_{1}) g_{2} (s_{2}) .

f (s_{1}, s_{2}) = (1 - ϵ) f_{1} (s_{1}) f_{2} (s_{2}) + ϵ g_{1} (s_{1}) g_{2} (s_{2}) .

\mathbf{H}(Z)=\left[\begin{array}[]{cc}\mathsf{h}_{11}(z_{1},z_{2})&\mathsf{h}_{12}(z_{1},z_{2})\\ \mathsf{h}_{21}(z_{1},z_{2})&\mathsf{h}_{22}(z_{1},z_{2})\end{array}\right].

\mathbf{H}(Z)=\left[\begin{array}[]{cc}\mathsf{h}_{11}(z_{1},z_{2})&\mathsf{h}_{12}(z_{1},z_{2})\\ \mathsf{h}_{21}(z_{1},z_{2})&\mathsf{h}_{22}(z_{1},z_{2})\end{array}\right].

\overset{ˉ}{C}_{t} = \overset{ˉ}{C}_{t - 1} - μ E_{S} [H (\overset{ˉ}{C}_{t - 1} S_{t})] \overset{ˉ}{C}_{t - 1}

\overset{ˉ}{C}_{t} = \overset{ˉ}{C}_{t - 1} - μ E_{S} [H (\overset{ˉ}{C}_{t - 1} S_{t})] \overset{ˉ}{C}_{t - 1}

E [C_{t}] = \overset{ˉ}{C}_{t} + o (μ),

E [C_{t}] = \overset{ˉ}{C}_{t} + o (μ),

E_{S} [H (\overset{ˉ}{C}_{\infty} S_{t})] = 0.

E_{S} [H (\overset{ˉ}{C}_{\infty} S_{t})] = 0.

E [h_{ij} (\pm c_{1} s_{1}, \pm c_{2} s_{2})] = 0

E [h_{ij} (\pm c_{1} s_{1}, \pm c_{2} s_{2})] = 0

E [h_{ij} (\pm c_{2} s_{2}, \pm c_{1} s_{1})] = 0

E [h_{ij} (\pm c_{2} s_{2}, \pm c_{1} s_{1})] = 0

f (- s_{1}, s_{2}) = f (s_{1}, - s_{2}) = f (s_{1}, s_{2}),

f (- s_{1}, s_{2}) = f (s_{1}, - s_{2}) = f (s_{1}, s_{2}),

h_{11} (- z_{1}, z_{2}) = h_{11} (z_{1}, - z_{2}) = h_{11} (z_{1}, z_{2}) h_{22} (- z_{1}, z_{2}) = h_{22} (z_{1}, - z_{2}) = h_{22} (z_{1}, z_{2}) h_{12} (- z_{1}, z_{2}) = h_{12} (z_{1}, - z_{2}) = - h_{12} (z_{1}, z_{2}) h_{21} (- z_{1}, z_{2}) = h_{21} (z_{1}, - z_{2}) = - h_{21} (z_{1}, z_{2}) .

h_{11} (- z_{1}, z_{2}) = h_{11} (z_{1}, - z_{2}) = h_{11} (z_{1}, z_{2}) h_{22} (- z_{1}, z_{2}) = h_{22} (z_{1}, - z_{2}) = h_{22} (z_{1}, z_{2}) h_{12} (- z_{1}, z_{2}) = h_{12} (z_{1}, - z_{2}) = - h_{12} (z_{1}, z_{2}) h_{21} (- z_{1}, z_{2}) = h_{21} (z_{1}, - z_{2}) = - h_{21} (z_{1}, z_{2}) .

E [h_{11} (c_{1} s_{1}, c_{2} s_{2})] = 0, E [h_{22} (c_{1} s_{1}, c_{2} s_{2})] = 0,

E [h_{11} (c_{1} s_{1}, c_{2} s_{2})] = 0, E [h_{22} (c_{1} s_{1}, c_{2} s_{2})] = 0,

E [h_{11} (c_{2} s_{2}, c_{1} s_{1})] = 0, E [h_{22} (c_{2} s_{2}, c_{1} s_{1})] = 0,

E [h_{11} (c_{2} s_{2}, c_{1} s_{1})] = 0, E [h_{22} (c_{2} s_{2}, c_{1} s_{1})] = 0,

Δ_{t} = [α_{t} δ_{t} γ_{t} β_{t}]

Δ_{t} = [α_{t} δ_{t} γ_{t} β_{t}]

[α_{t} β_{t}] = C (I + μ F) C^{- 1} [α_{t - 1} β_{t - 1}] [γ_{t} δ_{t}] = C (I + μ G) C^{- 1} [γ_{t - 1} δ_{t - 1}],

[α_{t} β_{t}] = C (I + μ F) C^{- 1} [α_{t - 1} β_{t - 1}] [γ_{t} δ_{t}] = C (I + μ G) C^{- 1} [γ_{t - 1} δ_{t - 1}],

F

F

G

s_{1} \to \pm \infty lim h_{ij} (c_{1} s_{1}, \cdot) s_{1} f (s_{1}, \cdot) = 0 s_{2} \to \pm \infty lim h_{ij} (\cdot, c_{2} s_{2}) s_{2} f (\cdot, s_{2}) = 0.

s_{1} \to \pm \infty lim h_{ij} (c_{1} s_{1}, \cdot) s_{1} f (s_{1}, \cdot) = 0 s_{2} \to \pm \infty lim h_{ij} (\cdot, c_{2} s_{2}) s_{2} f (\cdot, s_{2}) = 0.

tr {F} < 0, det {F} > 0, tr {G} < 0, det {G} > 0,

tr {F} < 0, det {F} > 0, tr {G} < 0, det {G} > 0,

f (s_{1}, s_{2}) = ω (K_{2} s_{1}^{2} + K_{1} s_{2}^{2}),

f (s_{1}, s_{2}) = ω (K_{2} s_{1}^{2} + K_{1} s_{2}^{2}),

f_{1}^{'} (s_{1}) f_{2} (s_{2})

f_{1}^{'} (s_{1}) f_{2} (s_{2})

\allowdisplaybreaks f_{1} (s_{1}) f_{2}^{'} (s_{2})

\frac{f _{1}^{'} ( s _{1} ) f _{2} ( s _{2} )}{2 K _{2} s _{1}} = \frac{f _{1} ( s _{1} ) f _{2}^{'} ( s _{2} )}{2 K _{1} s _{2}}

\frac{f _{1}^{'} ( s _{1} ) f _{2} ( s _{2} )}{2 K _{2} s _{1}} = \frac{f _{1} ( s _{1} ) f _{2}^{'} ( s _{2} )}{2 K _{1} s _{2}}

\frac{f _{1}^{'} ( s _{1} )}{f _{1} ( s _{1} ) 2 K _{2} s _{1}} = \frac{f _{2}^{'} ( s _{2} )}{f _{2} ( s _{2} ) 2 K _{1} s _{2}} = K .

\frac{f _{1}^{'} ( s _{1} )}{f _{1} ( s _{1} ) 2 K _{2} s _{1}} = \frac{f _{2}^{'} ( s _{2} )}{f _{2} ( s _{2} ) 2 K _{1} s _{2}} = K .

H (Z)

H (Z)

H (Z)

s_{1} = r cos θ; s_{2} = r {sin θ + d (sin θ)^{2} sgn (sin θ)} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Adaptive Blind Separation of Two Dependent Sources∗

George V. Moustakides1, Feeby Salib2 and Kalliopi Basioti3 *This work was supported by the National Science Foundation under Grant CIF 1513373, through Rutgers University.1G.V. Moustakides is faculty with ECE, University of Patras, Rion, Greece, [email protected] and with CS, Rutgers University, New Brunswick, NJ, USA, [email protected].2F. Salib is graduate student with ECE, Rutgers University, New Brunswick, NJ, USA, [email protected].3K. Basioti is graduate student with CS, Rutgers University, New Brunswick, NJ, USA, [email protected].

Abstract

We consider the problem of adaptive blind separation of two sources from their instantaneous mixtures. We focus on the case where the two sources are not necessarily independent. By analyzing a general form of adaptive algorithms we show that separation is possible not only for independent sources but also for sources that are dependent provided their joint pdf satisfies certain symmetry conditions. A very interesting problem consists in identifying the class of dependent sources that are non-separable, namely, the counterpart of Gaussian sources of the independent case. We corroborate our theoretical analysis with a number of simulations and give examples of dependent sources that can be easily separated.

I Introduction and Background

Blind source separation (BSS) is the problem of recovering unobserved signals (sources) from their observed mixtures. BSS finds applications in a number of areas as biomedical signal processing, speech and image processing, data mining and communications [1].

The simplest and most common version of the BSS problem consists in estimating two source signals $s_{1}(t),s_{2}(t)$ from two observations $x_{1}(t),x_{2}(t)$ that are instantaneous linear mixtures of the sources of the form $x_{i}(t)=a_{i1}s_{1}(t)+a_{i2}s_{2}(t),~{}i=1,2$ . Using matrix notation, we can write

[TABLE]

where $X_{t}=[x_{1}(t),x_{2}(t)]^{\intercal}$ is the observation vector, $S_{t}=[s_{1}(t),s_{2}(t)]^{\intercal}$ the source signal vector and $\mathbf{A}$ a constant matrix comprised of the mixing coefficient $a_{ij},~{}i,j=1,2$ . We assume that the observation sequence $\{X_{t}\}$ becomes available sequentially and we are interested in the on-line estimation of the source sequence $\{S_{t}\}$ . It is clear that, to solve this problem, it is sufficient to estimate the matrix $\mathbf{B}=\mathbf{A}^{-1}$ since then $S_{t}$ can be recovered as $\mathbf{B}X_{t}$ . Our results can be extended to cover multiple sources but we reserve the corresponding analysis for the more extended, journal version of our work.

For the solution of the BSS problem we concentrate on adaptive algorithms. Therefore we will assume that every time a new sample of the vector process $\{X_{t}\}$ becomes available we update an estimate $\mathbf{B}_{t}$ of the inverse $\mathbf{A}^{-1}$ . We focus on adaptive algorithms of the form

[TABLE]

where (see [1, 2, 3]) the most common form of the matrix function $\mathbf{H}(Z)$ is

[TABLE]

with $Z=[z_{1}\,z_{2}]^{\intercal}$ , $G(Z)=[g_{1}(z_{1})\,g_{2}(z_{2})]^{\intercal}$ , $g_{i}(z)$ univariate functions, $\mathbf{I}$ the identity matrix, and $\mu>0$ is a scalar step size that controls the convergence behavior of the algorithm. Vector $\hat{S}_{t}$ plays the role of the estimate of the source vector.

Although the algorithm defined by (2) is the one we apply in practice, for its analysis it is more convenient to adopt the following normalized version

[TABLE]

with $\mathbf{C}_{t}=\mathbf{B}_{t}\mathbf{A}$ and where matrix $\mathbf{A}$ appears now only as initial condition. Substituting the first equation in (4) into the second yields the final recursion

[TABLE]

which will be used in our subsequent analysis.

We will say that the adaptive algorithm solves the BSS problem if $\mathbf{C}_{t}$ tends in the mean to a non-mixing matrix $\mathbf{C}$ with the following possible forms

[TABLE]

where $c_{1},c_{2}$ positive, nonzero quantities. In other words $\mathbf{C}$ must be either diagonal or anti-diagonal with nonzero elements. These limits impose an ambiguity in the ordering, power and sign of the estimated sources. Fortunately, in most applications these uncertainties can either be tolerated or corrected with simple means as, for example, employment of pilot signals, where periodically and at known time instances the source signals are synthetic and known before hand.

Remarkably, the algorithm defined in (5), can converge even when very limited information about the statistical description of the sources is available. In fact, [2, 4] it is sufficient that the functions $g_{i}(z)$ and the probability density functions (pdf) of the sources satisfy certain symmetry properties. We have the following theorem that summarizes the existing results (for the two-source case).

Theorem 1.

*Let the sources $\{s_{1}(t),s_{2}(t)\}$ satisfy the following assumptions:

A1. For every $t$ , $s_{1}(t),s_{2}(t)$ are independent random variables with symmetric densities and at most one source can be Gaussian.*

A2. For $\kappa_{i}=\mathsf{E}[g_{i}^{\prime}(s_{i})]\mathsf{E}[s_{i}^{2}]-\mathsf{E}[s_{i}g_{i}(s_{i})],i=1,2,$ we have

[TABLE]

Then the adaptive scheme defined by (5) with $\mathbf{H}(Z)$ defined in (3) can converge in the mean to a non-mixing matrix and the corresponding limit is locally stable.

Proof.

The proof can be found in [4].∎

Theorem 1 does not guarantee global convergence because of the nonlinear form of (5). Worth mentioning is also the fact that in (3) the first term in $\mathbf{H}(Z)$ , which uses only second order moments, plays the role of a whitener of the observation vector $X_{t}$ , whereas the second term, with the help of nonlinear statistics, imposes the final independence and achieves separation. The literature on BSS is very rich. One can find a detailed review of the existing methodologies for the case of independent sources in [1].

II Proposed Algorithmic Scheme

In this work, we extend the above result in two major directions. Specifically

•

We show that there exists a rich class of adaptive algorithms that can be applied to the BSS problem with the same success as the algorithm defined in (2), (3). This algorithmic class not only separates independent sources but also sources that are dependent, provided that some simple symmetry condition applies to the joint pdf of the source signals. It is in fact this symmetry that guarantees separation and not independence.

•

We identify the type of dependent random sources that cannot be separated under our proposed general algorithmic scheme, hence extending the non-Gaussianity requirement of the independent case.

The motivation for considering dependent sources, except of course the obvious theoretical challenge, is the fact that even when sources are independent under nominal conditions, once we consider simple contamination models, independence can be easily lost. For example if $\mathsf{f}(s_{1},s_{2})$ denotes the joint pdf of the two sources, the following $\epsilon$ -contamination model does not correspond to independent sources

[TABLE]

We see that with probability $1-\epsilon$ the two sources are independent following the pdfs $\mathsf{f}_{1}(s_{1}),\mathsf{f}_{2}(s_{2})$ and with probability $\epsilon$ they are again independent but following the alternative pair of pdfs $\mathsf{g}_{1}(s_{1}),\mathsf{g}_{2}(s_{1})$ . It is a simple exercise to verify that $\mathsf{f}(s_{1},s_{2})$ does not correspond to independent sources. This raises the logical question as to whether the BSS algorithms will break under such mild divergence from the nominal conditions. There are of course applications [5, 6] where the source signals are genuinely dependent and we are interested in their separation. The existing literature for BSS methods for dependent sources is considerable. Here we only mention some representative articles for each available methodology. There are off-line techniques as Dependent Component Analysis [7, 8], contrast functions [9], time-frequency ratio of mixtures [10] and Kullback-Leibler divergence for copula densities [11] that are proposed to solve the problem. For on-line methods we find a technique based on nonnegative matrix factorization and the Kullback-Leibler divergence in [12]. For a more detailed list of references please consult: www.springeropen.com/collections/DCA.

In this work, we adopt a purely algorithmic approach. Starting with the adaptive algorithm in (2), we examine what type of matrix functions $\mathbf{H}(Z)$ can be employed and combined with what type of dependent sources in order for the algorithm in (2) to be successful, namely, the algorithm in (5) to converge to one of the non-mixing matrices. The goal is, whatever results we develop, to be applicable to a wide variety of signals without requiring exact knowledge of the statistical description of the sources.

Let us now introduce the adaptation we propose as a general alternative to the existing algorithm in (2) and (3). Our scheme also follows (2) but with the matrix function $\mathbf{H}(Z)$ replaced by the more general version

[TABLE]

For the analysis of the corresponding adaptive algorithm we will use the equivalent adaptation introduced in (5) with $\mathbf{H}(Z)$ replaced by the expression introduced in (8). With our analysis we target the discovery of suitable constraints on $\mathbf{H}(Z)$ that will guarantee the correct performance of the corresponding algorithm, namely its convergence to one of the non-mixing matrices depicted in (6).

III Limits and Stability

Adaptive algorithms can be analyzed using Stochastic Approximation theory [13] when the step size $\mu$ is “small”. Our main interest lies with the convergence in the mean which we consider next.

III-A Limit in the Mean

The mean field $\{\mathsf{E}[\mathbf{C}_{t}]\}$ of the algorithm in (5), according to the Stochastic Approximation theory [3], can be efficiently approximated by the sequence $\{\bar{\mathbf{C}}_{t}\}$ defined by the recursion

[TABLE]

and the quality of the approximation is of the form

[TABLE]

where $\mathsf{E}_{S}[\cdot]$ denotes expectation only with respect to the source signal vector $S_{t}$ . If we let $t\to\infty$ and assume that $\bar{\mathbf{C}}_{t}\to\bar{\mathbf{C}}_{\infty}$ with $\bar{\mathbf{C}}_{\infty}$ being an invertible matrix we obtain the following equation for $\bar{\mathbf{C}}_{\infty}$

[TABLE]

All matrices $\bar{\mathbf{C}}_{\infty}$ that satisfy (10) are equilibrium points of the recursion in (9) and potential limits in the mean of the adaptive algorithm in (5). Whether a specific equilibrium can actually become the limit of the recursion in (9) is, of course, a question of stability of the particular equilibrium point.

Let us ignore for the moment the stability issue and focus on the problem of imposing a specific matrix as a possible equilibrium. We simply have to make sure that this matrix satisfies (10) when it replaces $\bar{\mathbf{C}}_{\infty}$ . To assure that the non-mixing matrices introduced in (6) are equilibrium points, we need for $i,j=1,2$ the following equations, corresponding to (10), to be satisfied

[TABLE]

for the diagonal case, or

[TABLE]

for the anti-diagonal. We observe that, for simplicity, we have dropped the subscript “ $S$ ” in the expectation $\mathsf{E}_{S}[\cdot]$ since, from now on, expectation is only with respect to the two sources.

We note that once the functions $\mathsf{h}_{ij}(z_{1},z_{2})$ are specified and the type of non-mixing matrix selected, then (11) or (12) constitutes a system of four equations in two unknowns ( $c_{1},c_{2}$ ). To have a solution of the desired form (diagonal or anti-diagonal) it is clear that two of the four equations must be satisfied automatically and, most importantly, without exact knowledge of the statistical description of the sources. It turns out that this is indeed possible if we assume that the joint pdf $\mathsf{f}(s_{1},s_{2})$ of the two sources exhibits the following property

[TABLE]

corresponding to quadrantal symmetry. We should point out that (13) is not an unfamiliar constraint. Indeed in the case of independent sources where $\mathsf{f}(s_{1},s_{2})=\mathsf{f}_{1}(s_{1})\mathsf{f}_{2}(s_{2})$ we recall from Theorem 1, Condition A1, that we need both marginal pdfs to be symmetric, which implies (13). Consequently we require the same symmetry to hold for the joint density when the two sources are dependent.

If we now impose some additional symmetries, this time on the functions $\mathsf{h}_{ij}(z_{1},z_{2})$ , we can easily guarantee that the desired non-mixing matrices defined in (6) become equilibrium points. Specifically, we ask that the following conditions hold

[TABLE]

In other words, the two diagonal elements of the matrix $\mathbf{H}(Z)$ must be even functions in each of their arguments while the anti-diagonal odd functions. There are two desirable consequences when these properties are combined with the quadrantal symmetry of the joint pdf $\mathsf{f}(s_{1},s_{2})$ .

•

For any $c_{1},c_{2}$ , we have $\mathsf{E}[\mathsf{h}_{12}(\pm c_{1}s_{1},\pm c_{2}s_{2})]=\mathsf{E}[\mathsf{h}_{12}(\pm c_{2}s_{2},\pm c_{1}s_{1})]=0$ and the same property is true for $\mathsf{h}_{21}(z_{1},z_{2})$ . This suggests that two out of the four equations in (11) or (12) are satisfied for free and for all possible signs of the non-mixing matrix.

•

If $(c_{1},c_{2})$ are roots of the system of the two equations

[TABLE]

corresponding to a diagonal non-mixing matrix, or of the system

[TABLE]

corresponding to an anti-diagonal non-mixing matrix then so is any combination of signs $(\pm c_{1},\pm c_{2})$ .

Both observations are very simple to demonstrate since they are a direct consequence of the symmetries imposed on $\mathsf{h}_{ij}(z_{1},z_{2})$ and $\mathsf{f}(s_{1},s_{2})$ . Regarding $c_{1},c_{2}$ we should point out that we only need their existence since the exact values of these two quantities depend on the actual joint pdf $\mathsf{f}(s_{1},s_{2})$ which is assumed to be unknown.

So far, through the symmetries imposed on $\mathsf{f}(s_{1},s_{2})$ in (13) and on $\mathsf{h}_{ij}(z_{1},z_{2})$ in (14), we can guarantee that the desired non-mixing matrices are equilibrium points for the mean field adaptation (9). However, in order for these equilibriums to be actually accessible as limits by the adaptation we also need to establish some form of stability.

III-B Local Stability

The next step consists in examining under what conditions the non-mixing equilibrium points are in fact stable limits of (9). We will present our analysis for $\mathbf{C}=\mathrm{diag}\{c_{1},c_{2}\}$ , similar steps apply for the anti-diagonal case.

Establishing global stability in nonlinear updates is, unfortunately, very difficult and not always possible. We therefore limit ourselves (very common in adaptive algorithms) in testing only for local stability. This means that we write $\bar{\mathbf{C}}_{t}=\mathbf{C}+\mathbf{\Delta}_{t}$ where $\mathbf{\Delta}_{t}$ is a perturbation matrix with “small” elements and analyze the evolution of $\mathbf{\Delta}_{t}$ with $t$ using linear system approximation. Stability requires $\mathbf{\Delta}_{t}\to 0$ as $t\to\infty$ . Specifically, assuming that the perturbation matrix is of the form

[TABLE]

we have the following lemma that captures the evolution of $\mathbf{\Delta}_{t}$ .

Lemma 1.

The elements of the perturbation matrix $\mathbf{\Delta}_{t}$ satisfy the following recursions

[TABLE]

where

[TABLE]

$\mathsf{f}_{s_{i}}(s_{1},s_{2})=\frac{\partial\mathsf{f}(s_{1},s_{2})}{\partial s_{i}}$ * and expectation in both formulas is with respect to the joint source pdf $\mathsf{f}(s_{1},s_{2})$ .*

Proof.

If we assume that $\mathsf{f}(s_{1},s_{2})$ is uniformly bounded then in order for its two marginal densities to exist we need $\lim_{s_{1}\to\pm\infty}\mathsf{f}(s_{1},\cdot)=\lim_{s_{2}\to\pm\infty}\mathsf{f}(\cdot,s_{2})=0$ . In fact we need to strengthen this property slightly so that the two expectations in (19) and (20) are bounded. In particular, for any fixed constants $c_{1},c_{2}$ we require

[TABLE]

Details of the proof are given in the Appendix.∎

From the recursions in (18) we can find conditions that assure local stability of the desired equilibrium. The next lemma discusses exactly this point.

Lemma 2.

The equilibrium point $\mathbf{C}$ is locally stable if and only if the following inequalities hold

[TABLE]

where $\mathrm{tr}\{\cdot\},\mathrm{det}\{\cdot\}$ denote trace and determinant respectively.

Proof.

For local stability we need the two matrices $\mathbf{I}+\mu\mathbf{F}$ , $\mathbf{I}+\mu\mathbf{G}$ to have their eigenvalues in the interior of the unit circle. This can happen for all sufficiently small step sizes $\mu>0$ if and only if the two matrices $\mathbf{F},\mathbf{G}$ have eigenvalues with strictly negative real parts. The two inequalities applied to each matrix correspond to the Routh-Hurwitz criterion that assures this fact.∎

Remark 1: From our local analysis we observe that the mean estimates, when they are close to the limit, converge to the equilibrium exponentially fast in the form of $(\mathbf{I}+\mu\mathbf{F})^{t}$ and $(\mathbf{I}+\mu\mathbf{G})^{t}$ . In other words we have an exponential rate of convergence which is proportional to $\mu$ . Of course this is true, provided that the conditions of Lemma 2 apply. As we can see, a smaller $\mu$ reduces the convergence speed towards the desired limit.

Remark 2: We devoted all our efforts to assure convergence in the mean of the algorithmic scheme in (5) to the desired non-mixing equilibrium. However, mean convergence by itself cannot guarantee satisfactory estimates. It is equally important that the variance of the corresponding estimates is small. Fortunately, regarding this point, Stochastic Approximation comes to our rescue. Specifically, it is known [13] that when the limit in the mean is stable the corresponding covariance matrix of the estimates in (5), at steady-state, is proportional to $\mu$ . Actually, there are even formulas that can compute the steady-state covariance matrix up to a first order approximation in $\mu$ . Since the step size is selected to be small this suggests that, at steady-state, our estimates will differ from the desired non-mixing matrix by a random amount that has small power. Decreasing $\mu$ provides better steady-state estimates but, as mentioned in the previous remark, results in longer convergence periods of the mean field toward its desired limit.

IV Non-Separable Sources

One of the main issues in BSS is to identify the type of sources that cannot be separated. When the two sources are independent it is well known that the only combination that is non-separable by any off- or on-line method is the case of two Gaussians. If we allow the sources to be dependent with a joint pdf satisfying the symmetry in (13) then the class of sources that are non-separable may increase. Unfortunately, under dependency it is very difficult to develop results of the same generality as in the independent case. Consequently, in order to come up with something meaningful, we propose a more modest characterization of the non-separable sources which, we believe, is equally interesting.

Definition: Two sources will be called non-separable if there is no algorithm of the form of (5) for which a non-mixing equilibrium point $\mathbf{C}$ defined in (6) is stable. In other words, instead of referring to any on- or off-line method, we relate the separability property to our general algorithmic scheme. If our algorithm is unable to converge to a non-mixing matrix no matter which functions $\mathsf{h}_{ij}(z_{1},z_{2})$ we employ, then we regard the corresponding sources as non-separable.

The equilibrium point is unstable if at least one of the two matrices $\mathbf{F},\mathbf{G}$ has at least one eigenvalue with positive or zero real part. Clearly this fact must be shared by all combinations of functions $\mathsf{h}_{ij}(z_{1},z_{2})$ with symmetries specified in (14). We have the following theorem that identifies the joint probability density of sources that are non-separable, according to our definition.

Theorem 2.

Two dependent sources $s_{1},s_{2}$ are non-separable by any version of the algorithm in (5) if and only if their joint pdf $\mathsf{f}(s_{1},s_{2})$ is of the following form

[TABLE]

where $\omega(z)$ is a univariate function of $z$ and $K_{1},K_{2}$ are positive constants.

Proof.

The proof is very interesting and requires several steps. All details are given in the Appendix.∎

Theorem 2 identifies as non-separable, the sources with joint pdf $\mathsf{f}(s_{1},s_{2})$ that exhibits elliptical quadrantal symmetry (due to the term $K_{1}s_{1}^{2}+K_{2}s_{2}^{2}$ ). Fig. 1 captures the typical form of the contour lines of the corresponding joint pdf. An interesting question is what happens when we apply our definition of non-separability to the independent case. In particular, we would like to know whether our definition generates any additional, to the classical Gaussian pair, sources. The next corollary provides the necessary answer.

Corollary.

When the two sources $s_{1},s_{2}$ are independent the only combination which is non-separable by the algorithm in (5) is the classical case of Gaussian sources.

Proof.

When the two sources are independent then $\mathsf{f}(s_{1},s_{2})=\mathsf{f}_{1}(s_{1})\mathsf{f}_{2}(s_{2})$ . If we use this fact in (23) and take the derivative with respect to $s_{1}$ and $s_{2}$ we obtain the following equalities

[TABLE]

where $\omega_{z}(a)=\frac{d\omega(z)}{dz}|_{z=a}$ . From the two equations we conclude that

[TABLE]

which suggests

[TABLE]

$K$ must be a function solely of $s_{1}$ and at the same time a function solely of $s_{2}$ , therefore it is necessarily a constant. The previous expression gives rise to two differential equations in $s_{1}$ and $s_{2}$ , with solutions $\mathsf{f}_{1}(s_{1})=A_{1}e^{KK_{2}s_{1}^{2}}$ , $\mathsf{f}_{2}(s_{2})=A_{2}e^{KK_{1}s_{2}^{2}}$ , i.e. Gaussian pdfs. The corresponding function $\omega(z)$ has the form $\omega(z)=A_{1}A_{2}e^{Kz}$ . ∎

The Corollary guarantees that, even if we limit ourselves to separation algorithms of the form of (5), this does not augment the class of non-separable sources when the sources are independent. This result was, in a sense, expected since from the literature we know that adaptive algorithms of the form of (2), with $\mathbf{H}(Z)$ as in (3), in simulation were seen to be able to separate independent sources except, of course, Gaussian pairs. Since our model for $\mathbf{H}(Z)$ in (8), with the particular symmetries imposed in (14), is more general than (3), the corresponding adaptive algorithm will also be capable of separating the same class of independent sources. Of course, the main value of Theorem 2 comes from the fact that it identifies non-separable dependent sources which is clearly not a straighforward extension of the Gaussian-pair of the independent case.

In the next section we give examples of classical and non-classical $\mathbf{H}(Z)$ matrices and we test, using simulations, their capability to separate dependent sources. We also give examples of sources with elliptical quadrantal symmetry and verify that the algorithm in (5) is unable to perform separation.

V Examples

Let us start with the example where the pair $(s_{1},s_{2})$ is a mixture of independent Gaussian random variables. Specifically, with probability 0.5 the two sources $s_{1}$ and $s_{2}$ are independent $\mathcal{N}(0,1)$ , $\mathcal{N}(0,4)$ , while with probability 0.5 they are again independent $\mathcal{N}(0,4)$ , $\mathcal{N}(0,1)$ respectively. We consider two cases for the $\mathbf{H}(Z)$ matrix

[TABLE]

The first matrix corresponds to the classical version introduced in (3) and, as we can see, it whitens the observations. On the other hand, the second matrix does not contain any whitening part. Both selections satisfy the symmetry properties set in (14). Furthermore, the analysis of the corresponding matrices $\mathbf{F},\mathbf{G}$ assures validity of (22) for stability.

Fig. 2 depicts the simulation results. In Fig. 2(a) we can see the contour lines of the corresponding joint pdf. We observe that we have quadrantal symmetry which is not elliptical, consequently the sources can be separated. In Fig. 2(b) and (c) we plot the elements of the normalized estimates $\mathbf{C}_{t}$ as they evolve in time for the two choices of $\mathbf{H}(Z)$ . We recall that $\mathbf{C}_{0}=\mathbf{A}$ where $\mathbf{A}$ is unknown. We therefore initialized $\mathbf{C}_{0}$ with a random matrix corresponding to a random selection of $\mathbf{A}$ and $\mathbf{B}_{0}=\mathbf{I}$ . Blue and magenta lines depict the diagonal elements of $\mathbf{C}_{t}$ whereas yellow and orange the anti-diagonal. As we can see, in both algorithms we have convergence towards a non-mixing matrix.

Let us now test the validity of Theorem 2. We are going to generate dependent sources with their pdf controlled by a parameter $d$ . When $d\neq 0$ the joint pdf will have quadrantal symmetry but not elliptical. For $d=0$ the quadrantal symmetry will also become elliptical. This means that in the former case we expect source separability while in the latter the sources will be non-separable. The source model we propose is the following: We start with $r,\theta$ independent random variables with $r$ uniformly distributed in [0,1] and $\theta$ uniformly distributed in $[-\pi,\pi]$ . We apply the following transformations to produce the two signals

[TABLE]

As we mentioned, $d=0$ is the only value that generates elliptical (actually cyclic) symmetry. We use the classical $\mathbf{H}(Z)$ matrix in order to demonstrate that the classical algorithms can also separate dependent sources. Fig. 3(a) depicts the contour lines of the pdf and (b) the evolution of the elements of the normalized estimates for the case $d=1$ .

As we can see the adaptive algorithm converges to a non-mixing matrix.

In Fig. 4, we present the simulation for $d=0$ . Fig. 4(a) has the contours which have indeed cyclic symmetry.

In (b), we have the evolution of $\mathbf{C}_{t}$ which, as predicted by our analysis, does not converge to a non-mixing matrix.

VI Conclusion

We considered adaptive algorithms that are capable of blindly separating dependent sources. We showed that if the sources exhibit a quadrantal symmetry in their statistical behavior, then simple adaptive algorithms can be employed to separate them. This result indicates that source separability is not a property due to “independence” but rather due to “symmetric statistical behavior”. With our analysis we were also able to identify the dependent sources that are not separable thus extending the Gaussian case known for independent sources.

Appendix

Proof of Lemma 1

The proof is somewhat involved but presents no particular analytical challenges. Since $\mathbf{C}$ is an equilibrium point satisfying $\mathsf{E}[\mathbf{H}(\mathbf{C}S)]=0$ , where expectation is with respect to $\mathsf{f}(s_{1},s_{2})$ , it is not very difficult to verify that the study of local stability of the mean field (9) at $\mathbf{C}$ is the same as studying the local stability of

[TABLE]

at the same equilibrium. Consider now the perturbation $\bar{\mathbf{C}}_{t}=\mathbf{C}+\mathbf{\Delta}_{t}$ . Next we will present the complete computation of the recursion for the element $\alpha_{t}$ defined in (17) for the perturbation matrix. Similar steps can be applied for the other three terms to show the validity of the lemma. Without loss of generality assume that $\mathbf{C}=\mathrm{diag}\{c_{1},c_{2}\}$ , then for $\alpha_{t}$ we have

[TABLE]

Applying first order Taylor expansion we obtain

[TABLE]

where $\partial_{z_{i}}$ denotes partial derivative with respect to $z_{i}$ . Since $\mathsf{h}_{11}(z_{1},z_{2})$ is even symmetric in $z_{1}$ this implies that $\partial_{z_{1}}\mathsf{h}_{11}(z_{1},z_{2})$ is odd in $z_{1}$ consequently $\partial_{z_{1}}\mathsf{h}_{11}(c_{1}s_{1},c_{2}s_{2})s_{2}$ is odd in both arguments $s_{1},s_{2}$ . Because $\mathsf{f}(s_{1},s_{2})$ exhibits quadrantal symmetry this suggests that $\mathsf{E}[\partial_{z_{1}}\mathsf{h}_{11}(c_{1}s_{1},c_{2}s_{2})s_{2}]=0$ . Similar conclusion can be drawn for $\mathsf{E}[\partial_{z_{2}}\mathsf{h}_{11}(c_{1}s_{1},c_{2}s_{2})s_{1}]$ . Because of this observation we can write

[TABLE]

Let us now find a more convenient expression for the two expectations. First note that $\partial_{z_{1}}\mathsf{h}_{11}(c_{1}s_{1},c_{2}s_{2})=c_{1}^{-1}\partial_{s_{1}}\mathsf{h}_{11}(c_{1}s_{1},c_{2}s_{2})$ . Using this equality we can write

[TABLE]

For the second equality we used integration by parts and (21) and for the last we used (15). Similar computations can be performed for the second expectation in the recursion for $\alpha_{t}$ and for the corresponding terms in the recursions for $\beta_{t},\gamma_{t},\delta_{t}$ . This can prove validity of the formulas for the two matrices $\mathbf{F},\mathbf{G}$ in (19) and (20). This completes the proof of the lemma.∎

Proof of Theorem 2

As we mentioned, for non-separation we need at least one of the two matrices $\mathbf{F},\mathbf{G}$ to have an eigenvalue which is either positive of zero. This property must be true for all functions $\mathsf{h}_{ij}(z_{1},z_{2})$ with symmetries as in (14). Note that a possible selection of $\mathsf{h}_{ij}(z_{1},z_{2})$ is the following

[TABLE]

which satisfies the system of equations (11) with $c_{1}=c_{2}=1$ . Denote the corresponding $\mathbf{F},\mathbf{G}$ matrices as $\mathbf{F}_{*},\mathbf{G}_{*}$ then, using (19), (20) we obtain

[TABLE]

Both matrices are symmetric and nonnegative definite, therefore the only hope to experience instability is if and only if at least one of the two matrices has an eigenvalue equal to 0 (since nonzero eigenvalues are necessarily negative). The latter can happen only when we can find constants $K_{1},K_{2}$ such that $[K_{1}\,-\!\!K_{2}]^{\intercal}$ is an eigenvector to a 0 eigenvalue for $\mathbf{F}_{*}$ or $\mathbf{G}_{*}$ . Because both matrices are symmetric and nonnegative definite, this is possible if and only if, at least one of the following two equations is satisfied for all $(s_{1},s_{2})$

[TABLE]

with not necessarily the same constants $K_{1},K_{2}$ . Summarizing: For the specific selection of the $\mathsf{h}_{ij}(z_{1},z_{2})$ functions, at least one of the two equations (25), (26) is required to be true if the sources are non-separable.

It is a fact that: If (25) or (26) is valid then for any other selection of $\mathsf{h}_{ij}(z_{1},z_{2})$ at least one of the corresponding $\mathbf{F}$ or $\mathbf{G}$ matrices will also have an eigenvalue equal to 0.

This can be seen from (19), (20) where we have the expression for $\mathbf{F},\mathbf{G}$ for arbitrary $\mathsf{h}_{ij}(z_{1},z_{2})$ . If for example (26) is true then $\mathbf{G}$ in (20) will have the same $[K_{1}\,-\!\!K_{2}]^{\intercal}$ as a right eigenvector corresponding to a 0 eigenvalue. We can therefore conclude that if at least one of (25), (26) is true then $\mathsf{f}(s_{1},s_{2})$ corresponds to non-separable sources.

Let us now examine what type of joint densities $\mathsf{f}(s_{1},s_{2})$ can satisfy (25), (26). We start with (25). Due to the quadrantal symmetry we can limit ourselves to the first quadrant with $s_{1},s_{2}$ nonnegative. Define $z=s_{1}^{K_{2}}s_{2}^{K_{1}}$ then we can express $s_{1}$ in terms of $z$ and $s_{2}$ as $s_{1}=z^{\frac{1}{K_{2}}}s_{2}^{-\frac{K_{1}}{K_{2}}}$ . Call $\omega(z,s_{2})=\mathsf{f}(z^{\frac{1}{K_{2}}}s_{2}^{-\frac{K_{1}}{K_{2}}},s_{2})$ and compute its partial derivative with respect to $s_{2}$ , we have

[TABLE]

with the last equality coming from (25). From $\omega_{s_{2}}(z,s_{2})=0$ we conclude that $\omega(z,s_{2})=\omega(z)$ . Recalling the relationship between $\omega(z,s_{2})$ and $\mathsf{f}(s_{1},s_{2})$ and replacing $z$ with its definition we prove that $\mathsf{f}(s_{1},s_{2})=\omega(|s_{1}|^{K_{2}}|s_{2}|^{K_{1}})$ . It turns out that functions of this form cannot be legitimate joint pdfs. This can be seen by integrating the equality over $s_{2}$ in order to identify the marginal pdf $\mathsf{f}_{1}(s_{1})$ . We note that

[TABLE]

where constant $A$ is defined as $A=\frac{2}{K_{1}}\int_{0}^{\infty}z^{\frac{1}{K_{1}}-1}\omega(z)dz$ . The resulting form of $\mathsf{f}_{1}(s_{1})$ is not an integrable function over the whole real line for any value of the ratio $\frac{K_{2}}{K_{1}}$ and therefore cannot play the role of the marginal $\mathsf{f}_{1}(s_{1})$ . Consequently (25) cannot be satisfied by any joint pdf $\mathsf{f}(s_{1},s_{2})$ .

Let us now analyze in the same way (26). If in this case we define $z=K_{2}s_{1}^{2}+K_{1}s_{2}^{2}$ , solve for $s_{1}$ and follow exactly the same steps as in the previous case, we end up with $\mathsf{f}(s_{1},s_{2})=\omega(K_{2}s_{1}^{2}+K_{1}s_{2}^{2})$ . What is left to show is that $K_{1},K_{2}$ must be of the same sign which, without loss of generality, can be considered positive. Note that if $K_{2}>0$ and $K_{1}<0$ then $K_{2}s_{1}^{2}+K_{1}s_{2}^{2}=r^{2}$ , for fixed $r$ , corresponds to a hyperbola. We can then express $s_{1},s_{2}$ in terms of two alternative variables $r$ and $\theta$ as follows

[TABLE]

where $r\geq 0$ and $\theta$ can be any real. If the joint pdf of $s_{1},s_{2}$ satisfies $\mathsf{f}(s_{1},s_{2})=\omega(K_{2}s_{1}^{2}+K_{1}s_{2}^{2})$ , then we can find the corresponding pdf of $r$ and $\theta$ by applying standard methodology for transformations of random variables, this yields

[TABLE]

The previous equation suggests that $r$ and $\theta$ are independent and $r$ has a pdf of the form $A\omega(r^{2})r$ , where $A$ suitable constant, while $\theta$ a pdf equal to $A^{-1}$ , namely, a uniform density. The latter, however, is not possible since $\theta$ takes values on the whole real line and there is no uniform distribution that can support this unbounded range. Regarding this last point, one might argue that the density we are seeking exists and is known as “degenerate uniform distribution”. However, we recall that this function is not an actual density but rather a limit of a regular uniform density whose support increases without limit. The actual limiting function is not a pdf since it is 0 everywhere on the real line. Consequently we cannot have $K_{2}>0$ and $K_{1}<0$ and the only legitimate choice for the joint pdf of a non-separable pair is a function with elliptical quadrantal symmetry. This completes the proof of the theorem. ∎

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. Kofidis, Blind source separation: Fundamentals and recent advances, ar Xiv: 1603:03089 , 2016.
2[2] J-F. Cardoso, Blind signal separation: Statistical principles, Proceedings of IEEE , vol. 86, no. 10, pp. 2009–2025, 1998.
3[3] J-F. Cardoso, High-Order contrasts for independent component analysis, Neural Computations , vol. 11, no. 1, pp. 157-192, 1999.
4[4] J-F. Cardoso, On the stability of source separation algorithms, Journal of VLSI signal processing systems for signal, image and video technology , vol. 26, no. 1-2, pp. 7-14.
5[5] A. Quiros, S. P. Wilson, Dependent Gaussian mixture models for source separation, EURASIP Journal on Advances in Signal Processing , vol. 2012, no. 1, pp. 239, 2012.
6[6] C. A. Estombelo-Montesco, et al., Dependent component analysis for the magnetogastrographic detection of human electrical response activity, Physiological Measurement , vol. 28, pp. 1029–1044, 2007.
7[7] R. Li, H. Li, F. Wang, Dependent component analysis: Concepts and main algorithms, Journal of Computers , vol. 5, no. 4, pp. 589–597, 2010.
8[8] H. Stögbauer, et al., Least dependent component analysis based on mutual information, ar Xiv: physics/0405044 , 2004.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Adaptive Blind Separation of Two Dependent Sources∗

Abstract

I Introduction and Background

Theorem 1**.**

Proof.

II Proposed Algorithmic Scheme

III Limits and Stability

III-A Limit in the Mean

III-B Local Stability

Lemma 1**.**

Proof.

Lemma 2**.**

Proof.

IV Non-Separable Sources

Theorem 2**.**

Proof.

Corollary**.**

Proof.

V Examples

VI Conclusion

Appendix

Proof of Lemma 1

Proof of Theorem 2

Theorem 1.

Lemma 1.

Lemma 2.

Theorem 2.

Corollary.