The Generalized Complex Kernel Least-Mean-Square Algorithm

Rafael Boloix-Tortosa; Juan Jos\'e Murillo-Fuentes; Sotirios A.; Tsaftaris

arXiv:1902.08692·stat.ML·October 2, 2019

The Generalized Complex Kernel Least-Mean-Square Algorithm

Rafael Boloix-Tortosa, Juan Jos\'e Murillo-Fuentes, Sotirios A., Tsaftaris

PDF

TL;DR

This paper introduces the generalized complex kernel LMS (gCKLMS), an advanced adaptive regression method for complex signals that improves convergence and representation by incorporating a pseudo-kernel, outperforming previous algorithms in nonlinear channel equalization.

Contribution

The paper presents the gCKLMS algorithm, which includes a pseudo-kernel for better complex signal modeling, unifies previous complex KLMS variants, and offers improved convergence and flexibility.

Findings

01

gCKLMS outperforms previous algorithms in convergence speed.

02

The pseudo-kernel enhances modeling of complex signals with different real and imaginary properties.

03

Experimental results confirm significant performance improvements in nonlinear channel equalization.

Abstract

We propose a novel adaptive kernel based regression method for complex-valued signals: the generalized complex-valued kernel least-mean-square (gCKLMS). We borrow from the new results on widely linear reproducing kernel Hilbert space (WL-RKHS) for nonlinear regression and complex-valued signals, recently proposed by the authors. This paper shows that in the adaptive version of the kernel regression for complex-valued signals we need to include another kernel term, the so-called pseudo-kernel. This new solution is endowed with better representation capabilities in complex-valued fields, since it can efficiently decouple the learning of the real and the imaginary part. Also, we review previous realizations of the complex KLMS algorithm and its augmented version to prove that they can be rewritten as particular cases of the gCKLMS. Furthermore, important conclusions on the kernels design…

Tables1

Table 1. TABLE I: Conditions imposed on the kernel and pseudo-kernel by the algorithms

Algorithm	Kernel term	Pseudo-kernel term	Conditions
gCKLMS	$k = k_{rr} + k_{jj} + j (k_{jr} - k_{rj})$ in (15)	$\tilde{k} = k_{rr} - k_{jj} + j (k_{jr} + k_{rj})$ in (16)
CKLMS2	$k = 2 k_{rr} - j 2 k_{rj}$	$\tilde{k} = 0$	$k_{rr} = k_{jj}$ , $k_{jr} = - k_{rj}$ in (22)
ACKLMS/CKLMS1	$k = 2 k_{rr} \in ℝ$	$\tilde{k} = 0$	$k_{rr} = k_{jj}$ , $k_{jr} = - k_{rj} = 0$

Equations79

[K (x, x^{'})]_{l, q} = ⟨ K_{x} e_{l}, K_{x^{'}} e_{q} ⟩_{H},

[K (x, x^{'})]_{l, q} = ⟨ K_{x} e_{l}, K_{x^{'}} e_{q} ⟩_{H},

K (x, x^{'}) = \overset{ˉ}{\upphi} (x) \upphi (x^{'}),

K (x, x^{'}) = \overset{ˉ}{\upphi} (x) \upphi (x^{'}),

[K (x, x^{'})]_{l, q} = i \in {1, \dots, m} \sum ϕ_{i l} (x) ϕ_{i q} (x^{'}) .

[K (x, x^{'})]_{l, q} = i \in {1, \dots, m} \sum ϕ_{i l} (x) ϕ_{i q} (x^{'}) .

J

J

= E [(y_{R} (i) - f_{R} (x (i)))^{⊤} (y_{R} (i) - f_{R} (x (i)))],

f_{R} (x)

f_{R} (x)

J (w) = E [(y_{R} (i) - \upphi^{⊤} (x (i)) w)^{⊤} (y_{R} (i) - \upphi^{⊤} (x (i)) w)] .

J (w) = E [(y_{R} (i) - \upphi^{⊤} (x (i)) w)^{⊤} (y_{R} (i) - \upphi^{⊤} (x (i)) w)] .

\frac{\partial J ( w )}{\partial w}

\frac{\partial J ( w )}{\partial w}

= - 2 E [\upphi (x (i)) e_{R} (i)],

w (i) = w (i - 1) + 2 μ \upphi (x (i)) e_{R} (i) .

w (i) = w (i - 1) + 2 μ \upphi (x (i)) e_{R} (i) .

w (i) = 2 μ l = 1 \sum i \upphi (x (l)) e_{R} (l) .

w (i) = 2 μ l = 1 \sum i \upphi (x (l)) e_{R} (l) .

\hat{y}_{R} (i)

\hat{y}_{R} (i)

= 2 μ l = 1 \sum i - 1 \upphi^{⊤} (x (i)) \upphi (x (l)) e_{R} (l)

= 2 μ l = 1 \sum i - 1 K (x (i), x (l)) e_{R} (l),

K (x (i), x (l))

K (x (i), x (l))

= [Φ_{r}^{⊤} (x (i)) Φ_{j}^{⊤} (x (i)] [Φ_{r} (x (l)) Φ_{j} (x (l))]

= [Φ_{r}^{⊤} (x (i)) Φ_{r} (x (l)) Φ_{j}^{⊤} (x (i)) Φ_{r} (x (l)) Φ_{r}^{⊤} (x (i)) Φ_{j} (x (l)) Φ_{j}^{⊤} (x (i)) Φ_{j} (x (l))]

= [k_{rr} (x (i), x (l)) k_{jr} (x (i), x (l)) k_{rj} (x (i), x (l)) k_{jj} (x (i), x (l))] .

\mathbf{T}_{n}=\left[\begin{array}[]{c c}\mathbf{I}&\textrm{j}\mathbf{I}\\ \mathbf{I}&-\textrm{j}\mathbf{I}\\ \end{array}\right]\in\mathbb{C}^{2n\times 2n},

\mathbf{T}_{n}=\left[\begin{array}[]{c c}\mathbf{I}&\textrm{j}\mathbf{I}\\ \mathbf{I}&-\textrm{j}\mathbf{I}\\ \end{array}\right]\in\mathbb{C}^{2n\times 2n},

\hat{\underline{y}} (i)

\hat{\underline{y}} (i)

= 2 μ l = 1 \sum i - 1 T_{1} K (x (i), x (l)) (\frac{1}{2} T_{1}^{H} T_{1}) e_{R} (l)

= μ l = 1 \sum i - 1 K_{A} (x (i), x (l)) \underline{e} (l),

K_{A} (x (i), x (l))

K_{A} (x (i), x (l))

= [k (x (i), x (l)) \tilde{k}^{*} (x (i), x (l)) \tilde{k} (x (i), x (l)) k^{*} (x (i), x (l))],

k (x (i), x (l))

k (x (i), x (l))

+ j (k_{jr} (x (i), x (l)) - k_{rj} (x (i), x (l))),

\tilde{k} (x (i), x (l))

+ j (k_{jr} (x (i), x (l)) + k_{rj} (x (i), x (l))) .

\overset{y}{^} (i) = μ l = 1 \sum i - 1 e (l) k (x (i), x (l)) + μ l = 1 \sum i - 1 e^{*} (l) \tilde{k} (x (i), x (l)) .

\overset{y}{^} (i) = μ l = 1 \sum i - 1 e (l) k (x (i), x (l)) + μ l = 1 \sum i - 1 e^{*} (l) \tilde{k} (x (i), x (l)) .

\overset{y}{^} (i) = μ l = 1 \sum i - 1 e (l) k (x (i), x (l)) .

\overset{y}{^} (i) = μ l = 1 \sum i - 1 e (l) k (x (i), x (l)) .

\overset{y}{^} (i) = μ l = 1 \sum i - 1 2 e (l) k_{R} (x (i), x (l)) .

\overset{y}{^} (i) = μ l = 1 \sum i - 1 2 e (l) k_{R} (x (i), x (l)) .

\overset{y}{^} (i) = μ l = 1 \sum i - 1 (e (l) k (x (i), x (l)) + e (l) k^{*} (x (i), x (l))) .

\overset{y}{^} (i) = μ l = 1 \sum i - 1 (e (l) k (x (i), x (l)) + e (l) k^{*} (x (i), x (l))) .

\overset{y}{^} (i)

\overset{y}{^} (i)

= μ l = 1 \sum i - 1 2 e (l) R {k (x (i), x (l)} .

k_{rr} (x (i), x (l))

k_{rr} (x (i), x (l))

k_{jr} (x (i), x (l))

k (x (i), x (l)) = 2 k_{rr} (x (i), x (l)) - j 2 k_{rj} (x (i), x (l)) .

k (x (i), x (l)) = 2 k_{rr} (x (i), x (l)) - j 2 k_{rj} (x (i), x (l)) .

k_{C G} (x, x^{'})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

The Generalized Complex Kernel Least-Mean-Square Algorithm

Rafael Boloix-Tortosa*, Juan José Murillo-Fuentes, Sotirios A. Tsaftaris R. Boloix-Tortosa, and J.J. Murillo-Fuentes are with the Dep. de Teoría de la Señal y Comunicaciones, Escuela Técnica Superior de Ingeniería, Universidad de Sevilla, Camino de los Descubrimientos sn, 41092 Sevilla, Spain. e-mail: {rboloix,murillo}@us.es.S. A. Tsaftaris is with the School of Engineering, The University of Edinburgh, Edinburgh EH8 3FB, U.K., and also with The Alan Turing Institute, London NW1 2DB, U.K.Thanks to the Spanish Government (Ministerio de Economía y Competitividad, TEC2016-78434-C3-02-R, and Ministerio de Educación, Cultura y Deporte, Subprograma Estatal de Movilidad (PRX18/00523), del Plan Estatal de I+D+I) and European Union (FEDER) for funding.

Abstract

We propose a novel adaptive kernel based regression method for complex-valued signals: the generalized complex-valued kernel least-mean-square (gCKLMS). We borrow from the new results on widely linear reproducing kernel Hilbert space (WL-RKHS) for nonlinear regression and complex-valued signals, recently proposed by the authors. This paper shows that in the adaptive version of the kernel regression for complex-valued signals we need to include another kernel term, the so-called pseudo-kernel. This new solution is endowed with better representation capabilities in complex-valued fields, since it can efficiently decouple the learning of the real and the imaginary part. Also, we review previous realizations of the complex KLMS algorithm and its augmented version to prove that they can be rewritten as particular cases of the gCKLMS. Furthermore, important conclusions on the kernels design are drawn that help to greatly improve the convergence of the algorithms. In the experiments, we revisit the nonlinear channel equalization problem to highlight the better convergence of the gCKLMS compared to previous solutions. Also, the flexibility of the proposed generalized approach is tested in a second experiment with non-independent real and imaginary parts. The results illustrate the significant performance improvements of the gCKLMS approach when the complex-valued signals have different properties for the real and imaginary parts.

Index Terms:

LMS, complex-valued, RKHS, kernel methods.

I Introduction

Complex-valued signals model many systems in diverse applications such as electromagnetism, telecommunications, optics or acoustics, among others. Complex-valued signal processing is thus of fundamental interest as it provides a natural way to represent some signals and transformations involved in those systems. While the linear case has been widely studied (see for example [1] and references therein), nonlinear processing still remains an open problem. Nonlinear processing of complex-valued signals has been tackled, among others, from the point of view of neural networks [2], [3], nonlinear adaptive filtering [4], or reproducing kernel Hilbert spaces (RKHS) [5]. This latter field is gaining increasing interest within the signal processing community as it provides a simple but elegant way to treat nonlinearities. Complex kernel-based algorithms have been lately proposed for regression [6, 7, 8], kernel principal component analysis [9] or classification [10].

Regarding complex-valued regression within the RKHS framework, we have recently highlighted in [11] the need of a new term: the pseudo-kernel. We redefined the kernel based regularized least squares regression to include the pseudo-kernel, and the resulting structure resembles that of the widely linear (WL) solutions, being capable of learning any complex-valued function effectively. As discussed in [11], the need for a pseudo-kernel can be justified in cases where the real and imaginary parts are correlated and learning them independently is, at best, suboptimal. Also, a pseudo-kernel is needed when the real and imaginary parts are not best represented by the same kernel, i.e., the same measure of similarity. Furthermore, we analyzed in [11] the structure of the kernel and pseudo-kernel, and discussed how to design these functions, and when should they be real or complex-valued. As a result, two important remarks were made. First, if the real and imaginary parts of the output are independent, then the kernel and pseudo-kernel should be real-valued. Second, if the real and imaginary parts of the output have different properties in terms of similarity, the pseudo-kernel is needed. On the contrary, the pseudo-kernel vanishes if the real and imaginary parts of the output are independent but have same properties in terms of similarity, i.e., the same kernel can be used for the real and imaginary parts.

In the design of adaptive nonlinear approaches, the authors in [6] address the problem of adaptive filtering of complex signals and calculate the gradient of cost functions by using Wirtinger’s derivatives. Two alternatives are described. The first alternative proposes using real kernels, by means of the technique called complexification of real RKHSs. The second one proposes the use of complex kernels, in particular, the complex Gaussian kernel [10]. By means of these two alternatives they develop two realizations of the kernel least-mean-square (KLMS) algorithm [12]. The same complex Gaussian kernel is also adopted in [13] and in [14], where they propose to introduce WL adaptive filters in complex RKHS to solve a nonlinear filtering task. Augmented or WL filters consider both the original values of the signal data and their conjugates [15], and are able to capture the full second-order statistical characteristics of the signal [16, 17]. This is highlighted in [14] as a key starting point to develop the augmented complex KLMS. The authors remark that “the natural choice for kernels, in the context of the WL filtering structure, are the pure complex kernels”. Other augmented complex kernel algorithms have also been proposed [18, 19, 20].

In this paper, we propose a novel generalized formulation for the adaptive complex KLMS algorithm. In light of the findings in [11], we herein develop the generalized complex KLMS algorithm, that we call gCKLMS, which includes a kernel and a pseudo-kernel term. We show that the kernel and pseudo-kernel in the gCKLSM have the same structures found in [11], and we can use the analysis in that work to design these two functions. Unlike in [14], we conclude that a complex kernel is not always the best choice for the adaptive complex KLMS algorithm, as we will show later in the experiments in Section VI. We also show that previous proposed complex or augmented complex KLMS algorithms are limited, as they are just particular simplifications of the more general formulation proposed in this paper.

Our starting point is the definition of the RKHS for the composite representation of complex-valued functions. This is the representation of a complex-valued function as a two-dimensional real-valued vector function, obtained by stacking the real part of the function over the imaginary part. We devote Section II to review the theory of kernels for multi-task learning [21] as a suitable feature map representation of the composite function. In Section III we develop the KLMS algorithm for the composite representation. The composite representation of a complex-valued function is related to its augmented representation [17]. This is the representation as a two dimensional complex vector with the complex-valued function on top of its complex conjugate. By using this relationship, the formulation for the gCKLMS algorithm is found in Section IV. In this section we also show the equations of the kernel and pseudo-kernel terms. In Section V we compare the gCKLMS with other complex KLMS algorithms in the literature to show that they are particular cases of the gCKLMS. Experiments are included in Section VI, where the gCKLMS algorithm is tested first in the context of a nonlinear channel equalization task, and then in the learning of samples of a filtered random process. These experiments show that the gCKLMS outperforms other KLMS algorithms, as it has both a kernel and a pseudo-kernel term. By making use of the remarks in [11], we have more suitable designs for the kernel and the pseudo-kernel that greatly improve the predictions. We end the paper with some conclusions in Section VII.

In the notation used throughout the paper, bold lower-case letters are used to denote vectors, while matrices are denoted using bold upper-case letters. For matrix $\mathbf{A}$ , $[{\mathbf{A}}]_{{l},{q}}$ is its $(l,q)$ entry. To denote the $i$ -th sample of a vector or signal we use, respectively, $\mathbf{a}(i)$ and $a(i)$ . ${\mathcal{R}}\left\{a\right\}$ is the real part of $a$ . Transpose operation is represented by ⊤, while H represents the Hermitian and ∗ complex conjugation. $\mathbb{E}[\cdot]$ is the expectation operator.

II RKHS of composite vector-valued functions

A complex function $f(\mathbf{x})=f_{\textrm{r}}(\mathbf{x})+\textrm{j}f_{\textrm{j}}(\mathbf{x})$ can be represented as a composite vector-valued function $\mathbf{f}_{\mathbb{R}}{(\mathbf{x})}=[f_{\textrm{r}}(\mathbf{x})\;f_{\textrm{j}}(\mathbf{x})]^{\top}\in\mathbb{R}^{2}$ , also known as the dual real channel (DRC) formulation, by stacking its real part on its imaginary part. The definition of the RKHS for vector-valued functions [21] parallels the one for scalar functions [22], with the main difference that the reproducing kernel is now matrix-valued [23], [21].

Let $\mathcal{H}$ be a Hilbert space of functions $\mathbf{f}$ on a set $\mathcal{X}$ with values in $\mathcal{Y}$ . $\mathcal{H}$ is a RKHS when for any $\mathbf{x}\in\mathcal{X}$ and any $\mathbf{{y}}\in\mathcal{Y}$ the linear functional which maps $\mathbf{f}$ to $(\mathbf{{y}},\mathbf{f}(\mathbf{x}))_{\mathcal{Y}}$ is continuous on $\mathcal{H}$ [21]. Here, $(\cdot,\cdot)_{\mathcal{Y}}$ represents the inner product in the Hilbert space $\mathcal{Y}$ , while $\langle\cdot,\cdot\rangle_{\mathcal{H}}$ is the inner product in $\mathcal{H}$ .

From the Riesz Lemma, for every $\mathbf{x}\in\mathcal{X}$ and $\mathbf{{y}}\in\mathcal{Y}$ there is a linear operator ${\mathbf{K}}_{\mathbf{x}}:\mathcal{Y}\rightarrow\mathcal{H}$ , such that $(\mathbf{{y}},\mathbf{f}(\mathbf{x}))_{\mathcal{Y}}=\langle{\mathbf{K}}_{\mathbf{x}}\mathbf{{y}},\mathbf{f}\rangle_{\mathcal{H}}$ . Let us now introduce the linear operator ${\mathbf{K}}(\mathbf{x},\mathbf{x}^{\prime}):\mathcal{Y}\rightarrow\mathcal{Y}$ , for every $\mathbf{x},\mathbf{x}^{\prime}\in\mathcal{X}$ , defined by ${\mathbf{K}}(\mathbf{x},\mathbf{x}^{\prime})\mathbf{{y}}:=({\mathbf{K}}_{\mathbf{x}^{\prime}}\mathbf{{y}})(\mathbf{x})$ .

We say that ${\mathbf{K}}:\mathcal{X}\times\mathcal{X}\rightarrow\mathcal{L(Y)}$ , where $\mathcal{L(Y)}$ denotes the set of all bounded linear operators from $\mathcal{Y}$ to itself, is a matrix-valued kernel [21] (or operator-valued kernel if $\mathcal{Y}$ is not finite dimensional [24]) if it satisfies the following properties for every $\mathbf{x},\mathbf{x}^{\prime}\in\mathcal{X}$ :

(a)

For every $\mathbf{{y}},\mathbf{{y}}^{\prime}\in\mathcal{Y}$ , we have $(\mathbf{{y}},{\mathbf{K}}(\mathbf{x},\mathbf{x}^{\prime})\mathbf{{y}}^{\prime})_{\mathcal{Y}}=\langle{\mathbf{K}}_{\mathbf{x}}\mathbf{{y}},{\mathbf{K}}_{\mathbf{x}^{\prime}}\mathbf{{y}}^{\prime}\rangle_{\mathcal{H}}$ . 2. (b)

${\mathbf{K}}(\mathbf{x},\mathbf{x}^{\prime})=\bar{{\mathbf{K}}}(\mathbf{x}^{\prime},\mathbf{x})$ , and ${\mathbf{K}}(\mathbf{x},\mathbf{x})\in\mathcal{L_{+}(Y)}$ , where $\bar{{\mathbf{K}}}$ denotes the adjoint and $\mathcal{L_{+}(Y)}$ the set of all positive semi-definite bounded linear operators, i.e., $(\mathbf{{y}},{\mathbf{K}}(\mathbf{x},\mathbf{x})\mathbf{{y}})_{\mathcal{Y}}\geqslant 0$ for any $\mathbf{{y}}\in\mathcal{Y}$ . 3. (c)

For any positive integer $m$ , we have that $\sum_{l,q\in\{1,\cdots,m\}}(\mathbf{{y}}_{q},{\mathbf{K}}(\mathbf{x}_{q},\mathbf{x}_{l})\mathbf{{y}}_{l})_{\mathcal{Y}}\geqslant 0$ , for any $\mathbf{x}_{l},\mathbf{x}_{q}\in\mathcal{X}$ , $\mathbf{{y}}_{l},\mathbf{{y}}_{q}\in\mathcal{Y}$ .

Proof of these properties can be found in [21]. Also, it can be shown that if ${\mathbf{K}}$ is a kernel then there exists a unique (up to an isometry) RKHS of functions from $\mathcal{X}$ to $\mathcal{Y}$ which admits ${\mathbf{K}}$ as the reproducing kernel.

In the case of $\mathcal{Y}=\mathbb{R}^{2}$ , the kernel function ${\mathbf{K}}$ takes values as $2\times 2$ matrices and, from property (a), the matrix elements can be found as:

[TABLE]

where $\mathbf{e}_{l},\mathbf{e}_{q}$ are the standard coordinate bases in $\mathbb{R}^{2}$ , for ${l,q\in\{1,2\}}$ .

II-A Feature map

We next define a suitable feature map representation for the matrix-valued kernel that will be later useful in deriving the gCKLMS algorithm.

Every kernel ${\mathbf{K}}$ admits a feature map representation. A feature map is a continuous function $\boldsymbol{\upphi}:\mathcal{X}\rightarrow\mathcal{L(Y,W)}$ , where $\mathcal{L(Y,W)}$ denotes all bounded linear operators from $\mathcal{Y}$ into the feature Hilbert space $\mathcal{W}$ [24]. If $\bar{\boldsymbol{\upphi}}(\mathbf{x})$ is the adjoint of ${\boldsymbol{\upphi}}(\mathbf{x})$ , it is in $\mathcal{L(W,Y)}$ , and

[TABLE]

for any $\mathbf{x},\mathbf{x}^{\prime}\in\mathcal{X}$ .

In the case of finite dimensional Hilbert spaces $\mathcal{Y}=\mathbb{R}^{2}$ and $\mathcal{W}=\mathbb{R}^{m}$ , relative to standard basis of both spaces $\boldsymbol{\upphi}(\mathbf{x})$ is a $m\times 2$ matrix. Each entry of this matrix, $[{\boldsymbol{\upphi}(\mathbf{x})}]_{{p},{q}}=\phi_{pq}(\mathbf{x})$ is a scalar-valued continuous function of $\mathbf{x}\in\mathcal{X}$ , and each entry of the kernel is

[TABLE]

Note that when $\mathcal{Y}=\mathbb{R}$ , then ${\boldsymbol{\upphi}}(\mathbf{x})\in\mathcal{W}$ , but this is not the case here.

III The composite KLMS algorithm

Consider the training sequence of input-output pairs $\{(\mathbf{x}(1),{y}(1)),...,(\mathbf{x}(N),{y}(N))\}$ where ${y}(n)\in\mathbb{C}$ and $\mathbf{x}(n)\in\mathbb{C}^{d}$ . The goal is to uncover the underlying complex-valued function $f(\mathbf{x}(i))$ based on these examples, so that to minimize the mean square error $J=\mathbb{E}[|{y}(i)-f(\mathbf{x}(i))|^{2}]=\mathbb{E}[|e(i)|^{2}]$ . By using the composite notation, this can be written as

[TABLE]

where $\mathbf{f}_{\mathbb{R}}{(\mathbf{x})}=[f_{\textrm{r}}(\mathbf{x})\;f_{\textrm{j}}(\mathbf{x})]^{\top}$ and $\mathbf{{y}}_{\mathbb{R}}=[{y}_{\textrm{r}}\;{y}_{\textrm{j}}]^{\top}$ .

The least-mean-square (LMS) algorithm would consider a linear input-output mapping, i.e., $f(\mathbf{x}(i))=\mathbf{w}^{\mathrm{H}}\mathbf{x}(i)$ , and compute the weight vector $\mathbf{w}$ adaptively using stochastic gradient descent updates [25]. However, instead of a direct linear input-output mapping, the KLMS [12] is performed on the transformed inputs by using the feature map. We propose here to use the composite notations and the theory for RKHS of composite vector-valued functions described in the previous section. Therefore, we use the feature map $\boldsymbol{\upphi}:\mathcal{X}\rightarrow\mathcal{L(Y,W)}$ and set $\mathbf{f}_{\mathbb{R}}(\mathbf{x})=\bar{\boldsymbol{\upphi}}(\mathbf{x})\mathbf{w}$ , where $\mathbf{w}\in\mathcal{W}$ .

Note that in the general case $\mathcal{W}$ could be an infinite dimensional Hilbert space. For the particular case of $\mathcal{W}=\mathbb{R}^{m}$ , since $\mathcal{Y}=\mathbb{R}^{2}$ then ${\boldsymbol{\upphi}}(\mathbf{x})=[\boldsymbol{\Phi}_{\textrm{r}}(\mathbf{x})\;\boldsymbol{\Phi}_{\textrm{j}}(\mathbf{x})]$ is an $m\times 2$ matrix, where $\boldsymbol{\Phi}_{\textrm{r}}(\mathbf{x})$ and $\boldsymbol{\Phi}_{\textrm{j}}(\mathbf{x})$ are its first and second column, respectively, and $\bar{\boldsymbol{\upphi}}(\mathbf{x})={\boldsymbol{\upphi}}^{\top}(\mathbf{x})$ :

[TABLE]

The objective is now the minimization of

[TABLE]

It is easy to show that the gradient is

[TABLE]

and the update equation for $\mathbf{w}$ using the stochastic gradient yields

[TABLE]

If we set $\mathbf{w}(0)=\mathbf{0}$ , the repeated application of the weight-update equation (8) yields

[TABLE]

At instant $i$ the output can be estimated using the last updated weights, $\mathbf{w}(i-1)$ , as $\hat{\mathbf{{y}}}_{\mathbb{R}}(i)=\mathbf{f}_{\mathbb{R}}(\mathbf{x}(i))=\boldsymbol{\upphi}^{\top}(\mathbf{x}(i))\mathbf{w}(i-1)$ . Therefore, the input-output operation of the composite KLMS algorithm can be expressed as

[TABLE]

where the matrix-valued kernel yields:

[TABLE]

Notice that this kernel matrix follows the structure introduced in [11] for the WL-RKHS, and is composed of four scalar real functions.

IV The proposed generalized complex KLMS algorithm

Any real-valued composite vector representation $\mathbf{{y}}_{\mathbb{R}}=[\mathbf{{y}}^{\top}_{\textrm{r}}\;\mathbf{{y}}^{\top}_{\textrm{j}}]^{\top}\in\mathbb{R}^{2n}$ of any complex-valued vector $\mathbf{{y}}=\mathbf{{y}}_{\textrm{r}}+\textrm{j}\mathbf{{y}}_{\textrm{j}}\in\mathbb{C}^{n}$ , can be related to the complex augmented vector $\underline{\mathbf{{y}}}=[\mathbf{{y}}^{\top}\;\mathbf{{y}}^{\mathrm{H}}]^{\top}\in\mathbb{C}^{2n}$ representation, which is obtained by stacking $\mathbf{{y}}$ on top of its complex conjugate $\mathbf{{y}}^{*}$ . The relation is $\underline{\mathbf{{y}}}=\mathbf{T}_{n}\mathbf{{y}}_{\mathbb{R}}$ , where

[TABLE]

which is a unitary matrix up to a factor of 2: $\mathbf{T}_{n}\mathbf{T}_{n}^{\mathrm{H}}=\mathbf{T}_{n}^{\mathrm{H}}\mathbf{T}_{n}=2\mathbf{I}$ , where $\mathbf{I}$ is the identity matrix.

We can now apply this relation to (III) to calculate:

[TABLE]

Here we have the augmented error vector $\underline{\mathbf{e}}(l)=\mathbf{T}_{1}\mathbf{e}_{\mathbb{R}}(l)=[e(l)\;e^{*}(l)]^{\top}$ , and the augmented kernel matrix

[TABLE]

where by using (III) the complex kernel and complex pseudo-kernel can be identified, respectively, as

[TABLE]

Notice that this kernel and pseudo-kernel follow the structure introduced in [11].

The first entry of $\hat{\underline{\mathbf{{y}}}}(i)$ in (13) yields the proposed generalized complex KLMS (gCKLMS):

[TABLE]

V Connection with other algorithms

In [6], two realizations of the complex-valued KLMS (CKLMS) algorithm were developed by following two methodologies. The first approach is based on using a complex-valued kernel for a complex RKHS through the associated feature map. In this approach, denoted in [6] as CKLMS2, the output yields:

[TABLE]

The second alternative is the complexification approach of real RKHSs. In this approach, it is defined the space of complex functions $f(\mathbf{x})=f_{1}(\mathbf{x})+\textrm{j}f_{2}(\mathbf{x})$ where $f_{1}(\mathbf{x})$ and $f_{2}(\mathbf{x})$ are in a RKHS of real functions with real kernel $k_{{\mathcal{R}}}$ . Then, the complexified real kernel trick allows to construct a kernel adaptive algorithm denoted in [6] as CKLMS1:

[TABLE]

Notice that the kernel used in this CKLMS1 algorithm is a real-valued function.

In [14] it is employed the framework of [6] to develop widely linear adaptive filters in complex RKHS. Two realizations of the augmented CKLMS (ACKLMS) were proposed. First, by using the complexification approach they obtain exactly the same formula (19) for the CKLMS1 algorithm (except for a rescaling) [14]. On the other hand, when a pure complex-valued kernel is used, the ACKLMS algorithm yields

[TABLE]

At this point it is interesting to note that (20) and (19) are the same. If we take $e(l)$ as a common factor in (20), it follows:

[TABLE]

Hence (20) and (19) provide the same learning process, since in both cases the kernel is real. In fact, they yield the same formula with ${\mathcal{R}}\left\{{k}\right\}={k}_{{\mathcal{R}}}$ .

Next, we show that algorithms CKLMS1, CKLMS2 [6] and ACKLMS [14] are particular cases of our proposed gCKLMS algorithm in (17). They yield a subset of the cases the gCKLMS algorithm presented in this paper can represent.

First, these approaches do not have a pseudo-kernel term, therefore they provide simplified limited versions and hence a reduction on the flexibility the general algorithm provides. It is easy to check that if we set the pseudo-kernel equal to zero in (17) the gCKLMS reduces to the CKLMS2 in (18). However, to have $\tilde{{k}}(\mathbf{x}(i),\mathbf{x}(l))=0$ in (16) the following conditions must be satisfied:

[TABLE]

and the kernel in (15) yields

[TABLE]

Second, if in addition to $\tilde{{k}}(\mathbf{x}(i),\mathbf{x}(l))=0$ we now set ${k}_{\textrm{rj}}(\mathbf{x}(i),\mathbf{x}(l))=0$ , then the kernel in (23) becomes a real-valued function ${k}(\mathbf{x}(i),\mathbf{x}(l))=2{k}_{\textrm{rr}}(\mathbf{x}(i),\mathbf{x}(l))$ , and the gCKLMS simplifies to the CKLMS1 in (19) or the ACKLMS in (21).

In Table I we summarize the algorithms and the conditions they impose on the kernel and pseudo-kernel terms.

V-A Kernel design

The conditions that the algorithms impose on the kernel and pseudo-kernel terms must be carefully analyzed in order to choose the best algorithm and kernels for a given learning problem.

The kernel in a RKHS learning algorithm encodes our assumptions about the function that is being learned [5] and provides a measure of similarity between the inputs. In [11] the kernel and pseudo-kernel in (15)-(16) are analyzed, and several remarks are provided to help designing them and deciding when they should be real or complex-valued. We use that analysis here to understand the implications of the conditions that each algorithm impose.

We start with the conditions imposed when the pseudo-kernel is null, i.e., the conditions in (22) that yield the complex-valued kernel in (23). For any two inputs $\mathbf{x}$ and $\mathbf{x}^{\prime}$ , the first condition ${k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})={k}_{\textrm{jj}}(\mathbf{x},\mathbf{x}^{\prime})$ implies that the same measure of similarity must be used with the real and the imaginary parts of the function [11]. Hence, if we impose a null pseudo-kernel, we cannot use a kernel for the real part, ${k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})$ , and another different design for the imaginary part, ${k}_{\textrm{jj}}(\mathbf{x},\mathbf{x}^{\prime})$ . The second condition is ${k}_{\textrm{jr}}(\mathbf{x},\mathbf{x}^{\prime})=-{k}_{\textrm{rj}}(\mathbf{x},\mathbf{x}^{\prime})$ . But we also have ${k}_{\textrm{jr}}(\mathbf{x},\mathbf{x}^{\prime})={k}_{\textrm{rj}}(\mathbf{x},\mathbf{x}^{\prime})$ , because the kernel matrix ${\mathbf{K}}(\mathbf{x},\mathbf{x}^{\prime})$ in (III) is positive semi-definite. Therefore, ${k}_{\textrm{rj}}(\mathbf{x},\mathbf{x}^{\prime})=-{k}_{\textrm{rj}}(\mathbf{x},\mathbf{x}^{\prime})$ . This imposes a skew-symmetry in the measure of similarity between the real and the imaginary parts of the function.

As an example, the complex Gaussian kernel proposed in [6] for the CKLMS2 algorithm:

[TABLE]

follows the form given in (23) and fulfils the conditions in (22), i.e., the symmetries that yield a null pseudo-kernel. However, this kernel measures similarities between the real parts of the inputs with $|\mathbf{x}_{\textrm{r}}-\mathbf{x}_{\textrm{r}}^{\prime}|^{2}$ , while for the imaginary ones it uses $|\mathbf{x}_{\textrm{j}}+\mathbf{x}_{\textrm{j}}^{\prime}|^{2}$ , where $|\cdot|$ is the $\ell^{2}$ -norm. Also, it is not stationary, has an oscillatory behavior, and the exponent in the kernel may easily grow large and positive [26]. This might cause numerical problems and, as we show later in the experiments, it does not yield the best performance.

The skew-symmetry ${k}_{\textrm{rj}}(\mathbf{x},\mathbf{x}^{\prime})=-{k}_{\textrm{rj}}(\mathbf{x},\mathbf{x}^{\prime})$ imposed for the CKLMS2 algorithm may not be a property satisfied by many to-be-learned functions and, in such a case, enforcing a complex-valued kernel can be counterproductive. Algorithms CKLMS1 [6] and ACKLMS [14] avoid this problem by adding another condition: ${k}_{\textrm{rj}}(\mathbf{x}(i),\mathbf{x}(l))=0$ . Therefore, these algorithms use a real-valued kernel ${k}(\mathbf{x},\mathbf{x}^{\prime})=2{k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})$ . The condition ${k}_{\textrm{rj}}(\mathbf{x}(i),\mathbf{x}(l))=0$ implies that the real and the imaginary parts are not related and that one of them does not provide information to learn the other [11].

We conclude that algorithms CKLMS1, CKLMS2 [6] and ACKLMS [14] cannot represent any possible complex-valued function, and yield a subset of the cases that the gCKLMS algorithm proposed in this paper can represent. The gCKLMS, with the kernel and the pseudo-kernel terms, provides more flexibility to model the learning problem by means of the four real-valued functions ${k}_{\textrm{rr}}$ , ${k}_{\textrm{jj}}$ , ${k}_{\textrm{rj}}$ and ${k}_{\textrm{jr}}$ . Hence, the gCKLMS will provide the best result if the conditions described above are not suitable for our learning problem, i.e., when the real and imaginary parts are better represented with different kernels, or they are not independent, or the skew-symmetry imposed by ${k}_{\textrm{rj}}(\mathbf{x},\mathbf{x}^{\prime})=-{k}_{\textrm{rj}}(\mathbf{x},\mathbf{x}^{\prime})$ does not hold.

We end this discussion about the kernels by bringing here a suitable real-valued function for ${k}_{\textrm{rr}}$ , ${k}_{\textrm{jj}}$ , ${k}_{\textrm{rj}}$ and ${k}_{\textrm{jr}}$ proposed in [11]. This is the adaptation to complex-valued inputs of the real-valued Gaussian kernel:

[TABLE]

where $\gamma$ is the kernel parameter. This real function provides a measure of similarity between the complex-valued inputs that is simple but effective for complex-valued signals: inputs closer to other input in the complex field are considered more similar than inputs that are further away [11]. We will use it in our experiments. For a further analysis about the selection of suitable kernels for complex-valued applications see [11, 26].

VI Experiments

We consider two experiments where we compare the performance of our proposal the gCKLMS in (17), versus the CKLMS2 in (18) [6] and the ACKLMS algorithm in (21) [14].

In the first experiment we reproduce the nonlinear channel equalization task in [14]. In this experiment the complex-valued signals have independent real and imaginary parts, and they are better represented with different kernels. We show that in such a case the best choice is a real kernel and a real pseudo-kernel.

In the second experiment we propose learning a filtered two-dimensional random process. At the output of the filter the real and imaginary parts are not independent, and we show that we can use the imaginary part of the pseudo-kernel to improve the performance.

As in [14], we use the complex Gaussian kernel $k_{\mathbb{C}G}(\mathbf{x},\mathbf{x}^{\prime})$ in (24) for both the CKLMS2 and the ACKLMS. In fact, for the ACKLMS the real part of this kernel is used, as was shown in (21). We use the code available in [27] to run the algorithms.

For our proposed gCKLMS we use the general kernel and pseudo-kernel in (15)-(16). For ${k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})$ , ${k}_{\textrm{jj}}(\mathbf{x},\mathbf{x}^{\prime})$ , ${k}_{\textrm{jr}}(\mathbf{x},\mathbf{x}^{\prime})$ and ${k}_{\textrm{rj}}(\mathbf{x},\mathbf{x}^{\prime})$ we propose to use the real-valued Gaussian kernel ${k}_{G}(\mathbf{x},\mathbf{x}^{\prime})$ in (25) with parameter $\gamma=\gamma_{\textrm{r}}$ for ${k}_{\textrm{rr}}$ , $\gamma=\gamma_{\textrm{j}}$ for ${k}_{\textrm{jj}}$ , $\gamma=\gamma_{\textrm{rj}}$ for ${k}_{\textrm{rj}}$ , and $\gamma=\gamma_{\textrm{jr}}$ for ${k}_{\textrm{jr}}$ , respectively. The kernel and pseudo-kernel can be simplified if the signals meet any of the conditions discussed in Section V-A. For example, if the real and imaginary parts of the signals are independent we can set ${k}_{\textrm{jr}}(\mathbf{x},\mathbf{x}^{\prime})={k}_{\textrm{rj}}(\mathbf{x},\mathbf{x}^{\prime})=0$ and the kernel and pseudo-kernel are real-valued:

[TABLE]

We use this simplification in the first experiment. Notice that if we also assume that the real and imaginary parts of the output use the same kernel, ${k}_{\textrm{rr}}={k}_{\textrm{jj}}$ , then we should set $\gamma_{\textrm{r}}=\gamma_{\textrm{j}}$ and the pseudo-kernel term in (27) cancels. In such a case, as explained in Section V, the gCKLMS approach simplifies to the ACKLMS with real-valued kernel ${k}(\mathbf{x},\mathbf{x}^{\prime})=2{k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})$ , where ${k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})$ is as in (25) with $\gamma=\gamma_{\textrm{r}}$ . We will refer to this case as ACKLMS with kernel (25) in the experiments.

VI-A Nonlinear channel equalization

We face the problem of nonlinear channel equalization, as in [6] and [14], for ease of comparison and continuity. The channel consists of a linear filter and a memoryless nonlinearity. The two nonlinear channels in [14] have been considered here. The first channel is the soft nonlinear channel, with linear filter

[TABLE]

followed by the nonlinearity

[TABLE]

The second one is the strong nonlinear channel, with linear filter

[TABLE]

and nonlinearity

[TABLE]

At the receiver, the signal $q(n)$ is corrupted by additive white circular Gaussian noise with an SNR of 15 dB to yield the received signal $r(n)$ . The inputs to the equalizer are the sets of samples $\mathbf{x}=[r(n+D),r(n+D-1),\cdots,r(n+D-L+1)]^{\top}$ , where $L>0$ is the filter length and $D$ is the equalization time delay. Here we set $L=5$ and $D=2$ , as in [14]. The goal is to estimate the original input signal $s(n)$ .

VI-A1 Gaussian distributed inputs

We first set the input signals as in [14]: $s(n)=0.7(\sqrt{1-\rho^{2}}\cdot X(n)+\textrm{j}\rho\cdot Y(n))$ , where $X(n)$ and $Y(n)$ are independent Gaussian random variables, with $\rho=1/\sqrt{2}$ for circular signals, and $\rho=0.1$ for noncircular signals.

The real and imaginary parts of the signals are independent and, therefore, we can set ${k}_{\textrm{jr}}(\mathbf{x},\mathbf{x}^{\prime})={k}_{\textrm{rj}}(\mathbf{x},\mathbf{x}^{\prime})=0$ and use the real-valued kernel and pseudo-kernel terms in (26)-(27) for our proposed gCKLMS.

Experiments were conducted on 100 independent sets of 5000 samples of the input signal. For all the approaches, the novelty criterion [28, 29], is used for sparsification with $\delta_{1}=0.15$ and $\delta_{2}=0.2$ , as in [14].

Figs. 1 and 2 show the averaged mean square errors (MSE) for the soft nonlinear channel. The circular input case is shown in Fig. 1, and the noncircular input case is shown in Fig. 2. For the CKLMS2 and the ACKLMS algorithms we set $\gamma_{\mathbb{C}G}=10$ and $\mu=1/8$ , as in [14]. For the gCKLMS algorithm we set $\gamma_{\textrm{r}}=6.5$ and $\gamma_{\textrm{j}}=5.5$ , and $\mu=1/7$ . For the ACKLMS with kernel (25) we set $\gamma_{\textrm{r}}=5$ and $\mu=1/10$ .

Figs. 3 and 4 include the MSE for the strong nonlinear channel and the circular input and noncircular input cases, respectively. For the CKLMS2 and the ACKLMS algorithms $\gamma_{\mathbb{C}G}=15$ and $\mu=1/6$ , as in [14]. For the gCKLMS algorithm we set $\gamma_{\textrm{r}}=5$ and $\gamma_{\textrm{j}}=3$ , and $\mu=1/7$ . For the ACKLMS with kernel (25) we set $\gamma_{\textrm{r}}=5$ and $\mu=1/10$ .

In all the examples, the proposed gCKLMS outperforms the other algorithms. The main advantage of the gCKLMS is that by introducing a pseudo-kernel we can use a different kernel for the real and the imaginary parts, ${k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})$ and ${k}_{\textrm{jj}}(\mathbf{x},\mathbf{x}^{\prime})$ . This extra degree of freedom, which is not present in the other algorithms, is the key to obtain a better estimation. In this experiment, the gain in MSE is small, because ${k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})$ and ${k}_{\textrm{jj}}(\mathbf{x},\mathbf{x}^{\prime})$ are very similar. That is the reason why the ACKLMS with kernel (25), i.e., setting $\gamma_{\textrm{r}}=\gamma_{\textrm{j}}$ , performs well and close to the general case. In any case, it can be observed that to achieve a given error, the faster convergence of the gCKLMS allows saving 10%-30% of the samples and time.

With the complex Gaussian kernel, both the ACKLMS in [14] and the CKLM2 in [6] perform poorly compared to the gCKLMS. Therefore, the experiments show that the complex Gaussian kernel is not the best choice for this equalization task and, as it is shown in Fig. 2, sometimes yields undesired spikes in the learning curves.

VI-A2 Unbalanced digital modulated signals

In digital communications inputs are discrete. For discrete and unbalanced digital modulated signals, the difference between ${k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})$ and ${k}_{\textrm{jj}}(\mathbf{x},\mathbf{x}^{\prime})$ is greater and the proposed gCKLMS algorithm is a good choice versus the previous proposals, with null pseudo-kernels. To illustrate this point, we propose to repeat here the equalization experiment for the soft nonlinear channel, where the input signals are now $s(n)=0.2X(n)+\textrm{j}0.1Y(n)$ , where $X(n)$ and $Y(n)$ are independent binary $\{-1,+1\}$ data streams.

For the proposed gCKLMS algorithm we use again the real-valued Gaussian kernel (25) with parameters $\gamma_{\textrm{r}}=0.59$ for ${k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})$ and $\gamma_{\textrm{j}}=1.63$ for ${k}_{\textrm{jj}}(\mathbf{x},\mathbf{x}^{\prime})$ , respectively. For the ACKLMS with kernel (25) we set $\gamma_{\textrm{r}}=1.52$ . The learning parameter is set to $\mu=0.5$ for both approaches.

We generate 100 independent test trials with a set of 10000 samples to test the algorithms. The mean square errors (MSE) of the estimation are compared in Fig. 5 versus the number of input samples. It can be observed that the proposed gCKLMS outperforms the ACKLMS with kernel (25), i.e., the case with null pseudo-kernel.

Again, the key to obtain a better estimation with the gCKLMS in this experiment is the possibility to define a different kernel for the real part ${k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})$ and the imaginary part ${k}_{\textrm{jj}}(\mathbf{x},\mathbf{x}^{\prime})$ . Figs. 6 and 7 are included to highlight this. Fig. 6 shows the estimation MSE of the imaginary part of the signals, while Fig. 7 shows the estimation MSE of the real part. In this experiment the real and imaginary parts require a different kernel to be accurately learnt. However, the ACKLMS algorithm uses the same kernel for both parts. The parameter value $\gamma_{\textrm{r}}$ with best performance to learn the imaginary part of the output in Fig. 6 yields the worst estimation of the real part in Fig. 7. And vice versa, the best parameter value to learn the real part of the output in Fig. 7 provides the worst performance in Fig. 6. Remarkably, the estimation with the proposed gCKLMS is always low for both imaginary and real parts, as it allows to set different values for ${k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})$ and ${k}_{\textrm{jj}}(\mathbf{x},\mathbf{x}^{\prime})$ .

VI-B A random process filtered

In this experiment, we show the performance of the proposed gCKLMS when the signals do not have independent real and imaginary parts. We generate a complex-valued signal with correlated real and imaginary parts by filtering a real valued random process with a complex-valued filter. We define the complex-valued filter $h(x)=\alpha\cdot\left(2\exp(-|x|^{2}/3)+\textrm{j}\exp(-|x|^{2}/0.5\right)$ , where $x=x_{\textrm{r}}+\textrm{j}x_{\textrm{j}}$ , with $x_{\textrm{r}}\in[-5,5]$ and $x_{\textrm{j}}\in[-5,5]$ , and $\alpha=0.0228$ to ensure unit norm. Then we define a real Gaussian process $s(x_{\textrm{r}},x_{\textrm{j}})$ with zero mean and unit variance, and we pass this process through the filter. We show in Figs. 8 and 9 the real and imaginary parts of one sample of the filtered process in $x_{\textrm{r}}\in[-5,5]$ and $x_{\textrm{j}}\in[-5,5]$ . We adaptively learn this filtered process with our proposed gCKLMS algorithm and compare its performance with the ACKLMS with kernel (25).

For the gCKLMS, we make use of the terms ${k}_{\textrm{jr}}(\mathbf{x},\mathbf{x}^{\prime})$ and ${k}_{\textrm{rj}}(\mathbf{x},\mathbf{x}^{\prime})$ in (15)-(16). Since there is a relationship between the real and the imaginary parts of the filtered process, we do not assume these terms to be zero in this experiment, but we assume that ${k}_{\textrm{jr}}(\mathbf{x},\mathbf{x}^{\prime})={k}_{\textrm{rj}}(\mathbf{x},\mathbf{x}^{\prime})$ . Hence, (15)-(16) yield:

[TABLE]

and we use a real-valued kernel and a complex-valued pseudo-kernel [11]. For ${k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})$ and ${k}_{\textrm{jj}}(\mathbf{x},\mathbf{x}^{\prime})$ we again use the real-valued Gaussian kernel ${k}_{G}(\mathbf{x},\mathbf{x}^{\prime})$ in (25) with parameters $\gamma_{\textrm{r}}=1.73$ and $\gamma_{\textrm{j}}=0.58$ . For the imaginary part of the pseudo-kernel, we use ${k}_{\textrm{jr}}(\mathbf{x},\mathbf{x}^{\prime})=v\cdot{k}_{G}(\mathbf{x}^{\prime},\mathbf{x}^{\prime})$ with parameter $\gamma_{\textrm{jr}}=1.11$ . Note that the variable $v$ controls the amplitude of the imaginary part of the pseudo-kernel. It was set to $v=0.09$ . The learning step was set to $\mu=1/4$ .

For the ACKLMS we again propose ${k}(\mathbf{x},\mathbf{x}^{\prime})=2{k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})$ , with ${k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})$ the real-valued Gaussian kernel ${k}_{G}(\mathbf{x},\mathbf{x}^{\prime})$ in (25), since it yields better performance than the complex Gaussian kernel. The kernel parameter is set to $\gamma_{\textrm{r}}=0.76$ and $\mu=1/2$ .

We generate $100$ independent samples of the filtered process in $x_{\textrm{r}}\in[-5,5]$ and $x_{\textrm{j}}\in[-5,5]$ , each with $10000$ data points. A random white circular Gaussian noise is added to the samples. The averaged MSE of the estimation are compared in Fig. 10 versus the number of input points for two values of SNR, $15$ and $50$ dB. The proposed gCKLMS algorithm greatly outperforms the ACKLMS algorithm. We also include in the figure the performance of the gCKLMS when $v=0$ , i.e., when both the kernel and pseudo-kernel are real-valued. In such a case, the kernel parameters have been slightly modified to $\gamma_{\textrm{r}}=1.62$ and $\gamma_{\textrm{j}}=0.59$ in order to reach the best result. The figure shows that the imaginary part of the pseudo-kernel helps to improve the prediction accuracy by making use of the ${k}_{\textrm{jr}}(\mathbf{x},\mathbf{x}^{\prime})$ term in the pseudo-kernel.

VII Conclusions

In this paper, we have developed a novel generalized formulation for the complex-valued KLMS algorithm. Based on the ideas recently presented in [11] for the WL-RKHS, we have developed the gCKLMS algorithm that includes both a kernel and a pseudo-kernel. We reviewed the theory of RKHS of vector-valued functions to define the feature map for the RKHS of composite vector-valued functions. Based on this definition, we were able to develop the composite KLMS algorithm to later rewrite it in augmented notation and, finally, yield the proposed gCKLMS algorithm. Also, in this process we were able to identify the equations that define the kernel and pseudo-kernel. These equations follow the structure introduced in [11], and include four real-valued functions: ${k}_{\textrm{rr}}(\mathbf{x},\mathbf{x}^{\prime})$ , ${k}_{\textrm{jj}}(\mathbf{x},\mathbf{x}^{\prime})$ , ${k}_{\textrm{rj}}(\mathbf{x},\mathbf{x}^{\prime})$ and ${k}_{\textrm{jr}}(\mathbf{x},\mathbf{x}^{\prime})$ . We can use the analysis in [11] to design this real-valued functions and set the kernel and pseudo-kernel for a given application. Another important contribution of the paper is to show that previous proposed complex-valued KLMS algorithms are just particular simplifications of the gCKLMS proposed in this paper. The experiments included reveal that the gain of using the gCKLMS algorithm, which provides more flexibility than the previous proposed algorithms, can be significant.

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. J. Schreier and L. L. Scharf, Statistical Signal Processing of Complex-Valued Data. The Theory of Improper and Noncircular Signals . Cambridge, UK: Cambridge University Press, 2010.
2[2] A. Hirose, Complex-Valued Neural Networks: Advances and Applications , ser. IEEE Press Series on Computational Intelligence. Wiley, 2013.
3[3] M. E. Valle, “Complex-valued recurrent correlation neural networks,” IEEE Transactions on Neural Networks and Learning Systems , vol. 25, no. 9, pp. 1600–1612, Sept 2014.
4[4] D. Mandic and V. S. L. Goh, Complex Valued Nonlinear Adaptive Filters: Noncircularity, Widely Linear and Neural Models . Wiley Publishing, 2009.
5[5] B. Schölkopf and A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , ser. Adaptive computation and machine learning. Cambridge, Massachusetts: MIT Press, 2002.
6[6] P. Bouboulis and S. Theodoridis, “Extension of Wirtinger’s calculus to reproducing kernel Hilbert spaces and the complex kernel LMS,” IEEE Trans. Signal Processing , vol. 59, no. 3, pp. 964–978, 2011.
7[7] S. Van Vaerenbergh, M. Lázaro-Gredilla, and I. Santamaría, “Kernel recursive least-squares tracker for time-varying regression,” Neural Networks and Learning Systems, IEEE Transactions on , vol. 23, no. 8, pp. 1313–1326, Aug 2012.
8[8] R. Boloix-Tortosa, J. J. Murillo-Fuentes, F. J. Payán-Somet, and F. Pérez-Cruz, “Complex Gaussian processes for regression,” IEEE Transactions on Neural Networks and Learning Systems , vol. 29, no. 11, pp. 5499–5511, Nov 2018.