Study of Robust Distributed Diffusion RLS Algorithms with Side Information for Adaptive Networks
Y. Yu, H. Zhao, R. C. de Lamare, Y. Zakharov, L. Lu

TL;DR
This paper introduces robust diffusion RLS algorithms for adaptive networks that effectively handle impulsive noise by incorporating side information and reducing computational complexity, with proven convergence and superior performance.
Contribution
The paper presents novel diffusion RLS algorithms with side information and reduced complexity, enhancing robustness against impulsive noise in adaptive networks.
Findings
Algorithms outperform existing methods in impulsive noise scenarios.
Proposed methods demonstrate mean-square convergence.
Reduced complexity makes the algorithms practical for real-world applications.
Abstract
This work develops robust diffusion recursive least squares algorithms to mitigate the performance degradation often experienced in networks of agents in the presence of impulsive noise. The first algorithm minimizes an exponentially weighted least-squares cost function subject to a time-dependent constraint on the squared norm of the intermediate update at each node. A recursive strategy for computing the constraint is proposed using side information from the neighboring nodes to further improve the robustness. We also analyze the mean-square convergence behavior of the proposed algorithm. The second proposed algorithm is a modification of the first one based on the dichotomous coordinate descent iterations. It has a performance similar to that of the former, however its complexity is significantly lower especially when input regressors of agents have a shift structure and it is well…
| Parameters: , , and (R-dRLS part); |
| , and (NC part) |
| Initialization: , and (R-dRLS part); |
| , , |
| and (NC part) |
| for iteration |
| for node |
| [R-dRLS part:] |
| [NC part:] |
| Step 1: to compute |
| if |
| end |
| Step 2: to reset |
| if |
| , |
| elseif |
| else |
| end |
| end |
| end |
| Parameters:, |
| Initialization: |
| for |
| while and |
| , |
| end |
| if |
| break |
| else |
| end |
| end |
| Parameters: , , and |
| Initialization: , and |
| for each node : |
| Using DCD to solve , yielding |
| and |
| if |
| Using DCD to solve , yielding |
| and |
| end |
| Algorithms | Multiplications | Additions | Divisions | Square-root |
|---|---|---|---|---|
| dLMS | - | - | ||
| dRLS | - | |||
| DCD-dRLS | ||||
| without shift structure in input | - | - | ||
| DCD-dRLS | ||||
| with shift structure in input | - | - | ||
| R-dRLS | 1 | |||
| DCD-R-dRLS | ||||
| without shift structure in input | ||||
| DCD-R-dRLS | ||||
| with shift structure in input |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Adaptive Filtering Techniques · Blind Source Separation Techniques · Chaos control and synchronization
Study of Robust Distributed Diffusion RLS Algorithms with Side Information for Adaptive Networks
Yi Yu, Haiquan Zhao, , Rodrigo C. de Lamare, , Yuriy Zakharov, , and Lu Lu This work is partially supported by National Nature Science Foundation of P.R. China (Nos: 61871461, 61571374, 61433011, 11472297). The work of Y. Zakharov is partly supported by the UK Engineering and Physical Sciences Research Council (EPSRC) through Grant EP/R003297/1. The work of L. Lu is supported by China Postdoctoral Science Foundation Funded Project under Grant 2018M640916. An earlier version of this work was reported in the conference presentation IEEE International Conf. Acoustics, Speech and Signal Processing (ICASSP), Alberta, Canada, April 2018 [1].Y. Yu is with the School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, China (e-mail: [email protected])H. Zhao is with the School of Electrical Engineering, Southwest Jiaotong University, Chengdu, 610031, China. (e-mail: [email protected]).R. C. de Lamare is with the CETUC, PUC-Rio, Rio de Janeiro 22451-900, Brazil, and also with the Department of Electronic Engineering, University of York, York YO10 5DD, U.K. (e-mail: [email protected]).Y. Zakharov is with the Department of Electronic Engineering, University of York, York YO10 5DD, U.K. (e-mail: [email protected]).L. Lu is with the School of Electronics and Information Engineering, Sichuan University, Chengdu, China. ([email protected]).
Abstract
This work develops robust diffusion recursive least squares algorithms to mitigate the performance degradation often experienced in networks of agents in the presence of impulsive noise. The first algorithm minimizes an exponentially weighted least-squares cost function subject to a time-dependent constraint on the squared norm of the intermediate update at each node. A recursive strategy for computing the constraint is proposed using side information from the neighboring nodes to further improve the robustness. We also analyze the mean-square convergence behavior of the proposed algorithm. The second proposed algorithm is a modification of the first one based on the dichotomous coordinate descent iterations. It has a performance similar to that of the former, however its complexity is significantly lower especially when input regressors of agents have a shift structure and it is well suited to practical implementation. Simulations show the superiority of the proposed algorithms over previously reported techniques in various impulsive noise scenarios.
Index Terms:
Distributed algorithms, diffusion cooperation, dichotomous coordinate-descent, impulsive noises, recursive least squares algorithms.
I Introduction
Over the past decade, distributed parameter estimation over wireless sensor networks with multiple nodes (agents) has attracted much attention. It only relies on the local data exchange between interconnected nodes, and therefore removes the requirement of a powerful central processor and, as such, reduces communications bandwidth of the traditional centralized estimation whilst retaining similar estimation performance [2, 3]. Distributed estimation has been applied to target localization [4], clustering [5], frequency estimation [6] and spectrum estimation in Cognitive radio (CR) [7, 8].
I-A Prior and Related Work
According to the cooperation strategies between interconnected nodes, existing algorithms can be categorized as incremental [9], consensus [10], and diffusion [11, 12, 13, 14] types. Among these, the diffusion strategy is popular, because it does not require a Hamiltonian cycle path as in the incremental type, thereby it is more robust to nodes/links failures; it is stable and shows a faster convergence rate and a lower mean-square error than that of the consensus approach. Several diffusion algorithms were proposed, e.g., diffusion least mean square (dLMS) algorithm [11] and its variable step size variants [15, 16].
In practice, the measurements can be corrupted by non-Gaussian noise with impulsiveness. Impulsive noise has small occurrence probability but much higher amplitude than the nominal measurements. It may occur due to atmospheric phenomena, or man-made due to either electric machinery in the operation environment [17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43]. Other examples are keyboard clicking or pen dropping in teleconference [44], double-talk in echo cancellation [45], biological noise [46] or ice cracking [47] in various underwater signals, out-of-band spectral leakage in CR [48], etc. In such scenarios, the conventional algorithms like the dLMS designed for Gaussian noise would undergo a significant performance deterioration. To this end, many robust distributed algorithms have been proposed. Some algorithms are based on the instantaneous gradient-descent method to minimize different robust criteria, for instance, the diffusion error nonlinearity (dEN) [49], diffusion least mean p-th power (dLMP) [50], diffusion sign error LMS (dSE-LMS) [51], and diffusion maximum correntropy [52] algorithms. Moreover, because of the insensitivity of correntropy to impulsive noise, the maximum total correntropy diffusion algorithm was proposed in [53] for the case of large outliers in communication links. Nevertheless, their main limitation is slow convergence especially when the nodes’ input signals are colored (highly correlated). As shown in [49], the dEN algorithm converges slower than the dSE-LMS algorithm. In [54], by resorting to the adaptive projected subgradient method, a robust diffusion algorithm was developed which projects the output errors onto halfspaces defined by Huber’s error function at each node, thereby speeding up the convergence. However, the setting of the parameters controlling the algorithm’s robustness requires prior knowledge of the noise distribution which is often unavailable.
It is well-known that due to the exponentially weighted least squares (EWLS) criterion, the diffusion recursive LS (dRLS) algorithm provides fast convergence even for colored signals [55, 56]. By means of the alternating direction method of multipliers to solve the EWLS problem, Mateos et al. proposed another type of distributed RLS algorithm [57]. Following this algorithm, to reduce computation and communication costs, its variants were presented by censoring observations with small innovations [58]. Likewise, these algorithms might experience convergence issues in impulsive noise environments, because impulsive noise samples are directly involved in the adaptation through output errors of nodes. For the single-agent case, many works have proposed RLS algorithms robust against impulsive noise, e.g., [59, 60]. However, distributed RLS-based techniques that are robust to impulsive noise have not been well investigated. The study in [61] develops the diffusion recursive least p-th power (dRLP) algorithm, while its robustness relies on the value of as the dLMP does.
Analogous to the RLS, distributed RLS requires high computational complexity. Apart from this, it may also suffer from numerical instability due to accumulation of round-off errors in finite-precision implementations [62]. Aiming to address these problems, an efficient alternative method is the dichotomous coordinate-descent (DCD) that solves a system of normal equations associated with the RLS-type algorithms [63, 62, 64, 65]. In particular, the DCD method only involves shift and addition operations, thus the DCD-based RLS algorithms reduce the computational cost and improve the numerical stability in contrast with the original RLS counterparts, whilst preserving comparable estimation performance. For this reason, reference [66] also explored the use of the DCD in distributed networks, and developed the DCD-dRLS algorithm. It is worth mentioning that, however, the development of the DCD-based algorithms in impulsive noise environments has not been studied in single nor multi -agent scenarios.
I-B Contributions
The focus of this paper is on developing robust distributed RLS algorithms for scenarios with impulsive noise. Specifically, our contributions are listed as follows:
-
A robust dRLS (R-dRLS) algorithm is developed by extending the framework of [59] to multi-agent scenarios with a diffusion distributed strategy. To ensure that the proposed R-dRLS algorithm has good convergence performance after an abrupt change in the set of parameters to be estimated, we also propose a diffusion-based non-stationary control (NC) method.
-
Theoretical insights into the mean square steady-state and evolution behaviors of the R-dRLS algorithm in impulsive noise environment are presented.
-
We employ the DCD method for developing recursions used in the adaptation step of the R-dRLS algorithm, resulting in the DCD-R-dRLS algorithm with similar learning performance. Remarkably, the DCD-R-dRLS algorithm brings a reduction in computational complexity; especially for shift structured input regressors, it reduces the order of complexity from to , where is the length of the estimated vector.
-
Simulation examples are presented to demonstrate the performance of the proposed algorithms in impusive noise scenarios described by either Bernoulli-Gaussian (BG) or -stable processes.
In comparison to the preliminary results [1] related to this work, the current version is further developed due to the main contributions 2) and 3). We slightly improve the NC method by a smoothing operation as shown in (15). Moreover, the effectiveness of the proposed algorithms are also verified in an application to distributed spectrum estimation.
This paper is organized as follows. In Section II, the estimation problem is described. The R-dRLS algorithm is derived in Section III. Analyses of its mean square behavior are presented in Section IV. In Section V, we review the DCD algorithm and propose the DCD-R-dRLS algorithm. In Section VI, extensive simulations are presented to verify the proposed algorithms. Finally, conclusions are given in Section VII.
Notation: Throughout the paper, all vectors are column vectors. We use the parenthesis on to denote matrices and vectors, and the subscript on to denote scalars. The superscript denotes the transpose, denotes the -norm of a vector, and denotes the expectation of random variables. We use to denote an enlarged column vector structured by stacking its columns on top of each other, to yield a diagonal matrix with its arguments, and to denote the trace of a matrix. is the identity matrix of size , is the Kronecker product, and is the column vector of length with all entries being one. For symmetric matrices and , the notation stands for , meaning that the matrix difference is positive semi-definite.
II Problem Formulation
Let us consider a diffusion network with nodes located at different positions in space, as shown in Fig. 1, where each node communicates only with its neighboring nodes by a link (single-hop communication). All nodes connected directly to node (including itself) are referred to as its neighborhood, denoted as . At every time instant , every node has access to an input regressor vector and an output measurement , which are related as:
[TABLE]
where is a parameter vector of size to be estimated, and is the additive noise at node k. The additive noises and are spatially and temporally independent for and . Moreover, any is independent of any . The model (1) is used in many applications [3, 67]. The objective of the in-network processing is to estimate , using the available data collected at nodes.
For this purpose, the global EWLS estimation problem is described as [55]:
[TABLE]
where is a regularization constant, and () is the forgetting factor. The dRLS algorithm solves (2) in a diffusion-based distributed manner [55]. As already mentioned in the introduction, the noise may be non-Gaussian with impulsiveness so that the algorithms derived from (2), e.g., the dRLS algorithm, would exhibit poor convergence and even diverge. In general, when studying robust adaptive algorithms, both contaminated-Gaussian (CG) [51, 68] and -stable [69, 70] random processes are often used for modeling impulsive noise.
III Proposed R-RLS Algorithm
In this section, we derive the R-dRLS algorithm and propose a control method for endowing it with tracking capability. The diffusion strategy has two alternatives: the adapt-then-combine (ATC) and the combine-then-adapt (CTA). However, we focus only on the ATC policy, which performs first the adaptation step and then the combination step. This is based on the fact that the extension to CTA is straightforward by reversing the order of the adaptation and combination steps [2, 5]. In what follows, we neglect the notation ATC for brevity.
III-A dRLS Algorithm
To conveniently develop the R-dRLS algorithm, we re-derive here the dRLS algorithm from the following method instead of directly solving (2).
In the adaptation step, every node , at time instant , finds an intermediate estimate of by minimizing the individual local cost function:
[TABLE]
with , where
[TABLE]
is the time-averaged correlation matrix for the input vector at node , and is an estimate of at node at time instant . Notice that the quadratic form in (3) defines the Riemmanian distance between vectors and , where is a Riemannian metric tensor characterizing that the distance properties are not uniform along the -dimensional space [71, 72].
Setting the derivative of with respect to to zero, we obtain
[TABLE]
where
[TABLE]
stands for the output error at node , and
[TABLE]
with initialized as . The recursion (7) is the result of applying the matrix inversion lemma [67].
At the combination step, the intermediate estimates , from the neigborhood of node are linearly weighted, yielding a combined estimate [3]:
[TABLE]
where the combination coefficients are non-negative, and satisfy:
[TABLE]
Note that is a weight that node assigns to the intermediate estimate received from its neighbor node . If one assumes in (5), where , (5) is a standard RLS update for node . In summary, (5)-(8) formulate the dRLS algorithm. It is noteworthy that the term in (5) provides the decorrelating ability for colored inputs, thus speeding up the convergence.
Remark 1: In general, in (9) are determined by one of many static rules (e.g., the Metropolis rule [73] that we adopt in this paper) which keeps them constant during the estimation. Considering that nodes may be working under different signal-to-noise ratios (SNRs), several adaptive rules have been proposed to optimize the algorithm behavior [73, 74, 75]. However, these adaptive rules are severely polluted when impulsive noise samples appear, since the output errors at nodes directly participate in the adaptation of . Designing a robust adaptive rule is an alternative, but it is not the focus of this paper. In another approach, based on the detection of impulsive noise, Ahn et al. proposed a robust variable weighting coefficients dLMS (RVWC-dLMS) algorithm which sets the weighting coefficients to zero at nodes disturbed by impulsive noise [76]. Likewise, the RVWC scheme can be extended to dRLS in a straightforward way, resulting in the RVWC-dRLS algorithm with robustness in impulsive noises111In the literature, the RVWC scheme was presented for more general diffusion strategies (namely, also exchanging information among nodes in adaptation step). However, here we do not consider this case for a fair comparison. Besides, such general diffusion strategies require higher computational complexity and communication load [5]., as can be seen in the simulations later on.
III-B R-dRLS Algorithm
An impulsive noise sample at time instant might lead the dRLS algorithm to diverge via in (5) due to its large amplitude and the propagation of its effect. This degradation effect can last for many iterations. To endow the algorithm with robustness in impulsive noise scenarios, we propose to minimize (3) under the following constraint:
[TABLE]
where is a positive bound. A similar constraint appeared in an adaptive filter for a single agent scenario [59], but when generalizing to the distributed version with multiple agents, the constraint could be imposed on the adaptation at all the nodes. This constraint represents that the energy (squared norm) of the update at every node from to always does not exceed the amount regardless of the type of noise (possibly, impulsive noise), thereby guaranteeing the robustness of the algorithm. In doing so, if (5) satisfies (10), i.e.,
[TABLE]
where represents the Kalman gain vector, then (5) is a solution of the above constrained minimization problem. Conversely, if (10) is not satisfied (usually in the case of appearance of impulsive noise), i.e., , we propose to replace the update (5) by its normalized form to satisfy the equality in (10), which is described by
[TABLE]
where is the sign function. Thus, combining (5), (11) and (12), we obtain the adaptation step for each node as:
[TABLE]
Evidently, the crucial problem is how to properly choose the bound as it controls the robustness of the algorithm against impulsive noise and influences its dynamic behavior. To be more specific, we wish to have larger values at the earlier adaptation stage to provide a fast initial convergence, while for enforcing good robustness against impulsive noise, its values cannot be too large. In addition, we also wish to obtain a small estimation error at steady-state, so should be reduced to a small value. Based on these requirements, we consider the equality in (10) to propose a useful recursive method for adjusting , as described by
[TABLE]
where is a memory factor with . At every node , can be initialized by , where is a positive integer, and and are powers of the output measurement and the input regressor , respectively. As one can see in (14), every node not only uses its own adaptive rule to update , but also exploits the side information transmitted from its neighboring nodes by the diffusion cooperation. In doing so, the proposed R-dRLS algorithm is more effective at computing consistent estimates at all nodes, which will be observed in Section VI-A. Table I details the proposed R-dRLS algorithm together with the NC method.
Remark 2: As can be seen from (13), the operation mode of the proposed R-dRLS algorithm in the adaptation step can be as follows. At time instant , if , the classical RLS update is performed. If not, the squared norm of the RLS increment is limited to the amount as in (12) for guaranteeing the robustness in impulsive noise. At the early iterations, the values of can be high compared to so that the algorithm will behave as the dRLS algorithm, providing a fast initial convergence. On the other hand, whenever an impulsive noise sample appears, due to its significant magnitude, the R-dRLS algorithm will work as an dRLS update multiplied by a very small scaling factor . It has been shown in [77, 78] that in the adaptation update term, the multiplication of a small scaling factor can reduce the negative influence of impulsive noise on the estimation. Thus, this also indirectly implies that the R-dRLS algorithm has robustness against impulsive noise. Moreover, the robustness is further maintained over the iterations, due to the decreasing property of given by (14). In addition to this, the diminishing also leads to a reduction in the steady-state error of the algorithm. To sum up, the R-dRLS algorithm can be considered as an improved dRLS algorithm with a variable ’step-size’ scheme which has an automatic switch between 1 and , as can be observed in (13).
III-C NC Method
As a consequence of the diminishing sequence , the R-dRLS algorithm has poor ability of tracking (i.e., re-convergence of the algorithm) after undergoes an abrupt change. In order to overcome this problem, inspired by the idea in [45] for the single-agent scenario, we propose here a diffusion-based NC method, as summarized in Table I. The NC method is implemented in the following two steps.
Step 1: A variable at node is computed once for every iterations, to judge whether the unknown vector changed or not. In this step, with denoting the ascending arrangement for its arguments. With being a vector whose first elements set to one, where is a positive integer with , the product can remove the effect of outliers (e.g., impulsive noise samples) when computing . We use a smooth filtering of to avoid large fluctuations in computing (see Table I), as follows:
[TABLE]
where , , is a memory factor. Note that, every node to compute also combines the information from its neighboring nodes based on a diffused cooperation; stores the value of at the last time instant.
From Step 1, one can see that using a larger , the algorithm has lower steady-state error but a higher delay in tracking. Moreover, for a large occurrence probability of impulsive noise, the value of should be increased to better discard the impulsive noise samples in the computation of . From our extensive simulations, we found out that for both and , good choices are with and [45].
Step 2: If , where is a predefined threshold, it is decided that a change of has occurred. Then, we reset to its initial value so that the R-dRLS algorithm can track this change rapidly. Meanwhile, should also be re-initialized with .
It is worth noting that in this scheme the parameters , and are not affected by each other so that their choices are simplified.
IV Mean Square Performance Analyses
IV-A Steady-state Behavior
In this section, we discuss the steady-state behavior of the R-dRLS algorithm in impulsive noise. Assuming that the vector is invariant, then we define the estimate deviation and intermediate estimate deviation vectors respectively as:
[TABLE]
Using these definitions and (14), it is easy to rearrange (13) and (8), respectively, as:
[TABLE]
and
[TABLE]
Equating the squared -norm of both sides of (17) and then taking the expectation, we obtain
[TABLE]
Likewise treating (18) and applying Jensen’s inequality [79, p.77], we obtain
[TABLE]
Typically, is close to 1 so that the variances of and given in (14) would be small enough. Accordingly, it can be assumed that
[TABLE]
Then, with this approximation, (19) is changed to
[TABLE]
To deal with the (a) term in (22), some assumptions are helpful.
Assumption 1: The input regressors are zero-mean with covariance matrices and spatially independent.
Assumption 2: The regressors are independent of the estimates for and all , referred to as the independence assumption, which is known as useful in the analysis of adaptive algorithms [67] and distributed estimation algorithms [3, 80].
Assumption 3: There is an iteration number such that for all , the time-averaged matrix at every node can be replaced by its expected value . This is an ergodicity assumption since , and from (4) we have
[TABLE]
Correspondingly, we can also replace the random matrix by for a sufficiently large number of iterations . Note that such replacements are commonly used in the performance analysis of RLS-type algorithms, see [55, 67, 81, 56] and the references therein.
Applying assumption 3, we are able to represent the term (a) in (22) as:
[TABLE]
In the light of assumption 1, if the dimension of is large, i.e., , the fluctuation of the denominator term in (24) from one iteration to the next can be assumed to be small. So, we could make the following approximation (which is also verified in Appendix A):
[TABLE]
where
[TABLE]
Considering the presence of impulsive noise, we need the following assumptions to continue the analysis.
Assumption 4: At every node , the additive noise is drawn from a CG random process, , where is the background noise assumed to be zero-mean white Gaussian with variance . The impulsive part is described as , where is drawn from a Bernoulli random process with the probability , and is drawn from a white Gaussian random process with zero-mean and variance , . Usually, is also called the appearance probability of an impulsive noise sample.
Then, the mean and variance of are zero and , respectively. Note that, only when or 1, is Gaussian; otherwise, is non-Gaussian. Also, conditioned on is Gaussian [68]. Although the -stable process is more appropriate for modeling impulsive noise in practice [44, 69, 70], one would not consider it in the algorithms’ analysis because its probability density function has no explicit form. Accordingly, the above assumption was used frequently for performance analysis of adaptive algorithms in impulsive noise environments, providing mathematical tractability [51, 45, 82, 68].
Furthermore, as pointed out in [83], when , then by using the central limit theorem, it can be assumed that and are zero mean Gaussian variables for any constant matrix . Then, we can employ the following Lemma.
Lemma: Let and be jointly Gaussian zero-mean random variables. Let , where is a zero-mean CG random variable with variance , and is independent of and . If and , where and are zero-mean Gaussian random variables with variances and , and are independent of and , then
[TABLE]
Such a Lemma has been commonly used in the past for analyzing the sign-based algorithms [82, 51]. Based on Price’s theorem in [84], Lemma and assumption 2, we can establish the following equation
[TABLE]
where
[TABLE]
and the notation accounts for the expectation of conditioned on . Subsequently, the right-hand term in equality (25) becomes
[TABLE]
where the equality (a) is the result of using (26) under assumption 2.
Substituting (25) and (29) into (22), it is rearranged as
[TABLE]
Next, we introduce the following network global vectors:
[TABLE]
and the network global matrices
[TABLE]
Also, we define the matrix to collect the combination coefficients, i.e., . Following (31) and (32), we can formulate (20) and (30) for all nodes as follows:
[TABLE]
Taking the -norm for both sides of (33) leads to
[TABLE]
where the equality (a) uses the fact that in that the summation of each column of is 1. Since and are diagonal matrices with positive entries, (34) can be equivalently expressed as [12]:
[TABLE]
for . When the algorithm reaches the steady-state, i.e., as , from (35) we will get:
[TABLE]
In view of the result that and converge approximately to 0 as (see Appendix B) as well as and , thus, from (LABEL:036) we are able to deduce that
[TABLE]
As a result, (37) illustrates that based on given assumptions, the R-dRLS algorithm can converge to the true parameter vector in the mean-square sense after enough iterations even in impulsive noise environments.
IV-B Analysis of Evolution Behavior
The result (37) is qualitative so that it does not predict the steady-state performance of the algorithm, due mainly to the use of the upper bound relation (20). In this subsection, we will establish a recursive model to describe the evolution behavior of the algorithm in impulsive noise. We start by defining the following network vectors collected from all nodes:
[TABLE]
where
[TABLE]
for nodes . By these defined vectors, we can associate (17) with (18) at all the nodes:
[TABLE]
where . Post-multiplying (40) by its transpose and taking the expectation, we have
[TABLE]
where denotes the covariance matrix of the deviation vector , and its -th diagonal block of size , i.e., , represents the covariance matrix of the deviation vector at node .
To evaluate terms I-III in (41), in addition to the spatially independence in assumption 1, we also require the input regressors to be statistically independent in time, which is also often used in analysis of distributed estimation algorithms [3, 2]. Therefore, performing similar manipulations as in Section IV-A on the expectations under assumptions 2-4, Lemma and Price’s theorem, we can compute these three terms. Specifically, the term I in (41) becomes
[TABLE]
where we rewrite contained in as
[TABLE]
The term II in (41) is the transpose of (43). For any and belonging to the set , we define the -th matrix as follows:
[TABLE]
When , (44) represents the -th diagonal block of , which is described as
[TABLE]
where is the -th element of . When , the off-diagonal blocks will be simplified as
[TABLE]
From (45) and (LABEL:046), we obtain the term III in (41):
[TABLE]
where
[TABLE]
By substituting (42) and (47) into (41), we obtain the recursive expression for :
[TABLE]
The mean square deviation (MSD) at node is defined as , and the network MSD over all the nodes is defined as [2]. Equation (49) models the MSD evolution behavior of the algorithm. It needs to be mentioned that to implement the model (49), and defined in still need to be evaluated further. However, as shown in (14), and between interconnected nodes are affected by each other and there is a comparison operation, so it is difficult to provide an evolution expression for them. In this paper, we suggest that and are obtained by the ensemble average using simulations. Consequently, although (49) is a semi-analytic result, it can also be used to evaluate the convergence of the proposed algorithm.
V DCD-Based Algorithms
In this section, we review the DCD-dRLS algorithm from [66], and then develop a robust DCD-dRLS algorithm.
V-A The Original DCD-dRLS Algorithm
Since the dRLS algorithm involves the matrix operation of size in the computations of and (7) at every node, it requires a computational complexity that scales as a quadratic function of in terms of additions and multiplications per iteration . To reduce the complexity, the adaptation step of the DCD-dRLS algorithm is described as [66]:
[TABLE]
where the increment is obtained by solving the normal equation:
[TABLE]
[TABLE]
defines the residual vector at node at time instant :
[TABLE]
For reducing the complexity of computing and , the DCD method presented in Table II is used; see [63, 62, 64] for details. In Table II, is the -th entry of a vector , and and are the -th entry and the -th column of , respectively.
The accuracy and complexity of the DCD method are dependent on three parameters: , , and . In general, is chosen as a power-of-two number; is the number of bits being enough for a fixed-point representation of within an amplitude range ; and defines a maximum number of elements in that can be updated at a time instant. The DCD method only requires additions at most at each time instant with no multiplication [62]. Also, a larger makes the solution closer to the optimal solution in (51), but increases the number of additions. It follows that if , the DCD-based algorithm implements a selective partial update [85].
Similar to the dRLS algorithm, however, the DCD-dRLS algorithm will also encounter the performance deterioration when impulsive noise happens.
V-B Proposed DCD-R-dRLS Algorithm
To achieve robustness against impulsive noise, we present here the DCD-R-dRLS algorithm.
Step 1: At every node , we firstly use the DCD method to solve the normal equation (51) with (4) and (52), yielding a solution and residual vector . In the presence of impulsive noise, we also impose a constraint similar to that in (10):
[TABLE]
Step 2: If , we set and and then perform the update (50). Otherwise, we need to recalculate in (51) as:
[TABLE]
Subsequently, based on the DCD method, we obtain the solution and the residual vector from the normal equation (51) under (4) and (55), thereby performing the update (50) with the increment
[TABLE]
and .
Step 3: The combination step (12) is performed.
Step 4: The bound parameter in the DCD-R-dRLS algorithm is updated according to
[TABLE]
Table III summarizes the DCD-R-dRLS algorithm.
Remark 3: An impulsive noise sample appearing at time instant would yield a mismatch solution so that . In this case, the scaling factor in (55) is small enough to eliminate impulsive noise hidden in . A similar scaling factor in (56) is to make the increment satisfy the constraint (54). Consequently, the DCD-R-dRLS algorithm improves the robustness to impulsive noise relative to the DCD-dRLS algorithm. Moreover, the decreasing sequence shown in (57) further guarantees the robustness. It is worth noting that due to , the DCD-R-dRLS algorithm is a DCD-based variant of the R-dRLS algorithm. Unlike the R-dRLS algorithm, based on the NC method we re-initialize with to endow the DCD-R-dRLS algorithm with the tracking capability when suddenly changes.
Remark 4: Let denote the only required number of additions for the DCD algorithm, with . In Table IV, we provide the computational complexity of the existing dLMS, dRLS, DCD-dRLS, and both proposed R-dRLS and DCD-R-dRLS algorithms at node k per time instant , where denotes the cardinality of . For shift structured input regressor at node [54, 9], i.e., , where is an input sample at time instant , implementing in (4) is very simplified. In this situation, by copying the upper-left block of leads to the lower-right block of . The remaining part of that needs to be updated is the first row and first column. Owing to symmetry of , only calculating the first column is sufficient, which is formulated as:
[TABLE]
Note that, in the DCD-R-dRLS algorithm, represents the case at time instant (which leads to the maximum complexity), otherwise . The comparisons required in the R-dRLS and DCD-R-dRLS algorithms are counted as additions.
Consider an example with , and , Fig. 2 depicts the number of operations of some diffusion algorithms in terms of multiplications and additions at node at each time instant versus . It is clear that the computational complexity of the dLMS algorithm, with the order of , is much lower than that of the dRLS algorithm. As expected, since , compared with the standard dRLS and R-dRLS algorithms, their DCD versions obtain about reduction in both multiplications and additions for the case of general input regressors. However, for shift structured input regressors, the computational cost is drastically reduced from the order to , which is more pronounced in scenarios with large . Moreover, the multiplications required in the DCD-based algorithms are not dependent of . On the other hand, in contrast with the existing dRLS and DCD-dRLS algorithms, the additional complexities in the proposed R-dRLS and DCD-R-dRLS algorithms resulted from the computations of the scaling factor and the bound parameter are small. In addition to the complexity, for both proposed algorithms, each node increases communication cost of numbers for transmitting to its neighbors.
Remark 5: From the DCD-R-dRLS algorithm, we can directly obtain its special form for a single-agent scenario, referred it to as the DCD-R-RLS algorithm. In other words, the DCD-R-RLS algorithm is the DCD implementation of the algorithm presented in [59].
VI Simulation Results
Simulation examples are presented for a diffusion network with nodes on distributed parameter estimation and distributed spectrum estimation. The network topology adopted for all simulations is shown in Fig. 3(a), unless otherwise specified. Herein, we do not consider the measurement sharing in the adaptation step for all diffusion algorithms. The Metropolis rule [73] used for computing the combination coefficients in combination step is expressed as:
[TABLE]
VI-A Distributed Parameter Estimation
The vector to be estimated has a length of and a unit norm; it is generated randomly from a zero-mean uniform distribution. The input regressor has a shift structure, where is colored and generated by a second-order autoregressive system:
[TABLE]
where is a zero-mean white Gaussian process with variance . The background noise is zero-mean white Gaussian noise with variance . Variances and are shown in Fig. 3(b) and (c), respectively, for all the nodes. We employ the network MSD to assess the performance of algorithms. All results are the average over 200 independent trials.
Example 1: Except for the background noise , a cluster of impulses with length is also added to corrupt at iteration 222Such a scenario is similar to double-talk in echo cancellation.. The cluster is drawn from a zero-mean white Gaussian process, but with a large variance to generate impulsive samples, where denotes the power of . Fig. 4 compares the performance of the proposed R-dRLS algorithm with that of the dRLS and both LTVFF-dRLS and LCTVFF-dRLS algorithms presented in [81]. The parameters of the algorithms are set to make a comparable convergence rate. The regularization constant for all RLS-type algorithms is chosen as . It is clear to see, for a small forgetting factor , the conventional dRLS algorithm converges faster but has a higher estimation error; conversely, by increasing the forgetting factor, it has a lower estimation error but its convergence rate becomes slower. In particular, using a large forgetting factor , the dRLS will need more time to converge again after a cluster of impulses enforces the algorithm to diverge. Due to the use of variable forgetting factor schemes, both LTVFF-dRLS and LCTVFF-dRLS algorithms solve this performance trade-off to a certain extent. As stated in Remark 2, the R-dRLS algorithm also overcomes this performance trade-off since it employs a variable ’step-size’ factor in the adaptation step. Besides, unlike the dRLS, LTVFF-dRLS and LCTVFF-dRLS algorithms, even though a cluster of impulses does not happen until the algorithms reach the steady-state, the R-dRLS algorithm also does not undergo divergence. This is because the R-dRLS algorithm can judge by (11) whether impulses occur or not and perform corresponding updates.
Example 2: The additive noise is a CG process given in assumption 4. At every node , we set as a random number in the range of and . For a fair comparison of RLS-type algorithms, we choose the same forgetting factor =0.985 and regularization constant =0.01, except =0.5 in the dRLP and RVWC-dRLS algorithms.
Fig. 5 checks the validity of the semi-analytic result (49), where we plot at node 1 (having similar results at other nodes). To take into account the assumption on input regressors in analysis, here its entries are generated from a white Gaussian process . To compute (49), we use the same impulsive noise parameters: or 0.05, and at all the nodes. As one can see, the theoretical results have good fit with the simulated results. Moreover, obtained by the ensemble average of simulations is a decreasing function of the iteration , which further supports the theory in Appendix B.
Fig. 6 investigates the effect of the NC method on the R-dRLS algorithm. It can be seen that the R-dRLS algorithm will not re-converge after changes to at iteration . In this scenario, all algorithms have a large sharp phase transition of MSD due to the mismatch between and its estimate at that moment. The NC method can endow the R-dRLS algorithm with good tracking capability for such a change of . Benefited from the smoothing operation (15), the NC () only slightly degrades the steady-state performance of the R-dRLS algorithm compared with the non-smooth version in [1] (i.e., ).
In Fig. 7, we compare the performance of the dRLS, dSE-LMS, dLMP, RVWC-dRLS, and dRLP algorithms with that of the proposed R-dRLS with NC algorithm. Note that, the R-dRLS (no cooperation) is that each node performs a standalone adaptive algorithm presented in [59]. As expected, the dRLS algorithm has a poor performance in the presence of impulsive noise, while other algorithms are robust. Among these robust algorithms, the convergence of dSE-LMS and dLMP algorithms is slow. Thanks to the decorrelation property of dRLS, the RVWC-dRLS, dRLP, and R-dRLS with NC algorithms obtain fast convergence. In particular, the proposed R-dRLS with NC algorithm has also a large reduction in the steady-state MSD. This is due mainly to the fact that its updated energy described by (10) and (14) diminishes with iterations.
Example 3: The additive noise here is generated by the -stable process, also called the -stable noise. Its characteristic function is given by [72, 69], where the characteristic exponent describes the impulsiveness of the noise (smaller leads to more impulsive noise samples) and represents the dispersion level of the noise. In particular, when , it reduces to the Gaussian noise. It is rare to find -stable noise with in practice [72, 69]. In this example, thus we set and . The learning performance of the algorithms is shown in Fig. 8. Fig. 9 shows the node-wise steady-state MSD of the robust algorithms (i.e., excluding the dRLS), by averaging MSD values from iteration 2 400 to 2500. As can be seen from Figs. 8 and 9, the proposed R-dRLS algorithm with NC outperforms the known robust diffusion algorithms in terms of convergence rate, steady-state accuracy and tracking capability. As shown in Fig. 7 to Fig. 9, due to the cooperation of interconnected nodes, the R-dRLS algorithm improves the estimation performance compared with its non-cooperative counterpart.
We also perform the simulations for the network in Fig. 10 with less connections among nodes. Fig. 11 shows the node-wise steady-state MSD of those algorithms in Fig. 9. By comparing these two figures, it is seen that the proposed R-dRLS algorithm is more likely to reach the same estimates at all nodes.
Example 4: Comparison of DCD-algorithms. Figs. 12 and 13 compare the DCD-R-dRLS algorithm using different values with its standard version in CG-noise and -noise scenarios333Here the curves of both the R-dRLS and DCD-dRLS algorithms are omitted due to their divergence performance in impulsive noise.. The DCD parameters are and . It is seen that, the proposed DCD-R-dRLS algorithm is also robust to impulsive noises, and approaches the R-dRLS performance with increase in . In this example, the DCD-R-dRLS algorithm with has a good approximation to the R-dRLS algorithm, while the complexity of the former is significantly lower than that of the latter. Moreover, many simulations have been carried out in different impulsive noise scenarios by prolonging the iteration to a larger number than the one in Fig. 5, e.g., , using MATLAB R2013A on a Intel(R) Core(TM) i5-4590 CPU @ 3.30 GHz processor. We did not observe any numerical instability during the simulations for both proposed R-dRLS and DCD-dRLS algorithms.
VI-B Application: Distributed Spectrum Estimation
We have also tested the proposed algorithms’ performance in an application of distributed spectrum estimation in CR, in which the objective is to estimate the spectrum of a transmitted signal source in the network with nodes [7, 8, 81]. We use to denote the power spectral density (PSD) of the signal at frequency , where is a vector consisting of basis functions evaluated at normalized frequency , and stands for the power that transmits the signal over each of basis functions and needs to be estimated. Such basis expansion can accurately model the spectrum of the signal for large enough . Considering is the transfer function of the channel between the station emitting the signal and receiver node at time instant , the PSD of the received signal at node can be expressed as
[TABLE]
where , and is the received noise power at node .
At time instant , each node observes the received PSD expressed in (58) over frequency samples for ; accordingly, the output measurements of node obey the following relation:
[TABLE]
where denotes the observation noise at frequency . The noise power can be estimated with high accuracy before the spectrum estimation, using, for example, an energy estimator over an idle band, and then subtracted from (59) [7, 8, 81]. Then, by collecting the output measurements over frequencies, we obtain a data model at every node for distributed spectrum estimation:
[TABLE]
where , , and .
Based on this model, we estimate the unknown spectrum of the signal using different diffusion algorithms over the network given in Fig. 3(a). In the simulation [8, 81], we use nonoverlapping rectangular basis functions444Other basis functions are also possible, e.g., raised cosines, or Gaussian bells [7]. with amplitude equal to one to model the PSD of the signal . The nodes scan frequencies over the normalized frequency axis between 0 and 1. We assume that has only 8 non-zero elements, meaning that the unknown spectrum is transmitted over 8 basis functions, and the power transmitted over each basis function is set to 0.7. The observation noise is an -stable process as in the previous Example 3 [48]. In Fig. 14, we compare the network MSD performance of different algorithms considered for the distributed spectrum estimation. As depicted, the dRLS algorithm can not identify the spectrum coefficients due to its divergence in an -stable noise environment. In comparison with the dSE-LMS, dLMP, RVWC-dRLS and dRLP algorithms, the proposed R-dRLS and DCD-R-dRLS (with ) algorithms still obtain better estimation performance. We also notice from this figure that the DCD-R-dRLS algorithm with lower computational complexity approaches the R-dRLS performance. In Fig. 15, we also select the robust dRLS-type algorithms to show their performance in terms of PSD at node . From the results, the proposed R-dRLS and DCD-R-dRLS algorithms have lower side lobes in the PSD curves than those of the other two algorithms, thus fitting much better the true spectrum.
VII Conclusion
In this paper, we have derived a new dRLS algorithm which is robust in impulsive noise, based on the minimization of a local RLS cost function with a time-dependent constraint on the squared norm of the intermediate estimate update. Following the diffusion strategy, the constraint is dynamically adjusted with the help of side information from the neighboring nodes. We also analyze the convergence of the proposed algorithm in the mean square sense under impulsive noise. Then, its DCD version was developed to reduce the computational complexity. Moreover, to adapt the proposed algorithms to an abrupt change of the unknown parameter vector, a non-stationary control approach has also been designed. Simulation results have verified that the proposed algorithms perform better than the known algorithms in impulsive noise scenarios.
Appendix A Verification of (25)
From Fig. 16, one can see that the left side of (25) has a good agreement with the right side of that555 Similar results at other nodes have not been shown here because of the page limitation.. This reveals that the simplification from (24) to (25) is reasonable.
Appendix B Convergence of to 0
It is evident from (14) that as a function of is non-increasing in adaptation process, with positive values. So, the limit of at is existent. Applying the expectation operator to (14), we obtain
[TABLE]
[TABLE]
Again using the assumption that the variance of is small enough since closes to 1, we are able to make the approximation,
[TABLE]
where means that both and have the same distribution, denotes the probability of event in the argument, and denotes the distribution function of at time instant .
Let us define the network global vectors as follows:
[TABLE]
Therefore, according to (LABEL:A40) and (B.4), we reformulate (B.1) and (B.2) for all nodes as:
[TABLE]
where
[TABLE]
and
[TABLE]
Taking the -norm of both sides of (B.5) and recalling , it is found the following inequality:
[TABLE]
Based on the diagonal definition in (B.6), we deduce an equivalent form from (B.8), i.e., for ,
[TABLE]
It is supposed that there is a limit for when , (B.9) further reduces to
[TABLE]
In (B.10), the relation as is also used. Herein, we consider the equality case in (B.10), i.e.,
[TABLE]
It is shown in Appendix A in [45], for a similar equation (B.11), its solution is . Since (B.11) is an upper bound of (B.10), we can conclude that given by (14) would also converge to zero. Moreover, as its intermediate quantity, also converges to zero.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Y. Yu, H. Zhao, R. C. de Lamare, and Y. Zakharov, “Robust diffusion recursive least squares estimation with side information for networked agents,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , April 2018, pp. 4099–4103.
- 2[2] A. H. Sayed, “Adaptive networks,” Proceedings of the IEEE , vol. 102, no. 4, pp. 460–497, 2014.
- 3[3] A. H. Sayed, “Adaptation, learning, and optimization over networks,” Foundations and Trends in Machine Learning , vol. 7, no. 4-5, pp. 311–801, 2014.
- 4[4] S.-Y. Tu and A. H. Sayed, “Mobile adaptive networks,” IEEE Journal of Selected Topics in Signal Processing , vol. 5, no. 4, pp. 649–664, 2011.
- 5[5] A. H. Sayed, S.-Y. Tu, J. Chen, X. Zhao, and Z. J. Towfic, “Diffusion strategies for adaptation and learning over networks: an examination of distributed strategies and network behavior,” IEEE Signal Processing Magazine , vol. 30, no. 3, pp. 155–171, 2013.
- 6[6] S. Kanna, D. H. Dini, Y. Xia, S. Hui, and D. P. Mandic, “Distributed widely linear kalman filtering for frequency estimation in power networks,” IEEE Transactions on Signal and Information Processing over Networks , vol. 1, no. 1, pp. 45–57, 2015.
- 7[7] P. Di Lorenzo, S. Barbarossa, and A. H. Sayed, “Distributed spectrum estimation for small cell networks based on sparse diffusion adaptation,” IEEE Signal Processing Letters , vol. 20, no. 12, pp. 1261–1265, 2013.
- 8[8] T. G. Miller, S. Xu, R. C. de Lamare, and H. V. Poor, “Distributed spectrum estimation based on alternating mixed discrete-continuous adaptation,” IEEE Signal Processing Letters , vol. 23, no. 4, pp. 551–555, 2016.
