Feasibility study on the least square method for fitting non-Gaussian noise data
Wei Xu, Wen Chen, Yingjie Liang

TL;DR
This paper evaluates the effectiveness of the least squares method in fitting data contaminated with non-Gaussian noises, specifically Lévý and stretched Gaussian noises, revealing limitations at higher noise levels.
Contribution
It provides a systematic analysis of least squares fitting performance on non-Gaussian noise data, highlighting its limitations and comparative performance between different noise types.
Findings
Least squares fitting is less accurate with non-Gaussian noise.
Stretched Gaussian noise is fitted better than Lévý noise.
The method fails when noise exceeds 5% level.
Abstract
This study is to investigate the feasibility of least square method in fitting non-Gaussian noise data. We add different levels of the two typical non-Gaussian noises, L\'evy and stretched Gaussian noises, to exact value of the selected functions including linear equations, polynomial and exponential equations, and the maximum absolute and the mean square errors are calculated for the different cases. L\'evy and stretched Gaussian distributions have many applications in fractional and fractal calculus. It is observed that the non-Gaussian noises are less accurately fitted than the Gaussian noise, but the stretched Gaussian cases appear to perform better than the L\'evy noise cases. It is stressed that the least-squares method is inapplicable to the non-Gaussian noise cases when the noise level is larger than 5%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Feasibility study on the least square method for fitting non-Gaussian noise data
Wei Xu, Wen Chen, Yingjie Liang
Institute of Soft Matter Mechanics, College of Mechanics and Materials, Hohai University, Nanjing, China
State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering
Corresponding authors:
Wen Chen, Email address: [email protected]
Yingjie Liang, Email address: [email protected]
Abstract: This study is to investigate the feasibility of least square method in fitting non-Gaussian noise data. We add different levels of the two typical non-Gaussian noises, Lévy and stretched Gaussian noises, to exact value of the selected functions including linear equations, polynomial and exponential equations, and the maximum absolute and the mean square errors are calculated for the different cases. Lévy and stretched Gaussian distributions have many applications in fractional and fractal calculus. It is observed that the non-Gaussian noises are less accurately fitted than the Gaussian noise, but the stretched Gaussian cases appear to perform better than the Lévy noise cases. It is stressed that the least-squares method is inapplicable to the non-Gaussian noise cases when the noise level is larger than 5%.
**Keywords: **Least square method, non-Gaussian noise, Lévy distribution, stretched Gaussian distribution, least square fitting
**1. Introduction
**
Non-Gaussian noise is universal in nature and engineering.1-3 In recent decades, non-Gaussian noise has widely been studied, especially in signal detection and processing,* 4-5* theoretical model analysis,* 6* and error statistics.* 7* It is known that the Gaussian distribution is the mathematical precondition to use the least square method.However, it is often directly used to process such non-Gaussian noise data, which may give wrong estimation.* 8* Thus, this study is to quantitatively examine the applicability of the least square method to analyze non-Gaussian noise data.
Generally, non-Gaussian noise has detrimental influence on the stability of power system, and also can stimulate systems to generate ordered patterns.9-10 To our best knowledge, Lévy and stretched Gaussian noises are two kinds of typical non-Gaussian noise, which are frequently used in fractional and fractal calculus.11-12 Lévy noise has been observed in many complex systems, such as turbulent fluid flows,signal processing,14 financial times series,15-16 neural networks.We also note that the parameters estimation for stochastic differential equations driven by small Lévy noise were investigated.18 Compared to the Lévy noise the stretched Gaussian noise is less studied, but its corresponding stretched Gaussian distribution has been explored,19 such as in the motion of flagellate protozoa,SoL interchange turbulence simulation,anomalous diffusion of particles with external force,22 not mentioned too much. It also should be pointed that processing of non-sinusoidal signals or sound textures has become an important research topic,23 and the derived algorithms significantly improve the perceptual quality of stretched noise signals.
It is well known that the least square method is a standard regression approach to approximate the solutions of over determined systems, which is most frequently used in data fitting and estimation.24 The core concept of the least square method is to identify the best match for the system by minimizing the square error.25 Supposed that the data points are where represents the independent variable and is the dependent variable. The fitting error characterizes the distance between and the estimated curve, i.e., . The best fitting curve is to minimize the square error , where the errors are usually modeled by Gaussian distribution.26
Field data are often polluted by noiseand the Gaussian noise is the classical one, whose probability density function obeys Gaussian distribution. We have mentioned above that several types of noise data obey non-Gaussian distribution.28 To examine the feasibility of the least square method in fitting non-Gaussian noise data, we generate the non-Gaussian random numbers as the noise, and then add different levels of the noise to the exact values of the selected functions including linear equations, polynomial and exponential equations as the observed values. By using the least square method, the maximum absolute and mean square errors are calculated and compared in the Gaussian and non-Gaussian applications.
The rest of the paper is organized as follows. In Section 2, we introduce the Gaussian distribution, Lévy distribution, stretched Gaussian distribution, and the methods we use to analyze the noise data. In Section 3, we give the results and discussion. Finally, a brief summary is provided.
**2. Theory and methods
**
**2.1 Gaussian distribution
**
Gaussian distribution is also called as normal distribution, which is often encountered in mathematics, physics and engineering. The probability density function of Gaussian distribution is:
[TABLE]
where and respectively represent mean and standard deviation. When and , it degenerates into the standard Gaussian distribution. Figure1 gives four different cases of Gaussian density function.
**2.2 Lévy distribution
**
Lévy distribution, named after Paul Lévy, is a rich class of probability distributions. The Gaussian and Cauchy distributions are its special cases. It is usually defined by its characteristic function .29
[TABLE]
where
[TABLE]
stability index , skewness parameter , scale parameter , and location parameter . and respectively determine the properties of asymptotic decay and symmetry. The standard Lévy distribution can be obtained by the following transformation.
[TABLE]
When, the probability density function of the Lévy distribution is stated as:
[TABLE]
where , is the location parameter and is the scale parameter. Different cases of Eq.(5) are illustrated in Figure 2.
**2.3 Stretched Gaussian distribution
**
The stretched Gaussian distribution has widely been used to describe anomalous diffusion and turbulence, especially in the porous media with fractal structure.30 The solution to the fractal derivative equation in characterizing the fractal media has the form of stretched Gaussian distribution,whose probability density function is defined as:
[TABLE]
where is the stretched exponent. When and , it becomes to the standard Gaussian distribution. Figure 3 shows three cases of stretched Gaussian density function.
**2.4 Generation of noise data
**
In this study, the noise data are obtained based on the above mentioned Gaussian and non-Gaussian random variables, which can be generated by using the inverse function methodand the selection method.33 Specifically, Chambers, Mallows, Stuck proposed the CMS method in Lévy random variables simulation,* 34* which is the fastest and most accuracy method. By using the CMS method, some variables need to be defined first.
[TABLE]
[TABLE]
[TABLE]
[TABLE]
where and are two independent uniform distribution on interval .
When , the Lévy random number is
[TABLE]
When ,
[TABLE]
The general Lévy random numbers can be obtained based on some known properties.35
For the stretched Gaussian distribution, we use the acceptance rejection method to generate its random numbers.* 33*
**2.5 Methods
**
Both linear and nonlinear functions estimation are considered by using the least square method, in which the model function is estimated as
[TABLE]
where is the number of observations, the response variable, the explanatory variable, the noise
[TABLE]
In this study, we consider the case, then the observed values can be constructed by adding the values of the random numbers to the exact values of the selected functions including linear equations, polynomial and exponential equations, finally the maximum absolute error and the mean square error are calculated for the above different cases in conjunction with the least square method.
We give the following abbreviations in noise date processing for convenience:
FA: Gaussian noise least square error fitting,
FB: Lévy noise least square error fitting,
FC: Stretched Gaussian noise least square error fitting,
Rerr1: Maximum absolute error: .
Rerr2: Mean square error: .
**3. Results and discussion
**
In this section, we apply the least square method to fit various noise-polluted data by adding different levels of Gaussian and non-Gaussian noise to exact values of the selected functions including linear equations, polynomial and exponential equations, and give a brief discussion.
a) The simplest typical model is the linear function.
[TABLE]
Here we select the following case as example:
[TABLE]
Tables 1 to 5 give the estimated parameters and the errors for five different levels of noise in the linear case. We can observe that the Gaussian noise fitting data maximum absolute error is in the range of , the mean square error is in the range of . The maximum absolute and the mean square errors of Gaussian noise are the least and Lévy distribution noise are the largest, with the relationship expressed as:
[TABLE]
[TABLE]
The corresponding fitting curves are depicted in Figures 4-8. We can find that the results of Gaussian noise fitting have the best accuracy, and the stretched Gaussian noise fitting curves are closer to those of the Gaussian noise compared with the results of Lévy noise data fitting.
Table 1. The estimated results for 1% noise in the linear case.
[TABLE]
Table 2. The estimated results for 5% noise in the linear case.
[TABLE]
Table 3. The estimated results for 10% noise in the linear case.
[TABLE]
Table 4. The estimated results for 15% noise in the linear case.
[TABLE]
Table 5. The estimated results for 20% noise in the linear case.
[TABLE]
b) A polynomial can be constructed by means of addition, multiplication and exponentiation to a non-negative power, which is usually written as the following form with a single variable ,
[TABLE]
where are constants. We select three parameters polynomial function.
[TABLE]
Here the following case is used as example:
[TABLE]
Tables 6 to10 give the estimated parameters and the errors for five different levels of noise in the polynomial case. The corresponding fitting curves are depicted in Figures 9-13. Gaussian noise fitting maximum absolute error is in the range of , and the mean square error is in the range of . We notice that Eq. (13) and Eq. (14) also satisfied here. From Figures 9-13, the results of Gaussian noise fitting are the best, and the stretched Gaussian noise fitting curves are better than the results of Lévy noise data fitting.
Table 6. The estimated results for 1% noise in the polynomial case.
[TABLE]
Table 7. The estimated results for 5% noise in the polynomial case.
[TABLE]
Table 8. The estimated results for 10% noise in the polynomial case.
[TABLE]
Table 9. The estimated results for 15% noise in the polynomial case.
[TABLE]
Table 10. The estimated results for 20% noise in the polynomial case.
[TABLE]
c) Non-linear equations can be divided into two categories, one is polynomial equation, and the other is non-polynomial equation. In this part, we select the four parameters exponential function.
[TABLE]
Here the following case is used as example:
[TABLE]
Tables 11 to 15 give the estimated parameters and the errors for five different levels of noise in the exponential function case. The corresponding fitting curves are shown in Figures 14-18. The Gaussian noise fitting data maximum absolute error is in the range of , the mean square error is in the range of . And the results of exponential cases have similar patterns with those shown in the linear and polynomial cases.
Table 11. The estimated results for 1% noise in the exponential case.
[TABLE]
Table 12. The estimated results for 5% noise in the exponential case.
[TABLE]
Table 13 The estimated results for 10% noise in the exponential case.
[TABLE]
Table 14. The estimated results for 15% noise in the exponential case.
[TABLE]
Table 15 The estimated results for 20% noise in the exponential case.
[TABLE]
To summarize all the above results, we can find that the maximum absolute and the mean square errors for the Gaussian noise cases are the smallest, but the values for the Lévy noise cases are the biggest, i.e.,
[TABLE]
[TABLE]
It can be observed from Figures 4 to 18, that the results of Gaussian noise fitting have the best accuracy, and the stretched Gaussian noise fitting curves are closer to those of the Gaussian noise compared with the results of Lévy noise data fitting. Thus, the least square method is less accurate when it is applied to the non-Gaussian noise data fitting compared with the cases of Gaussian noise, especially when the noise level is larger than 5%.
This study mainly verifies the least square method is inapplicable to non-Gaussian noise when the noise level is high. To extend the results in more complicated systems, a mathematical proof to the conclusion should be derived in future study.
The second goal of our further work is to modify the least square method in fitting non-Gaussian noises. Actually the core concept of the least square method is to minimize the square error , i.e., in the linear case,
[TABLE]
to compute the minimum value of Eq. (22), the main task is to set the first-order derivatives of the parameters to be zero.
[TABLE]
The solutions of Eq. (23) are the target values of the parameters and . Combining our previous work on fractional and fractal derivatives,* 30,36* we can employ the fractional and fractal derivatives to generalize Eq. (23), and the corresponding fitting errors can be defined by using the following power law transform:
[TABLE]
**4. Conclusions
**
This study examines the feasibility of least square method in fitting various noise data polluted by adding different levels of Gaussian and non-Gaussian noise to exact values of the selected functions including linear equations, polynomial and exponential equations. The maximum absolute error and the mean square error are calculated and compared for the different cases. Based on the foregoing results and discussions, the following conclusions can be drawn:
-
The fitting results for the non-Gaussian noise are less accurate than those of the Gaussian noise, but the stretched Gaussian cases appear to perform better than the Lévy noise cases.
-
The least-squares method is inapplicable to the non-Gaussian noise data when the noise level is larger than 5%.
-
A theoretical proof and improved least mean square methods for non-Gaussian noise data are under intense study.
**Acknowledgments
**
This paper was supported by the National Science Funds for Distinguished Young Scholars of China (Grant No. 11125208) and the 111 project (Grant No. B12032).
**References
**
-
D. Middleton. Non-Gaussian noise models in signal processing for telecommunications: new methods an results for class A and class B noise models. IEEE Transactions on Information Theory 1999; 45(4): 1129-1149.
-
A. Nasri, R. Schober, Y. Ma. Unified asymptotic analysis of linearly modulated signals in fading, non-Gaussian noise and interference. Communications IEEE Transactions on 2008; 56(6): 980-990.
-
X. Wang, R. Chen. Blind turbo equalization in Gaussian and impulsive noise. *IEEE Transactions on Vehicular Technology *2001; 50(4):1092-1105.
-
R. Blum, R. Kozick, B. Sadler. An adaptive spatial diversity receiver for non–Gaussian interference and noise. *IEEE Trans. Signal Processing *1999; 47: 2100-2111.
-
L. He, Y. Cui, T. Zhang. Analysis of weak signal detection based on tri-stable system under Levy noise. *Chinese Physics B: English *2016; 6: 85-94.
-
Y. Zhao, X. Zhuang, S. J. Ting. Gaussian mixture density modeling of non-Gaussian source for autoregressive process. *IEEE Transactions on Signal Processing *1995; 43(4): 894-903.
-
S. Chen, B. Mulgrew, L. Hanzo. Least bit error rate adaptive nonlinear equalizers for binary signaling. *IEEE Proceedings Communications *2003; 150(1): 29-36.
-
C.H. Chapman, J. A. Orcutt. Least-square fitting of marine seismic refraction data. Geophysical Journal International 1985; 82(3): 339-374.
-
A. Nasri, R. Schober. Performance of BICM-SC and BICM-OFDM systems with diversity reception in non-Gaussian noise and interference. *IEEE Transactions on Communications *2009; 57(11): 3316-3327.
-
A. Aldo Faisal, Luc P. J. Selen, Daniel M. Wolpert. Noise in the nervous system. Nature Reviews Neuroscience 2008, 9(4): 292-303.
-
E. Dobierzewska-Mozrzymas, G. Szymczak, P. Biegański, E. Pieciul. Lévy’s distributions for statistical description of fractal structures; discontinuous metal films on dielectric substrates. Physica B Condensed Matter 2003; 337(1–4): 79–86.
-
F. Ren, Y. Xu, W. Qiu, J. Liang. Universality of stretched Gaussian asymptotic diffusion behavior on biased heterogeneous fractal structure in external force fields. *Chaos Solitons & Fractals *2005; 24(1):273-278.
-
T. Solomon, E. Weeks, H. Swinney. Observation of anomalous diffusion and Lévy flights in a two-dimensional rotating flow.* Physical Review Letters *1993; 71(24): 3975-3978.
-
L. Chrysostomos. Signal processing with alpha-stable distributions and applications. J*ohn Wiley & Sons, Inc. *1995; 22(3): 333-334.
-
R. Gomory, B. Mandelbrot. Fractals and scaling in finance: discontinuity, concentration, Risk. New York: Springer, 1997.
-
I. Sokolov, W. Ebelling, B. Dybiec, Harmonic oscillator under Lévy noise: Unexpected properties in the phase space. *Phys. Rev. E. *2011; 83 (041118).
-
R. Segev, M. Benveniste, E. Hulata, et al. Long term behavior of lithographically prepared in vitro neuronal networks. Phys. Rev. Lett. 2002; 88: 11-18.
-
H. Long, C. Ma, Y. Shimizu. Least squares estimators for stochastic differential equations driven by small Lévy noises. Stochastic Processes & Their Applications 2016; 8(6): 1-21.
-
W.H. Liao, A. Roebel, W.Y. Su. On stretching Gaussian noises with the phase vocoder. Proc Int Conf on Digital Audio Effects 2012; 9(15): 17-21.
-
L. G. Alves, D. B. Scariot, R. R. Guimarães, et al.. Transient superdiffusion and Long-Range correlations in the Motility patterns of trypanosomatid flagellate protozoa. *Plos One *2016; 11(3): e0152092.
-
S. Sugita, M. Yagi, S. Itoh, K. Itoh. Bohm-like dependence of transport in scrape-off layer plasmas. Journal of the Physical Society of Japan 2012; 81(4): 69-69.
-
F. Ren, J. Wang, L. Lv, H. Pan, W. Qiu. Effect of different waiting time processes with memory to anomalous diffusion dynamics in an external force fields. Contents lists available at Science Direct: Physical A 2015; 417: 202–214.
-
N. N. Kolchigin, S. N. Pivnenko. Numerical modeling of measurements of dielectric material characteristics using non-sinusoidal signals. Proc Int Conf on Digital Audio Effects 2012; 9: 17-21.
-
P. Lancaster, K. Šalkauskas, Curve and Surface Fitting: An Introduction. London: Academic Press, 1986.
-
D. N. Lehmer. Review: E. T. Whittaker and G. Robinson, the calculus of observations. A treatise on numerical mathematics. Phys. Rev. Lett. 1925; 98(6): 068102-068102.
-
X. Chen. Concise history of statistics. Changsha: Hunan Education Publishing House, 2002.
-
H. Xian. Study on ANN Noise Adaptability in Application of Industry Process Characteristics Mining.* Advanced Materials Research* 2012; 462:635-640.
-
G. A. Tsihrintzis, C. L.Nikias. Fast estimation of the parameters of alpha-stable impulsive interference. IEEE Transactions on Signal Processing 1996; 44(6): 1492-1503.
-
W. Chen, H. Sun, X. Li .The fractional derivative model of mechanics and engineering problems. Beijing: Science Press, 2010.
-
W Chen, H Sun, X Zhang, D Koro ak. Anomalous diffusion modeling by fractal and fractional derivatives. Computers & Mathematics with Applications 2010; 59(5): 1754-1758.
-
W. Chen, Time-space fabric underlying anomalous diffusion. Chaos Solitons & Fractals 2005; 28(4): 923-929.
32.B. Wang, Y. Wei, Y. Zhang. Generate random number by using Inverse function and transform sampling method. Journal of Ningxia Teachers University 2012; 33(3): 24-28.
-
B. Wang, Y. Wei, Y. Sun. Generate random number by using acceptance rejection method. Journal of Chongqing normal university (Natural science) 2013; 30(6): 86-91.
-
J. M. Chambers, C. L. Mallows, B. W. Stuck. A method for simulating stable random variables. Journal of the American Statistical Association 1976; 71(354): 340-344.
-
R.Weron. Computationally intensive value at risk calculations. Hand book of Computational Statistics. Berlin: Springer, 2004.
-
W. Chen, L. Ye, H. Su. Fractional diffusion equations by the Kansa method. Computers and Mathematics with Applications 2010; 59: 1614–1620.
