Parameter-free quantification of stochastic and chaotic signals
Sergio Roberto Lopes, Thiago de Lima Prado, Gilberto Corso, Gustavo, Zampier dos Santos Lima, Jurgen Kurths

TL;DR
This paper introduces a parameter-free entropy measure based on recurrence microstates that effectively characterizes the complexity and correlation properties of stochastic and chaotic signals without requiring parameter tuning.
Contribution
It presents a novel, parameter-free quantifier of time series complexity that distinguishes different types of stochastic and chaotic signals and reveals attractor properties.
Findings
Max(S) effectively quantifies time correlation in stochastic signals.
Max(S) distinguishes signals with different power-law spectra.
The method provides new insights into attractor properties and chaos degree.
Abstract
Recurrence entropy is a novel time series complexity quantifier based on recurrence microstates. Here we show that is a \textit{parameter-free} quantifier of time correlation of stochastic and chaotic signals, at the same time that it evaluates property changes of the probability distribution function (PDF) of the entire data set. can distinguish distinct temporal correlations of stochastic signals following a power-law spectrum, even when shuffled versions of the signals are used. Such behavior is related to its ability to quantify distinct subsets embedded in a time series. Applied to a deterministic system, the method brings new evidence about attractor properties and the degree of chaoticity. The development of a new parameter-free quantifier of stochastic and chaotic time series opens…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Parameter-free quantification of stochastic and chaotic signals
Sergio Roberto Lopes
Universidade Federal do Paraná, Departamento de Física, Curitiba, 81531-980, Brazil
Thiago de Lima Prado
Universidade Federal dos Vales do Jequitinhonha e Mucuri, Instituto de Engenharia, Ciência e Tecnologia, Janaúba, 39440-146, Brazil
Gilberto Corso
Universidade Federal do Rio Grande do Norte, Departamento de Biofísica e Farmacologia, Natal, 59078-970, Brazil
Gustavo Zampier dos Santos Lima
Universidade Federal do Rio Grande do Norte, Escola de Ciências e Tecnologia, Natal, 59078-970, Brazil.
Universidade Federal do Rio Grande do Norte, Departamento de Biofísica e Farmacologia, Natal, 59078-970, Brazil
Jürgen Kurths
Potsdam Institute for Climate Impact Research - Telegraphenberg A 31, Potsdam, 14473, Germany
Humboldt University Berlin, Department of Physics, Berlin,12489, Germany
Abstract
Recurrence entropy is a novel time series complexity quantifier based on recurrence microstates. Here we show that is a parameter-free quantifier of time correlation of stochastic and chaotic signals, at the same time that it evaluates property changes of the probability distribution function (PDF) of the entire data set. can distinguish distinct temporal correlations of stochastic signals following a power-law spectrum, even when shuffled versions of the signals are used. Such behavior is related to its ability to quantify distinct subsets embedded in a time series. Applied to a deterministic system, the method brings new evidence about attractor properties and the degree of chaoticity. The development of a new parameter-free quantifier of stochastic and chaotic time series opens new perspectives to stochastic data and deterministic time series analyses and may find applications in many areas of science.
Recurrence entropy, stochastic signals, chaotic signals
I Introduction
Two of the foremost characteristics of a stochastic signal are its possible temporal correlation, preserving memory for some interval of time Beran (2017) and the details of its probability distribution function (PDF), that bring information about how common can be an elements or a set of elements of a signal. Both characteristics are related to the concept of the complexity of a signal, that we defined as a measure of how stochastic or dynamical systems express the degree of engagement of its elements in organized structured interactions. High complexity is achieved in systems that exhibit a mixture of order and disorder and that have a high capacity to generate emergent phenomena, or in other words, the ability of a system as whole to display behaviors that can not be reduced to the properties of the constituent parts. Despite the importance of the concept of signal complexity, no general and widely accepted means of measuring it currently exists Ziemelis (2001); Albert and Barabási (2002).
A common approach to characterize signal complexity is to use entropy-like quantities, describing the amount of data needed to identify the state of a system Shannon (1948). Entropy is also a fundamental concept to understand chaotic dynamics Kantz and Schreiber (2004) and can be related to the level of chaos or the chaoticity of the system, mainly measured by the Lyapunov exponent Kantz and Schreiber (2004); Corso et al. (2018). Distinct time correlated stochastic signals are characterized by a frequency spectrum following a power-law distribution , where quantifies the time correlation Akaike (1974); Beran (2017). Specific values of are associated with colors e.g. for “white”, for “pink” or for “red” or, in this case also known as Brownian noise. Stochastic processes with power spectra are ubiquitous in science finding applications in all its subareas like physics Bak et al. (1987); Weissman (1988); Press (1978); dos Santos Lima et al. (2012), engineering Hooge et al. (1981), biology Glass (2001); Kobayashi and Musha (1982); West and Shlesinger (1990); dos Santos Lima et al. (2014), cognition Gilden et al. (1995), astrophysics Press (1978); Weissman (1988), geophysics Weissman (1988); Matthaeus and Goldstein (1986), economics Granger and Ding (1996), psychology Gilden (2001), language and music Voss and Clarke (1975).
A trustful method for estimation long-time correlation based on finite time series is key issue and, hitherto, an open question in time series analyses Simonsen et al. (1998); Carbone (2007); Weron (2002); Podobnik and Stanley (2008). Many of these methods are based on time correlation quantifications such as the computation of the Hurst exponent Carbone (2007); Weron (2002), detrended fluctuation analysis Podobnik and Stanley (2008) or range-scaled analysis Weron (2002). Others are computed on the frequency or wavelet domains like periodogram or Wavelet methods Simonsen et al. (1998). The non-stationarity imposed by the long-range dependence () associated to the finite time of the signal makes the characterization of correlation via those traditional methods a sophisticated technique. Often the analyses lead to parameter dependent results. Empirical time series are always finite and long-range correlations are, unavoidably, partly suppressed. Diversely, the local dynamics characteristics of small temporal windows tend to be overestimated.
On the other hand, the quantification of special properties of the PDF of a signal and its relation to the complexity are also open questions. In fact, many attempts to quantify signal complexity have been developed in order to evaluate properties of the set of points composing a time series, usually employing the measure of entropy Shannon (1948); Pincus (1991); Bandt and Pompe (2002); Eroglu et al. (2014), but they do not evaluate time correlations or are parameter dependent.
In this context, the evaluation of the recurrence entropy Corso et al. (2018) of a signal and our definition of show to be a powerful parameter-free tool to examine time series correlations. We show that the new approach can evaluate short and long time correlations, possesses a good agreement with traditional methods, and going further, providing information about characteristics of the entire set of points of a signal.
The article is organized as follows: The recurrence entropy concept is introduced in section II, section III is devoted to the analyses and discussions of time-correlated and non-correlated stochastic signals; section IV presents results and discussions of the deterministic signal problem; our conclusions and final remarks are shown in section V.
II The recurrence entropy
A visual tool to display recurrences of a length time series is defined as a binary matrix Marwan et al. (2007)
[TABLE]
where is the vicinity parameter. summarizes visually, in a binary pattern, the information about how many recognizable subsets are embedded in a sequence of data showing how distinct will be the recurrence pattern (sequences of zeros and ones) of consecutive points. The most explored subsets of are diagonal lines of “ones” representing the mutual recurrences of a sequence of points. However, other structures of also have dynamical interpretations: the vertical/horizontal lines are associated to stationary points and the abundance of isolated points is an indicative of chaotic or stochastic dynamics Marwan et al. (2007). We generalize these concepts defining recurrence microstates as all possible cross-recurrence states among two randomly selected short sequences of consecutive points in a () length time series (we use and ), namely are small binary matrices. For example, supposing a time series of elements, and using , we randomly select two sequences of two elements, say and . In the case of our microstates will be binary numbers composed of four elements ([math] or ), namely a binary matrix expressing the cross-recurrences among and , , and , and and, finally, , and . For a large enough randomly selected number of samples , the recurrence entropy can be adequately computed by Corso et al. (2018)
[TABLE]
where measures the probability of occurrence of a specific state considering randomly samples. Usually, is a parameter-free as Eq. 1 and suggest, but this dependence is eliminated observing that is null when computed for sufficient large or small , due to the absence of diversity of for both cases. So, we impose a natural condition of a maximum for Jaynes (1957) turning and in parameter-free quantities.
At first sight, should be larger than the quantity of all possible microstates , but as observed in Corso et al. (2018) the number of microstates effectively populated is small and the convergence of Eq. 2 is fast. So a much smaller number of randomly select microstates is enough for good results in a large variety of cases and, in special, for all cases treated here, turning the method fast even for moderate values of microstate sizes .
III Time correlated stochastic signals analyses
Firstly, we consider time series of Gaussian distributed stochastic signals Kasdin (1995), characterized by a power spectrum for . Examples of the mean power spectrum obtained from time series are plotted in Fig. 1(a) for distinct values of . Corresponding individual time series examples are plotted in Figs. 1 (b-f).
All properties of the stochastic signal are kept constant in the following analyses, but values of impose a finite degree of non-stationarity due to long-term correlations as observed in Figs. 1(e, f). For such cases, correlation-based methods overestimate (underestimate) short(long)-term correlations. Fig. 2(a) depicts the results of computed for distinct colored stochastic signals () for values of and time series lengths. In general, displays a typical logistic shaped curve as a function of . For vanishing values of , will asymptote its maximum theoretical values , obtained for uncorrelated stochastic signals and infinite time series lengths. For the interval , similar results for distinct show that the variability of as a function of is measurable even for the smallest possible value of the microstate matrix size . Another important conclusion is that for a fixed , longer time series lead to smaller values of since longer time series provide a better evaluation of long-term time correlations. An error bar analysis specially for indicates that smaller time series associated to larger and microstate size values result in larger dispersion of . This behavior reveals the natural dispersion expected for the quantification of long-term correlations when just finite time series are used. The results for and are less sensitive to the natural dispersion since the number of possible microstates are also smaller, such that tiny changes of the time correlation are not captured. All these features explored at the same time bring useful results when unknown source signals are analyzed. Fig. 2(b) displays all curves depicted in Fig. 1(a) but normalized by its respective maximum. This data collapse reveals that the shape of for all time series lengths and all microstate sizes are equivalent despite the small differences and details discussed above.
Another important question about time series characterization is related to the characterization of the PDF of the signal. To evaluate properties of the entire set of points in a time series, we make use of surrogate data analysis Hair Jr et al. (2010). One of the main and simple surrogate algorithm consists in shuffling the data, so that the data preserves the same amplitude distribution and mean, but any correlation is destroyed, keeping only the collective properties of the set of points. For surrogate data methods the same analysis is carried out to the original data and the surrogated data to identify any distinguishable features between them. Traditional methods like Hurst exponents and detrended fluctuation analysis only quantify the time correlation Beran (2017); Podobnik and Stanley (2008); Simonsen et al. (1998) and are not suitable for surrogated data.
Fig. 3 depicts the results of applied to the same data used in Fig. 2 for but shuffled in a random sequence (Fisher-Yates algorithm Fisher and Yates (1963)) and using values of . Now the results of reveal a new question: even when the sequence of points in the time series is randomly organized, distinct stochastic signals lead to distinct values of . So the behavior of , in this case, is due only to properties of the set of points of the time series. Despite the fluctuation observed for and , the results point out for a clear distinction between all our Gaussian PDFs of the time series. We observe that long-term correlations imposed by larger results in a smaller value of reflecting a more restrictive and more organized set of points, due to restrictions imposed by the correlation. Time correlations impose a limit to all possible sequence of subsets in the time series and some combinations will not be allowed.
To make this point clear, we analyze the results of obtained for the time series produced by
[TABLE]
where the process follows Fisher and Yates (1963), “rand” is an uncorrelated Gaussian noise and measures the level of uncorrelated noise superposed to the shuffled harmonic signal. Thus, the stochasticity of this example comes from sources: the shuffled process in the sine signal and the random noise generator. Fig. 4 depicts results of as a function of the for 3 values of . The black lines in all panels indicate obtained for uncorrelated stochastic signal using the same time series length. For , the signal is an uncorrelated set of points, but its set of points is very restrictive, namely those points obtained from the function . In this case, is consistently smaller than those ones expected for uncorrelated stochastic signal. For , grows monotonically, pointing out for an increasing number of distinct recurrence entropy microstates of the data set since the stochastic perturbation amplitude is being increased. For , the uncorrelated stochasticity is large enough to turn the time series in an uncorrelated stochastic time series and the PDF will also reflect this situation. Consequently, asymptotically reaches the expected value for uncorrelated noise. So, captures progressive increases of the complexity (measured by an increasing number of distinct microstates) imposed to the PDF even when all analyzed time series are completely uncorrelated.
IV Deterministic signals analyses
To prove the ability of to capture distinct characteristics of even more complex PDFs, we analyze time series obtained by the generalized Bernoulli chaotic map
[TABLE]
For the level of chaoticity can be evaluated by the Lyapunov exponent Alligood et al. (1996). It is expected that entropy measures are related to but not necessarily directly proportional since the entropy is also a function of the PDF of the attractor (the invariant measure) . The quantity generated from Eq. 4 is homogeneous for integer , but becomes inhomogeneous for non-integer values Góra (2009), due to the discontinuities observed in the PDF, result of an inhomogeneous measure of the attractor and a corresponding more complex signal. Figs. 5(a-d) display of Eq. 4 map, depicting more complex PDFs for non-integer values of (a-c) but collapsing in a homogeneous one for integer (d). The entropy as a function of will be a function of two factors, namely the continuous growing chaoticity associated to parameter superposed by a complex behavior of the PDF. Panel (e) shows the behavior of in the interval (blue curve). The dashed tone of blue is representative of the standard deviation of due to initial conditions for each . To evaluate just the effect of changes in the PDF, the black curve in panel (e) depicts computed for shuffled time series (out of -scale magnification also shown). Again, the dashed black tone indicates the standard deviation over initial conditions. In this case, depicts a complex oscillatory pattern due to the behavior of the PDF as discussed. In general grows as a function of . However, the growth rate is faster for values of departing from integers, diminishing as approximates from the next subsequent integer. Such behavior can be explained since for values immediately larger than each integer, the simultaneous increases of chaoticity and complexity of the PDF lead to a (local) maximum growth rate of . As approximates to an integer, the complexity diminishes and, consequently, the rate of also diminishes. As increases with , the chaoticity level increasing, in contrast to the complexity of the PDF that is decreasing, leading to smaller values of the growth rate of . For large values of as the interval the growth rate can be even negative since a progressive less complex PDF can overtake the effect of the increasing of chaoticity. The value of also reflects specific more dramatic changes of the PDF as the example highlighted by the arrow around , where a clear change in the PDF revealed by the shuffled time series analysis (black curve) leads to local small changes in the growth rate of (blue curve). In resume the complex change behaviors of the PDF lead to a rich fine structure in computed over the shuffled time series, denouncing a strong nonstationary time series (due to parameter changes in this example).
V Conclusions
In conclusion, we have shown that is a parameter-free quantifier that additionally to the possibility to quantify time correlation of stochastic and chaotic signals, it goes further, evaluating subtle properties of the PDF of a signal, what can be computed using simple shuffling of the points. When time correlation is evaluated, brings similar results to those obtained for more traditional but parameter-dependent quantifiers, such as the Hurst exponent. However the use of makes clear a more complex interrelation about properties of the PDF and the complexity of the time series, bringing new perspective for stochastic and chaotic data analyses. Our results can be useful in the analysis of experimental noisy data such as seismic, paleontology, economic problems where the possibility to evaluate properties of the entire data set data associated with the quantification of time correlation are important.
We have analyzed stochastic and deterministic signals. For both cases we conclude that the new method identify and quantify a new cause/effect relation where changes occurring in the time series PDF can be related directly to variations of complex behavior including the possibility to display short and long time correlations. Finally, it is worth to mention that due to its computation methodology Corso et al. (2018), recurrence entropy is fast evaluated for arbitrary long real-world time series, leading to robust parameter-free way to process data.
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brazil (CAPES) - Finance Code 001 and trough project number 88881.119252/2016-01, Conselho Nacional de Desenvolvimento Científico e Tecnológico, CNPq - Brazil, grant number 302785/2017-5, and Financiadora de Estudos e Projetos (FINEP).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Beran (2017) J. Beran, Statistics for long-memory processes (Routledge, 2017).
- 2Ziemelis (2001) K. Ziemelis, Nature 410 , 241 (2001) . · doi ↗
- 3Albert and Barabási (2002) R. Albert and A.-L. Barabási, Reviews of Modern Physics 74 , 47 (2002).
- 4Shannon (1948) C. E. Shannon, Bell System Tech. J 27 , 218 (1948).
- 5Kantz and Schreiber (2004) H. Kantz and T. Schreiber, Nonlinear time series analysis , Vol. 7 (Cambridge university press, 2004).
- 6Corso et al. (2018) G. Corso, T. d. L. Prado, G. Z. d. S. Lima, J. Kurths, and S. R. Lopes, Chaos: An Interdisciplinary Journal of Nonlinear Science 28 , 083108 (2018).
- 7Akaike (1974) H. Akaike, IEEE Transactions on Automatic Control 19 , 716 (1974).
- 8Bak et al. (1987) P. Bak, C. Tang, and K. Wiesenfeld, Physical Review Letters 59 , 381 (1987).
