Permutation entropy revisited

Stuart J Watt; Antonio Politi

arXiv:1812.05075·nlin.CD·February 20, 2019

Permutation entropy revisited

Stuart J Watt, Antonio Politi

PDF

TL;DR

This paper introduces a generalized permutation entropy measure for time-series analysis that depends on two window lengths, providing insights into the structure of invariant measures and aiding in estimating Kolmogorov-Sinai entropy.

Contribution

It extends permutation entropy by incorporating a second window parameter, enabling a more detailed analysis of time-series dynamics and invariant measure structures.

Findings

01

The $w$-dependence reveals invariant measure structure.

02

The $L$-dependence aids in estimating Kolmogorov-Sinai entropy.

03

Partition structure becomes elongated with increasing $w$.

Abstract

Time-series analysis in terms of ordinal patterns is revisited by introducing a generalized permutation entropy $H_{p} (w, L)$ , which depends on two different window lengths: $w$ , implicitly defining the resolution of the underlying partition; $L$ , playing the role of an embedding dimension, analogously to standard nonlinear time-series analysis. The $w$ -dependence provides information on the structure of the corresponding invariant measure, while the $L$ -dependence helps determining the Kolmogorov-Sinai entropy. We finally investigate the structure of the partition with the help of principal component analysis, finding that, upon increasing $w$ , the single atoms become increasingly elongated.

Equations14

h_{K S} = ε \to 0 lim L \to \infty lim \frac{H _{K S} ( L )}{L},

h_{K S} = ε \to 0 lim L \to \infty lim \frac{H _{K S} ( L )}{L},

H_{p} (w, L) = - i \sum p_{i} lo g p_{i} .

H_{p} (w, L) = - i \sum p_{i} lo g p_{i} .

Δ H_{p} (w, L) = H_{p} (w, L) - H_{p} (w, L - 1)

Δ H_{p} (w, L) = H_{p} (w, L) - H_{p} (w, L - 1)

δ H_{p} (L) = H_{p} (L, L) - H_{p} (L - 1, L - 1)

δ H_{p} (L) = H_{p} (L, L) - H_{p} (L - 1, L - 1)

q_{n + 1} (s_{j}) = i \sum M_{j i} q_{n} (s_{i})

q_{n + 1} (s_{j}) = i \sum M_{j i} q_{n} (s_{i})

K = - i \sum q (s_{i}) lo g q (s_{i}) = Δ H_{p}^{(L)} (L, L + 1)

K = - i \sum q (s_{i}) lo g q (s_{i}) = Δ H_{p}^{(L)} (L, L + 1)

R^{2} (w, L) = ⟨ \frac{λ _{1} ( w , L )}{λ _{2} ( w , L )} ⟩ .

R^{2} (w, L) = ⟨ \frac{λ _{1} ( w , L )}{λ _{2} ( w , L )} ⟩ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Permutation entropy revisited

Stuart J. Watt

Antonio Politi

Institute of Pure and Applied Mathematics, University of Aberdeen, Aberdeen, UK

Abstract

Time-series analysis in terms of ordinal patterns is revisited by introducing a generalized permutation entropy $H_{p}(w,L)$ , which depends on two different window lengths: $w$ , implicitly defining the resolution of the underlying partition; $L$ , playing the role of an embedding dimension, analogously to standard nonlinear time-series analysis. The $w$ -dependence provides information on the structure of the corresponding invariant measure, while the $L$ -dependence helps determining the Kolmogorov-Sinai entropy. We finally investigate the structure of the partition with the help of principal component analysis, finding that, upon increasing $w$ , the single atoms become increasingly elongated.

time series, entropy, embedding, complexity, PCA, fractal dimension

I Introduction

The development of effective procedures to encode irregular time series is an important research topic, tightly related to the compression of information or, equivalently, to the identification and removal of irrelevant details within given signals. Powerful tools have been developed when the underlying model is known and it is low-dimensional. The state of the art is (unsurprisingly) much less satisfactory when either prior knowledge is not available, or the dynamics is high-dimensional. The reason can be traced back to the difficulty of explicitly partitioning the phase space into non-overlapping cells (atoms).

The approach proposed by Bandt and Pompe bandt02a is the most powerful, if not the only, zero-knowledge method that can be effectively implemented above dimension two. Chunks of trajectories (“windows” as we refer to them from now on) of length $L$ are encoded according to the corresponding ordinal pattern (see next section for a precise definition). The so-called permutation entropy $H_{p}$ is thereby determined from the probabilities of the different ordinal patterns. In this context, a partition atom corresponds to the smallest box which contains all trajectories encoded with the same ordinal pattern. The easiness of the procedure has allowed developing many applications in different fields (ranging from engineering, to medicine etc. cao04 ; weck15 ; masoller15 ).

An additional reason to work with $H_{p}$ is its relationship with the Kolmogorov-Sinai entropy $H_{KS}$ , the most important indicator of dynamical complexity sinai09 . $H_{KS}$ is a dynamical invariant, independent of the parametrization adopted to describe the underlying evolution. $H_{p}$ is expected to coincide with $H_{KS}$ for sufficiently long window lengths, although the convergence is typically rather slow. Recently, it has been understood that the large deviations are “finite-size” effects associated with the window-length dependence of the partition induced by the ordinal encoding. These deviations can be substantially eliminated by introducing an effective permutation entropy $\tilde{H}_{p}=H_{p}+D\overline{\ln\sigma}$ , where $\sigma$ is the spread among trajectories characterized by the same pattern, while $D$ is the dimension of the underlying attractor. $\tilde{H}_{p}$ turns out to be a very accurate proxy of $H_{KS}$ politi17 .

In this paper, we revisit the concept of permutation entropy by introducing the dependence of $H_{p}$ on the window length $w$ used to encode the underlying trajectory, while $L$ is still used to determine the entropy growth rate. Explicit calculations of the (average) partition size confirm the intuition that the size is controlled by $w$ . This new approach allows decreasing the finite-size effects which affect the standard $H_{p}$ , without the need of determining the spread itself. The spread is nevertheless investigated with the goal of characterizing the way the phase-space is filled by the observed time series. This is done with the help of principal component analysis, by studying the scaling properties of the eigenvalues of the correlation matrix.

The paper is organized as follows. The general formalism is introduced in section 2. Section 3 is devoted to the implementation of the two-length entropy, while in Section 4, we discuss the spread of the trajectories encoded by the same symbolic sequence. Finally in section 5, we briefly discuss possible future directions.

II Formalism

Given the generic time series $(x_{1},x_{2},\ldots,x_{n})$ (we assume it to have been properly sampled - see Ref. kantz04 for a discussion), a meaningful characterization requires passing through three steps: (i) the time series must be embedded into a suitable phase space; (ii) the corresponding space has to be properly partitioned into non overlapping cells; (iii) the information contained in the symbolic sequences is computed for different lengths.

The first step is typically tackled by building an $L$ -dimensional space, made of the $L$ -tuples $(u_{1},\ldots,u_{L})=(x_{m},x_{m+1},\ldots,x_{m+L-1})$ . Takens theorem ensures that the underlying attractor is correctly reproduced, provided that $L$ is large enough takens81 .

Once the window length $L$ has been set, the next step consists in partitioning the embedding space into cells of size $\varepsilon$ , so that the time series can be encoded as a sequence of symbols, each symbol corresponding to a different cell. The Kolmogorov-Sinai entropy rate $h_{KS}$ is then formally obtained as

[TABLE]

where the limit $\varepsilon\to 0$ is taken to ensure that the encoding is one-to-one, i.e. to avoid that any two different, infinitely long, trajectories are encoded in the same way eckmann85 . If the partition is generating, this second limit is not needed. In general, there is no guarantee that a given partition is generating. Special approaches have been developed, which, however, work only in two dimensions grassberger85 ; christiansen97 .

In the context of permutation entropy, the $L$ -tuple $(u_{1},u_{2},\ldots,u_{L})$ is encoded as $S=(s_{1},s_{2},\ldots,s_{L})$ , where $s_{k}$ is the ordinal position of $u_{k}$ within the $L$ -tuple. For instance, the quadruplet $(1.3,6.1,2.5,0.7)$ is encoded as $S=(2,4,3,1)$ , meaning that the first element is the second smallest value, and so on. Accordingly, the phase space is automatically partitioned into cells, each containing all $L$ -tuples encoded in the same way. The cell size $\varepsilon$ is nothing but the spread among sequences encoded in the same way; the spread depends on the symbolic sequence.

We now illustrate the process with reference to the Hénon map, $x_{n+1}=a-x_{n}^{2}+bx_{n}$ for the standard parameter values $a=1.4$ and $b=0.3$ . In this case, the embedding dimension $L=2$ suffices to reproduce the behavior of the dynamical system. We consider $L=6$ and project the partition onto a two-dimensional space. More precisely, given a generic 6-tuple, obtained by iterating the Hénon map, we plot the last two coordinates of each 6-tuple.

The results are presented in Fig. 1. In the left panel we provide the standard representation of the Hénon attractor; in the right panel we plot the points belonging to 10 out of the 63 symbolic sequences obtained by iterating the map (notice that the maximum possible number of different sequences is, in principle, $6!$ ). In the picture we see a large diversity of cell structures. In some cases the cells are very thin and quite elongated. There is also a large diversity in the corresponding frequencies that are only vaguely proportional to the cell size.

The beauty and, at the same time, the limit of permutation entropy is that $\varepsilon$ depends on $L$ (actually $\varepsilon$ decreases for increasing $L$ ). As a result, it is sufficient to take the limit $L\to\infty$ , since it automatically implies $\varepsilon\to 0$ . The relationship between $L$ and $\varepsilon$ is advantageous when a quick analysis is required, since one has to deal with only one scaling parameter.

On the other hand, the dependence of $H_{p}$ on $L$ induces a dependence on $\varepsilon$ as well. These finite-size corrections eventually vanish (in the limit $L\to\infty$ ), but are typically non-negligible for the numerically accessible $L$ values. Moreover, the relationship between $L$ and $\varepsilon$ might represent a hindrance whenever there is no actual need to increase the spatial resolution, while it would instead be worth considering longer temporal windows.

In this paper, we revisit the definition of $H_{p}$ , by introducing a second length, $w<L$ , used to encode the signal; this way one can independently control the resolution $\varepsilon$ .

III Two-length approach

Given the $L$ -tuple $(u_{1},x_{2},\ldots,u_{L})$ , we start encoding the first $w\leq L$ elements $(u_{1},x_{2},\ldots,u_{w})$ as in the standard implementation of permutation entropy, according to their ordinal pattern. Next, we proceed by encoding each following element $u_{m}$ up to $m=L$ according to the ordinal position within the window $(u_{m-w+1},u_{m-w+2},\ldots,u_{m})$ . Given the pair $(w,L)$ of values, the maximum number of symbolic sequences of length $L$ is $w!(L-w)^{w}$ , a number much smaller than the number $L!$ allowed by the standard approach (when $w\ll L$ ). This is an advantage whenever a given $w$ value provides a high-enough resolution to ensure a meaningful encoding.

Let us now denote with $p_{i}(w,L)$ the probability (relative frequency) of the symbolic sequence $s_{i}$ of length $L$ , computed using an ordinal pattern of length $w$ . The corresponding generalized permutation entropy is thereby defined as,

[TABLE]

$H_{p}(L,L)$ coincides with the standard permutation entropy introduced by Pompe. The incremental entropy

[TABLE]

is the variation of information required to characterise the time series, when the window length is increased by one unit for a fixed partition stucture (here and in the following, we assume that the sampling time $T$ is one unit - whenever this is not the case, one should divide the rhs by $T$ ). Eq. (2) generalizes the formula

[TABLE]

used in the context of the standard definition of permutation entropy.

In Fig. 2, we compare the two quantities with reference to the Hénon map. There, we see that for increasing $L$ (and $w$ ), $\Delta H_{p}$ converges faster than $\delta H_{p}$ to $h_{KS}$ , which coincides, in this case, with the positive Lyapunov exponent of the map, $\lambda_{1}=0.4169$ .

$\Delta H_{p}(w,L)$ performs better than $\delta H_{p}$ , since it corresponds to a Markov process (of order $L-w$ ), while $\delta H_{p}$ is a hybrid observable, being the difference between two terms, $H_{p}(L,L)$ and $H_{p}(L-1,L-1)$ , which refer to different partitions and thereby to a different symbolic encoding.

For those researchers who do not want to engage themselves in the implementation of the full two-length approach, they can obtain a genuine and correct first-order Markov approximation by proceeding as follows. Let $M_{ji}=p(s_{j}|s_{i})$ denote the conditional probability to observe the sequence $s_{j}$ after shifting forward the $L$ -tuple (encoded by $s_{i}$ ) by one time unit. $M_{ji}$ can be easily estimated by determining the fraction of observed $i\to j$ transitions.

Let us then introduce the recursive relation

[TABLE]

where $q_{n}$ is a vector of probabilities (i.e. with sum-1 positive entries). If the underlying dynamics were a memory-1 Markov process, the numerically determined components $q_{n}$ would represent a fixed point of the above relation. In general, this is not the case. One can, nevertheless iterate the above equation, (starting from a generic initial condition) until a fixed point is obtained, i.e. a vector $q(s_{j})$ that is left invariant by the above transformation.

The corresponding entropy

[TABLE]

coincides by construction with the first order Markov approximation $\Delta H_{p}(L-1,L)=H_{p}(L-1,L)-H_{p}(L-1,L-1)$ of the permutation entropy.

We conclude this section by discussing the dependence of $H_{p}(w,L)$ on $w$ for fixed $L$ . As $L$ is kept constant, it means we always refer to the same embedding dimension $L$ . The variation of the entropy is therefore due to the refinement of the partition implicitly induced by $w$ . In other words, the entropy variation is the kind of observable that is computed when a fractal dimension is being determined within a given embedding space kantz04 .

In order to give direct evidence of this dependence, we have estimated the spread $\varepsilon$ associated to each symbolic sequence, by computing the standard deviation of the last variable in the corresponding $L$ -tuple (in other words, we have followed the same strategy adopted in Ref. politi17 ). The logarithm of the spread has been then averaged over all symbolic sequences for a given value of $w$ and $L$ . The variation of $H_{P}(w,L)$ with $w$ is plotted in Fig. 3, where, instead of referring to $w$ itself, we treat $\langle\varepsilon\rangle(w,L)$ as the independent variable (for $L=14$ ). There, we see that the entropy increases with the logarithm of $\varepsilon$ , as expected since upon increasing $w$ , the resolution used to partition a space of dimension $L$ increases as well. A fractal structure would imply a linear growth as indeed seen in Fig. 3, where the slope (from a fit over the largest $w$ -values, i.e. smallest $\varepsilon$ -values) gives an exponent approximately equal to 1.5, relatively close to, but different from, the fractal dimension of the Hénon map, $D=1.26$ .

We suspect that the quantitative difference is to be attributed to the fact that the cells induced by the ordinal patterns are not isotropic (i.e. characterized by a single linear size), as implicitly assumed in the definition of the fractal dimension. We elaborate more on this point in the next section.

IV Partition structure

In the previous section we have shown that it is possible to improve the characterization of a complex time-series by generalizing the encoding strategy and including the spread among equally-coded $L$ -tuples into the analysis.

In this section we analyse the distribution of points within each partition atom with the help of the principal component analysis (PCA), alias orthogonal decomposition broomhead86 . PCA is a linear tool and, as such, cannot provide an accurate representation of an invariant measure distributed over a nonlinear manifold. Nevertheless, if the analysis is restricted to tiny regions, such as the atoms of the partition, the nonlinear effects are relatively smaller and the outcome more meaningful. This approach has been already implemented in past studies of the fractal dimension of high-dimensional systems politi92 , with reference to a predetermined homogeneous partition. Here we consider the atoms induced by the ordinal representation, referring to the Hénon map, for $w=L=6$ . PCA consists in first computing the covariance matrix $C_{ij}=\langle u_{i}u_{j}\rangle-\langle u_{i}\rangle\langle y_{j}\rangle$ , where $u_{i}$ denotes the $i$ th component of an $L$ -tuple and the average is performed over all points lying within the same cell (i.e. encoded in the same way). The resulting eigenvalues $\mu_{k}$ represent the variance of the distribution along the so-called principal axes (the eigenvalues are assumed to be ordered from the largest to the smallest ones). Given such information, we further average the logarithm of $\mu_{k}$ for each given $k$ over all cells (more precisely, we consider the $70\%$ most populated ones to avoid including $\mu_{k}$ -values of poorly populated cells). The outcome is presented in Fig. 4, using a logarithmic scale (see the black solid curve at the bottom of the figure).

If one could neglect the curved nonlinear structure of the underlying attractor, only two eigenvalues should be different from zero (due to the two-dimensional nature of the Hénon map), while the remaining four eigenvalues should strictly vanish. Any deviation from zero of the third to sixth eigenvalue is therefore a manifestation of nonlinear effects over the scale of the cell size. In practice we see that all six eigenvalues are different from zero although their amplitude decreases very rapidly with the index $k$ (see the bottom solid curve).

In order to interpret this outcome, we turn our attention to a simple case, that can be handled analytically. We consider a single cell in a three-dimensional space (i.e. we assume $L=3$ ), filled by statistically independent triplets. Each triplet is generated by iterating twice the recursive relation $x_{n+1}=x_{n}+x_{n}^{2}$ , starting from a randomly chosen initial condition $x_{1}$ , uniformly distributed within the interval $\left[-\Delta,\Delta\right]$ . Averages are then performed over different choices of $x_{1}$ (rather than being time averages). The resulting triplets are by definition aligned along a one-dimensional pseudo-parabolic curve. The elements of the covariance matrix $C_{ij}$ can be determined analytically by performing suitable integrals and one can also obtain analytical expressions for the three eigenvalues. Rather than reporting the resulting cumbersome expressions, we plot the $\mu$ values in Fig. 5 for different $\Delta$ values in doubly logarithmic scales (see full circles, crosses and triangles). Additionally, we superpose the expected scaling behavior, as obtained from a perturbative calculation, which yields $\mu_{1}=\Delta^{2}$ , $\mu_{2}=8\Delta^{4}/45$ , and $\mu_{3}\approx 8\Delta^{6}/525$ and exhibit a very good agreement with the numerical results.

In practice, the (quadratic) nonlinearity of the initial set of points induces nonzero eigenvalues (besides the first one). Interestingly, the higher the order $k$ of the eigenvalue, the smaller its size. This means that the eigenvalues decrease exponentially with $k$ , the decay rate being approximately $|\ln\Delta|$ (actually, it might be even larger, because of the multiplicative contribution of the prefactors). In other words, in the presence of weak nonlinearities (i.e. small $\Delta$ ), PCA acts as a sort of perturbative expansion, the eigenvalues being a sort of probes which detect nonlinearities of increasing order.

Returning back to the Hénon map, it is resasonable to interpret the pseudo-exponential behavior of the eigenvalues reported in Fig. 4 as a manifestation of the nonlinear structure of the two-dimensional manifold containing the Hénon attractor. Interpretative doubts, however, persist about the value of the first two eigenvalues, which both correspond to directions actually spanned by the invariant measure. In order to partially clarify this point, we have computed

[TABLE]

$R(w,L)$ is, by definition, larger than 1; it measures the degree of anisotropy of the cells induced by the ordinal patterns. In Fig. 3, we plot $R(w,L)$ for the same $w$ and $L$ values used in the computation of $H_{p}$ (see triangles). Its divergence for $\varepsilon\to 0$ , shows that the cells are increasingly elongated. We suspect that this might be the origin of the overestimation of the fractal dimension. A more quantitative analysis is however required to relate the anysotropy of the covering with the scaling behavior of the corresponding entropy.

We finally briefly explore the role of observational noise. In Fig. 4, we report the six eigenvalues for increasing level of noise. On the one hand, the noise has an obvious implication: it induces a saturation of the exponential-like decrease: the smallest eigenvalue approximately scales as $\Delta^{2}$ , where here $\Delta$ is the noise amplitude. On the other hand, we see a counterintuitive phenomenon: the average leading eigenvalue decreases upon increasing $\Delta$ . This effect is presumably due to changes in the symbolic representation that more likely occur in certain regions of the phase space than in others.

V Conclusions and open problems

In this paper, we have revisited the definition of permutation entropy by generalizing the approach proposed in Ref. bandt02a with the introduction of a second window-length $w$ which allows controlling the partition size. This strategy increases the flexibility of the ordinal-pattern analysis of generic time-series; in particular, if combined with the measure of trajectory spreading, it allows extracting additional information on the structure of the invariant measure and to have hints on the presence of noise.

We have exclusively based our analysis of a prototypical example of low-dimensional chaos: the Hénon map. It is certainly desirable to extend the method to higher dimensions: this is, in fact, one of the greatest challenges in the analysis of realistic time series. As a preliminary step in this direction, here we present results for the so-called generalized Hénon map (GH): $x_{n+1}=a-x_{n}^{2}+bx_{n-2}$ . This model has been already discussed in Ref. politi17 , where it was found to be relatively nasty (exhibiting a rather slow convergence, even compared to the higher-dimensional attractor generated by the Mackey-Glass equation). For $a=1.5$ and $b=0.29$ , the GH map has two positive Lyapunov exponents so that the KS entropy is equal to 0.1756 (as from the sum of the first two Lyapunov exponents).

The results presented in Fig. 6 confirm that the two-length approach is superior to the computation of the standard permutation entropy. However, the convergence to the asymptotic value is slower and, more important, it seems to follow a weird pattern. In fact, smaller $w$ values seem to yield better results: compare, for instance, full circles ( $w=3$ ) to triangles ( $w=7$ ). As it can be seen from the inset, where the deviation from the asymptotic value is plotted versus $L$ in doubly logarithmic scales, all sets of measurement are compatible with the final value. The reason of the lower performance of the supposedly more accurate partitions need to be further clarified. Anyway, this “anomalous” scenario is consistent with the slowness of the convergence reported in Ref. politi17 .

Altogether, the method proposed in this paper is significantly more accurate than the standard one, but there are many issues that require additional investigations: what is the reason for the slow convergence exhibited by the GH map? Is it a peculiarity of the model itself, or a general feature of some broad class of high-dimensional dynamics? Moreover, can we quantify the effect of noise so as to distinguish genuine deterministic from stochastic contributions?

Acknowledgements.

One of us, (SJW), wishes to acknowledge financial support from the Carnegie Trust for his summer project.

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) C. Bandt and B. Pompe, Phys. Rev. Lett. 88 , 174102 (2002).
2(2) Y. Cao, W. Tung, J.B. Gao, V.A. Protopopescu, and L.M. Hively, Phys. Rev. E 70, 046217 (2004).
3(3) P.J. Weck, D.A. Schaffner, M.R. Brown, and R.T. Wicks, Phys. Rev. E 91 , 023101 (2015).
4(4) C. Quintero-Quiroz, S. Pigolotti, M.C. Torrent, C. Masoller, New J. Phys. 17 , 093002 (2015).
5(5) Ya. Sinai, Scholarpedia 4 (3), 2034 (2009).
6(6) A. Politi, Phys. Rev. Lett. 11 144101 (2017).
7(7) H. Kantz and Th. Schreiber, Nonlinear time series analysis , (CUP, Cambridge 2004).
8(8) F. Takens, in Dynamical Systems and turbulence , D.A. Rand and L.-S Young eds., 366 (Springer, London 1981).