Eigenvalue and Eigenvector Statistics in Time Series Analysis
Paolo Barucca, Mario Kieburg, Alexander Ossipov

TL;DR
This paper develops a supersymmetric theoretical framework to analyze the eigenvalue and eigenvector statistics of cross-correlation matrices in correlated time series, providing a universal benchmark for complex system analysis.
Contribution
It introduces a novel supersymmetric approach to derive analytical results for eigenvector statistics in correlated time series, filling a gap in existing theoretical understanding.
Findings
Analytical expressions for eigenvector statistics derived
Benchmark results for correlated signal analysis established
Framework applicable to diverse complex systems
Abstract
The study of correlated time-series is ubiquitous in statistical analysis, and the matrix decomposition of the cross-correlations between time series is a universal tool to extract the principal patterns of behavior in a wide range of complex systems. Despite this fact, no general result is known for the statistics of eigenvectors of the cross-correlations of correlated time-series. Here we use supersymmetric theory to provide novel analytical results that will serve as a benchmark for the study of correlated signals for a vast community of researchers.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Eigenvalue and Eigenvector Statistics in Time Series Analysis
Paolo Barucca
Department of Computer Science, University College London, London WC1E 6EA, United Kingdom
Mario Kieburg
Faculty of Physics, Bielefeld University, P.O. Box 100131, D-33501 Bielefeld, Germany
Alexander Ossipov
School of Mathematical Sciences, University of Nottingham, Nottingham NG7 2RD, United Kingdom
Abstract
The study of correlated time-series is ubiquitous in statistical analysis, and the matrix decomposition of the cross-correlations between time series is a universal tool to extract the principal patterns of behavior in a wide range of complex systems. Despite this fact, no general result is known for the statistics of eigenvectors of the cross-correlations of correlated time-series. Here we use supersymmetric theory to provide novel analytical results that will serve as a benchmark for the study of correlated signals for a vast community of researchers.
Introduction
The theory of complex systems ultimately deals with the identification of patterns of simple behaviours accounting for the emergence of universal dynamics in the time series measured in a vast range of disciplines, including condensed matter physics, medicine, finance, signal transmission, biology, and more recently computational social sciences Strogatz (2018). A time series is a series of values scanned over time of a given observable of a system Chatfield (2018) such as the sea level Rahmstorf (2007), the temperature of a lake Sharma et al. (2015), the neuron activity in electroencephalography (EEG) Šeba (2003); Müller et al. (2006), the response in a unit of volume of a magnetic resonance imaging experiment Kwong et al. (1992), the gross domestic product of a country Lee (2005), the price or return of a stock Plerou et al. (1999); Barucca (2014), the volume of an order in the market Bouchaud et al. (2009); Chiarella et al. (2009), the infected individuals in a region affected by an epidemics Grenfell et al. (2001), and the online activity of a user O’Connor et al. (2010).
The basic analysis that is ubiquitously performed when dealing with multiple time series are covariance and correlation analysis, especially with the aim of identifying the main factors accounting for time variability and parsimoniously representing the state space of the system, through denoising and dimensionality reduction. The generality of this statistical approach constitutes the basis for Principal Component Analysis (PCA) Pearson (1901); Jolliffe (2011), clustering analysis, and many other data mining algorithms Lloyd (1982). In these techniques one distinguishes between eigenvalue and eigenvector statistics and both of them carry important information as we know, for example, from the theory of quantum disordered systems. Therefore it is even more surprising that only few results are available for the cross-statistics between eigenvalues and eigenvectors, when dealing with the covariance and correlation matrices of noisy time series.
The spectral density of the eigenvalues is up to now the major quantity where the theory provides robust and general results Laloux et al. (1999); Lillo and Mantegna (2005); Allez et al. (2012); Majumdar and Vivo (2012). For instance, the Marc̆enko–Pastur distribution (MPD) Marc̆enko and Pastur (1967) usually serves as a blueprint for describing the influence of white noise in the time series on the spectral density. Any deviation from the MPD, for instance outliers, can be considered as system specific information so that the MPD serves as a filter. However, some eigenvalues encoding relevant information might be obscured by the bulk of the spectrum described by the MPD. Then PCA may remove relevant data that should be taken into account. To distinguish those system specific eigenvalues from the eigenvalues of the MPD one needs to take into consideration the eigenvector statistics. An important step in this direction is made in the present Letter. We derive an analytical formula for the first moment of a fixed eigenvector component conditioned to a chosen eigenvalue. Moreover, we state a conjecture on their general moments and distributions for a correlation matrix of noisy time series. Our results provide insights and pave the way for a much more informative spectral decomposition in time series analysis, allowing not only to focus on the spectral density but also on the individual contribution of each component to the spectrum, leading to a much deeper understanding of a system’s dynamics.
Random Matrix Model
Specifically, we study the statistics of the eigenvectors and the eigenvalues of the matrix , with representing time series of length or, in the case of PCA, descriptors with variants, and being the transpose of . Thus can be interpreted as the covariance matrix between the time series aggregated in or the covariance between the descriptors respectively. The real rectangular matrix in our model is composed of four matrices
[TABLE]
where is a deterministic real matrix and is a Gaussian random matrix distributed by
[TABLE]
The two real symmetric matrices and are positive definite and represent a spatio-temporal correlation between the various time series. Here, the matrix can be identified with a time correlation, the matrix with the spatial correlations, and additionally, at difference with many common models, we include an offset . Hence is a non-centred and doubly correlated Gaussian random matrix. This form allows the model to capture in detail the case of factor models ubiquitous in statistics and econometrics.
Though our model is quite general, it is still not the most general Gaussian random matrix model. We assume that the spatio-temporal correlations of the multivariate time series factorize in the two matrices, and . Therefore time-dependent spatial correlations, like the two epoch model Akemann et al. (2016), are not considered here.
The random matrix model defined above can be also considered as a simple deformation of the standard real Wishart ensemble of random matrices, in which the orthogonal invariance is broken in several ways. Such non-invariant deformations of the standard random matrix ensembles were introduced and studied in different contexts including wireless communication Couillet and Debbah (2011), vibration analysis Soize (2003), signal processing Nadakuditi and Edelman (2008) and neural networks Ahmadian et al. (2015). There is a growing interest to the statistical properties of the eigenvectors in these ensembles. While there are some recent results about the statistics of the eigenvectors in the deformed Gaussian Orthogonal and Unitary ensembles Allez et al. (2014); Truong and Ossipov (2016a, b); Bun et al. (2017); Bourgade and Yau (2017); Benigni (2017); Truong and Ossipov (2018), we are not aware of similar results for the Wishart ensemble except for Ref.Bourgade and Yau (2017), in which the ergodicity of the eigenvectors was proven for the special case , .
In the following, we will not simply focus on the computation of the spectral density of the eigenvalues, analysed in Recher et al. (2010, 2012) with the same supersymmetric (SUSY) approach as in the present work, but also calculate a detailed eigenvector statistics of the matrix , whose eigenvalues represented by the diagonal matrix and the eigenvectors represented by the columns of the matrix . The full information about the statistics of the eigenvector components is contained in the conditional density
[TABLE]
where refers to a particular eigenvector component and
[TABLE]
is the mean density of the eigenvalues and stands for the ensemble average over the distribution of . In the case of a factorisation of the eigenvector and eigenvalue statistics, as in the Wishart ensemble, one finds the Porter–Thomas distribution Porter and Thomas (1956)
[TABLE]
which is independent of the component and the eigenvalue due to the Haar distributed eigenvectors. This simplification cannot be expected to hold in our non-trivial model as well as in a realistic situation. The computation of (3) or its arbitrary moments
[TABLE]
where is a positive integer, is technically a very challenging problem. In this Letter we focus on the analytical derivation of the first moment and make a conjecture about an arbitrary moment and in the conclusions.
The moments of the eigenvectors are also a standard tool to characterise properties of complex quantum systems and are used to distinguish different phases in condensed matter physics Evers and Mirlin (2008). Hence, we expect that it may give valuable insights for time series as well.
Before we start with the analytical calculation of , we want to point out that the eigenvector components are basis dependent. Thus the conditional distribution strongly depends on the reference frame. In this work such a frame is chosen as the eigenbasis of , allowing us to investigate the broadening of the eigenvectors due to the white noise and its strength . Another natural and valuable reference frame could be the eigenbasis of which we do not consider here for simplicity.
Eigenvector Statistics with SUSY
The first moment of the eigenvectors, see (6) for , can be computed by taking the imaginary part and the limit of a regularization of the quantity
[TABLE]
where . Defining the -dimensional unit vector with unity at the position and zero otherwise, this quantity can be generated by differentiating
[TABLE]
with respect to the auxiliary parameter , at . is chosen to be real to guarantee convergence later on. Following the standard steps of the SUSY method Recher et al. (2010, 2012), we represent first the generating function by the supersymmetric Gaussian integral, average over the random matrix and finally apply the Hubbard-Stratonovich transformation Fyodorov et al. (2008). In this way, we derive the following supersymmetric representation for (see the Supplemental Material sup for details),
[TABLE]
where , . The supermatrices are symmetric in the boson-boson block and self-dual in the fermion-fermion block and their eigenvalues run along complex contours that are detailed in the Supplemental Material sup . The supersymmetric Green function has the form
[TABLE]
with . The representation (9) is exact, but rather involved and technical. An expression for the mean level density can be obtained by summing over and should be compared with the corresponding result in Recher et al. (2010, 2012); Waltner et al. (2015). The above expression simplifies a lot in the limit , which is considered next.
Macroscopic level density and limiting statistics
In most applications, one is interested in the limit . In this limit the integral in Eq.(9) can be evaluated using the saddle-point approximation. To derive the saddle-point equation, it is convenient to introduce the supermatrices and , which can be considered as independent. The saddle-point solution contributing most to the integral is given by the diagonal matrices and with the complex parameters and that satisfy the coupled equations sup
[TABLE]
The mean level density is up to a normalisation constant given by
[TABLE]
where we assume without loss of generality. The case only yields an additional Dirac delta function at the origin. The formula (12) reduces to the MPD Marc̆enko and Pastur (1967) in the case of the Wishart ensemble, i.e., , and . We illustrate the result for in Fig. 1 for the one-factor model, which is described in the next subsection.
The result for can be expressed in terms of the same matrix and reads
[TABLE]
which constitutes the main result of the present Letter. The normalisation is fixed by the condition . We note that for a Haar distributed vector one has .
One-factor model
To illustrate our findings we apply our general results to the one-factor model supplemented with Gaussian noise. Specifically, we set , where and are column vectors of length and , respectively. The correlation matrices are chosen to be diagonal and . The vector represents a common factor, e.g. the market mode in financial time series analysis, and the component quantifies the relative weight of the common factor on the th time series, before normalization.
We plug the matrices of the one-factor model into the saddle-point equation (11) and simplify the resulting expression via the Sherman-Morrison identity for the inverse matrices Sherman and Morrison (1950), i.e. . This leads to the coupled equations
[TABLE]
Solving these saddle-point equations we can derive the spectral density and the moments of the eigenvectors, simply by plugging the following matrix elements in Eqs. (12)-(13)
[TABLE]
We illustrate these results in Figs. 1 and 2, where we also compare them with Monte-Carlo simulations. The deviations from the Porter-Thomas distribution (5), which yields for the first moment the constant , can be readily seen for some components of the eigenvectors. They indicate that the corresponding eigenvalues still carry a lot of information on the matrix , although these eigenvalues are evidently inside the bulk of the spectrum, cf., Fig 1. This simple example demonstrates the strength of the combined statistics of eigenvalues and eigenvectors.
Conclusions
The general result in Eq. (13) provides a powerful analytical methodology to quantify the expected value of the square of specific components in a given eigenvalue interval for a wide range of random matrices. We tested numerically these analytical results in detail for the one-factor model (see Figs. 1-2). Our general formulation allows an arbitrary number of factors to be added in the matrix . Although our analytical results were derived in the limit , they show a very good agreement with the results of numerical simulations at finite and . The rate of convergence to the limiting statistics will generally depend on the input , , and .
In the present work we derived analytically a closed result only for the first moment of an eigenvector under the condition of a fixed eigenvalue. However we conjecture that all higher moments are related to the first moment as follows:
[TABLE]
which corresponds to a locally rescaled Porter-Thomas distribution
[TABLE]
A similar result has been also found for the conditioned eigenvector statistics of the deformed Gaussian Unitary Ensemble (GUE) in Truong and Ossipov (2016a, b). The only difference is the prefactor in (16), which is equal to in our case and given by for the complex eigenvectors in the deformed GUE Truong and Ossipov (2016a, b). These numerical values result from the averaged moments of real and complex normalized vectors, respectively. We have tested this conjecture numerically for and found a nice agreement, see Fig. 3.
We are confident that our analytical results are of general relevance for the spectral decomposition of time series and could lead to unprecedented understanding of the full statistics of the eigen-components in signal analysis. A strong deviation of the moment from the constant hints at an eigenvector-eigenvalue pair that contains system specific information. This knowledge can improve PCA and other techniques to reduce highly dimensional data without loosing relevant information.
Acknowledgements.
MK acknowledges financial support by the German research council (DFG) through CRC 1283: “Taming uncertainty and profiting from randomness and low regularity in analysis, stochastics and their applications”. PB acknowledges support from the London Institute for Mathematical Sciences (LIMS).
I Derivation of the supersymmetric integral representation for the moments of the eigenvectors
The quantity defined in Eq.(7) can be computed by differentiating the generating function (8)
[TABLE]
with respect to and setting . The normalization is given as . In order to construct a representation of in terms of the supersymmetric integral we use the identity
[TABLE]
where we employed the matrix which is an dimensional matrix of real Grassmann variables and is an dimensional ordinary real matrix. The two matrices are introduced in order to cancel the resulting determinants from the Gaussian integral. To ensure integrability we have introduced the constant matrix , where is the second Pauli matrix.
To simplify the notation, we define the diagonal supermatrix and the rectangular supermatrix . Moreover, we rearrange the matrices and in the supermatrix and the supermatrix as follows
[TABLE]
Both matrices are two real rectangular supermatrices and with dimensions and respectively. The first two columns of and are real variables while the last two columns are Grassmann variables. In this way, we find
[TABLE]
The average over yields
[TABLE]
Since the action contains a quartic term in the matrices and , the next step is to perform the Hubbard-Stratonovich transformation, which allows one to decouple such terms. Up to the normalization the result reads
[TABLE]
The parametrization of the two supermatrices needs to be chosen carefully to guarantee the convergence of the integral. They are given by
[TABLE]
equipped with the flat Berezinian measure
[TABLE]
The ordinary matrices and are negative definite and symmetric and can be diagonalized with orthogonal matrices as follows
[TABLE]
with two positive definite diagonal matrices. The matrix has the form
[TABLE]
The matrices and are Hermitian self-dual matrices and and are two rectangular matrices whose entries are independent real Grassmann variables.
The shift of in by the imaginary part solves a convergence problem in the Gaussian terms in (S6). In particular the Gaussian integrals over the supermatrices and are absolutely convergent and yield
[TABLE]
where is defined as in (10). Hence, has four indices with and . To fix the normalization we take and notice that becomes approximately . Therefore we end up with the intermediate result
[TABLE]
Coming back to our original problem we notice that we are interested in the first derivative with respect to at . In particular, the quantity is given by
[TABLE]
which coincides with Eq.(9).
II Saddle-point equation
For deriving the saddle-point equation we only need to consider the exponential function and the superdeterminant in the integral (S13). The term is only a polynomial prefactor which does not influence the saddle-point solution. It is easier to study the saddle-point by introducing the supermatrices and , which can be considered to be independent. Then the action, i.e. the function that need to be minimised, is
[TABLE]
Differentiating it with respect to and yields two coupled equations
[TABLE]
The operator is the partial trace over the first tensor space which is here the space of ordinary and matrices, respectively.
The saddle-point equation is rotation invariant, i.e., when is a solution then this is also true for as well as and any kind of combination. This can be seen by multiplying both equations from the left and the right with and , which is equivalent to replacing by . Assuming that the saddle-point solution is unique, we conclude then that and must commute. The uniqueness of the solution should follow from the fact the contour of integration, which was shifted by the term , can’t cross the poles and the fact that the Berezinian (the Jacobian in superspace), that is for , is not suppressed only when the multiplicity of the eigenvalues in the Fermion-Fermion blocks is equal to those in the Boson-Boson block. The Fermion-Fermion blocks are doubly degenerate due to their Hermitian self-duality. Thus also the Boson-Boson blocks are doubly degenerate, which implies for supermatrices that we can diagonalize and simultaneously and the solution has to be diagonal and degenerate, i.e., and . Substituting this ansatz into Eq. (S15) we derive Eq.(11), which is
[TABLE]
The regularization only determines which saddle-point has to be chosen, especially which sign the imaginary part carries. Assuming the correct sign of the imaginary part we neglected this regularization in Eq. (11).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Strogatz (2018) S. H. Strogatz, Nonlinear dynamics and chaos: with applications to physics, biology, chemistry, and engineering (CRC Press, 2018).
- 2Chatfield (2018) C. Chatfield, Introduction to multivariate analysis (Routledge, 2018).
- 3Rahmstorf (2007) S. Rahmstorf, Science 315 , 368 (2007).
- 4Sharma et al. (2015) S. Sharma, D. K. Gray, J. S. Read, C. M. O’Reilly, P. Schneider, A. Qudrat, C. Gries, S. Stefanoff, S. E. Hampton, S. Hook, et al. , Scientific Data 2 , 150008 (2015).
- 5Šeba (2003) P. Šeba, Physical review letters 91 , 198104 (2003).
- 6Müller et al. (2006) M. Müller, Y. L. Jiménez, C. Rummel, G. Baier, A. Galka, U. Stephani, and H. Muhle, Physical Review E 74 , 041119 (2006).
- 7Kwong et al. (1992) K. K. Kwong, J. W. Belliveau, D. A. Chesler, I. E. Goldberg, R. M. Weisskoff, B. P. Poncelet, D. N. Kennedy, B. E. Hoppel, M. S. Cohen, and R. Turner, Proceedings of the National Academy of Sciences 89 , 5675 (1992).
- 8Lee (2005) C.-C. Lee, Energy economics 27 , 415 (2005).
