Wavelet Spectra for Multivariate Point Processes
Edward A.K. Cohen, Alexander J. Gibberd

TL;DR
This paper introduces a wavelet-based statistical framework for analyzing multivariate point processes, enabling detection of non-stationarity and time-varying dependencies, with applications to neural spike train data.
Contribution
It develops a temporally smoothed wavelet periodogram with asymptotic distributional properties and applies it to test stationarity and analyze dependencies in multivariate point processes.
Findings
Wavelet periodogram is asymptotically Wishart distributed under stationarity.
Method effectively detects non-stationarity in neural spike trains.
Wavelet coherence measures inter-process correlation over time and scale.
Abstract
Wavelets provide the flexibility to analyse stochastic processes at different scales. Here, we apply them to multivariate point processes as a means of detecting and analysing unknown non-stationarity, both within and across data streams. To provide statistical tractability, a temporally smoothed wavelet periodogram is developed and shown to be equivalent to a multi-wavelet periodogram. Under a stationary assumption, the distribution of the temporally smoothed wavelet periodogram is demonstrated to be asymptotically Wishart, with the centrality matrix and degrees of freedom readily computable from the multi-wavelet formulation. Distributional results extend to wavelet coherence; a time-scale measure of inter-process correlation. This statistical framework is used to construct a test for stationarity in multivariate point-processes. The methodology is applied to neural spike train data,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Wavelet Spectra for Multivariate Point Processes
Edward A. K. Cohen
Department of Mathematics, Imperial College London, South Kensington Campus, London SW7 2AZ, U.K.
Alexander J. Gibberd
Department of Mathematics and Statistics, Lancaster University, Bailrigg, Lancaster LA1 4YF, U.K.
Abstract
Wavelets provide the flexibility to analyse stochastic processes at different scales. Here, we apply them to multivariate point processes as a means of detecting and analysing unknown non-stationarity, both within and across data streams. To provide statistical tractability, a temporally smoothed wavelet periodogram is developed and shown to be equivalent to a multi-wavelet periodogram. Under a stationary assumption, the distribution of the temporally smoothed wavelet periodogram is demonstrated to be asymptotically Wishart, with the centrality matrix and degrees of freedom readily computable from the multi-wavelet formulation. Distributional results extend to wavelet coherence; a time-scale measure of inter-process correlation. This statistical framework is used to construct a test for stationarity in multivariate point-processes. The methodology is applied to neural spike train data, where it is shown to detect and characterise time-varying dependency patterns.
1 Introduction
We adopt the construction of Hawkes (1971) which presents a -dimensional multivariate point process () as a counting vector where the random element states the number of events of type over the interval . Its first order properties are characterized by its rate , defined as where , and its second order properties at times and characterized by its covariance density matrix
[TABLE]
Process is second-order stationary (henceforth referred to simply as “stationary”) if is constant for all and depends only on . In this setting we will denote the covariance density matrix .
The spectral domain provides a rich environment for representing this second order structure and is based on the fact that stationary stochastic processes can be considered a composite of subprocesses operating at different frequencies. The spectral density matrix of a stationary point process is the Fourier transform of its covariance density matrix (Bartlett, 1963), namely
[TABLE]
A fundamental summary of the second order relationship between a pair of component processes, and say, is their coherence defined as
[TABLE]
This provides a normalized measure on of the correlation structure between the processes in the frequency domain. For time series data, it has been used extensively in several disciplines, including climatology, oceanography and medicine. For event data, it has been an important tool in neuroscience for the analysis of neuron spike train data.
Estimation of the coherence can be achieved by substituting smoothed spectral estimators into (S1). Failure to smooth (i.e. simply using the periodogram) will result in a coherence estimate of one for all frequencies, irrespective of whether correlation exists between the pair of processes or not. Tractability of the coherence estimator’s distribution is crucial for principled statistical testing and dependent on the smoothing procedure used (Walden, 2000).
Often, stochastic processes do not conform to the assumptions of stationarity. This might occur through simple first-order trends in the underlying data generating process, or more typically, complex changes in the second (or higher) order structure of the process. This renders classical Fourier methods obsolete and demands more flexible non-parametric methodology, with wavelets forming a natural basis with which to analyse non-stationary behaviour at different scales.
For a wavelet , the continuous wavelet transform at scale and translation (or time) of , observed on the interval , is defined by Brillinger (1996) as
[TABLE]
where ∗ denotes the complex conjugate. The th element of this stochastic integral is computed as , where are the ordered event times of and . Thus, working with the continuous time process is possible if the finite set of event times are known. The wavelet periodogram is subsequently defined as where H denotes the complex conjugate transpose.
As is the case with the Fourier periodogram, smoothing is required for two reasons. Firstly to control variance, and secondly to give meaningful values of the wavelet coherence estimator. Wavelet coherence is an analogue of coherence which provides a normalized measure on of the correlation between a pair of processes in time-scale space. It is defined as
[TABLE]
where is a smoothed version of . In the time series setting, wavelet coherence has been extensively applied in a wide range of disciplines (e.g. Torrence and Webster, 1999; Grinsted et al., 2004). Understanding the distributional properties of these smoothed coherence estimators is vital for rigorous statistical analysis and testing. In the Gaussian discrete-time setting the asymptotic distribution of coherence is widely studied (Cohen and Walden, 2010a, b), however, the point-process case has received little attention.
There are a wide range of ways in which non-stationarity can occur. Hence, rather than assume a specific model of non-stationarity, we here propose to study the properties of the temporally smoothed wavelet periodogram and coherence for stationary point-processes, thus providing a framework in which we deploy methods for exploratory data analysis and formal tests for detecting non-stationary.
2 Temporally smoothed wavelet periodogram
2.1 Formulation
Assumption 1**.**
Wavelet is a real or complex valued continuous function that satisfies (i) , (ii) , and (iii) the admissibility condition , where is the Fourier transform of .
Assumption 2**.**
Smoothing function is a non-negative, symmetric function supported and continuous on , and normalized such that .
Let and satisfy Assumptions 1 and 2, respectively. We define the temporally smoothed wavelet periodogram as
[TABLE]
where with controlling the level of smoothing. This is a wavelet analogue to Welch’s weighted overlapping sample averaging spectral estimator for stationary time series (Welch, 1967; Carter, 1987). It will prove convenient for the level of smoothing to scale with , and we therefore let , with .
For a particular choice of , and defining the Hermitian kernel function (at scale ) as
[TABLE]
the temporally smoothed wavelet periodogram in (S3) can be expressed as
[TABLE]
where . The th element of is computed as
[TABLE]
Given a choice for and , the form of will depend on . Throughout this paper, we use the examples of the complex valued Morlet wavelet and the real valued Mexican hat wavelet. These are examples of wavelets for which is analytically tractable.
2.2 Practical implementation
For continuous time wavelet analysis, the wavelets themselves are often non-compactly supported. However, the region of significant support is typically well localized and a close approximation to can be obtained through utilising the approximating wavelet
[TABLE]
For example, the Morlet wavelet shown in Fig. 1 has infinite support but can be well approximated by for . In practice, to speed up computation, it can make sense to use the approximating wavelet as only a subset of the data is required to compute the wavelet transform. From herein, to simplify notation, will we use to represent both the original and approximating wavelet, assuming that is chosen appropriately.
In a finite data setting we are restricted to regions of the time-scale space in which we can fairly evaluate (S2) without the consequences of edge effects at either ends of the data. These issues are compounded when smoothing across time, for a smoothing window with , the effective size of support for is , therefore we restrict ourselves to values of and for which . This defines an isosceles triangle with vertices , and , where . This is an adaptation to the cone of influence (Mallat and Peyré, 2008, p. 215) that also mitigates for smoothing distances. In practice, a positive minimum value of should be imposed to ensure a reasonable amount of event data exists in the smoothing range.
3 Multi-wavelet representation
3.1 Formulation
Given is continuous and non-negative definite by construction, associated with kernel is the Hermitian linear operator defined as It follows from Mercer’s Theorem (Mercer, 1909) that where are the orthonormal eigenfunctions of with non-zero eigenvalues ordered in decreasing size. Noting that , it follows that . From here on, we refer to as the eigenfunctions of . The following proposition shows that these orthonormal eigenfunctions are themselves wavelets.
Proposition 1**.**
Let satisfy Assumption 1, satisfy Assumption 2, and for the corresponding non-negative definite kernel have eigenfunctions . Every eigenfunction with a non-zero eigenvalue is a wavelet that satisfies the conditions of Assumption 1.
We adopt the term eigen-wavelets for the functions .
Turning our attention back to the temporally smoothed wavelet periodogram, it is straightforward to show
[TABLE]
Thus, the scaled and shifted versions , of the eigen-wavelets are themselves the eigenfunctions of , and again from Mercer’s theorem The temporally smoothed wavelet periodogram can thus be represented as
[TABLE]
where is the continuous wavelet transform of at scale and translation with respect to eigen-wavelet . Therefore the temporally smoothed wavelet periodogram is equivalent to the weighted sum of wavelet spectra arising from the orthonormal eigen-wavelet system. This is analogous to multitapering (Thomson, 1982) and comparisons can also be drawn with the multi-wavelet spectrum of Cohen and Walden (2010b). In that setting, multiple orthogonal wavelets were derived in Olhede and Walden (2002) from a time-frequency concentration problem, whereas here we have shown they can be generated by any arbitrary wavelet and smoothing window .
The representation in (S5) will be crucial for deriving the distributional results in Section 4, as well as offering computational speed-up. In particular, we will make use of the following proposition which shows the effective frequency response of the eigen-wavelet system is equal to the frequency response of the generating wavelet .
Proposition 2**.**
Let satisfy Assumption 1, satisfy Assumption 2, and for the corresponding non-negative definite kernel have eigenfunctions and eigenvalues . It holds that where and are the Fourier transforms of and , respectively.
In general, closed form expressions for the eigen-wavelets will be unobtainable and numerical procedures need to be used to find the solutions of Details for an implementation of the Nystrom method for doing just this can be found in Appendix 1.
3.2 Worked example
The Morlet wavelet can be seen as a complex sinusoid enveloped with a Gaussian window, and therefore the wavelet transform at scale and translation is the Fourier transform of the tapered process, localized at and evaluated at frequency . The temporally smoothed wavelet periodogram using a rectangular smoothing function
[TABLE]
emits kernel where
[TABLE]
and is the Gauss error function. The real part of this kernel is shown in Fig. 2a.
The function is itself a real valued non-negative kernel with its own set of real valued orthonormal eigenfunctions and associated eigenvalues . It follows that is an eigenfunction of with corresponding eigenvalue and hence is the eigen-wavelet system emitted by the Morlet wavelet with a rectangular smoothing function. The first five of these eigen-wavelets for are shown in Fig. 2b. This eigen-wavelet system follows the same spirit of the generating Morlet wavelet, with themselves being complex sinusoids enveloped by a taper. Thus, performing a continuous wavelet transform with one of the eigen-wavelets is equivalent to a time localized tapered Fourier transform evaluated at frequency , and the temporally smoothed wavelet periodogram as represented in (S5) is equivalent to a time localized multitaper spectral estimator. As means of a comparison, the kernel and associated eigen-wavelets of the Mexican hat wavelet using a rectangular smoothing function are shown in Fig. 2c and Fig. 2d, respectively.
4 Statistical Properties under Stationarity
4.1 Preliminaries
Let us define the th order cumulant of the differential process as
[TABLE]
The following mixing condition (Assumption 2.2 in Brillinger (1972)) is sufficient for the asymptotic results that follow. It ensures that dependency structure in the point process decays at a sufficient rate for central limit arguments to be invoked.
Assumption 3**.**
The -dimensional point process is strictly stationary, i.e. , and we set . Furthermore, all moments exist, the cumulant function satisfies
[TABLE]
for and , and
[TABLE]
for .
The distributional results differ slightly depending on whether a real valued wavelet (e.g. Mexican hat) or complex valued wavelet (e.g. Morlet) is chosen. We present the results for a complex valued wavelet and relegate the derivation for a real valued wavelet to the Supplementary Material.
Assumption 4**.**
Wavelet is complex valued, satisfies Assumption 1 and has approximating support for some finite . Furthermore, there exists a finite such that for all real , and it is orthogonal to its complex conjugate, i.e. .
The Morlet wavelet is an example of a complex valued wavelet that satisfies Assumption 4.
Assumption 5**.**
Smoothing function satisfies Assumption 2 and furthermore there exists a finite such that .
For wavelet with Fourier transform , its central frequency is defined as (Cohen and Walden, 2010a). The central frequency of is therefore and can be interpreted as the central analysing frequency of the wavelet at scale . For example, the Morlet wavelet has a central frequency of and the Mexican hat wavelet has a central frequency of (approx.) . It immediately follows from Proposition 2 that the central frequency of the eigen-wavelet system is .
4.2 Asymptotic distributional results
We allow the wavelet to scale with by defining , and appropriately normalize the scale and translation parameters as and , respectively. Under this rescaling (S2) becomes
[TABLE]
and the normalized temporally smoothed wavelet periodogram is defined as
[TABLE]
where . For any , the valid region of analysis is normalized to , an isosceles triangle with vertices , and whose interior contains all valid pairs of . Asymptotic results are presented for any fixed point as . In doing so, we define the frequency .
Proposition 3**.**
Let be a -dimensional stationary process with spectral density matrix . Let be a wavelet satisfying Assumption 1 and let be a smoothing function satisfying Assumption 2. For all and for all ,
[TABLE]
and as .
In the following theorem, denotes the (circular) -dimensional complex normal distribution with mean and covariance matrix .
Theorem 1**.**
Let be a -dimensional stationary process satisfying Assumption 3 with spectral density matrix , and let be a wavelet with central frequency satisfying Assumption 4. The continuous wavelet transform is asymptotically as , for all .
Let denote the -dimensional complex Wishart distribution with degrees of freedom and centrality matrix .
Theorem 2**.**
Let be a -dimensional stationary process satisfying Assumption 3 with spectral density matrix . Let be a wavelet with central frequency satisfying Assumption 4, let be a smoothing function satisfying Assumption 5, and for let be the eigenvalues of the kernel defined in (S4). The temporally smoothed wavelet periodogram is asymptotically as for all , where .
The following distributional result for the wavelet coherence is now immediate from Theorem 2 and Goodman (1963). We let denote the hypergeometric function with 2 and 1 parameters , and and scalar argument .
Corollary 1**.**
Under the conditions of Theorem 2, the temporally smoothed wavelet coherence between component processes and () asymptotically has density function
[TABLE]
where is shorthand for , the spectral coherence between and at frequency .
In the case of the rectangular smoothing function given in (S6), the effective degrees of freedom scale linearly with according to the following proposition.
Proposition 4**.**
Let satisfying Assumption 4, let be the rectangular smoothing function given in (S6), and for let corresponding kernel have ordered eigenvalues . Provided , then where .
5 Test for stationarity
Consider testing the null hypothesis that states is a stationary process, against the alternative hypothesis that states is not true. Under and from Proposition 3 it is true that is constant in . We therefore consider testing for stationarity at different scales.
Consider a smoothing parameter where and . From Proposition 4 we have degrees of freedom in Theorem 2 being . With a slight reworking of the normalized framework of Section 4, we set and appropriately normalize the scale and translation parameters as and . This again normalizes the valid region of analysis to for all .
For convenience, we perform a dyadic partition of the time-scales space, performing a test at each scale in the set . At scale , we partition time into non-overlapping equal size segments, each centred at time points and each the width of the approximate support of the wavelet at that scale.
Proposition 5**.**
Let satisfy Assumption 4 and satisfy Assumption 5. Then, for any and , are asymptotically independent.
Our test at scale therefore becomes a test of the null hypothesis
,
where is unspecified. We construct a likelihood ratio test based on the asymptotic distribution of stated in Theorem 1.
Proposition 6**.**
Let be independent samples where (). The likelihood ratio test statistic for the null hypothesis , with unspecified , is
[TABLE]
Furthermore, when is true, is asymptotically where .
In the following proposition, we let be a random variable that is equal in distribution to under the null hypothesis.
Proposition 7**.**
Let where and , and define the test statistic for as
[TABLE]
where and is as given in Theorem 2. Under , , where .
Theorem 3**.**
Let where and . Under , is asymptotically where . Specifically, where .
Thus, the rate of convergence to is optimal when .
Let denote the wavelet at the th scale and th translation (; ). Provided for all (this is only an approximation for the Morlet wavelet), the likelihood ratio test statistics will be independent of each other. Combining them for , namely , forms a test statistic for . It follows that is asymptotically , where .
6 Real data example
To give an example of the methods in practice, we analyse signalling regions within the lateral geniculate nucleus of a mouse. Specifically, we consider a set of neurons examined in Tang et al. (2015), where the authors are primarily concerned with analysing firing properties in order to understand how visual signals are encoded and transferred throughout the brain. To demonstrate the ability of our smoothed coherence estimator to operate with a single trial we consider only a single firing sequence from the paper. In this case, the mouse is shown a visual stimulus in the form of an liquid crystal display screen showing a sinusoidal monochromatic drifting grating with spatial stimulus at a frequency of 0.04 cycles per second and temporal flicker of 1Hz. The firing pattern is 7 seconds in length and represents data for cells 108 and 117 (these cells were picked for the example as they demonstrate relatively high firing rates). We use the Morlet wavelet with temporal smoothing parameter and approximating support . For completeness, this example was performed using exact kernel sampling, however, an approximate computation based on the Nystrom method based method (see Appendix 1) provides visually indistinguishable results and p-values.
The analysis of the experimental data is provided in Fig. 3. Tests for stationarity were performed at scale levels , with dyadic sampling points as marked by the crosses. With a p-value of 0.032, there is strong evidence that the process demonstrates non-stationary behaviour at the coarsest scale (, corresponding to a scale of 0.25s and frequency of 4Hz through the relationship for a Morlet wavelet that ). However, there is little evidence to support non-stationarity existing at the finer scales (). With the parameters specified, the 95th percentile of the distribution for zero coherence is 0.593 and is represented by the black contour line. This indicates that the non-stationarity at involves a change in the correlation between the two data streams half way through the experiment, with significant coherent signalling becoming present in the latter half. It is worth noting that whilst we can also see some peaks in the wavelet coherence at higher frequencies (scale 0.025s), the number of data points within the support of the kernel is limited. Thus, at this level, we should be careful to make inferences based on the asymptotic results.
Acknowledgement
This work is funded by EPSRC grant EP/P011535/1. The authors would like to thank Leigh Shlomovich, Department of Mathematics, Imperial College London for developing the Hawkes process simulation code, and Heather Battey, Dean Bodenham, and Andrew Walden, Department of Mathematics, Imperial College London for stimulating conversations.
Supplementary material
Supplementary Material Section 1 contains the proofs to propositions and theorems presented here. Supplementary Material Section 2 provides the results for real valued wavelets. Supplementary Material Section 3 provides verification of the results via simulation, as well as further supporting figures. It also contains a link to a MATLAB package for implementing the presented methods.
Appendix 1
Computing eigen-wavelets and eigenvalues
The Nystrom method (Kythe and Puri, 2001, Chapter 1) is an efficient method for computing the eigenfunctions of kernel for the multiwavelet representation described in Section 3. We can approximate the integral using the quadrature rule to solve the approximate eigen-problem for a discrete set of values for . The quadrature points ( large) are regularly spaced across and the weights are set to be . For simplicity, the Nystrom points are set to equal . In matrix form, the eigen-problem now becomes
[TABLE]
where is the matrix , , and . Solving the above gives approximations to the first eigenvalues and eigen-wavelets of kernel .
Should it be required, the Nystrom extension of the sampled vector is the function
[TABLE]
The sum in (S5) is over an infinite set of (eigen-)wavelet periodograms. However, in practice, the size of the eigenvalues drop away rapidly indicating that the kernel can be accurately reconstructed using only a small number of its eigen-wavelets, hence (S5) can be approximated with only a small number of terms. For example, in the case of the , the first nine eigenvalues contain 99.9% (3.s.f.) of the overall energy.
Correspondence
Correspondence should be addressed to
Edward Cohen
Department of Mathematics
Imperial College London
London SW7 2AZ.
Email: [email protected]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bartlett (1963) Bartlett, M. S. (1963). The spectral analysis of point processes. Journal of the Royal Statistical Society. Series B 25 (2), 264–296.
- 2Brillinger (1972) Brillinger, D. R. (1972). The spectral analysis of stationary interval functions. Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Theory of Statistics , 483–513.
- 3Brillinger (1996) Brillinger, D. R. (1996). Some uses of cumulants in wavelet analysis. Journal of Nonparametric Statistics 6 , 93–114.
- 4Carter (1987) Carter, G. (1987). Coherence and time delay estimation. Proceedings of the IEEE 75 (2), 236–255.
- 5Cohen and Walden (2010 a) Cohen, E. A. K. and A. T. Walden (2010 a). A statistical analysis of Morse wavelet coherence. IEEE Transactions on Signal Processing 58 (3 PART 1), 980–989.
- 6Cohen and Walden (2010 b) Cohen, E. A. K. and A. T. Walden (2010 b). A statistical study of temporally smoothed wavelet coherence. IEEE Transactions on Signal Processing 58 (6), 2964–2973.
- 7Goodman (1963) Goodman, N. (1963). Statistical analysis based on a certain multivariate complex Gaussian distribution (an introduction). Annals of Mathematical Statistics 34 (1), 152–177.
- 8Grinsted et al. (2004) Grinsted, A, J., C. Moore, and S. Jevrejeva (2004). Application of the cross wavelet transform and wavelet coherence to geophysical time series. Nonlinear Processes in Geophysics 11 (5), 561–566.
