Wavelet Spectra for Multivariate Point Processes

Edward A.K. Cohen; Alexander J. Gibberd

arXiv:1908.02634·stat.ME·November 4, 2020

Wavelet Spectra for Multivariate Point Processes

Edward A.K. Cohen, Alexander J. Gibberd

PDF

TL;DR

This paper introduces a wavelet-based statistical framework for analyzing multivariate point processes, enabling detection of non-stationarity and time-varying dependencies, with applications to neural spike train data.

Contribution

It develops a temporally smoothed wavelet periodogram with asymptotic distributional properties and applies it to test stationarity and analyze dependencies in multivariate point processes.

Findings

01

Wavelet periodogram is asymptotically Wishart distributed under stationarity.

02

Method effectively detects non-stationarity in neural spike trains.

03

Wavelet coherence measures inter-process correlation over time and scale.

Abstract

Wavelets provide the flexibility to analyse stochastic processes at different scales. Here, we apply them to multivariate point processes as a means of detecting and analysing unknown non-stationarity, both within and across data streams. To provide statistical tractability, a temporally smoothed wavelet periodogram is developed and shown to be equivalent to a multi-wavelet periodogram. Under a stationary assumption, the distribution of the temporally smoothed wavelet periodogram is demonstrated to be asymptotically Wishart, with the centrality matrix and degrees of freedom readily computable from the multi-wavelet formulation. Distributional results extend to wavelet coherence; a time-scale measure of inter-process correlation. This statistical framework is used to construct a test for stationarity in multivariate point-processes. The methodology is applied to neural spike train data,…

Equations50

\Gamma(s,t)=E\{{\rm d}N(s){\rm d}N^{\rm T}(t)\}\big{/}({\rm d}t\ {\rm d}s)-\lambda(s)\lambda^{\rm T}(t)\;.

\Gamma(s,t)=E\{{\rm d}N(s){\rm d}N^{\rm T}(t)\}\big{/}({\rm d}t\ {\rm d}s)-\lambda(s)\lambda^{\rm T}(t)\;.

S (f) = diag (λ) + \int_{- \infty}^{\infty} Γ (τ) e^{- i 2 π f τ} d τ, - \infty < f < \infty.

S (f) = diag (λ) + \int_{- \infty}^{\infty} Γ (τ) e^{- i 2 π f τ} d τ, - \infty < f < \infty.

ρ_{ij}^{2} (f) = \frac{∣ S _{ij} ( f ) ∣ ^{2}}{S _{ii} ( f ) S _{j j} ( f )} .

ρ_{ij}^{2} (f) = \frac{∣ S _{ij} ( f ) ∣ ^{2}}{S _{ii} ( f ) S _{j j} ( f )} .

w (a, b) = a^{- 1/2} \int_{0}^{T} ψ^{*} {(t - b) / a} d N (t),

w (a, b) = a^{- 1/2} \int_{0}^{T} ψ^{*} {(t - b) / a} d N (t),

γ_{ij}^{2} (a, b) = \frac{∣ Ω _{ij} ( a , b ) ∣ ^{2}}{Ω _{ii} ( a , b ) Ω _{j j} ( a , b )},

γ_{ij}^{2} (a, b) = \frac{∣ Ω _{ij} ( a , b ) ∣ ^{2}}{Ω _{ii} ( a , b ) Ω _{j j} ( a , b )},

Ω (a, b)

Ω (a, b)

K (s, t) = \int_{- \infty}^{\infty} h_{κ} (u) ψ (s - u) ψ^{*} (t - u) d u,

K (s, t) = \int_{- \infty}^{\infty} h_{κ} (u) ψ (s - u) ψ^{*} (t - u) d u,

Ω (a, b) \equiv \int_{0}^{T} \int_{0}^{T} K_{a, b} (s, t) d N (t) d N^{T} (s),

Ω (a, b) \equiv \int_{0}^{T} \int_{0}^{T} K_{a, b} (s, t) d N (t) d N^{T} (s),

Ω_{ij} (a, b) = k = 1 \sum N_{i} (T) k^{'} = 1 \sum N_{j} (T) K_{a, b} (s_{i, k}, s_{j, k^{'}}) .

Ω_{ij} (a, b) = k = 1 \sum N_{i} (T) k^{'} = 1 \sum N_{j} (T) K_{a, b} (s_{i, k}, s_{j, k^{'}}) .

\bar{\psi}(t)=\left\{\begin{array}[]{cc}\psi(t)&|t|<\alpha/2\\ 0&\text{otherwise.}\end{array}\right.

\bar{\psi}(t)=\left\{\begin{array}[]{cc}\psi(t)&|t|<\alpha/2\\ 0&\text{otherwise.}\end{array}\right.

\int_{- \infty}^{\infty} K_{a, b} (s, t) φ_{l} {(t - b) / a} d t = η_{l} φ_{l} {(s - b) / a} .

\int_{- \infty}^{\infty} K_{a, b} (s, t) φ_{l} {(t - b) / a} d t = η_{l} φ_{l} {(s - b) / a} .

Ω (a, b) = l = 0 \sum \infty η_{l} v_{l} (a, b) v_{l}^{H} (a, b),

Ω (a, b) = l = 0 \sum \infty η_{l} v_{l} (a, b) v_{l}^{H} (a, b),

h(t)=\left\{\begin{array}[]{lll}1&&-1/2<t<1/2\\ 0&&\mbox{otherwise,}\end{array}\right.

h(t)=\left\{\begin{array}[]{lll}1&&-1/2<t<1/2\\ 0&&\mbox{otherwise,}\end{array}\right.

k (s, t) = (2 κ)^{- 1} e^{- (t - s)^{2}} [erf {κ - (t + s)} + erf {κ + (t + s)}]

k (s, t) = (2 κ)^{- 1} e^{- (t - s)^{2}} [erf {κ - (t + s)} + erf {κ + (t + s)}]

q_{i_{1}, ..., i_{k}} (t_{1}, ..., t_{k}) d t_{1} \dots d t_{k} \equiv cum {d N_{i_{1}} (u_{1}), ..., d N_{i_{k}} (u_{k})} .

q_{i_{1}, ..., i_{k}} (t_{1}, ..., t_{k}) d t_{1} \dots d t_{k} \equiv cum {d N_{i_{1}} (u_{1}), ..., d N_{i_{k}} (u_{k})} .

\int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} ∣ r_{i_{1}, ..., i_{k}} (u_{1}, ..., u_{k - 1}) ∣ d u_{1} \dots d u_{k - 1} < \infty,

\int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} ∣ r_{i_{1}, ..., i_{k}} (u_{1}, ..., u_{k - 1}) ∣ d u_{1} \dots d u_{k - 1} < \infty,

\int_{- \infty}^{\infty} ∣ u ∣∣ r_{i_{1}, i_{2}} (u) ∣ d u < \infty,

\int_{- \infty}^{\infty} ∣ u ∣∣ r_{i_{1}, i_{2}} (u) ∣ d u < \infty,

w (a, b) = w^{(T)} (\tilde{a}, \tilde{b}) = \tilde{a}^{- 1/2} \int_{0}^{T} ψ^{* (T)} {(t - \tilde{b} T) / \tilde{a}} d N (t),

w (a, b) = w^{(T)} (\tilde{a}, \tilde{b}) = \tilde{a}^{- 1/2} \int_{0}^{T} ψ^{* (T)} {(t - \tilde{b} T) / \tilde{a}} d N (t),

Ω^{(T)} (\tilde{a}, \tilde{b}) = \int_{- \infty}^{\infty} h_{κ \tilde{a}}^{(T)} (u) W^{(T)} (\tilde{a}, u) d u,

Ω^{(T)} (\tilde{a}, \tilde{b}) = \int_{- \infty}^{\infty} h_{κ \tilde{a}}^{(T)} (u) W^{(T)} (\tilde{a}, u) d u,

E {Ω^{(T)} (\tilde{a}, \tilde{b})} = E {W^{(T)} (\tilde{a}, \tilde{b})} = \int_{- \infty}^{\infty} \tilde{a} ∣ Ψ^{(T)} (\tilde{a} f) ∣^{2} S (f) d f

E {Ω^{(T)} (\tilde{a}, \tilde{b})} = E {W^{(T)} (\tilde{a}, \tilde{b})} = \int_{- \infty}^{\infty} \tilde{a} ∣ Ψ^{(T)} (\tilde{a} f) ∣^{2} S (f) d f

g_{γ^{2}} (x) = (n - 1) (1 - ρ^{2})^{n} (1 - x)^{n - 2}_{2} F_{1} (n, n; 1; ρ^{2} x),

g_{γ^{2}} (x) = (n - 1) (1 - ρ^{2})^{n} (1 - x)^{n - 2}_{2} F_{1} (n, n; 1; ρ^{2} x),

\tilde{Λ} = K^{p K n} \frac{\prod _{i = 1}^{K} det ( B _{i} ) ^{n}}{det ( \sum _{i = 1}^{K} B _{i} ) ^{K n}} .

\tilde{Λ} = K^{p K n} \frac{\prod _{i = 1}^{K} det ( B _{i} ) ^{n}}{det ( \sum _{i = 1}^{K} B _{i} ) ^{K n}} .

Λ_{j} = K^{p K n} \frac{\prod _{i = 1}^{K} det { Ω ^{(T)} ( a ~ _{j} , b ~ _{i} ) } ^{n}}{det { \sum _{i = 1}^{K} Ω ^{(T)} ( a ~ _{j} , b ~ _{i} ) } ^{K n}},

Λ_{j} = K^{p K n} \frac{\prod _{i = 1}^{K} det { Ω ^{(T)} ( a ~ _{j} , b ~ _{i} ) } ^{n}}{det { \sum _{i = 1}^{K} Ω ^{(T)} ( a ~ _{j} , b ~ _{i} ) } ^{K n}},

K W \tilde{φ} = \tilde{η} \tilde{φ},

K W \tilde{φ} = \tilde{η} \tilde{φ},

\tilde{φ}_{l} (x) = \tilde{λ}_{l} j = 1 \sum n w_{j} K (x, s_{j}) \tilde{φ}_{l} (s_{j}) .

\tilde{φ}_{l} (x) = \tilde{λ}_{l} j = 1 \sum n w_{j} K (x, s_{j}) \tilde{φ}_{l} (s_{j}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Wavelet Spectra for Multivariate Point Processes

Edward A. K. Cohen

Department of Mathematics, Imperial College London, South Kensington Campus, London SW7 2AZ, U.K.

Alexander J. Gibberd

Department of Mathematics and Statistics, Lancaster University, Bailrigg, Lancaster LA1 4YF, U.K.

Abstract

Wavelets provide the flexibility to analyse stochastic processes at different scales. Here, we apply them to multivariate point processes as a means of detecting and analysing unknown non-stationarity, both within and across data streams. To provide statistical tractability, a temporally smoothed wavelet periodogram is developed and shown to be equivalent to a multi-wavelet periodogram. Under a stationary assumption, the distribution of the temporally smoothed wavelet periodogram is demonstrated to be asymptotically Wishart, with the centrality matrix and degrees of freedom readily computable from the multi-wavelet formulation. Distributional results extend to wavelet coherence; a time-scale measure of inter-process correlation. This statistical framework is used to construct a test for stationarity in multivariate point-processes. The methodology is applied to neural spike train data, where it is shown to detect and characterise time-varying dependency patterns.

1 Introduction

We adopt the construction of Hawkes (1971) which presents a $p$ -dimensional multivariate point process ( $p\geq 1$ ) as a counting vector $N(t)\equiv\{N_{1}(t),\ldots,N_{p}(t)\}^{\rm T}$ where the random element $N_{i}(t)$ $(i=1,...,p)$ states the number of events of type $i$ over the interval $(0,t]$ . Its first order properties are characterized by its rate $\lambda(t)\in\mathbb{R}^{p}$ , defined as $\lambda(t)\equiv E\{{\rm d}N(t)\}/{\rm d}t$ where ${\rm d}N(t)=N(t+{\rm d}t)-N(t)$ , and its second order properties at times $s$ and $t$ characterized by its covariance density matrix

[TABLE]

Process $N(t)$ is second-order stationary (henceforth referred to simply as “stationary”) if $\lambda(t)$ is constant for all $t$ and $\Gamma(t,s)$ depends only on $\tau=s-t$ . In this setting we will denote the covariance density matrix $\Gamma(\tau)$ .

The spectral domain provides a rich environment for representing this second order structure and is based on the fact that stationary stochastic processes can be considered a composite of subprocesses operating at different frequencies. The spectral density matrix of a stationary point process is the Fourier transform of its covariance density matrix (Bartlett, 1963), namely

[TABLE]

A fundamental summary of the second order relationship between a pair of component processes, $N_{i}(t)$ and $N_{j}(t)$ say, is their coherence defined as

[TABLE]

This provides a normalized measure on $[0,1]$ of the correlation structure between the processes in the frequency domain. For time series data, it has been used extensively in several disciplines, including climatology, oceanography and medicine. For event data, it has been an important tool in neuroscience for the analysis of neuron spike train data.

Estimation of the coherence can be achieved by substituting smoothed spectral estimators into (S1). Failure to smooth (i.e. simply using the periodogram) will result in a coherence estimate of one for all frequencies, irrespective of whether correlation exists between the pair of processes or not. Tractability of the coherence estimator’s distribution is crucial for principled statistical testing and dependent on the smoothing procedure used (Walden, 2000).

Often, stochastic processes do not conform to the assumptions of stationarity. This might occur through simple first-order trends in the underlying data generating process, or more typically, complex changes in the second (or higher) order structure of the process. This renders classical Fourier methods obsolete and demands more flexible non-parametric methodology, with wavelets forming a natural basis with which to analyse non-stationary behaviour at different scales.

For a wavelet $\psi(t)$ , the continuous wavelet transform at scale $a>0$ and translation (or time) $b\in\mathbb{R}$ of $N(t)$ , observed on the interval $(0,T]$ , is defined by Brillinger (1996) as

[TABLE]

where ∗ denotes the complex conjugate. The $i$ th element of this stochastic integral is computed as $w_{i}(a,b)=\sum_{k=1}^{N_{i}(T)}\psi^{*}_{a,b}(s_{i,k})$ , where $s_{i,1},...,s_{i,N_{i}(T)}$ are the ordered event times of $N_{i}(t)$ and $\psi_{a,b}(t)\equiv a^{-1/2}\psi\{(t-b)/a\}$ . Thus, working with the continuous time process is possible if the finite set of event times are known. The wavelet periodogram is subsequently defined as $W(a,b)=w(a,b)w^{\rm H}(a,b),\;$ where H denotes the complex conjugate transpose.

As is the case with the Fourier periodogram, smoothing is required for two reasons. Firstly to control variance, and secondly to give meaningful values of the wavelet coherence estimator. Wavelet coherence is an analogue of coherence which provides a normalized measure on $[0,1]$ of the correlation between a pair of processes in time-scale space. It is defined as

[TABLE]

where $\Omega$ is a smoothed version of $W$ . In the time series setting, wavelet coherence has been extensively applied in a wide range of disciplines (e.g. Torrence and Webster, 1999; Grinsted et al., 2004). Understanding the distributional properties of these smoothed coherence estimators is vital for rigorous statistical analysis and testing. In the Gaussian discrete-time setting the asymptotic distribution of coherence is widely studied (Cohen and Walden, 2010a, b), however, the point-process case has received little attention.

There are a wide range of ways in which non-stationarity can occur. Hence, rather than assume a specific model of non-stationarity, we here propose to study the properties of the temporally smoothed wavelet periodogram and coherence for stationary point-processes, thus providing a framework in which we deploy methods for exploratory data analysis and formal tests for detecting non-stationary.

2 Temporally smoothed wavelet periodogram

2.1 Formulation

Assumption 1.

Wavelet $\psi(t)$ is a real or complex valued continuous function that satisfies (i) $\int_{-\infty}^{\infty}\psi(t){\rm d}t=0$ , (ii) $\|\psi\|=1$ , and (iii) the admissibility condition $\int_{-\infty}^{\infty}f^{-1}|\Psi(f)|^{2}{\rm d}f<\infty$ , where $\Psi$ is the Fourier transform of $\psi(t)$ .

Assumption 2.

Smoothing function $h(t)$ is a non-negative, symmetric function supported and continuous on $(-1/2,1/2)$ , and normalized such that $\int_{-\infty}^{\infty}h(t){\rm d}t=1$ .

Let $\psi(t)$ and $h(t)$ satisfy Assumptions 1 and 2, respectively. We define the temporally smoothed wavelet periodogram as

[TABLE]

where $h_{\xi}\equiv\xi^{-1}h(t/\xi)$ with $\xi>0$ controlling the level of smoothing. This is a wavelet analogue to Welch’s weighted overlapping sample averaging spectral estimator for stationary time series (Welch, 1967; Carter, 1987). It will prove convenient for the level of smoothing to scale with $a$ , and we therefore let $\xi=\kappa a$ , with $\kappa>0$ .

For a particular choice of $\kappa$ , and defining the Hermitian kernel function (at scale $a=1$ ) as

[TABLE]

the temporally smoothed wavelet periodogram in (S3) can be expressed as

[TABLE]

where $K_{a,b}(s,t)=a^{-1}K\{(s-b)/a,(t-b)/a\}$ . The $(i,j)$ th element of $\Omega(a,b)$ is computed as

[TABLE]

Given a choice for $h(t)$ and $\kappa$ , the form of $K(s,t)$ will depend on $\psi(t)$ . Throughout this paper, we use the examples of the complex valued Morlet wavelet and the real valued Mexican hat wavelet. These are examples of wavelets for which $K(s,t)$ is analytically tractable.

2.2 Practical implementation

For continuous time wavelet analysis, the wavelets themselves are often non-compactly supported. However, the region of significant support is typically well localized and a close approximation to $w(a,b)$ can be obtained through utilising the approximating wavelet

[TABLE]

For example, the Morlet wavelet $\psi(t)=\pi^{-1/4}{\rm e}^{-t^{2}/2}{\rm e}^{{\rm i}2\pi t}$ shown in Fig. 1 has infinite support but can be well approximated by $\bar{\psi}(t)$ for $\alpha=8$ . In practice, to speed up computation, it can make sense to use the approximating wavelet as only a subset of the data is required to compute the wavelet transform. From herein, to simplify notation, will we use $\psi(t)$ to represent both the original and approximating wavelet, assuming that $\alpha$ is chosen appropriately.

In a finite data setting we are restricted to regions of the time-scale space in which we can fairly evaluate (S2) without the consequences of edge effects at either ends of the data. These issues are compounded when smoothing across time, for a smoothing window $h_{\kappa}(t)$ with ${\rm supp}(h)=(-\kappa/2,\kappa/2)$ , the effective size of support for $K(s,t)$ is $\alpha+\kappa$ , therefore we restrict ourselves to values of $a$ and $b$ for which ${\rm supp}(K_{a,b})=(b-a(\alpha+\kappa)/2,b+a(\alpha+\kappa)/2)\times(b-a(\alpha+\kappa)/2,b+a(\alpha+\kappa)/2)\subseteq(0,T]\times(0,T]$ . This defines an isosceles triangle $\mathcal{T}_{\alpha,\kappa,T}\subset\mathbb{R}^{2}$ with vertices $(0,0)$ , $(0,T)$ and $(a_{\rm max}(T),T/2)$ , where $a_{\rm max}(T)=T/(\alpha+\kappa)$ . This is an adaptation to the cone of influence (Mallat and Peyré, 2008, p. 215) that also mitigates for smoothing distances. In practice, a positive minimum value of $a$ should be imposed to ensure a reasonable amount of event data exists in the smoothing range.

3 Multi-wavelet representation

3.1 Formulation

Given $K(s,t)$ is continuous and non-negative definite by construction, associated with kernel $K(s,t)$ is the Hermitian linear operator $T_{K}$ defined as $[T_{K}f](s)=\int_{-\infty}^{\infty}K(s,t)f(t){\rm d}t.$ It follows from Mercer’s Theorem (Mercer, 1909) that $K(s,t)=\sum_{l=0}^{\infty}\eta_{l}\varphi_{l}(s)\varphi^{*}_{l}(t)$ where $\{\varphi_{l}(t);\ l=0,1,...\}$ are the orthonormal eigenfunctions of $T_{K}$ with non-zero eigenvalues $\{\eta_{l};\ l=0,1,...\}$ ordered in decreasing size. Noting that ${\rm tr}(T_{K}):=\int_{-\infty}^{\infty}K(t,t){\rm d}t=1$ , it follows that $\sum_{l=0}^{\infty}\eta_{l}=1$ . From here on, we refer to $\{\varphi_{l}(t);\ l=0,1,...\}$ as the eigenfunctions of $K(s,t)$ . The following proposition shows that these orthonormal eigenfunctions are themselves wavelets.

Proposition 1.

Let $\psi(t)$ satisfy Assumption 1, $h(t)$ satisfy Assumption 2, and for $\kappa>0$ the corresponding non-negative definite kernel $K(s,t)$ have eigenfunctions $\{\varphi_{l}(t);\ l=0,1,...\}$ . Every eigenfunction $\varphi_{l}(t)$ with a non-zero eigenvalue is a wavelet that satisfies the conditions of Assumption 1.

We adopt the term eigen-wavelets for the functions $\{\varphi_{l}(t);\ l=0,1,...\}$ .

Turning our attention back to the temporally smoothed wavelet periodogram, it is straightforward to show

[TABLE]

Thus, the scaled and shifted versions $\varphi_{l,a,b}(t)=a^{-1/2}\varphi_{l}\{(t-b)/a\}$ , $l=0,1,\ldots$ of the eigen-wavelets are themselves the eigenfunctions of $K_{a,b}$ , and again from Mercer’s theorem $K_{a,b}(s,t)=\sum_{l=0}^{\infty}\eta_{l}\varphi_{l,a,b}(s)\varphi^{*}_{l,a,b}(t).$ The temporally smoothed wavelet periodogram can thus be represented as

[TABLE]

where $v_{l}(a,b)=\int_{0}^{T}\varphi_{l,a,b}(t){\rm d}N(t)$ is the continuous wavelet transform of $N(t)$ at scale $a$ and translation $b$ with respect to eigen-wavelet $\varphi_{l}(t)$ . Therefore the temporally smoothed wavelet periodogram is equivalent to the weighted sum of wavelet spectra arising from the orthonormal eigen-wavelet system. This is analogous to multitapering (Thomson, 1982) and comparisons can also be drawn with the multi-wavelet spectrum of Cohen and Walden (2010b). In that setting, multiple orthogonal wavelets were derived in Olhede and Walden (2002) from a time-frequency concentration problem, whereas here we have shown they can be generated by any arbitrary wavelet $\psi(t)$ and smoothing window $h(t)$ .

The representation in (S5) will be crucial for deriving the distributional results in Section 4, as well as offering computational speed-up. In particular, we will make use of the following proposition which shows the effective frequency response of the eigen-wavelet system is equal to the frequency response of the generating wavelet $\psi(t)$ .

Proposition 2.

Let $\psi(t)$ satisfy Assumption 1, $h(t)$ satisfy Assumption 2, and for $\kappa>0$ the corresponding non-negative definite kernel $K(s,t)$ have eigenfunctions $\{\varphi_{l}(t);\ l=0,1,...\}$ and eigenvalues $\{\eta_{l};\ l=0,1,...\}$ . It holds that $\sum_{l}\eta_{l}|\Phi_{l}(f)|^{2}=|\Psi(f)|^{2}$ where $\Phi_{l}$ and $\Psi(f)$ are the Fourier transforms of $\varphi_{l}(t)$ and $\psi(t)$ , respectively.

In general, closed form expressions for the eigen-wavelets $\{\varphi_{l}(t);\ l=0,1,\ldots\}$ will be unobtainable and numerical procedures need to be used to find the solutions of $\int_{-\infty}^{\infty}K(s,t)\varphi(t){\rm d}t=\eta\varphi(s).$ Details for an implementation of the Nystrom method for doing just this can be found in Appendix 1.

3.2 Worked example

The Morlet wavelet can be seen as a complex sinusoid enveloped with a Gaussian window, and therefore the wavelet transform at scale $a>0$ and translation $b$ is the Fourier transform of the tapered process, localized at $b$ and evaluated at frequency $1/a$ . The temporally smoothed wavelet periodogram using a rectangular smoothing function

[TABLE]

emits kernel $K(s,t)=k(s,t){\rm e}^{-{\rm i}2\pi(t-s)},$ where

[TABLE]

and $\mathrm{erf}(x)=\pi^{-1/2}\int_{-x}^{x}\exp(-t^{2}){\rm d}t$ is the Gauss error function. The real part of this kernel is shown in Fig. 2a.

The function $k(s,t)$ is itself a real valued non-negative kernel with its own set of real valued orthonormal eigenfunctions $\{\phi_{l}(t);\ l=0,1,...\}$ and associated eigenvalues $\{\eta_{l};\ l=0,1,...\}$ . It follows that $\varphi_{l}(t)={\rm e}^{{\rm i}2\pi t}\phi_{l}(t)$ is an eigenfunction of $K(s,t)$ with corresponding eigenvalue $\eta_{l}$ and hence $\{\varphi_{l}(t)={\rm e}^{{\rm i}2\pi t}\phi_{l}(t);l=0,1,...\}$ is the eigen-wavelet system emitted by the Morlet wavelet with a rectangular smoothing function. The first five of these eigen-wavelets for $\kappa=10$ are shown in Fig. 2b. This eigen-wavelet system follows the same spirit of the generating Morlet wavelet, with themselves being complex sinusoids enveloped by a taper. Thus, performing a continuous wavelet transform with one of the eigen-wavelets is equivalent to a time localized tapered Fourier transform evaluated at frequency $1/a$ , and the temporally smoothed wavelet periodogram as represented in (S5) is equivalent to a time localized multitaper spectral estimator. As means of a comparison, the kernel and associated eigen-wavelets of the Mexican hat wavelet using a rectangular smoothing function are shown in Fig. 2c and Fig. 2d, respectively.

4 Statistical Properties under Stationarity

4.1 Preliminaries

Let us define the $k$ th order cumulant $q$ of the differential process as

[TABLE]

The following mixing condition (Assumption 2.2 in Brillinger (1972)) is sufficient for the asymptotic results that follow. It ensures that dependency structure in the point process decays at a sufficient rate for central limit arguments to be invoked.

Assumption 3.

The $p$ -dimensional point process $N(t)$ is strictly stationary, i.e. $q_{i_{1},...,i_{k}}(t_{1}+t,...,t_{k}+t)=q_{i_{1},...,i_{k}}(t_{1},...,t_{k})$ , and we set $r_{i_{1},...,i_{k}}(u_{1},...,u_{k-1})=q_{i_{1},...,i_{k}}(u_{1},...,u_{k-1},0)$ . Furthermore, all moments exist, the cumulant function satisfies

[TABLE]

for $i_{1},...,i_{k}=1,...,p$ and $k=2,3,...$ , and

[TABLE]

for $i_{1},i_{2}=1,...,p$ .

The distributional results differ slightly depending on whether a real valued wavelet (e.g. Mexican hat) or complex valued wavelet (e.g. Morlet) is chosen. We present the results for a complex valued wavelet and relegate the derivation for a real valued wavelet to the Supplementary Material.

Assumption 4.

Wavelet $\psi(t)$ is complex valued, satisfies Assumption 1 and has approximating support $(-\alpha/2,\alpha/2)$ for some finite $\alpha>0$ . Furthermore, there exists a finite $C$ such that $\int|\psi(t+u)-\psi(t)|{\rm d}t<C|u|$ for all real $u$ , and it is orthogonal to its complex conjugate, i.e. $\int_{-\infty}^{\infty}\psi(t)\psi^{*}(t){\rm d}t=0$ .

The Morlet wavelet is an example of a complex valued wavelet that satisfies Assumption 4.

Assumption 5.

Smoothing function $h(t)$ satisfies Assumption 2 and furthermore there exists a finite $C^{\prime}$ such that $\int|h(t+u)-h(t)|{\rm d}t<C^{\prime}|u|$ .

For wavelet $\psi(t)$ with Fourier transform $\Psi(f)$ , its central frequency is defined as $f_{0}:=\int_{0}^{\infty}f|\Psi(f)|^{2}{\rm d}f$ (Cohen and Walden, 2010a). The central frequency of $\psi_{a,b}$ is therefore $f_{0}/a$ and can be interpreted as the central analysing frequency of the wavelet at scale $a$ . For example, the Morlet wavelet has a central frequency of $f_{0}=1$ and the Mexican hat wavelet has a central frequency of (approx.) $f_{0}=0.21$ . It immediately follows from Proposition 2 that the central frequency of the eigen-wavelet system is $f_{0}$ .

4.2 Asymptotic distributional results

We allow the wavelet to scale with $T$ by defining $\psi^{\scriptscriptstyle{(T)}}(t)=\{(\alpha+\kappa)/T\}^{-1/2}\psi\{t(\alpha+\kappa)/T\}$ , and appropriately normalize the scale and translation parameters as $\tilde{a}=a(\alpha+\kappa)/T$ and $\tilde{b}=b/T$ , respectively. Under this rescaling (S2) becomes

[TABLE]

and the normalized temporally smoothed wavelet periodogram is defined as

[TABLE]

where $h^{\scriptscriptstyle{(T)}}(t)=T^{-1}h(t/T)$ . For any $T$ , the valid region of analysis is normalized to $\tilde{\mathcal{T}}_{\alpha,\kappa}$ , an isosceles triangle with vertices $(0,0)$ , $(0,1)$ and $(1,1/2)$ whose interior contains all valid pairs of $(\tilde{a},\tilde{b})$ . Asymptotic results are presented for any fixed point $(\tilde{a},\tilde{b})\in\tilde{\mathcal{T}}_{\alpha,\kappa}$ as $T\rightarrow\infty$ . In doing so, we define the frequency $f_{\tilde{a}}=f_{0}/(\tilde{a}T)=f_{0}/a$ .

Proposition 3.

Let $N(t)$ be a $p$ -dimensional stationary process with spectral density matrix $S(f)$ . Let $\psi(t)$ be a wavelet satisfying Assumption 1 and let $h(t)$ be a smoothing function satisfying Assumption 2. For all $\kappa>0$ and for all $(\tilde{a},\tilde{b})\in\tilde{T}_{\alpha,\kappa}$ ,

[TABLE]

and $E\{\Omega^{\scriptscriptstyle{(T)}}(\tilde{a},\tilde{b})\}=S(f_{\tilde{a}})+O(T^{-2})$ as $T\rightarrow\infty$ .

In the following theorem, $\mathcal{N}^{\mathcal{C}}_{p}(\mu,\Sigma)$ denotes the (circular) $p$ -dimensional complex normal distribution with mean $\mu$ and covariance matrix $\Sigma$ .

Theorem 1.

Let $N(t)$ be a $p$ -dimensional stationary process satisfying Assumption 3 with spectral density matrix $S(f)$ , and let $\psi(t)$ be a wavelet with central frequency $f_{0}$ satisfying Assumption 4. The continuous wavelet transform $w^{\scriptscriptstyle{(T)}}(\tilde{a},\tilde{b})$ is asymptotically $\mathcal{N}^{\mathcal{C}}_{p}\{0,S(f_{\tilde{a}})\}$ as $T\rightarrow\infty$ , for all $(\tilde{a},\tilde{b})\in\tilde{\mathcal{T}}_{\alpha,\kappa}$ .

Let $\mathcal{W}^{\mathcal{C}}_{p}(n,\Sigma)$ denote the $p$ -dimensional complex Wishart distribution with $n$ degrees of freedom and centrality matrix $\Sigma$ .

Theorem 2.

Let $N(t)$ be a $p$ -dimensional stationary process satisfying Assumption 3 with spectral density matrix $S(f)$ . Let $\psi(t)$ be a wavelet with central frequency $f_{0}$ satisfying Assumption 4, let $h(t)$ be a smoothing function satisfying Assumption 5, and for $\kappa>0$ let $\{\eta_{l};\ l=0,1,\}$ be the eigenvalues of the kernel $K(s,t)$ defined in (S4). The temporally smoothed wavelet periodogram $\Omega^{\scriptscriptstyle{(T)}}(\tilde{a},\tilde{b})$ is asymptotically $(1/n)\mathcal{W}^{\mathcal{C}}_{p}\{n,S(f_{\tilde{a}})\}$ as $T\rightarrow\infty$ for all $(\tilde{a},\tilde{b})\in\tilde{\mathcal{T}}_{\alpha,\kappa}$ , where $n=1/\left(\sum_{l=1}^{\infty}\eta_{l}^{2}\right)$ .

The following distributional result for the wavelet coherence is now immediate from Theorem 2 and Goodman (1963). We let ${}_{2}F_{1}(\alpha_{1},\alpha_{2};\beta_{1};z)$ denote the hypergeometric function with 2 and 1 parameters $\alpha_{1}$ , $\alpha_{2}$ and $\beta_{1}$ and scalar argument $z$ .

Corollary 1.

Under the conditions of Theorem 2, the temporally smoothed wavelet coherence $\gamma_{ij}^{2}(\tilde{a},\tilde{b})=|\Omega^{\scriptscriptstyle{(T)}}_{ij}(\tilde{a},\tilde{b})|^{2}/\{\Omega^{\scriptscriptstyle{(T)}}_{ii}(\tilde{a},\tilde{b})\Omega^{\scriptscriptstyle{(T)}}_{jj}(\tilde{a},\tilde{b})\}$ between component processes $N_{i}(t)$ and $N_{j}(t)$ ( $i\neq j$ ) asymptotically has density function

[TABLE]

where $\rho^{2}$ is shorthand for $\rho_{ij}^{2}(f_{\tilde{a}})$ , the spectral coherence between $N_{i}(t)$ and $N_{j}(t)$ at frequency $f_{\tilde{a}}$ .

In the case of the rectangular smoothing function given in (S6), the effective degrees of freedom $n$ scale linearly with $\kappa$ according to the following proposition.

Proposition 4.

Let $\psi(t)$ satisfying Assumption 4, let $h(t)$ be the rectangular smoothing function given in (S6), and for $\kappa>0$ let corresponding kernel $K(s,t)$ have ordered eigenvalues $\{\eta_{l};\ l=0,1,...\}$ . Provided $\kappa>\alpha$ , then $n=(\sum_{l=0}^{\infty}\eta_{l}^{2})^{-1}=\kappa\{\int^{\infty}_{-\infty}|\mathcal{P}(x)|^{2}{\rm d}x\}^{-1},$ where $\mathcal{P}(x)\equiv\int_{-\infty}^{\infty}\psi(t)\psi^{*}(t-x){\rm d}t$ .

5 Test for stationarity

Consider testing the null hypothesis $H_{0}$ that states $N(t)$ is a stationary process, against the alternative hypothesis $H_{A}$ that states $H_{0}$ is not true. Under $H_{0}$ and from Proposition 3 it is true that $E\{\Omega(a,b)\}$ is constant in $b$ . We therefore consider testing for stationarity at different scales.

Consider a smoothing parameter $\tilde{\kappa}=\kappa T^{c}$ where $\kappa>0$ and $0<c<1/2$ . From Proposition 4 we have degrees of freedom $n$ in Theorem 2 being $O(T^{c})$ . With a slight reworking of the normalized framework of Section 4, we set $\psi^{\scriptscriptstyle{(T)}}(t)=\{(\alpha+\tilde{\kappa})/T\}^{-1/2}\psi\{t(\alpha+\tilde{\kappa})/T\}$ and appropriately normalize the scale and translation parameters as $\tilde{a}=a(\alpha+\tilde{\kappa})/T$ and $\tilde{b}=b/T$ . This again normalizes the valid region of analysis to $\tilde{\mathcal{T}}_{\alpha,\kappa}$ for all $T$ .

For convenience, we perform a dyadic partition of the time-scales space, performing a test at each scale in the set $\{\tilde{a}_{j}=2^{-j};j=1,...,J\}$ . At scale $\tilde{a}_{j}$ , we partition time into $2^{j}$ non-overlapping equal size segments, each centred at time points $\{\tilde{b}_{j,k}=(2k-1)/(2^{j+1});k=1,...,2^{j}\}$ and each the width of the approximate support of the wavelet at that scale.

Proposition 5.

Let $\psi(t)$ satisfy Assumption 4 and $h(t)$ satisfy Assumption 5. Then, for any $\kappa>0$ and $j>0$ , $\Omega^{\scriptscriptstyle{(T)}}(\tilde{a}_{j},\tilde{b}_{j,1}),...,\Omega^{\scriptscriptstyle{(T)}}(\tilde{a}_{j},\tilde{b}_{j,2^{j}})$ are asymptotically independent.

Our test at scale $\tilde{a}_{j}$ therefore becomes a test of the null hypothesis

$H_{j}:$ $E\{\Omega^{\scriptscriptstyle{(T)}}(\tilde{a}_{j},\tilde{b}_{1})\}=...=E\{\Omega^{\scriptscriptstyle{(T)}}(\tilde{a}_{j},\tilde{b}_{2^{j}})\}=\Omega_{j}$ ,

where $\Omega_{j}$ is unspecified. We construct a likelihood ratio test based on the asymptotic distribution of $\Omega^{\scriptscriptstyle{(T)}}(\tilde{a},\tilde{b})$ stated in Theorem 1.

Proposition 6.

Let $B_{1},...,B_{K}$ be independent samples where $B_{i}\sim(1/n)\mathcal{W}^{\mathcal{C}}_{p}(n,\Sigma_{i})$ ( $i=1,...,K$ ). The likelihood ratio test statistic for the null hypothesis $H:\Sigma_{1}=...=\Sigma_{K}=\Sigma$ , with unspecified $\Sigma$ , is

[TABLE]

Furthermore, when $H$ is true, $-2\log(\tilde{\Lambda})$ is asymptotically $\chi^{2}_{f}$ where $f=(K-1)p^{2}$ .

In the following proposition, we let $\tilde{\Lambda}_{0}(\Sigma)$ be a random variable that is equal in distribution to $\tilde{\Lambda}$ under the null hypothesis.

Proposition 7.

Let $\tilde{\kappa}=\kappa T^{c}$ where $\kappa>0$ and $0<c<1/2$ , and define the test statistic for $H_{j}$ as

[TABLE]

where $K=2^{j}$ and $n$ is as given in Theorem 2. Under $H_{j}$ , $\Lambda_{j}\stackrel{{\scriptstyle\rm d}}{{=}}\tilde{\Lambda}_{0}(\Sigma)+o(1)$ , where $\Sigma=E\{\Omega(a,b)\}$ .

Theorem 3.

Let $\tilde{\kappa}=\kappa T^{c}$ where $\kappa>0$ and $0<c<1/2$ . Under $H_{j}$ , $-2\log(\Lambda_{j})$ is asymptotically $\chi^{2}_{\nu_{j}}$ where $\nu_{j}=(2^{j}-1)p^{2}$ . Specifically, ${\rm pr}\{-2\log(\Lambda_{j})\leq x\}={\rm pr}(\chi^{2}_{\nu_{j}}\leq x)+O(T^{-\beta})$ where $\beta=\min\{c,1/2-c\}$ .

Thus, the rate of convergence to $\chi^{2}_{\nu_{j}}$ is optimal when $c=1/4$ .

Let $\psi^{\scriptscriptstyle{(T)}}_{j,k}(t)$ denote the wavelet at the $j$ th scale and $k$ th translation ( $j=1,...,J$ ; $k=1,...,K$ ). Provided $\int_{-\infty}^{\infty}\psi_{j,k}^{\scriptscriptstyle{(T)}}(t)\psi_{l,m}^{*\scriptscriptstyle{(T)}}(t){\rm d}t=0$ for all $(j,k)\neq(l,m)$ (this is only an approximation for the Morlet wavelet), the likelihood ratio test statistics will be independent of each other. Combining them for $H_{1},...,H_{J}$ , namely $\Lambda\equiv\prod_{j=1}^{J}\Lambda_{j}$ , forms a test statistic for $H_{0}$ . It follows that $-2\log(\Lambda)$ is asymptotically $\chi^{2}_{\nu}$ , where $\nu=\sum_{j=1}^{J}\nu_{j}=p^{2}(2^{J+1}-2-J)$ .

6 Real data example

To give an example of the methods in practice, we analyse signalling regions within the lateral geniculate nucleus of a mouse. Specifically, we consider a set of neurons examined in Tang et al. (2015), where the authors are primarily concerned with analysing firing properties in order to understand how visual signals are encoded and transferred throughout the brain. To demonstrate the ability of our smoothed coherence estimator to operate with a single trial we consider only a single firing sequence from the paper. In this case, the mouse is shown a visual stimulus in the form of an liquid crystal display screen showing a sinusoidal monochromatic drifting grating with spatial stimulus at a frequency of 0.04 cycles per second and temporal flicker of 1Hz. The firing pattern is 7 seconds in length and represents data for cells 108 and 117 (these cells were picked for the example as they demonstrate relatively high firing rates). We use the Morlet wavelet with temporal smoothing parameter $\kappa=10$ and approximating support $\alpha=4$ . For completeness, this example was performed using exact kernel sampling, however, an approximate computation based on the Nystrom method based method (see Appendix 1) provides visually indistinguishable results and p-values.

The analysis of the experimental data is provided in Fig. 3. Tests for stationarity were performed at scale levels $j=1,2,3$ , with dyadic sampling points as marked by the crosses. With a p-value of 0.032, there is strong evidence that the process demonstrates non-stationary behaviour at the coarsest scale ( $j=1$ , corresponding to a scale of 0.25s and frequency of 4Hz through the relationship for a Morlet wavelet that $f=1/a$ ). However, there is little evidence to support non-stationarity existing at the finer scales ( $j=2,3$ ). With the parameters specified, the 95th percentile of the distribution for zero coherence is 0.593 and is represented by the black contour line. This indicates that the non-stationarity at $j=1$ involves a change in the correlation between the two data streams half way through the experiment, with significant coherent signalling becoming present in the latter half. It is worth noting that whilst we can also see some peaks in the wavelet coherence at higher frequencies (scale 0.025s), the number of data points within the support of the kernel is limited. Thus, at this level, we should be careful to make inferences based on the asymptotic results.

Acknowledgement

This work is funded by EPSRC grant EP/P011535/1. The authors would like to thank Leigh Shlomovich, Department of Mathematics, Imperial College London for developing the Hawkes process simulation code, and Heather Battey, Dean Bodenham, and Andrew Walden, Department of Mathematics, Imperial College London for stimulating conversations.

Supplementary material

Supplementary Material Section 1 contains the proofs to propositions and theorems presented here. Supplementary Material Section 2 provides the results for real valued wavelets. Supplementary Material Section 3 provides verification of the results via simulation, as well as further supporting figures. It also contains a link to a MATLAB package for implementing the presented methods.

Appendix 1

Computing eigen-wavelets and eigenvalues

The Nystrom method (Kythe and Puri, 2001, Chapter 1) is an efficient method for computing the eigenfunctions of kernel $K(s,t)$ for the multiwavelet representation described in Section 3. We can approximate the integral using the quadrature rule to solve the approximate eigen-problem $\sum_{j=1}^{n}w_{j}K(s,t_{j})\tilde{\varphi}_{l}(t_{j})=\tilde{\eta}\tilde{\varphi}_{l}(s)$ for a discrete set of values for $s$ . The quadrature points $\{t_{1},...,t_{n}\}$ ( $n$ large) are regularly spaced across $(-(\alpha+\kappa)/2,(\alpha+\kappa)/2)$ and the weights are set to be $w_{j}=(\alpha+\kappa)/n$ . For simplicity, the Nystrom points $\{s_{1},...,s_{n}\}$ are set to equal $\{t_{1},...,t_{n}\}$ . In matrix form, the eigen-problem now becomes

[TABLE]

where $K$ is the $\mathbb{R}^{n\times n}$ matrix $(K(s_{i},t_{j}))$ , $\tilde{\varphi}\equiv[\tilde{\varphi}(t_{1}),\ldots,\tilde{\varphi}(t_{n})]^{\rm T}$ , and $W\equiv\mathrm{diag}(w_{1},\ldots,w_{n})$ . Solving the above gives approximations to the first $n$ eigenvalues and eigen-wavelets of kernel $K(s,t)$ .

Should it be required, the Nystrom extension of the sampled vector $\tilde{\varphi}_{l}=[\tilde{\varphi}(s_{1}),\ldots,\tilde{\varphi}(s_{n})]$ is the function

[TABLE]

The sum in (S5) is over an infinite set of (eigen-)wavelet periodograms. However, in practice, the size of the eigenvalues drop away rapidly indicating that the kernel can be accurately reconstructed using only a small number of its eigen-wavelets, hence (S5) can be approximated with only a small number of terms. For example, in the case of the $\kappa=10$ , the first nine eigenvalues contain 99.9% (3.s.f.) of the overall energy.

Correspondence

Correspondence should be addressed to

Edward Cohen

Department of Mathematics

Imperial College London

London SW7 2AZ.

Email: [email protected]

Bibliography18

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bartlett (1963) Bartlett, M. S. (1963). The spectral analysis of point processes. Journal of the Royal Statistical Society. Series B 25 (2), 264–296.
2Brillinger (1972) Brillinger, D. R. (1972). The spectral analysis of stationary interval functions. Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Theory of Statistics , 483–513.
3Brillinger (1996) Brillinger, D. R. (1996). Some uses of cumulants in wavelet analysis. Journal of Nonparametric Statistics 6 , 93–114.
4Carter (1987) Carter, G. (1987). Coherence and time delay estimation. Proceedings of the IEEE 75 (2), 236–255.
5Cohen and Walden (2010 a) Cohen, E. A. K. and A. T. Walden (2010 a). A statistical analysis of Morse wavelet coherence. IEEE Transactions on Signal Processing 58 (3 PART 1), 980–989.
6Cohen and Walden (2010 b) Cohen, E. A. K. and A. T. Walden (2010 b). A statistical study of temporally smoothed wavelet coherence. IEEE Transactions on Signal Processing 58 (6), 2964–2973.
7Goodman (1963) Goodman, N. (1963). Statistical analysis based on a certain multivariate complex Gaussian distribution (an introduction). Annals of Mathematical Statistics 34 (1), 152–177.
8Grinsted et al. (2004) Grinsted, A, J., C. Moore, and S. Jevrejeva (2004). Application of the cross wavelet transform and wavelet coherence to geophysical time series. Nonlinear Processes in Geophysics 11 (5), 561–566.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Wavelet Spectra for Multivariate Point Processes

Abstract

1 Introduction

2 Temporally smoothed wavelet periodogram

2.1 Formulation

Assumption 1**.**

Assumption 2**.**

2.2 Practical implementation

3 Multi-wavelet representation

3.1 Formulation

Proposition 1**.**

Proposition 2**.**

3.2 Worked example

4 Statistical Properties under Stationarity

4.1 Preliminaries

Assumption 3**.**

Assumption 4**.**

Assumption 5**.**

4.2 Asymptotic distributional results

Proposition 3**.**

Theorem 1**.**

Theorem 2**.**

Corollary 1**.**

Proposition 4**.**

5 Test for stationarity

Proposition 5**.**

Proposition 6**.**

Proposition 7**.**

Theorem 3**.**

6 Real data example

Acknowledgement

Supplementary material

Appendix 1

Computing eigen-wavelets and eigenvalues

Correspondence

Assumption 1.

Assumption 2.

Proposition 1.

Proposition 2.

Assumption 3.

Assumption 4.

Assumption 5.

Proposition 3.

Theorem 1.

Theorem 2.

Corollary 1.

Proposition 4.

Proposition 5.

Proposition 6.

Proposition 7.

Theorem 3.