Bayesian spectral density estimation using P-splines with quantile-based knot placement
Patricio Maturana-Russel, Renate Meyer

TL;DR
This paper introduces a Bayesian spectral density estimation method using P-splines with a novel, data-driven knot placement strategy, improving computational efficiency while maintaining accuracy in capturing spectral features.
Contribution
It proposes a new Bayesian spectral density estimation approach with a data-driven knot placement scheme using P-splines, reducing computational costs compared to existing methods.
Findings
Accurately estimates spectral peaks with the new knot placement.
Reduces computational complexity significantly.
Maintains flexibility in modeling sharp spectral features.
Abstract
This article proposes a Bayesian approach to estimating the spectral density of a stationary time series using a prior based on a mixture of P-spline distributions. Our proposal is motivated by the B-spline Dirichlet process prior of Edwards et al. (2019) in combination with Whittle's likelihood and aims at reducing the high computational complexity of its posterior computations. The strength of the B-spline Dirichlet process prior over the Bernstein-Dirichlet process prior of Choudhuri et al. (2004) lies in its ability to estimate spectral densities with sharp peaks and abrupt changes due to the flexibility of B-splines with variable number and location of knots. Here, we suggest to use P-splines of Eilers and Marx (1996) that combine a B-spline basis with a discrete penalty on the basis coefficients. In addition to equidistant knots, a novel strategy for a more expedient placement of…
| AR(1) | |||
|---|---|---|---|
| B-spline | 0.870 | 0.714 | 0.594 |
| Equidistant knots | |||
| P-spline | 0.769 | 0.655 | 0.527 |
| P-spline | 0.698 | 0.609 | 0.629 |
| Q-spaced knots | |||
| P-spline | 0.848 | 0.771 | 0.614 |
| P-spline | 0.920 | 0.840 | 0.648 |
| AR(4) | |||
| B-spline | 2.990 | 2.202 | 1.800 |
| Equidistant knots | |||
| P-spline | 2.905 | 2.389 | 2.306 |
| P-spline | 3.149 | 2.566 | 2.387 |
| Q-spaced knots | |||
| P-spline | 2.517 | 1.843 | 1.486 |
| P-spline | 2.750 | 1.989 | 1.673 |
| AR(1) | |||
|---|---|---|---|
| B-spline | 1.000 | 1.000 | 0.997 |
| Equidistant | |||
| P-spline | 1.000 | 1.000 | 0.997 |
| P-spline | 0.993 | 1.000 | 0.993 |
| Q-spaced knots | |||
| P-spline | 1.000 | 1.000 | 0.997 |
| P-spline | 0.933 | 0.527 | 0.713 |
| AR(4) | |||
| B-spline | 0.907 | 0.970 | 0.897 |
| Equidistant knots | |||
| P-spline | 0.547 | 0.343 | 0.010 |
| P-spline | 0.320 | 0.170 | 0.007 |
| Q-spaced knots | |||
| P-spline | 0.800 | 0.833 | 0.803 |
| P-spline | 0.313 | 0.363 | 0.277 |
| AR(1) | |||
|---|---|---|---|
| B-spline | 1.000 | 1.000 | 1.000 |
| Equidistant & | |||
| Q-spaced knots | |||
| P-spline | 1.000 | 1.000 | 1.000 |
| AR(4) | |||
| B-spline | 1.000 | 1.000 | 1.000 |
| Equidistant knots | |||
| P-spline | 1.000 | 0.992 | 0.984 |
| P-spline | 0.984 | 0.984 | 0.980 |
| Q-spaced knots | |||
| P-spline | 1.000 | 1.000 | 1.000 |
| P-spline | 0.984 | 0.992 | 0.992 |
| AR(1) | |||
|---|---|---|---|
| B-spline | 24.323 | 28.258 | 34.093 |
| Q-spaced knots | |||
| P-spline | 4.859 | 7.450 | 9.606 |
| P-spline | 4.538 | 6.863 | 8.783 |
| B-spline/P-spline | 6.090 | 4.368 | 4.493 |
| AR(4) | |||
| B-spline | 24.983 | 29.796 | 36.913 |
| Q-spaced knots | |||
| P-spline | 4.883 | 7.453 | 9.665 |
| P-spline | 4.543 | 6.887 | 8.827 |
| B-spline/P-spline | 6.062 | 4.591 | 5.523 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: Patricio Maturana-Russel 22institutetext: Department of Mathematical Sciences, Auckland University of Technology, Auckland, New Zealand
22email: [email protected] 33institutetext: Renate Meyer 44institutetext: Department of Statistics, University of Auckland, Auckland, New Zealand
Bayesian spectral density estimation using P-splines with quantile-based knot placement
Patricio Maturana-Russel
Renate Meyer
Abstract
This article proposes a Bayesian approach to estimating the spectral density of a stationary time series using a prior based on a mixture of P-spline distributions. Our proposal is motivated by the B-spline Dirichlet process prior of Edwards et al. (2019) in combination with Whittle’s likelihood and aims at reducing the high computational complexity of its posterior computations. The strength of the B-spline Dirichlet process prior over the Bernstein-Dirichlet process prior of Choudhuri et al. (2004) lies in its ability to estimate spectral densities with sharp peaks and abrupt changes due to the flexibility of B-splines with variable number and location of knots. Here, we suggest to use P-splines of Eilers and Marx (1996) that combine a B-spline basis with a discrete penalty on the basis coefficients. In addition to equidistant knots, a novel strategy for a more expedient placement of knots is proposed that makes use of the information provided by the periodogram about the steepness of the spectral power distribution. We demonstrate in a simulation study and two real case studies that this approach retains the flexibility of the B-splines, achieves similar ability to accurately estimate peaks due to the new data-driven knot allocation scheme but significantly reduces the computational costs.
Keywords:
P-splines B-splines Bernstein-Dirichlet process prior spectral density estimation Whittle likelihood
1 Introduction
The power spectral density (psd), or simply spectral density, describes the distribution of the power or variance over the individual frequency components of a time series. It embodies useful information for the study of stationary time series. For a zero-mean weakly stationary time series, an absolutely summable autocovariance function, i.e. , guarantees the existence of a continuous and bounded spectral density function given by
[TABLE]
where is the angular frequency.
Methods to estimate this function range from parametric to nonparametric approaches. The former are mainly based on fitting autoregressive moving average (ARMA) models. However, the corresponding spectral density will be biased if the ARMA model does not adequately describe the dependence structure of the time series. This is a downside of all parametric approaches. They are efficient if the model fits well but because they make restrictive assumptions about the data-generating mechanism, their inference is sensitive to model misspecifications and may be severely biased. Bayesian nonparametric approaches, on the other hand, put a prior distribution on the set of all possible densities that could have generated the data not just a small parametric subset thereof. They thus avoid the problem of choosing between different parametric models by an intrinsic data-driven determination of the model complexity.
As the periodogram fluctuates around the true spectral density, most nonparametric methods for spectral density estimation hinge on smoothing the periodogram
[TABLE]
in one way or another, where for are the Fourier frequencies and \mbox{\boldmathY}=(Y_{1},\ldots,Y_{n}) is a stationary time series of length . The periodogram ordinates evaluated at the Fourier frequencies are asymptotically independent and follow exponential distributions with mean . Thus, they are asymptotically unbiased estimates of but not consistent (Brockwell and Davis, 1991).
Most Bayesian approaches to psd estimation use the so-called Whittle likelihood, introduced by Whittle (1957) as an approximation to the likelihood of a Gaussian stationary time series. Based on the limiting independent exponential distribution of the periodogram values, the Whittle likelihood for a mean centred weakly stationary time series of length is defined as
[TABLE]
Advantages of this approximation are that it depends directly on the spectral density, unlike the true Gaussian likelihood function, and gives a good approximation for non-Gaussian time series under certain conditions as shown by Shao and Wu (2007).
Bayesian nonparametric prior specifications for the psd include Carter and Kohn (1997) who used a smoothing prior on the log psd. Rosen et al. (2012) segmented a nonstationary time series into stationary segments. For each stationary segment, they decomposed the log psd into a linear and nonlinear component and put a linear smoothig spline prior on the nonlinear part. Pensky et al. (2007) propose Bayesian wavelet-smoothing of the log psd and recently, Cadonna et al. (2017) modelled the log psd as a mixture of Gaussian distributions with frequency-dependent weights and mean functions. Gangopadhyay et al. (1999) modelled the log-spectral density function by piecewise polynomials of low order between strategically placed knots on the support of the psd, allowing the data to select the number and location of the knots. A reversible jump Markov chain Monte Carlo (RJMCMC; Green, 1995) algorithm was implemented to sample from the posterior distribution. Similarly, frequentist approaches such as Rodríguez-Álvarez et al. and Wood and Fasiolo (2017) model the log spectral density by a linear combination of splines with variable knots and combine these with adaptive smoothing.
Penalized splines have been used for a time-domain analysis of time series data e.g. to model the mean as a smooth function of time by Krivobokova et al. (2006) or for fitting and forecasting univariate nonlinear time series by Wegener and Kauermann (2017), but not in the frequency domain for estimating the spectral density. The P-spline prior on the spectral density function that we are going to propose is related to the Bernstein-Dirichlet process (BDP) prior introduced by Choudhuri et al. (2004) and combined with the Whittle likelihood. The BDP prior is a mixture of beta densities with weights induced by a Dirichlet process. The number of components, which is a smoothing parameter, is given a discrete prior. Edwards et al. (2019) extended the method by replacing the beta densities with B-spline densities. Similar to Gangopadhyay et al. (1999), the specification of the number and location of the B-spline densities is data-driven, however it avoids a RJMCMC algorithm by putting a second Dirichlet process prior on the cdf that induces the inter-knot distances. Choudhuri et al. (2004) already demonstrated in a comprehensive simulation study that the Bayesian psd estimates based on the BDP prior outperforms in terms of the -error the smoothed periodogram estimates based on Bartlett-Priestley quadratic kernel, the penalized MLE of Pawitan and O’sullivan (1994) and autoregressive spectral estimates in most scenarios but noted that the BDP-based estimates detect peaks correctly but often underestimate the magnitude of sharp peaks. Unlike the beta densities which have global support on the interval [0,1], B-spline densities have local support, thereby increasing their flexibility and allowing better modelling of spectral densities with sharp peaks and abrupt changes. As demonstrated in Edwards et al. (2019), the B-spline–Dirichlet process (BspDP) prior outperforms the BDP prior in estimating complex psds. However, this flexibility comes at a high computational price.
With the aim of reducing the computational cost of the BspDP prior approach, we suggest to use the idea of P-splines of Eilers and Marx (1996) that combines a large but fixed number of B-splines with a simple difference penalty that controls the degree of smoothness of the spectral density. P-splines, i.e. equally-spaced B-splines, have been successfully implemented for frequentist nonparametric regression but to the best of our knowledge not yet for Bayesian spectral density estimation. We suggest two novel Bayesian approaches in this paper, both based on the Whittle likelihood and on a mixture of B-spline densities. Both approaches fix the number of B-spline densities. The first approach uses equally spaced knots with a P-spline prior which we define in Section III. We demonstrate that this approach yields significant savings in computational time without sacrificing accuracy over the BspDP prior approach for simple spectral structures. The second approach has a particular focus on spectral densities with sharp peaks. Instead of an equidistant spacing, the knots are spread based on the distribution of peaks in the periodogram. Thus the knot allocation is also data-driven as with the BspDP prior but the knots remain fixed once allocated. This novel knot allocation scheme avoids the computational complexity of the BspDP prior. We show that with this knot allocation scheme, the estimates of spectral densities with spikes and abrupt changes improves significantly without an increase in computing time.
The paper is organized as follows: In order to set notation and allow for a direct comparison, we start by reviewing the B-spline densities and the BspDP prior in Section II. Section III specifies the P-spline prior for the psd, the novel knot allocations scheme and the Gibbs sampler used to sample from the posterior distribution based on the P-spline prior and Whittle likelihood. In Section IV, we test our P-spline-based approaches to spectral density estimation and compare results to those based on the BspDP prior in a simulation study and the application to two real time series. The paper concludes with a discussion of the relative merits of the new approaches and avenues for future research.
2 Notation and review of the B-spline–Dirichlet process prior
By an application of the Weierstrass theorem it can be seen that a mixture of beta densities with only integer parameters can uniformly approximate any continuous density on the interval (Choudhuri et al., 2004). Let be a cumulative distribution function (cdf), with continuous density , then
[TABLE]
converges uniformly to , where , is the beta density with parameters and , and are the weights of the mixture. The Bernstein polynomial prior, which is used to describe a nonparametric prior for probability densities in the unit interval, is based on this approximation (Petrone, 1999a, b). Choudhuri et al. (2004) proposed it as a prior on the spectral density function.
As shown by Perron and Mengersen (2001) for distributions on the interval [0,1], both mixtures of B-spline and beta distributions can approximate these arbitrarily well by increasing the number of mixture components, i.e. increasing the polynomial order for the beta distributions and the number of knots for suitable knot locations for the B-splines. But the rate of approximation is faster for B-splines than for beta distributions. This work motivated Edwards et al. (2019) to propose the BspDP prior on the spectral density function which is based on the representation (2) but with the beta densities replaced by B-spline densities as defined in the following.
B-splines and BspDP Prior
A spline of order is a function defined piecewise by polynomials of degree , which meet at points called knots where the function is continuous. Any of these spline functions can be described by basis functions, known as B-splines, in other words, all spline functions can be represented as a unique combination of B-splines with the same order and over the same partition. Without loss of generality, the global domain of interest is assumed to be the unit interval .
The B-spline basic functions of any order can be recursively defined as
[TABLE]
where
[TABLE]
and
[TABLE]
is the knot vector. This vector is a non-decreasing sequence that contains knots, which can be divided in external and {\color[rgb]{0,0,0}K^{*}}=K-r+1 internal knots. The latter must be .
The B-spline densities, i.e. normalized B-spline functions, are defined by
[TABLE]
The BspDP prior proposed by Edwards et al. (2019) involves two Dirichlet processes, one placed on inducing the weights of the B-spline densities and the other on the knot spacings. In practice, these are implemented using the stick-breaking representation (Sethuraman, 1994), for details see Edwards et al. (2019). For computational reasons, this infinite series representation of and is truncated at a large but finite positive integer, and , respectively. These truncations control the quality of the approximations. Longer series improve the approximation, but increase the number of calculations and consequently the computation time.
As shown by Edwards et al. (2019), the approach based on the BspDP prior outperforms the BDP prior in estimating spiky spectral densities. In the case of smooth spectral densities, there is no a significant difference in the performance of these two approaches. However, in practice, the true psd function is unknown, which makes it necessary to use methods such as the BspDP that will work well in all conceivable cases.
To implement the BDP prior, one only has to calculate the beta densities once and they can be stored and re-utilized across MCMC steps. On the other hand, the BspDP prior requires the calculation of the B-spline densities at each iteration, since these vary due to varying knot locations. Moreover, calculations are needed in the truncated stick-breaking representation of the two Dirichlet processes at each iteration resulting in a significantly increased complexity of the BspDP algorithm.
This has prompted us to explore the performance of two new algorithms, both based on the P-spline prior algorithm. Both use a fixed number of B-spline densities and knot locations, avoiding the Dirichlet processes and the recalculation of the B-spline densities. Whereas the first approach uses equidistant knots, the second algorithm proposes a new fast knot allocation strategy. Both methods preserve the flexibility of the B-spline densities and their potential ability to estimate spectral densities with sharp peaks and abrupt changes.
3 P-spline prior
As the spectral density function is defined on the interval , it is reparametrized and defined as
[TABLE]
where is the normalizing constant and
[TABLE]
with fixed number of B-spline densities of fixed degree , weight vector \mbox{\boldmathw}=(w_{1},\cdots,w_{K}), and knot sequence .
contains fixed equidistant internal knots on in our basic P-spline prior. A more judicious scheme for the placement of knots that improves the fit for peaked spectral densities is described in section “Quantile-based knot placement”.
Smoothing splines in the frequentist context are based on approximating a given function by a linear combination of B-splines and putting a penalty on the integrated squared second derivative as a measure of roughness. P-splines avoid derivatives by expressing the penalty as the sum of squares of differences of the B-spline coefficients. In a Bayesian context, this penalty can be transformed into a prior distribution for the order difference of successive coefficients, see e.g. Lang and Brezger (2004).
In our approach, the B-spline coefficients are weights, so positive and sum to one. We therefore reparametrize to the vector with
[TABLE]
After some calculations, it can be shown that the weights are given by
[TABLE]
The last weight is defined as , thus they all sum to 1. Then, an indirect prior is placed on the weights via
[TABLE]
where \mbox{\boldmathv}=(v_{1},\dots,v_{K-1})^{\top} is a dimensional parameter vector, is the smoothing or penalty parameter, is the penalty matrix (which is a full matrix rank matrix for any small quantity , for instance, ) with D the order difference matrix (, and are shape parameters, and are rate parameters, and denotes a gamma distribution with mean and variance . We used first and second order difference penalties in the simulation study. The order difference matrix D is defined as
[TABLE]
and the order difference matrix as
[TABLE]
The rationale for using discrete second order roughness penalties as an approximation to the usual continuous roughness measure based on the integral over the second derivatives is given in Eilers and Marx (1996).
The parameter vector for the P-spline prior is \mbox{\boldmath\theta}=(\mbox{\boldmathv}^{\top},\phi,\delta,\tau)^{\top}. We use a robust specification for the prior distribution of the penalty parameters as suggested by Jullion and Lambert (2007) by choosing small values for and , for instance, . The choice of and do not affect the spectral density estimate. Here we set and equal to 1 as used by Bremhorst and Lambert (2016). The scale parameter is given by an inverse gamma prior IG(), which can be considered as a noninformative prior for (Edwards et al., 2019). These are also the prior specifications used in the simulations and examples in Section 4.
Posterior Computation
The joint posterior distribution of the parameter vector \mbox{\boldmath\theta}=(\mbox{\boldmathv}^{\top},\phi,\delta,\tau)^{\top} is given by
[TABLE]
where L(\mbox{\boldmathY}|f) denotes the Whittle likelihood defined in (1). This posterior is proper since all the prior distributions are proper. We use the Gibbs sampler to sample from the joint posterior distribution.
Sampling from the full conditional posterior distributions of , , and can be performed directly. These are given by
[TABLE]
However, to sample from the full conditional distribution of , we use the Metropolis algorithm. Since the weights in general are larger in those areas of the frequency domain where the psd has peaks, we specify their starting value proportionally to the periodogram. Applying the corresponding transformation we obtain the starting value for the vector . This strategy speeds up the MCMC process by shortening the burn-in period.
In order to improve the mixing of the chains, we apply the Metropolis algorithm on a reparametrized posterior distribution (Lambert, 2007). For this, we need a pilot posterior sample for from which we calculate its mean vector and covariance matrix S. Then we define the following re-parametrization
[TABLE]
where \mbox{\boldmath\beta}=(\beta_{1},\dots,\beta_{K-1})^{\top}.
We can update by modifying according to a proposal distribution. In this work, we propose a univariate proposal value to update given by
[TABLE]
where \mbox{\boldmath\beta}^{*}=(\beta_{1},\dots,\beta_{k}^{*},\dots,\beta_{K-1})^{\top}, , and controls the length of the proposals. We vary across iterations in order to get an acceptance rate between 0.3 and 0.5. Note that even though the proposal is univariate on , it is multivariate on \mbox{\boldmathv}^{*}. Alternatively to the Metropolis step used for , a data augmentation step based on Pólya-Gamma auxiliary variables can be used which would allow for a full Gibbs sampler (Polson et al., 2013).
The cycle of the Gibbs sampler is given by
- •
Draw \mbox{\boldmathv}^{m} from p(\mbox{\boldmathv}|\mbox{\boldmath\beta}^{m-1},\phi^{m-1},\delta^{m-1},\tau^{m-1}) using univariate Metropolis steps for , with , according to the reparametrization described above;
- •
Draw from
[TABLE]
- •
Draw from
- •
Draw from
[TABLE]
The influence of the penalty is determined by . The larger this value is, the smoother is the resulting estimate. It is interesting to note the role of the penalty matrix P in the rate parameter on the full conditional posterior distribution of . When the quadratic form \mbox{\boldmathv}^{\top}\textbf{P}\mbox{\boldmathv} tends to infinity, this distribution becomes concentrated at . This limiting behaviour yields rougher results. Note that in this case the prior rate becomes irrelevant in the full conditional posterior density. On the other hand, when \mbox{\boldmathv}^{\top}\textbf{P}\mbox{\boldmathv} tends to , the distribution favours large values, yielding smoother estimates. The magnitude of this quadratic form is controlled by the penalty order and the number of B-spline densities. In general, large order penalties produce lower values for this quantity, causing smoother results. In this way the penalty matrix penalizes the inclusion of B-spline densities.
Ruppert (2002) explained that the degree of smoothness implied by a certain choice of prior parameters also depends on the scale of the responses . To avoid the problem, the author suggested to standardize before the sampling process and apply the inverse operation afterwards. This also allows to avoid numerical problems in the MCMC process. We follow his proposal.
Choosing the number of B-splines
The number of B-spline densities plays a critical role in the model fit and may result in under- or oversmoothing. Even though the penalty parameter controls the smoothness of the fit, a large number of B-splines yields rougher and a low number smoother spectral density estimates. There is a smallest sufficient number which results in a psd estimate that fits the features of the data whereas choosing a larger number will have an insignificant effect on the fit. Unexpectedly, Ruppert (2002) found some cases in which too many B-splines degrades the fitting in terms of mean square error. Unfortunately, there is no consensus in how to find the optimal value of the number of B-splines.
Eilers et al. (2015) consider the use of B-splines a wise choice, unless computational constraints are evident. Some algorithms that explore different number of knots in a trial sequence and find an optimal value with respect to a certain goodness-of-fit criterion have been proposed. Ruppert (2002) suggested a full-search and myopic algorithms which are based on the generalized cross-validation statistic. However, their results are not conclusive and this criterion only applies if the penalty parameter is treated as a tuning constant (Kauermann and Opsomer, 2011). Likhachev (2017) also considered a knot sequence and proposed the selection of the number of knots based on Akaike’s, corrected Akaike’s and the Bayesian Information Criterion.
Ruppert (2002) discussed a heuristic rule of thumb , which is simpler and, in our experience, seems to work very well in general. This criterion allocates a knot between observations, where stands for the floor function. We employ this approach in the application Section for selecting the number of B-spline densities.
Quantile-based knot placement
The knots are often equally spaced in P-spline methods. This scheme works quite well for smooth psds without any abrupt changes. However, for spectral densities with spikes, the algorithm will require a large number of knots to adequately estimate the peaks. The B-spline prior method is able to handle abrupt changes because the knot location is variable and driven by the data. We propose a new selective method that allocates the knots according to quantiles of a periodogram-related distribution function but once allocated, the knots stay fixed. The idea is to concentrate more effort in those regions that have potential peaks as detected by the periodogram. It is similar in spirit to the knot placement of O-splines (Wand and Ormerod, 2008) in the regression context that make use of the empirical quantiles of the covariates.
Our knot location proposal works as follow. We calculate the periodogram and apply a square root transformation in order to have more regular magnitudes and eliminate a potential trend. Second, we standardize these values, apply the absolute value function, and normalise them. We treat this transformed periodogram as a probability mass function (pmf) and interpolate its distribution function by the continuous cumulative distribution function . Finally, the knots are allocated according to the quantiles of . Thus, in those areas in which the periodogram has sharp and abrupt changes, our procedure will assign more knots in proportion to their magnitudes. Hereafter, we will refer to this scheme as quantile-spaced (Q-spaced) knots.
The procedure can be summarized as follows:
Let \mbox{\boldmathx}=\left(\sqrt{I_{n}(\lambda_{1})},\dots,\sqrt{I_{n}(\lambda_{\nu})}\right) be the square root of the periodogram values. 2. 2.
Define , for , where denotes the average and the standard deviation of . 3. 3.
Define , , and let denote the continuous cdf that interpolates the . 4. 4.
Let \mbox{\boldmathq}=\left(q_{1},\dots,q_{K^{*}}\right) be a vector of equally spaced points in , that is , for . 5. 5.
The Q-spaced internal knot vector is given by the corresponding quantiles \mbox{\boldmath\xi}^{*}=\left(F^{-1}(q_{1}),\dots,F^{-1}(q_{K^{*}})\right).
Note that this knot allocation algorithm is applied only once at the beginning of the MCMC process. Therefore, its computational cost is negligible.
For non-equidistant knots, it does not make sense any longer to base the definition of the covariance matrix of the Gaussian prior on the difference matrices as defined in (4) and (5) because this is an adequate approximation of the derivative-based roughness measures – defined as the integral of the first and second order derivatives of the P-splines – for equally spaced knots only. The definition of the difference matrices could be adjusted using divided differences, but here we use the derivative-based penalties as described by Wood (2017). We make use of the implementation of the derivative-based penalty matrix by the function bsplinepen in the R-package fda (Ramsay et al., 2020). This penalty matrix contains the inner products of the respective first and second order derivatives of the basis functions. In addition, we normalise this matrix by dividing it by the maximum absolute column sum.
The quantile based knot allocation can be regarded as an empirical Bayes prior since the data are used to specify the prior distribution. However, only the locations of the B-splines are determined by the data, not the parameters of the prior distribution which in this case are the weights associated with each B-spline . The main criticism of an empirical Bayes approach is that the data are being used twice, once to specify the prior and then again to update the prior to the posterior. Here, the data are used only once to specify the knot locations but not again to update these through the likelihood.
4 Application
We first assess the properties of our P-spline spectral density estimates in a simulation study and compare them to estimates based on the BspDP prior. We furthermore apply all approaches to the analysis of the classical sunspot dataset and the S. Carinae variable star light intensities, previously analysed by Carter and Kohn (1997); Huerta and West (1999); Kirch et al. (2018). As the psd of the S. Carinae time series contains sharp peaks, its analysis allows to assess the impact of the periodogram-spaced knots in contrast to the equidistant knots. For all the analyses, we use cubic B-splines ().
4.1 Simulation study
The setup of our simulation study mirrors that in Edwards et al. (2019) who compared spectral density estimates based on the BspDP and the BDP priors. We generated 300 autoregressive time series of order 1 and 4 with unit variance Gaussian innovations of length . For the AR(1) model, the first order autocorrelation was chosen, which has a relatively simple spectral density (see Figure 1). The AR(4) time series with , , and have a spectral density with two large peaks (see Figure 2).
We estimated the spectral density functions using the BspDP and P-spline priors. The BspDP-based analysis was performed via the gibbs_bspline function in the R package bsplinePsd 0.6.0 (Edwards et al., 2018). The algorithm was executed for 100,000 iterations with a burn-in period of 25,000 and thinning factor of 10, resulting in 7,500 samples used for posterior inference. Pilot analyses showed that these specifications are suitable in these examples in terms of convergence (results not shown).
The P-spline analysis was performed using the gibbs_pspline function implemented in the R package psplinePsd (Maturana-Russel and Meyer, 2020). A total of 100,000 MCMC samples were generated using a pilot run of 20,000 iterations with a burn-in period of 5,000 and thinning factor of 10 to calibrate the proposals. The final sample consisted of 80,000 samples with a burn-in period of 5,000 and thinning factor of 10, resulting in 7,500 samples used for posterior inferences. This procedure was replicated for both equidistant and Q-spaced knots considering first and second order penalties.
The theoretical spectral density for an AR() model is given by
[TABLE]
where is the variance of the white noise innovations and () are the model parameters. In order to compare the psd estimates to the true function, we use the integrated absolute error (IAE) or -error which is defined as
[TABLE]
where is the pointwise posterior median of the spectral density . The IAE is computed for each of the 300 replications and its median is calculated. The results are displayed in Table 1.
For the AR(1) time series, the P-spline prior with equidistant knots and penalty order yields better psd estimates than the BspDP prior. A larger penalty order does not necessarily result in more accurate psd estimates. In this case of a relatively smooth spectral density, the Q-spaced knots do not yield an improvement over the equidistant knots and the BspDP prior.
For the AR(4) time series, the fit based on the P-spline prior with Q-spaced knots is better than the fit based on the B-spline Dirichlet process prior and the one based on the P-spline prior with equidistant knots. In general, the BspDP algorithm achieves higher accuracy than the P-spline method with equidistant knots. As shown by Edwards et al. (2019), the posterior distribution based on the BspDP prior yields better results than posterior based on the BDP in terms of median IAE in this example. Consequently, our P-spline algorithm with Q-spaced knots outperforms the BDP method as well.
The IAE measures the accuracy of the point estimates of the spectral density. In the following, we will also compare the coverage of posterior uniform credible bands. We define the uniform credible band as
[TABLE]
where is the pointwise posterior median spectral density, is the median absolute deviation of the posterior samples and is such that
[TABLE]
The idea is to measure the proportion of times that the true spectral density is fully contained by the uniform credible band in the 300 replications. The results are displayed in Table 2. In addition, we calculate the proportion of times that the uniform credible band contains the value of the true psd at each of the Fourier frequencies. The median of these pointwise coverage proportions are displayed in Table 3.
For the AR(1) model, the BspDP and first order penalty P-spline 90% credible bands cover the whole true spectral density function, i.e. for all frequencies , in almost all the 300 independent analyses. In general, the latter method shows diminishing performance as the penalty degree increases. For the AR(4) model, the P-spline method with equidistant knots performs poorly. On the other hand, the quantile-based knot scheme yields evidently much better results, which are comparable (in the case of the first order penalty) to those obtained via the BspDP algorithm. For the P-spline priors, the coverage probabilities decrease as the penalty degree order increases, which can be explained by the forced smoothness imposed by the higher penalty order. As shown by Edwards et al. (2019), the BspDP algorithm produces better results than the BDP prior in the AR(4) case, which coverage probability is zero across . Consequently, our P-spline algorithm with Q-spaced knots outperforms the BDP method in this case.
Even though the P-spline method with equidistant knots for the AR(4) case performs poorly in terms of uniform coverage, i.e. the proportion of times that the credible band covers the true psd completely in the 300 runs, it is important to note that this is only because the uniform band does not cover the psd for just very few frequencies. As is evident from Table 3, only for a small fraction of the frequencies the psd is not covered by the uniform credible band. In this context, BspDP and Q-spaced knot P-spline algorithms perform equally well.
As Edwards et al. (2019) noted, one of the drawbacks of the BspDP prior is its computational complexity relative to the BDP. This is directly reflected in the run-time, which is approximately 2-3 times higher in this example (see Table 3 in Edwards et al. (2019)). The median run-time in our analyses for the BspDP prior and the P-spline with Q-spaced knots are displayed in Table 4. Note that the P-spline method based on equidistant knots has similar run-times to the Q-spaced knots. The P-spline method is approximately 4-6 times faster than BspDP approach in these examples. This is due to a reduced number of calculations per iteration because the B-spline densities are calculated only once at the start of the algorithm and there are no Dirichlet process approximations via the stick-breaking method.
In general terms, the P-spline algorithm with our novel quantile-based knot allocation scheme offers the best trade-off between accuracy and computation cost. The rule-of-thumb criterion, , for the number of B-spline densities in the P-spline prior algorithm worked perfectly well in this example.
4.2 Sunspot data analysis
Sunspots are regions of reduced surface temperature that are visible as darker spots on the sun’s photosphere. We analyse the average annual mean sunspot numbers for the year 1700-1987 consisting of 288 observations. This is a classic dataset used to assess spectral density estimation methods. A square root transformation is applied in order to make the observations more symmetrical and stationary.
As in Edwards et al. (2019), the MCMC algorithm based on the BspDP prior is run for 100,000 iterations, with a burn-in period of 50,000 and thinning factor of 10. Thus, 5,000 samples are used for the estimation. This took about 22 minutes. The other specifications are the same as those used in the simulation study.
The MCMC algorithm based on the P-spline prior is run in two stages. In the first stage, the proposal is calibrated for the final run. It consists of 25,000 iterations with a burn-in period of 10,000 and thinning factor of 10, resulting in 1,500 samples. Then the algorithm is run for 75,000 iterations with a burn-in period of 25,000 and thinning factor of 10, resulting in 5,000 samples. The analysis is performed using the first order penalty for equidistant and Q-spaced knots. All other specifications are the same as those used in the simulation study. A single analysis takes about 4.5 minutes.
The results for the BspDP and P-spline prior with first order penalty and Q-spaced knots are displayed in Figure 3 (p-spline with equidistant knot approach results are not shown due to their similarity to the Q-spaced knot estimates). Both methods reveal a large peak at about 0.09. In other words, there is a periodic solar cycle of length years. These results are consistent with those obtained via the BDP prior (Choudhuri et al., 2004). However, the BspDP estimation takes approximately 22 minutes compared to only 4.5 minutes for the the P-spline estimation. Thus the P-spline estimation is approximately five times more efficient.
4.3 Variable star S. Carinae data analysis
This dataset contains 1,189 visual observations of the S. Carinae, a variable star in the southern hemisphere sky. These are daily observations collected by the Royal Astronomical Society of New Zealand, which correspond to 10 day averages of light intensities over several years. This data set has been analysed in the literature previously (see for instance Carter and Kohn (1997); Huerta and West (1999); Kirch et al. (2018)). It contains 40 missing observations that in this work have been replaced by the mean, thus not affecting the general features of the data. In addition, the data has been squared root transformed and mean centred. The first 150 observations are displayed in Figure 4, where the missing values are replaced by the mean.
Kirch et al. (2018) provided a psd estimate via a nonparametrically corrected (NPC) approach, which is able to detect several peaks in the function. These features make this data set suitable to assess the impact of the knot location on the P-spline estimates.
Again, the P-spline algorithm is run in two stages. First, a Markov chain of 5,000 samples with a burn-in period of 1,000 and thinning factor of 10, resulting in 400 points, is used to calibrate the proposals of the final run. Then, a Markov chain of 10,000 samples with a burn-in period of 2,000 and thinning factor of 10, that is 800 points, is used for the psd estimation. We use 40 B-spline densities, based on the rule-of-thumb criterion discussed above, and the first order penalty. The whole process takes less than 1.5 minutes to run and is replicated for equidistant and Q-spaced knots. The results are displayed in Figure 5.
The periodogram shows several peaks with the main one located at around 0.42. When the knots are equally spaced, the algorithm is unable to capture the sharpness of this peak, in contrast to the Q-spaced knots. This is because the periodogram-based knot allocation puts more knots in those regions which contain significant peaks as indicated by the periodogram. The rest of peaks, which are smaller in comparison to the main one, are slightly better captured by the equidistant knots, since a large proportion of the Q-spaced knots have been allocated around the main peak. However, this can easily be corrected by increasing the number of knots. The psd estimate based on Q-spaced knots is comparable to the one obtained via the NPC approach (Kirch et al., 2018).
5 Discussion
As shown by Edwards et al. (2019), the BspDP prior outperforms the BDP prior in terms of IAE and uniform coverage probabilities for spiky psds. This is explained by the local support of the B-splines and its better approximation properties (Edwards et al., 2019). However, its performance is only achieved at a much higher computational cost. These characteristics have motivated our P-spline approach.
Although the P-spline prior is very similar to the B-spline prior as both are based on mixtures of B-spline distributions, there is quite a significant difference. The latter works with a variable number and location of knots, which are driven by the data. The P-spline prior assumes a fixed number of knots with fixed locations, thus almost looks like a special case. However, the multivariate prior based on the difference penalty of the basis coefficients controls the smoothness of the spectral density, a feature that cannot be achieved with a univariate Dirichlet process prior. Most importantly, fixing the number and location of knots reduces the computational effort significantly, without sacrificing accuracy as shown in the application section. To deal with abrupt peaks in the psd, we propose to locate the knots according to the peaks detected via the periodogram. This approach does not affect the computational time and improves the estimates dramatically.
In our simulation study, we showed that the equidistant knot scheme for the P-spline method works quite well in the case of simple spectral structures (AR(1) model). However, its performance is surpassed by the quantile-based knot location scheme for complex structures (AR(4) model). In this scenario particularly, we can highlight the good performance of the quantile-based knot scheme. This method outperforms the BspDP prior in terms of IAE and run-times, particularly for the complex psds. Regarding uniform coverage, a first order penalty P-spline coupled with Q-spaced knots works best. For this combination, the uniform coverage almost reaches the nominal level.
We also assessed our proposal in the classic sunspot dataset, in which the posterior distribution based on the BspDP prior is able to estimate correctly the solar cycle that occurs every 11.07 years. The posterior distribution based on the P-spline prior yields almost identical results, but in significantly less computational time.
Finally, we assess the fit of the posterior distribution based on the P-spline prior a under two different knot location schemes: equally spaced and Q-spaced knots. For this, we estimated the psd for the S. Carinae time series which has a spectral density with several sharp peaks. Equidistant knots failed to capture the main peak of this spectral density whereas Q-spaced knots significantly improved the results without affecting the computational time.
The number of B-spline densities was selected according to the criterion proposed by Ruppert (2002). It seems to work well for simple spectral densities. However, more complex functions could require larger number of B-spline densities. This can be evaluated studying the peaks of the periodogram. Our knot location proposal is very useful in these situations. Future research will explore whether the choice of the number of knots can be related to the entropy of the periodogram.
As already reviewed in the introduction, many Bayesian nonparametric approaches to spectral density estimation propose to put a prior on the log-spectral density. This could be easily implemented by modifying (3) to
[TABLE]
and it could be interesting to examine for what class of spectral densities this prior specification would yield better posterior inference.
Another interesting question is whether the P-spline prior can be extended to enable a Bayesian analysis of multivariate time series. In this case, the spectral density is a Hermitian positive definite matrix-valued function. A novel Bayesain nonparametric approach to multivariate spectral density estimation was proposed by Meier et al. (2020) based on a generalization of the Bernstein-Dirchlet process prior where Bernstein polynomials are used to smooth a matrix-Gamma process. An avenue for future research will be to explore whether a computational speed-up might be possible by making use of P-splines in smoothing the elements of the Cholesky decomposition of the spectral density matrix.
In conclusion, the P-spline prior with the quantile-based knot placement scheme offers a more computationally viable alternative to the BspDP prior for spectral density estimation. As shown in this work, it can handle spectral densities with several peaks in a reasonable run-time.
The R package psplinePsd can be downloaded from GitHub.
Acknowledgements.
We thank Thomas Yee for helpful discussions on P-splines. We also thank the New Zealand eScience Infrastructure (NeSI) for their high performance computing facilities, and the Centre for eResearch at the University of Auckland for their technical support. PM’s and RM’s work is supported by Grant 3714568 from the University of Auckland Faculty Research Development Fund and the DFG Grant KI 1443/3-1. RM gratefully acknowledges support by a James Cook Fellowship from Government funding, administered by the Royal Society Te Apārangi. All analysis was conducted in R, an open-source statistical software available on CRAN (cran.r-project.org).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bremhorst and Lambert [2016] Vincent Bremhorst and Philippe Lambert. Flexible estimation in cure survival models using Bayesian p-splines. Computational Statistics & Data Analysis , 93:270 – 284, 2016. ISSN 0167-9473. doi: https://doi.org/10.1016/j.csda.2014.05.009 . URL http://www.sciencedirect.com/science/article/pii/S 0167947314001492 .
- 2Brockwell and Davis [1991] Peter J Brockwell and Richard A Davis. Time Series: Theory and Methods . Springer-Verlag New York, New York, USA, 2 edition, 1991. ISBN 978-0-387-97429-3. doi: 10.1007/978-1-4419-0320-4 . URL https://doi.org/10.1007/978-1-4419-0320-4 . · doi ↗
- 3Cadonna et al. [2017] A. Cadonna, A. Kottas, and R. Prado. Bayesian mixture modeling for spectral density estimation. Statistics & Probability Letters , 125:189–195, 2017. ISSN 0167-7152. doi: https://doi.org/10.1016/j.spl.2017.02.008 . URL http://www.sciencedirect.com/science/article/pii/S 0167715217300573 .
- 4Carter and Kohn [1997] C. K. Carter and R. Kohn. Semiparametric Bayesian inference for time series with mixed spectra. Journal of the Royal Statistical Society, Series B , 59(1):255–268, 1997. doi: 10.1111/1467-9868.00067 . URL https://doi.org/10.1111/1467-9868.00067 . · doi ↗
- 5Choudhuri et al. [2004] Nidhan Choudhuri, Subhashis Ghosal, and Anindya Roy. Bayesian estimation of the spectral density of a time series. Journal of the American Statistical Association , 99(468):1050–1059, 2004. doi: 10.1198/016214504000000557 . URL https://doi.org/10.1198/016214504000000557 . · doi ↗
- 6Edwards et al. [2019] Matthew Edwards, Renate Meyer, and Nelson Christensen. Bayesian nonparametric spectral density estimation using b-spline priors. Statistics and Computing , 29(1):67–78, 2019. ISSN 0960-3174. URL https://doi.org/10.1007/s 11222-017-9796-9 . · doi ↗
- 7Edwards et al. [2018] Matthew C. Edwards, Renate Meyer, and Nelson Christensen. bspline Psd: Bayesian nonparametric spectral density estimation using b-spline priors , 2018. URL https://CRAN.R-project.org/package=bspline Psd . R package version 0.6.0.
- 8Eilers and Marx [1996] Paul H. C. Eilers and Brian D. Marx. Flexible smoothing with b -splines and penalties. Statist. Sci. , 11(2):89–121, 05 1996. doi: 10.1214/ss/1038425655 . URL https://doi.org/10.1214/ss/1038425655 . · doi ↗
