On estimation of the effect lag of predictors and prediction in functional linear model
Haiyan Liu, Georgios Aivaliotis, Jeanine Houwing-Duistermaat

TL;DR
This paper introduces a functional linear model that predicts responses using multiple functional predictors, estimates their effect lags, and evaluates the model's properties and performance through simulations.
Contribution
It presents a novel method for estimating predictor effect lags in a functional linear model using basis expansions and penalized optimization.
Findings
Effective estimation of predictor effect lags demonstrated
Model shows strong predictive performance in simulations
Mathematical properties of estimators are established
Abstract
We propose a functional linear model to predict a response using multiple functional and longitudinal predictors and to estimate the effect lags of predictors. The coefficient functions are written as the expansion of a basis system (e.g. functional principal components, splines), and the coefficients of the fixed basis functions are estimated via optimizing a penalization criterion. Then time lags are determined by simultaneously searching on a prior grid mesh based on minimization of prediction error criterion. Moreover, mathematical properties of the estimated parameters and predicted responses are studied and performance of the method is evaluated by extensive simulations.
| n | 50 | 100 | 150 | 200 |
|---|---|---|---|---|
| NPE | 2.08 | 1.95 | 1.86 | 1.79 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Genetic and phenotypic traits in livestock · Genetic Mapping and Diversity in Plants and Animals
On estimation of the effect lag of predictors
and prediction in functional linear model
Haiyan Liu, Georgios Aivaliotis, Jeanine Houwing-Duistermaat
Department of Statistics
University of Leeds
Abstract
We propose a functional linear model to predict a response using multiple functional and longitudinal predictors and to estimate the effect lags of predictors. The coefficient functions are written as the expansion of a basis system (e.g. functional principal components, splines), and the coefficients of the fixed basis functions are estimated via optimizing a penalization criterion. Then time lags are determined by simultaneously searching on a prior grid mesh based on minimization of prediction error criterion. Moreover, mathematical properties of the estimated parameters and predicted responses are studied and performance of the method is evaluated by extensive simulations.
Keywords: lag functional linear model, functional principal component analysis, sparse and irregular functional data.
1 Introduction
Temporal (time stamped) data are collected both routinely and ad hoc for various processes related to human activities and the natural world. In its two extreme forms, this data can be sampled densely and regularly in time (we call this dense data) or can include only records obtained at irregular time intervals with few measurements (we call this sparse longitudinal data). Naturally, intermediate situations are also available. Examples of dense data are hourly pollution and climate measurements in a particular site, or financial time series. Sparse datasets can arise from medical data (e.g. visits to GP) and other ad hoc observations for example measurements on wild species to which access is not easy.
Relationships between temporal data are often not synchronous and involve a delay in the effects. For example, historical exposure to high temperatures might not have an effect on the growth of trees anymore after a certain period and it may also take some time before high temperatures result in lower growth rate. It might take some time to have an effect on a person’s health and similarly the effects might fade away after some time if the exposure to a factor seizes (e.g. stop smoking).
In this paper, we consider estimation and prediction in a functional regression model where the dense functional predictor trajectory and the sparse longitudinal predictor trajectory from certain intervals of past have effects on the sparse response trajectory. We estimate the intervals through the corresponding lags of the effect of predictors on response. In our motivating example, we estimate the influence of dense functional temperature on sparse longitudinal tree diameters. Moreover, we want to estimate the effect lags of temperature on tree diameter, i.e. from when the predictors have influence on the response and until when this influence disappears.
The classical function-on-function linear model reads as follows :
[TABLE]
where is the response trajectory, is the predictor trajectory, is the error process, is the intercept process, is the two-dimensional regression coefficient function which shows the influence of on . This model was first introduced by Ramsay and Dalzell (1991). For reviews of functional data analysis, see Ramsay and Silverman (2005), Horvath and Kokoszka(2012) and the references therein. Notice that in this model the entire predictor trajectory including the future values, i.e. when , is assumed to influence the current value of response trajectory at time . Clearly this is not appropriate in many applications.
As a result, the historical functional linear model has been investigated by Malfait and Ramsay (2003), Harezlak et al. (2007), Kim et al. (2009, 2011) where only the past of the predictor trajectory influences the response at the current time:
[TABLE]
where and () are the lags for the influence of predictor trajectory on response trajectory. For one dense functional predictor, Malfait and Ramsay (2003) considers the triangular basis expansion of the coefficient function which is estimated at each observation point. A penalized approach which allows varying lags for the historical functional linear model has been developed by Harezlak et al. (2007). Kim et al. (2011) consider the situation that both predictor process and response process are sparsely and irregularly observed. Pomann et al. (2016) has extended the historical functional linear model to multiple homogeneous predictors, and the response is influenced by the predictors from a fixed starting effect time to current time.
The contribution of this paper is multiple heterogeneous (sparse longitudinal or dense functional) predictors are included, time lags (both starting and end points) that are fixed but unknown are determined, the asymptotic properties of the estimators have been investigated. To be precise, this paper addresses the historical functional linear model with multiple heterogeneous predictors, and the response is influenced by predictors from a fixed starting effect time to a fixed ending effect time. We estimate the coefficient functions, the effect lags and predict the response. Moreover, the asymptotic behavior of the estimated coefficient functions, and the predicted response curve is investigated.
The paper is organized as follows. In section 2, the history function-on-function linear model for multiple heterogeneous predictors is introduced. In section 3, we consider the estimation of the coefficient functions and the uniform consistency of our estimators are established. In section 4, the prediction of the response trajectories is proposed and the asymptotic property of the predicted trajectories is established. The determination of the lags is proposed in section 5. Extensive numerical examples are considered in section 6 to show the finite properties of our proposed estimators. In section 7, the Amazonian rainforest dataset is analysed and the lags are determined. We finish the paper with conclusion and discussion.
2 Model
Suppose our observations are , and , where . For example, the response corresponds to the tree diameter for subject at time . The predictor corresponds to the temperature for subject at time . The predictor corresponds to the climatic water deficit for subject at time .
Let , and , are independent copies of underlying square-integrable random functions and over respectively. Without loss of generality, we assume and . We denote the covariance of and the covariance of . We assume that the first predictor curves are observed on a dense and regular grid of points . The observations are the discrete version of with iid mean-zero and variance-finite noise which are independent of . However, the second predictor curves are observed on a sparse and irregular grid of points . Also observations are the discrete version of with iid mean zero and variance-finite noise which are independent of . For the responses , they are observed on a sparse and irregular grid of points .
We define the lag historical functional linear model with two heterogeneous covariates and for the response as
[TABLE]
where , , , and are continuous two-dimensional coefficient functions, and are independent measurement errors with mean zero and finite variance . Errors are assumed to be independent of and .
Notice that (1) is equivalent to
[TABLE]
then the model (1) means that given the entire predictor curves and , the response for subject at time is only affected by the values of over time-window and by the values of over time-window . That is, is the starting effective time and is the ending effective time for to have effect on at time ; is the starting effective time and is the ending effective time for to have effect on at time . The coefficient functions and , weigh the values and over the time-windows and respectively. The coefficient functions and quantify the effect of and respectively on the response .
3 Estimation
Let and be two pre-specified functional bases on and . Then the two-dimensional coefficient functions and are assumed to be represented as
[TABLE]
and
[TABLE]
respectively, where and capture the resolution of the fit and should be chosen accordingly and and are the unknown time-varying coefficient functions defined on . As Kim et al. (2011) reported where only one sparse predictor was discussed, “the estimation is not sensitive to the choice of provided that there are enough number of basis functions used in the estimation, since the penalized solution (defined later in this session) prevents over-fitting”. Clearly, various basis functions such as Fourier, B-spline, wavelet basis can be used depending on the specific features of the coefficient functions. Since, we could not assume any prior on the coefficients and B-spline basis are computationally fast and have good properties, we use B-spline functions of degree 4 with 10 equally spaced interior knots over and (number of basis is 14). For details on B-spline basis, see for example Fan and Gijbels (1996) and Ramsay and Silverman (2005).
Plugging and into equation (1), we have
[TABLE]
where , , , , , and . Note the observed times depend on subject . Then model (1) reduces to a varying coefficient model with induced predictors and induced predictors .
At first, notice that implies , so can be estimated by smoothing via local smoothing method based on the pooled data, see for example Yao et al. (2005), Beran and Liu (2014) and Liu and Houwing-Duistermaat (2018). We denote by , where is an estimator of evaluated at time .
In order to derive the estimator of and , we assume only in this paragraph, i.e. the observation times for different subject are the same. We then estimate and by minimizing:
[TABLE]
where is the Euclidean norm of a vector, , and are the regularization parameters which are assumed to be constants for any time in order to reduce the high variability if they vary for each time. The penalization does not only prevent over-fitting but also guarantee the inverse of matrix while solving the minimization problem. Then the minimizer of (3) is
[TABLE]
where is the identity matrix and
[TABLE]
Therefore, by using the probability limits of the covariance structure, for arbitrary , we have
[TABLE]
where is a matrix with an estimator of , is a matrix with an estimator of , is a matrix with an estimator of , is a matrix with an estimator of , is a vector and is estimator of , and is a vector and is an estimator of .
To obtain the necessary quantities in (4), we consider the covariances:
- •
For , we have
[TABLE]
where is the covariance between and . Since predictor is densely observed, can be estimated by bivariate kernel smoothing, see Beran and Liu (2014):
[TABLE]
where is a bandwidth and is a bivariate kernel function.
- •
For , we have
[TABLE]
where is the covariance between and . Since predictor is sparsely observed, can be estimated by local linear surface smoother (Yao et al. 2015) which is defined through minimizing
[TABLE]
with respect to , where is a bandwidth and is a bivariate kernel function. And .
- •
For , we have
[TABLE]
where is the covariance between and . Since predictor is densely observed and is sparsely observed, can be estimated by local surface smoothing.
- •
For , it is similar to .
- •
For , we have
[TABLE]
where is the covariance between and . Since is densely observed and is sparsely observed, can be estimated by local linear surface smoothing.
- •
For , it is similar to .
Once and are obtained (for given lags ’s and regularization parameters ’s), we can estimate coefficient functions by
[TABLE]
and
[TABLE]
Theorem 1
Under assumptions in Beran and Liu (2014) and Yao et al. (2005a, 2005b), denote ,
[TABLE]
[TABLE]
**Proof: **Uniform consistency of is given in Theorem 4 of Beran and Liu (2014), uniform consistency of is given in Lemma 1 of Yao et al. (2005b), uniform consistency of is given in Theorem 1 of Yao et al. (2005a). Then the uniform consistency of can be obtained. Therefore the uniform consistency of and follows and thus that of and can be obtained.
4 Prediction
Suppose we observe a new discrete response curve , discrete dense predictor trajectory and discrete sparse predictor trajectory . From the original model (1), the predicted response curve is
[TABLE]
However, the lags and regularization parameters and have to be determined and the functional representation of the predictor trajectories and have to be recovered from data.
For , it can be easily recovered by kernel smoothing, since the sampling is dense.
However for , since the sampling is sparse and irregular, we use functional principal component analysis (FPCA). As discussed, we assume and . Denote the covariance of by , then the Mercer’s theorem gives the following spectral decomposition of the covariance
[TABLE]
where are eigenvalues and are orthonormal eigenfunctions. By KL expansion, can be represented as
[TABLE]
where are the functional principal component scores and are uncorrelated random variables with mean 0 and variance . In practice, is often truncated by only including the first several items, i.e.
[TABLE]
The covariance can be estimated as we discussed in last section and the eigenfunctions can be estimated following the spectral decomposition of the estimated covariance. However the scores cannot be approximated by numerical integration as we usually do for dense functional data. In fact, under the Gaussian assumption, denote , the best linear predictor for is (see Mardia et al. 1978, Yao et al. 2005 or see the application in Liu et al. 2018):
[TABLE]
where . Then the estimate of can be defined as
[TABLE]
The number of eigenfunctions can be selected to be the number of eigenfunctions that explain 95% of the functional covariance. Once obtaining the estimation of eigenfunctions , scores and , can be recovered as
[TABLE]
After plugging the functional representation of the predictor curves and into (5), we have
[TABLE]
Define
[TABLE]
and
[TABLE]
Theorem 2
Under assumptions in Beran and Liu (2014) and Yao et al. (2005a, 2005b), denote , for all , we have
[TABLE]
**Proof: **For fixed , we have
[TABLE]
For , from the uniform consistency of established in Theorem 1 and the uniform consistency of kernel smoother, we have as .
For , from the uniform consistency of established in Theorem 1, the uniform consistency of for from Theorem 3 in Yao et al. (2005a), and the uniform consistency of from Theorem 2 in Yao et al. (2005a), we have as .
For , following Lemma A.3 in Yao et al. (2005a), we have as .
Therefore, Theorem 2 follows.
5 Implementation
The final question is to estimate the time lag ’s which is of great importance in our application. For selecting ’s and ’s, we consider the Normalized Prediction Error (NPE) criterion and the -fold cross validation criterion. Specifically, NPE in this situation is defined as
[TABLE]
where is the predicted value for the th measurement on the th response trajectory obtained using ’s and ’s, . Divide the data into equal parts, for each , fit the model with parameter to the other parts, giving the estimation of coefficient functions, further giving the prediction in the th part, and then compute the prediction error in the th part. The -fold cross validation score is defined as,
[TABLE]
Similar criteria are considered in Kim et al. (2011) and Pomann et al. (2016).
Then ’s and ’s are chosen in a hierarchical manner. Let and be the sets of potential lags for the first and second predictor, i.e. and , respectively. Let be the sets of potential regularization parameters . Firstly, for a fixed point of , NPE values are calculated for all . Then the that achieves the smallest NPE value is chosen as the optimal for the given fixed point of lags . Secondly, The optimal is used for calculating the cross validation score for . At last, we repeat the above steps for all and the cross validation score for all can be obtained. Then, the optimal is chosen to be the one with the smallest cross validation score. Actually and are meshes in and are chosen empirically, is also chosen empirically.
6 Simulations
We study efficiency of the NPE criterion for selecting the time lags ’s and regularization parameters ’s.
For subjects, we first generate the response curve and two predictor curves and on a dense and equally spaced time points over , i.e. . The number of measurements made on the th response is randomly selected from 20 to 50, the number of measurements made on the th predictor is 100 and the number of measurements made on the th predictor is randomly selected from 30 to 50.
Define with and , with . We take the same time lags for both and , i.e. , . For coefficient functions, we take , , . The measurement errors are taken to be independent normal with signal to noise ratio 20 for the predictors and response.
Figure 1 shows the simulated data with .
The estimation is based on the B-spline (B-spline functions of degree 4 with 10 equally spaced interior knots over ) expansion of the coefficients. The number of functional principal components is chosen based on leave-one-curve cross validation criterion and 99% variation is kept. The penalized parameters and are chosen on the dense grid of . We use NPE criterion and 10-fold cross validation criterion to determine the regularization parameters and the lags. Notice that in order to check the estimation performance, the estimation procedure is done under the correct lags, i.e. , . Figure 2 shows the result of one simulation, where is chosen as , is chosen as and the corresponding NPE is . From Figure 2, we conclude that our model successfully reveals the structure of coefficient functions.
Table 1 shows the asymptotic properties of our estimation. For different number of observations , the NPE are shown and also the estimation is based on the correct lags. As we can see, the NPE decrease as the increases which is correspond to the Theorem 1.
For evaluating the performance of our model on selecting the effect lags, the s are determined based on the NPE criterion and the s are determined based on 10-fold cross-validation score. Since the true and , in order to save computational time, we fix the ending point i.e. and search the starting point . That is we have three combinations but there is only one correct combination. Our model has 65 correct choices out of 100 simulations.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Beran, J. and Liu , H. (2014). On estimation of mean and covariance functions in repeated time series with long-memory errors. Lithuanian Mathematical Journal , 54(1), 8-34.
- 2[2] Fan, J. and Gijbels, I. (1996). Local polynomial modeling and its applications . CRC Press.
- 3[3] Harezlak, J., Coull, B. A., Laird, N. M., Magari, S. R., and Christiani, D. C. (2007). Penalized solutions to functional regression problems. Computational statistics and data analysis , 51(10), 4911-4925.
- 4[4] Horvath, L. and Kokoszka, P. (2012). Inference for functional data with applications. Springer Science and Business Media.
- 5[5] Kim, K., Sentürk, D., and Li, R. (2011). Recent history functional linear models for sparse longitudinal data. Journal of statistical planning and inference , 141(4), 1554-1566.
- 6[6] Liu, H. and Houwing-Duistermaat, J. (2018). On trend and its derivative estimation in repeated unevenly spaced time series with long-range dependent errors. ar Xiv:1803.05411.
- 7[7] Liu, H., Del Galdo, F. and Houwing-Duistermaat, J. (2018). Functional principal component analysis in predicting Scleroderma disease based on patients historical data.
- 8[8] Lopez-Gonzalez, G., Lewis, S.L., Burkitt, M. and Phillips, O.L. (2011). Forest Plots.net: a web application and research tool to manage and analyse tropical forest plot data. Journal of Vegetation Science 22: 610–613. doi: 10.1111/j.1654-1103.2011.01312.x
