Robust functional ANOVA model with t-process
Chen Zhang, Zimu Chen, Zhanfeng Wang, Yaohua Wu

TL;DR
This paper introduces a robust functional ANOVA model using t-processes to improve estimation and prediction in the presence of outliers, with proven statistical properties and demonstrated effectiveness through simulations and real data.
Contribution
It develops a novel robust estimation method for functional ANOVA models utilizing t-processes, enhancing robustness and prediction accuracy.
Findings
Method performs well in simulations
Effective in real data applications
Ensures robustness and consistency
Abstract
Robust estimation approaches are of fundamental importance for statistical modelling. To reduce susceptibility to outliers, we propose a robust estimation procedure with t-process under functional ANOVA model. Besides common mean structure of the studied subjects, their personal characters are also informative, especially for prediction. We develop a prediction method to predict the individual effect. Statistical properties, such as robustness and information consistency, are studied. Numerical studies including simulation and real data examples show that the proposed method performs well.
| Data | TP | GP | TP() |
|---|---|---|---|
| Precipitation | 0.0416(0.0029) | 0.0488(0.0046) | 0.0490(0.0045) |
| Freezer | 0.1362(0.0060) | 0.1480(0.0057) | 0.1487(0.0055) |
| Hi-Fi stereo | 0.0583(0.0149) | 0.0641(0.0144) | 0.0645(0.0144) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Advanced Statistical Process Monitoring · Optimal Experimental Design Methods
Robust functional ANOVA model with t-process
Chen Zhang
Department of Statistics and Finance, University of Science and Technology of China, Hefei, China.
Zimu Chen
Zhanfeng Wang 111Corresponding author: Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China. E-mail: [email protected]
Yaohua Wu
Abstract
Robust estimation approaches are of fundamental importance for statistical modelling. To reduce susceptibility to outliers, we propose a robust estimation procedure with t-process under functional ANOVA model. Besides common mean structure of the studied subjects, their personal characters are also informative, especially for prediction. We develop a prediction method to predict the individual effect. Statistical properties, such as robustness and information consistency, are studied. Numerical studies including simulation and real data examples show that the proposed method performs well.
keywords:
Extended process, Information consistency, Robustness,
††journal: Computational Statistics & Data Analysis
1 Introduction
The analysis of variance (ANOVA) functional models are frequently applied to analyze data in many fields, such as econometrics, sociology and medical follow-up studies, such as application to the breast cancer data (Huang et al., 2000), the angles data of the hip over walking cycles (Rice and Wu, 2001), the vehicle road load data acquisition computer model (Dorin, 2010) and so on.
Functional ANOVA model is an adaptation of linear model or analysis of variance to analyze functional observations, which are also widely studied, such as Woods et al. (1996), Huang (1998), Brumback and Rice (1998) and therein reference. To the best of our knowledge, functional ANOVA models in literature rarely consider individual effect of the studied subject (each observed functional curve). However, for market penetration of new products (Ashish et al., 2009), data are collected from 86 countries in 7 regions. Market penetrations from different countries, such as the developing and developed countries, have dissimilar growth trend, because of discrepant economical levels. Hence, market penetration of each country has personal character besides the common mean structure. This paper not only considers the main structure of market penetration, but also focuses on individual effect of each country via a nonparametric random effect. Without loss of generality, we consider a one-way functional ANOVA concurrent regression model,
[TABLE]
where is the observed time, is a functional response for batch under level , is a grand function, measures the main effect of level , is a nonparametric random effect depicting personal character for the -th subject at level , is a functional covariate with dimension , and is an error term. For identification of , it needs
[TABLE]
When random effect and has Gaussian distribution, model (1.1) becomes the common one way functional ANOVA model. For example, Brumback and Rice (1998) and Huang et al. (2000), respectively, used smoothing spline model and B-splines method to build this functional ANOVA model; Rice and Wu (2001) studied this model based on unequally sampled noisy curves.
When follows Gaussian process (GP) with mean function 0 and kernel function , denoted by and has Gaussian distribution, we denote model (1.1) by Gaussian process ANOVA model, saying GP model. Section 5.5 in Shi and Choi (2011) mentions the GP model. However, the GP is sensitive to outliers. For robustness, this paper assumes has heavy-tail process, such as t-process. For theoretical studies and application of heavy-tail processes, please see Yu et al. (2007), Zhang and Yeung (2010), Shah et al. (2014) and Wang et al. (2017) and so on. To illustrate the proposed method, this paper employs an extended t-process (ETP) (Wang et al., 2017) to depict the error term. Our method can be directly extended to other t-process.
This paper combines t-process error with Gaussian process random effect in model (1.1), which leads to a robust functional ANOVA model. Independence of random effect and error term in this paper makes computation procedure more complicated compared to Wang et al. (2017) which uses a joint ETP process to build their model. We propose a computation method based on Gaussian approximation. Prediction of random effect is also an important for studying personal character. We develop a prediction method for functional ANOVA model. Statistical properties of the estimation method, such as robustness and information consistence, are investigated. Numerical studies including simulation and real examples show that the proposed method has robustness against outliers.
The remainder of this paper is organized as follows. Section 2 proposes a robust functional ANOVA model with t-process, and builds prediction and parameter estimation procedure. Asymptotic properties of estimation method are discussed in Section 3. Simulation studies and real examples are reported in Section 4, followed by the conclusion in Section 5. All the proofs are in Appendix.
2 Robust functional ANOVA model
Considering model (1.1), we assume that , where is a kernel function. Hereafter, for convenience of notations, let . For robustness to outlier, let be from the extended t-process with for (Wang et al., 2017), where evaluated values of at any data points , follows an extended multivariate t-distribution (EMTD) with the density function
[TABLE]
Denoted by , where the first and -th elements are 1 and the others are 0. Model (1.1) is rewritten as
[TABLE]
From this model, it follows that
[TABLE]
2.1 Prediction
Let be observed data for the th curve under level , and at the observed times . Denoted by . For a new point , is used to predict at point , denoted by . From model (2.2), we show that
[TABLE]
Since the conditional distribution of for given is involved in intractable integration, (2.3) is complicated to be computed. Gaussian approximation method is employed to calculate (2.3). It follows that
[TABLE]
where is the Gaussian approximation of (details seen in Section 2.2).
Since , we have
[TABLE]
where is the covariance matrix of , . It follows that is normal distribution with mean and variance . From Section 2.2, we have
[TABLE]
where and are defined in the next subsection.
It easily shows that has a normal density function which indicates that can be estimated by
[TABLE]
We know that
[TABLE]
Easily, we show
[TABLE]
Hence, is used to estimate the variance of .
2.2 Parameter estimation
To apply the proposed method, it is necessary to estimate the unknown kernel function , and the function parameter . For kernel function, a function family such as a squared exponential kernel and Matérn class kernel can be applied (Shi and Choi, 2011). This paper takes a combination of a square exponential kernel and a non-stationary linear kernel,
[TABLE]
where is hyper-parameter. We use B-spline to approximate the functional parameter , that is , where is a set of known basis functions and is an unknown coefficient matrix with dimension .
Let , with the first and -th elements are 1 and the others are 0. Then, we have
[TABLE]
where represents the Kronecker product. Then a marginal log-likelihood is
[TABLE]
However, the marginal log-likelihood can not be computed directly, because of intractable integration. To avoid the complicated integration, some methods, such as Laplace approximation and the Markov Chain Monte Carlo(MCMC) method, are used to approximate the marginal log-likelihood. But the error rate of the Laplace approximation may be as the dimension of increases with the sample size (Rue et al., 2009), and the MCMC method is too complicated to solve the problem conveniently, see Section 7.1.4 in Shi and Choi (2011). This paper uses a Gaussian approximation method to compute as follows
[TABLE]
where is the Gaussian approximation to the full conditional density and is the mode of the full conditional density for given , and . Let . We have
[TABLE]
From Taylor expansion, we approximate to the second order
[TABLE]
where
[TABLE]
Let From density function of EMTD, it gives
[TABLE]
Therefore,
[TABLE]
where and .
Next, for some initial value , we can use Fisher scoring algorithm to find the Gaussian approximation, and the th iteration is given by
(i) Find the solution from ;
(ii) Update and using and repeat (i).
After the process converges, say at , we can get the Gaussian approximation which is the density function of the normal distribution .
Considering smoothness of , we need to add a penalty into the log-likelihood function, denoted by
[TABLE]
where is a tuning parameter. This paper takes the penalized function,
[TABLE]
where and is a derivative operator. The form of depends on the property of function , such as for the second order continuously derivative function, , where the weights may be either constant or function , and mean the first and second derivative of .
From (2.6), (2.7) and (2.8),we obtain an estimation procedure for parameters , and . For initial values , and ,
(I) For given , and , obtain by the Fisher scoring algorithm above-mentioned.
(II) For given , we update estimates of , and by maximizing the penalized log-likelihood (2.7) with (2.6) and (2.8).
(III) Repeat (I)-(II) until some convergence criteria hold.
3 Asymptotic properties
3.1 Robustness
Influence function is an important index to measure the robustness of estimation method to outliers. It is known that Gaussian distribution to fit data tends to be susceptive to outliers and has unbounded influence function, while t-distribution has heavy-tail and bound influence function. This paper shows that the proposed functional ANOVA models with t-process error (TP model) is more robust than those with Gaussian process error (GP model), see the next theorem (its proof seen in Appendix A).
Theorem 1. *If kernel function is bounded for the -th curve under level , , and continuously differentiable on , then for given , the estimators and from the TP model have bound influence functions, while that from the GP model does not. *
3.2 Information consistency
The consistency related to the estimation approach for model (2.2) involves two aspects: the common mean parameter function and the random effect curve (or the curve ). The common mean function is estimated by collected data from all curves and its consistency is already proved in many different functional linear models under suitable conditions, see Yao et al. (2005), Ramsay and Silverman (1997). Therefore, this paper focuses on the second issue, the information consistency of to . Information consistency of the estimation method is also an important issue. For Gaussian process regression model and the extended t-process regression model, this properties are obtained in Seeger et al. (2008), Wang and Shi (2014) and Wang et al. (2017). Here, we show this property for model (2.2).
Without loss of generality, suppose the mean function about the th curve under level , denoted by , has already been known. Let covariate values where are independently drawn from a distribution . Let be a measure induced by the process on space
Denote
[TABLE]
[TABLE]
where is the Bayesian predictive distribution of based on the t-process functional ANOVA model, and is based on the true underlying function of . It is said that achieves information consistency if
[TABLE]
where denotes the expectation under the distribution of and is the Kullback-Leibler divergence between functions and , that is,
[TABLE]
Theorem 2. Under the appropriate conditions in Lemma 1 and condition (A) in Appendix B, we have for
[TABLE]
*where the expectation is taken over the distribution . *
The proof of Theorem 2 is in Appendix B.
4 Numerical studies
4.1 Simulation studies
Numerical simulations are constructed to evaluate the robustness of the t-process functional ANOVA model (2.2) (TP). The performance of TP is investigated by comparison of model (1.1) with error term having Gaussian process (GP). In addition, we also compute model (2.2) ignoring the random effect , denoted by TP().
Simulation data are generated from three different models:
Model 1: The ANOVA model (1.1) with and , where , , and , and , .
- 2.
Model 2: , the other setups are the same as Model 1.
- 3.
Model 3: , the other setups are the same as Model 1.
Model 1 is the true model for the TP method, while Model 2 is the true model for the GP method. Model 3 does not consider the random effect. Let observed time takes 61 points evenly spaced in [0,2]. Take sample points as training data with and , the left are test data. Furthermore, to illustrate the performance of robustness, we randomly choose one point from the training data and disturb it by adding an extra error: a constant error 2 or random errors: and , where is normal distribution with mean 0 and variance 2, and is student distribution with degree of freedom 3. All simulation are repeated 500 times.
Prediction errors (PE) and mean squared errors (MSE) of prediction from the TP, GP and TP() methods are presented in Tables 1 - 3 for Model 1 - 3, respectively. From these tables, it follows that the TP method has the smallest PE and MSE, while the GP has comparable results with the TP(). It shows that the TP has more robustness against outliers than the GP, and better prediction than the TP().
4.2 Real data examples
The robust functional ANOVA model is applied to two datasets, Canada weather data and market penetration of new products. Canadian weather data includes temperature and precipitation of weather at 35 Canadian weather observation stations, which is available in the R-package fda (Ramsay et al., 2011). These stations are divided into 4 regions: Arctic, Atlantic, Continental and Pacific. We study the week-average precipitation for 4 regions, where each region is a level and the week-average temperature is variable for random effect . For the market penetration data (Ashish et al., 2009), we take the market penetration of Hi-Fi stereo and freezer as examples to illustrate our method. There are 86 countries in 7 continents from 1977 to 2015. Due to the missing data, we finally take data of 36 countries from 1983 to 2012 for the Hi-Fi stereo, and 35 countries from 1986 to 2015 for the freezer. With respect to relation with the Hi-Fi stereo and freezer, for random effect , we take the penetration of personal computer as variable for the Hi-Fi stereo, and that of microwave oven for the freezer. Because 36 countries come from seven regions: Asia Pacific, Australasia, Eastern Europe, Latin America, Middle East and Africa, North America and Western Europe. So, the number of levels is 7.
Figures 1, 2 and 3 present the main effects and examples of the personal effect of some subjects for the weather data, the market penetration of Hi-Fi stereo and freezer, respectively. We can see that besides different main effects for the studied levels, each subject (weather station or country) also has different personal character.
For these two datasets, we respectively randomly choose 30 and 20 time points as training data and the rest as test data. Three estimation methods, TP, GP and TP(), are applied to fit the training data and then predict the test data. These procedures are repeated for 500 times. Table 4 shows prediction errors from TP, GP and TP(). We can see the TP has the smallest prediction errors, while GP and TP() have the comparable ones.
5 Conclusion
The functional ANOVA model with Gaussian process offers a model for data with multi-dimensional covariances, and the specification of covariance kernels gives the ability to analyze a wide class of different type data to the model. However, the model with Gaussian process is not robust to outliers. Some heavy-tail processes can be used to overcome this shortcoming. This paper uses Gaussian process and t process to fit the personal random effect and error term, respectively, which results in a robust functional ANOVA model with t-process. Robustness property and information consistency of the proposed method are obtained. Numerical examples show that the robust functional ANOVA model has more robustness than that with Gaussian process.
Furthermore, it is a natural way to use t process to fit the random effect instead of Gaussian process, which leads to both of the personal random effect and error term having t processes. This extension makes estimation procedure more complicated because of more intractable integration.
Appendix A Robustness
**Proof of Theorem 1:
**From the log-likelihood function defined in (2.7), we get the score functions of ,, as follows,
[TABLE]
[TABLE]
where for ,
[TABLE]
[TABLE]
So, we can get that when for -th object under level , the score functions of , and are bounded, while that from the GPR model tend to .
Let be an estimate of (in this subsection let ), in which is the empirical distribution of and is a functional on some subset of all distributions. Let , according to Hampel et al. (1986), influence function of at is defined as
[TABLE]
where put mass 1 on point and 0 on others.
Following Hampel et al. (1986) , we can get that estimator of has the influence function
[TABLE]
The matrix is bounded according to , therefore the influence functions of are bounded under the propsoed functional ANOVA model. However, those under the GPR model are unbounded.
Appendix B Information consistency
Lemma 1**.**
Suppose are independent samples from model (2.2) and has a Gaussian process with zero mean and bounded kernel function . Let be continuous in and be one consistent estimator of . Then we have
[TABLE]
where is the reproducing kernel Hilbert space(RKHS) norm of associated with , is covariance matrix of over , is the identity matrix and are some positive constants.
Proof: Let is the reproducing kernel Hilbert space(RKHS) associated with , and is the span of , that is, . First, we assume that , then we can express as
[TABLE]
where and . With the property of RKHS, we can get , and .
By the Fenchel-Legendre duality relationship, we can get
[TABLE]
where is the measure induced by , therefore its finite dimensional distribution at is , where is defined in the same way as but with being replaced by its estimator , and is the posterior distribution of with a prior distribution and normal likelihood , where .
Therefore, by the Gaussian process regression, the posterior distribution of is , where . Then we have
[TABLE]
In addition, we show that
[TABLE]
and
[TABLE]
Hence, it follow from (B.1), (B.2), (B.3) and (B) that
[TABLE]
Since the covariance function is bounded and the covariance function is continous in and , we have as . Hence, there exist positive constants and such that for large enough, and which plugs in (B), there exist positive constant , we have the inequality
[TABLE]
From the Representer Theorem, it can obtain
[TABLE]
for all . The proof is over.
To prove Theorem 2, it need condition (A) is bounded and .
**Proof of Theorem 2.
**From the definition of the information consistency, we get that
[TABLE]
From Lemma 1, we obtain that
[TABLE]
where and are two positive constants. Because is bounded and expected regret term , Theorem 2 holds.
Reference
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Ashish et al. (2009) Ashish, S., Gareth, M. J., and Gerard, J. T. (2009). Functional regression: a new model for predicting market penetration of new products. Marketing Science 28 , pp. 36–51.
- 2Brumback and Rice (1998) Brumback, B., and Rice, J. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of the American Statistical Association 93 , pp. 961–976.
- 3Dorin (2010) Dorin, D. (2010). Functional ANOVA in Computer Models With Time Series Output. Technometrics 52 , pp. 430–437.
- 4Gray (1992) Gray, R. J. (1992). Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. Journal of the American Statistical Association 87 , pp. 942–951.
- 5Hampel et al. (1986) Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., and Stahel, W. A. (1986). Robust Statistics: The Approach Based on Influence Functions. Wiley.
- 6Huang (1998) Huang, J. Z. (1998). Projection estimation in multiple regression with application to functional ANOVA models. Annals of Statistics 26 , pp. 242–272.
- 7Huang et al. (2000) Huang, J., Kooperberg, C., Stone, C. J., and Truong, Y. K. (2000). Functional ANOVA models for proportional hazards regression. Annals of Statistics 28 , pp. 961–999.
- 8Kawaguchi et al. (2008) Kawaguchi, A., Yonemoto, K., Tanizaki, Y., Kiyohara, Y., Yanagawa, T., and Truong, Y. K. (2008). Application of functional ANOVA models for hazard regression to the Hisayama data. Statistics in Medicine 27 , pp. 3515–3527.
