Estimating variances in time series linear regression models using empirical BLUPs and convex optimization
Martina Han\v{c}ov\'a, Gabriela Voz\'arikov\'a, Andrej Gajdo\v{s},, Jozef Han\v{c}

TL;DR
This paper introduces a novel two-stage method for estimating variance components in time series linear mixed models using empirical BLUPs and convex optimization, improving computational efficiency and theoretical understanding.
Contribution
It develops a new estimation approach for variance components in FDSLRMs, establishing theoretical existence, equivalence of estimators, and a fast, accurate algorithm for maximum likelihood estimation.
Findings
The method provides invariant, non-negative quadratic estimators applicable to any absolutely continuous distribution.
A new $ ext{O}(n)$ algorithm for (RE)MLE computation is 10^7 times more accurate and 100 times faster than existing packages.
Validated on real datasets: electricity, tourism, and cybersecurity, demonstrating practical effectiveness.
Abstract
We propose a two-stage estimation method of variance components in time series models known as FDSLRMs, whose observations can be described by a linear mixed model (LMM). We based estimating variances, fundamental quantities in a time series forecasting approach called kriging, on the empirical (plug-in) best linear unbiased predictions of unobservable random components in FDSLRM. The method, providing invariant non-negative quadratic estimators, can be used for any absolutely continuous probability distribution of time series data. As a result of applying the convex optimization and the LMM methodology, we resolved two problems theoretical existence and equivalence between least squares estimators, non-negative (M)DOOLSE, and maximum likelihood estimators, (RE)MLE, as possible starting points of our method and a practical lack of computational implementation for FDSLRM. As for…
| characteristic | NE estimators | BLUP-NE estimators |
|---|---|---|
| least squares method in FDSLRM | maximum likelihood method in FDSLRM |
|---|---|
| double ordinary least squares estimators | maximum likelihood estimators |
| (DOOLSE) ref. stulajter_estimation_2004 , stulajter_predictions_2002 | (MLE) ref. stulajter_estimation_2004 , stulajter_predictions_2002 |
| modified (unbiased) DOOLSE | restricted (residual) MLE |
| (MDOOLSE) ref. stulajter_estimation_2004 | (REMLE) ref. stulajter_estimation_2004 |
| natural estimators (NE) ref. hancova_natural_2008 |
| Input: Form the matrix , the vector from vectors and . |
| For each auxiliary vector do: 1. Set the KKT-conditions matrix : ; For do: If then . 2. Calculate the auxiliary vector : . 3. Test non-negativity of : If then quit. |
| Output: Use the last to form NN-(M)DOOLSE of : |
| ; |
| For do: If then . |
| electricity consumption - toy model 1 | ||||
|---|---|---|---|---|
| method | estimate | EBLUP-NE | ||
| NE | ||||
| MLE | ||||
| REMLE | ||||
| electricity consumption - toy model 2 | ||||
| method | estimate | EBLUP-NE | ||
| NE | ||||
| MLE | ||||
| REMLE | ||||
| tourism | ||||
| NE | ||||
| MLE | ||||
| REMLE | ||||
| cyber attacks | ||||
| NE | ||||
| MLE | ||||
| REMLE | ||||
| acronym | explanation | specification |
|---|---|---|
| BLUE | best linear unbiased estimator | (3.2), p. 3.2 |
| BLUP | best linear unbiased predictor | (3.2), p. 3.2 |
| BLUP-NE | natural estimators based on BLUP | (3.1), p. 3.1 |
| CVXPY | Python library for convex optimization | p. 4.2 |
| CVXR | R version of CVXPY | p. 4.2 |
| DOOLSE |
double ordinary least squares estimator
without non-negativity constraints |
(4.15), p. 4.15 |
| EBLUP | empirical (plug-in) BLUP | p. 4.1 |
| EBLUP-NE | natural estimators based on EBLUP | p. 4.1 |
| FDSLRM |
finite discrete spectrum
linear regression model |
(1.1), p. 1.1 |
| KKT | Karush-Kuhn-Tucker | p. 4.2 |
| LMM | linear mixed model | p. 2.1 |
| LRM | linear regression model | p. 1 |
| MDOOLSE |
modified (unbiased) DOOLSE
without non-negativity constraints |
(4.15), p. 4.15 |
| (M)DOOLSE | considering both DOOLSE and MDOOLSE | Remark 5, p. 5 |
| MLE | maximum likelihood estimators | (4.25), p. 4.25 |
| MME | Henderson’s mixed model equations | (3.2), p. 3.2 |
| MMEinR | R version of MATLAB function mixed | p. 4.3 |
| MSE | mean squared error | p. 1 |
| NE | natural estimators | p. 3.1, (4.1), p. 4.1 |
| nlme |
R package for (non) linear
mixed(-effects) models |
p. 4.3 |
| NN-DOOLSE | non-negative DOOLSE | (4.10), p. 4.10 |
| NN-MDOOLSE | non-negative modified DOOLSE | (4.14), p. 4.14 |
| NN-(M)DOOLSE |
considering both NN-DOOLSE
and NN-MDOOLSE |
Remark 5, p. 5 |
| OLS | ordinary least squares | p. 4.1 |
| REMLE |
residual (restricted)
maximum likelihood estimator |
(4.29), p. 4.29 |
| (RE)MLE | considering both MLE and REMLE | Remark 5, p. 5 |
| SageMath | free Python-based mathematics software | p. 2 |
| sommer |
R package for multivariate LMMs
solving MME |
p. 4.3 |
| SciPy |
Scientific Python,
Python-based ecosystem of open software |
p. 2 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference · Gaussian Processes and Bayesian Inference
∎
11institutetext: M. Hančová 22institutetext: G. Vozáriková 33institutetext: A. Gajdoš 44institutetext: Institute of Mathematics, Pavol Jozef Šafárik University, Košice, Slovakia
Tel.: +421-55-2342213, Fax: +421-55-6222124
44email: [email protected] 55institutetext: J. Hanč66institutetext: Institute of Physics, Pavol Jozef Šafárik University, Košice, Slovakia
Estimating variances in time series
linear regression models using empirical BLUPs
and convex optimization ††thanks: This work was supported by the Slovak Research and Development Agency under the contract No. APVV-17-0568, the Scientific Grant Agency of the Slovak Republic (VEGA), VEGA grant No.1/0311/18 and the Internal Research Grant System of Faculty of Science, P. J. Šafárik University in Košice (VVGS PF UPJŠ) — project VVGS-PF-2018-792.
Martina Hančová
Gabriela Vozáriková
Andrej Gajdoš
Jozef Hanč
(Received: date / Accepted: date)
Abstract
We propose a two-stage estimation method of variance components in time series models known as FDSLRMs, whose observations can be described by a linear mixed model (LMM). We based estimating variances, fundamental quantities in a time series forecasting approach called kriging, on the empirical (plug-in) best linear unbiased predictions of unobservable random components in FDSLRM. The method, providing invariant non-negative quadratic estimators, can be used for any absolutely continuous probability distribution of time series data. As a result of applying the convex optimization and the LMM methodology, we resolved two problems — theoretical existence and equivalence between least squares estimators, non-negative (M)DOOLSE, and maximum likelihood estimators, (RE)MLE, as possible starting points of our method and a practical lack of computational implementation for FDSLRM. As for computing (RE)MLE in the case of observed time series values, we also discovered a new algorithm of order , which at the default precision is times more accurate and times faster than the best current Python(or R)-based computational packages, namely CVXPY, CVXR, nlme, sommer and mixed. We illustrate our results on three real data sets — electricity consumption, tourism and cyber security — which are easily available, reproducible, sharable and modifiable in the form of interactive Jupyter notebooks.
Keywords:
finite discrete spectrum linear regression model linear mixed model double ordinary least squares estimators maximum likelihood estimators efficient computational algorithms
MSC:
62M10 62J12 91B84 90C25
1 Introduction
The need to obtain sufficiently accurate predictions for facilitating and improving decision making becomes an integral part not only of science, industry or economy but also in many other human activities. Our recent article gajdos_kriging_2017 summarizes the guiding methodology and corresponding references dealing with kriging for time series econometric forecasting as one of the advanced alternative approaches to the most popular Box-Jenkins methodology (box_time_2015 , shumway_time_2017 ).
The key idea of the time series kriging is to model the given time series data in an appropriate general class of linear regression models (LRMs) and subsequently finding the best linear unbiased predictor — the BLUP which minimizes the mean squared error (MSE) of prediction among all linear unbiased predictors, see e.g. stulajter_predictions_2002 , kedem_regression_2005 , christensen_plane_2011 , brockwell_introduction_2016 .
In the frame of kriging, we investigate theoretical features and econometric applications of a class of time series models called finite discrete spectrum linear regression models or shortly FDSLRMs. The FDSLRM class was introduced in 2002-2003 by Štulajter (stulajter_predictions_2002 , stulajter_mse_2003 ) as a direct extension of classical (ordinary) regression models (see e.g. christensen_plane_2011 , hyndman_forecasting:_2018 ).
The FDSLRM has mean values (trend) given by linear regression and random components (error terms) are represented as a sum of a linear combination of uncorrelated zero-mean random variables and white noise which can be interpreted in terms of the finite discrete spectrum priestley_spectral_2004 .
Formally the FDSLRM can be presented as
[TABLE]
where
representing the time domain is a countable subset of the real line ,
and are some fixed non-negative integers, i.e. ,
is a vector of regression parameters,
is an unobservable random vector with zero mean vector , and with an diagonal covariance matrix , where are non-negative real numbers,
and are real functions defined on ,
stands for white noise uncorrelated with and having a positive dispersion .
Typically, due to the nature of time series data collection, the most frequently considered time domain is the set of natural numbers . FDSLRM variance parameters, which are fundamental quantities in kriging, are commonly described by one vector , an element of the parametric space or for short.
The principal goal of our paper is to present an alternative estimation method for FDSLRM variances, one of the time series kriging steps gajdos_kriging_2017 . Our approach represents an improved two-stage modification of the previously developed method of natural estimators (NE) hancova_natural_2008 which is now based on the idea of empirical (plug-in) BLUPs. We will refer to our new method as EBLUP-NE for short.
The structure of the paper is organized as follows. Section 2 includes several important notes about theoretical methods and computational tools for FDSLRM kriging offered by closely connected mathematical branches and computational research. This background became the strong basis for our research.
Section 3 describes the first stage of our EBLUP-NE method. It develops the definition and computational form of EBLUP-NE at known variance parameters using two special matrices known in linear algebra as the Schur complement and Gram matrix zhang_schur_2005 . The method requires very minimal distributional assumptions on a finite FDLSRM observation . It must have only an absolutely continuous probability distribution with respect to some -finite measure. In section 3 we also derive the basic statistical properties of EBLUP-NE at known .
Section 4 is devoted to the second stage of EBLUP-NE method which consists in replacing true, in practice unknown variance parameters by another, appropriate FDSLRM estimates of obtained by maximum likelihood or least squares gajdos_kriging_2017 . Since the theory and computational aspects of these estimation procedures in FDSLRM fitting based on the projection theory in Hilbert spaces are more than 10 years old (stulajter_predictions_2002 , stulajter_estimation_2004 , hancova_natural_2008 ), we revisit and update them in the light of recent advances in closely related linear mixed modeling and convex optimization.
In the following fifth section, we illustrate theoretical results of the paper and the performance of EBLUP-NE on three real time series data sets — electricity consumption, tourism and cyber security. Section 6 presents the conclusions of the paper. For the sake of paper readability, we moved very technical details and proofs to the Appendix. In the Appendix we also report a list of acronyms and abbreviations used in the paper (tab. 5).
Since the paper is aimed at statisticians, analysts, econometricians and data scientists applying time series and forecasting methods, our notation is standard for time series analysis and prediction using linear regression models (stulajter_predictions_2002 , kedem_regression_2005 , brockwell_time_2009 ). Sets like are labeled as it is common in convex optimization boyd_convex_2009 .
2 Theoretical and computational bases for FDSLRM kriging
Let us recall some important notes about time series FDSLRM kriging and its connection to other math branches and current computational technology.
From a modeling point of view, if we consider any time series model in the additive form whose mean value function is a real function on expressible by a functional series (e.g. Taylor or Fourier series) and error term is a mean-zero stationary process, then according to the spectral representation theory of time series (priestley_spectral_2004 ,percival_spectral_2009 ) can be approximated arbitrarily closely by FDSLRM. Therefore, from a practice perspective, FDSLRMs can be potentially applied in many practical situations.
In econometric applications of FDSLRM kriging, we almost always have only one realization of time series and we do not know the mean-value parameters nor the variance parameters . From the view of time series analysis, the problem of estimating variances belongs to the class of problems concerning covariance matrix estimation with one realization wu_covariance_2012 .
In addition, any finite FDSLRM observation satisfies a special type of linear mixed model (LMM) of the form
[TABLE]
where design matrices ; ; and random vector is a finite -dimensional white noise observation.
The fundamental property (2.1) of FDSLRM allows us to apply convenient LMM mathematical techniques for FDSLRM fitting and forecasting (pinheiro_mixed-effects_2009 , christensen_plane_2011 , witkovsky_estimation_2012 , demidenko_mixed_2013 , rao_small_2015 , covarrubias-pazaran_genome-assisted_2016 , singer_graphical_2017 ) together with up-to-date LMM software packages — nlme, sommer written in R (pinheiro_nlme:_2018 , covarrubias-pazaran_sommer:_2019 ), MATLAB function mixed witkovsky_mixed_2018 or SAS package Proc MIXED sasinstituteinc._sas/stat_2018 . As the first practical consequence of the LMM theory, to have identifiable FDSRLMs under assumptions of multivariate normal distribution (demidenko_mixed_2013, , sec. 3.2, thm 11), in the paper, we will assume for the block matrix and for the number of data in , a realization of in real situations, the following sufficient condition
[TABLE]
On the other hand, the standard maximum likelihood or least squares estimates of in FDSLRM are optimization problems. Our motivation to explore estimating FDSLRM variances in the frame of convex optimization lays on the fact that its mathematical tools and efficient, very reliable computational interior-point methods became important or fundamental tools in many other branches of mathematics boyd_convex_2009 like the design of experiments, high-dimensional data, machine learning or data mining.
Inspired by essential and recent works on convex optimization (boyd_convex_2009 , bertsekas_convex_2009 , koenker_convex_2014 , cornuejols_optimization_2018 , agrawal_rewriting_2018 ) we prove new theoretical relations among existing FDSLRM estimators in the extended parametric space for the orthogonal version of FDSLRM, most used in real time series applications. Moreover, we show how to apply the latest convex optimization packages CVXPY and CVXR based on disciplined convex programming, written in Python diamond_cvxpy:_2016 and R fu_cvxr:_2019 , for estimating . Finally, we also focus on the development of a new fast and accurate computational optimization algorithm for estimating variances in FDSLRM.
As for computational technology, our time series calculations are carried out using free open source software based on the R statistical language and packages (rdevelopmentcoreteam_r:_2019 , mcleod_time_2012 ), the Scientific Python with the SciPy ecosystem (jones_scipy:_2001 , oliphant_python_2007 ) and free Python-based mathematics software SageMath (stein_sage_2019 , hogben_sage_2013 , zimmermann_computational_2018 ), an open source alternative to the well-known commercial computer algebra systems Mathematica or Maple. The simultaneous use of R, SciPy and SageMath provides us valuable cross-checking of our computational results.
All our computational algorithms and results are easily readable, sharable, reproducible, and modifiable thanks to open-source Jupyter technology kluyver_jupyter_2016 . In particular, we present our results in the form of Jupyter notebooks, dynamic HTLM documents integrating prose, code and results similarly as Mathematica notebooks, stored in our free available GitHub repository (gajdos_fdslrm_2019 , adar_coding_2014 ).
The Jupyter notebooks, detailed records of our computing with explaining narratives, can be seen or studied as static HTML pages via Jupyter nbviewer (https://nbviewer.jupyter.org/, kluyver_jupyter_2016 ) or interactively as live HTML documents using Binder (https://mybinder.org/, projectjupyter_binder_2018 ) where the code is executable. The way and presentation of our computing with real data are inspired by works brieulle_computing_2019 , weiss_scientific_2017 .
3 The BLUP estimation method for
3.1 Definition and computational form of estimators
In the paper hancova_natural_2008 , we proposed the method of always non-negative natural estimators (NE) of FDSLRM variances . The main idea behind NE came from the fact that . Therefore, if random vector in (1.1) was known, the natural estimate of would be just . This initial consideration was identical with Rao’s estimates known as MINQUE ((rao_estimation_1988, , sec. 5.1), (christensen_plane_2011, , sec. 12.7)).
To get sufficiently simple, explicit analytic expressions available for further theoretical study, we predicted the random vector by the ordinary least squares method leading to the following linear predictor of based on
[TABLE]
Matrix represents (hancova_natural_2008 ) the orthogonal projector onto the orthogonal complement of the column space of and matrix ( means positive definiteness) is known in linear algebra and statistics (zhang_schur_2005, , chap. 6) as the Schur complement of in the block matrix .
However, from perspective of the general theory of the best linear unbiased prediction in LMM (see e.g. (christensen_plane_2011, , chap. 12)), we could consider natural estimators of in FDSLRM based on the BLUP of . Then intuitively, it seems reasonable to assume that the BLUP as the unbiased linear predictor with minimal MSE could lead to better estimators of . This intuition will be confirmed by our following derivations.
The previous heuristic consideration motivates the following definition. Let us consider FDSLRM (1.1) and its observation (2.1). Then estimators in the form
[TABLE]
are called natural estimators of based on the BLUP of or BLUP-NE for short.
Remark 1
Following the analogy with original NE in hancova_natural_2008 , for the first component of we could take as a BLUP based natural estimator the sum of squares of white noise residuals divided by the number of degrees of freedom in FDSLRM
[TABLE]
where is the best linear unbiased estimator (BLUE) of . In the LMM framework, FDSLRM white noise residuals are also called conditional residuals singer_graphical_2017 .
But now our intuition fails. The computational form and properties of depend on all variance components . That dependency also appears in the proposed estimator which causes its biasedness even in a simpler special case of FDSLRM (orthogonality condition, see (3.9)). Therefore, further we do not pay any special attention to its form and properties. As a natural estimator of we will keep the original unbiased invariant quadratic NE of , eq. (2.2) in hancova_natural_2008 .
The BLUP of together with the BLUE of can be obtained from celebrated Henderson’s mixed model equations, MME for short (for MME in the current framework of LMM see e.g. (christensen_plane_2011, , sec. 12.3) or (witkovsky_estimation_2012, , sec. 2)). MME have the following form in the case of FDSLRM
[TABLE]
where and is the Gram matrix ((boyd_introduction_2018, , sec. 10.1)) for columns of design matrix .
Under the assumption of the full column rank of matrix , when FDSRLM is identifiable, and using the so-called Banachiewicz inversion formula for the inverse of a partitioned (block) matrix (see e.g. (zhang_schur_2005, , sec. 6.0.2) or (hancova_natural_2008, , sec. 2.1)), we can easily prove the existence and the following form of the inverse to the block matrix in (3.2)
[TABLE]
where is again the Schur complement of but now in the block matrix . We also call matrix more specifically as the extended Schur complement determined by variances .
Substituting the last result for the inverse into MME (3.2) and rearranging, we get for (symbols denote blocks not needed in deriving)
[TABLE]
[TABLE]
Denoting matrix as , we just derived the following computational form of BLUP-NE estimators.
Proposition 1 (the computational form of BLUP-NE)
Let us consider the following LMM model for FDSLRM observation
[TABLE]
Then BLUP-NE estimators of parameters are given by
[TABLE]
*where are rows of matrix and with .
Remark 2
It is straightforward to extend our computational form to the case when some variances are zero or in other words when matrix is singular. For a singular , MME (3.2) have the alternative form witkovsky_estimation_2012
[TABLE]
where and .
Applying the same argument with the Banachiewicz inversion formula, we get the second version of the computational form of determining
[TABLE]
As opposed to , matrices always exist under identifiability assumptions of our FDSLRM. Simultaneously both of are also always invertible, since both of are nonzero for any in our FDSLRM as it is shown in the Appendix. In the case of a nonsingular the mentioned matrices are connected via following relationships
[TABLE]
The version (3.5) of MME is also preferred in numerical calculations witkovsky_estimation_2012 , since it can handle not only a singular but also a very ill-conditioned appearing when has very small positive components .
It is worth to mention that originally MME were derived under the normality assumptions (see the original work henderson_estimation_1959 or witkovsky_estimation_2012 ), but from the viewpoint of least squares both versions (3.2), (3.5) of MME describe the BLUP and BLUE ((christensen_plane_2011, , sec. 12.3)) with no need to restrict distributions of and to be normal.
3.2 Statistical properties at known variance parameters
The derivation of theoretical properties of BLUP-NE estimates under the assumption of known , regarding the first and second order moment characteristics, can be significantly facilitated by the following lemma describing the properties of matrix determining the BLUP in (3.3).
Its proof can be accomplished in a similar way as the proof of Lemma 3.1 in hancova_natural_2008 by a direct routine computation employing formulas (3.7), , , properties of orthogonal projectors (orthogonality, idempotency, symmetry) and Schur complements (symmetry, positive definiteness).
Lemma 1 (basic properties of )
** 2.
* and * 3.
**
The computational forms (3.4) of BLUP-NE are also quadratic forms of . In addition, result (2) implies that . Such a condition leads to the conclusion that BLUP natural estimators are translation invariant or shortly invariant quadratic estimators (stulajter_predictions_2002, , sec. 1.5) or (hancova_natural_2008, , sec. 3.1). The following theorem summarizes theoretical properties of BLUP-NE.
Theorem 3.1 (statistical properties of )
Natural estimators of based on the BLUP of are invariant quadratic estimators having the following properties
* *
If , then 2.
* * 3.
* ,* 4.
* *
Proof
See the Appendix.
Remark 3
If some components in are zero, there is a need to use expression instead of . The first property, biasedness of , is in full accordance with the Ghosh theorem (ghosh_nonexistence_1996 , gajdos_kriging_2017 ) about the incompatibility between simultaneous non-negativity and unbiasedness of estimators for variances of random components in LMM. As for bias, defined as , we have the following expression given by the extended Schur complement
[TABLE]
Now we can theoretically compare quality of the BLUP natural estimators with original natural estimators (NE) proposed in our previous paper hancova_natural_2008 . As we can see from summarizing tab. 1, their quality is determined by Schur complements and where both of them are symmetric and positive definite matrices.
Using elementary properties of the Löwner partial ordering (puntanen_formulas_2013, , chap. 24), we have immediately relations which imply a smaller absolute value of bias, dispersion and MSE for BLUP-NE in comparison with original NE . This conclusion directly supports our heuristic idea leading to the BLUP-NE definition (3.1).
Orthogonal FDSLRM
In real time series analysis using FDSLRMs, the fundamental modeling procedure is based on spectral analysis of time series (priestley_spectral_2004 , brockwell_time_2009 ). We identify the significant Fourier frequencies by periodogram (stulajter_estimation_2004 , gajdos_kriging_2017 ) which restrict the form of design matrices an leading to the orthogonality condition for FDSLRM stulajter_estimation_2004
[TABLE]
where is j-th column of . Such a FDSLRM, satisfying condition (3.9), is called orthogonal stulajter_estimation_2004 . So in practice, we mainly work with the orthogonal version of FDSLRM.
Under the condition of orthogonality (3.9), we can write for matrices , , , and BLUP-NE
[TABLE]
where we introduced .
Applying these results, we get the following direct corollary of Theorem 3.1 for any orthogonal FDSLRM.
Corollary 1 (properties of in orthogonal FDSLRM)
In an orthogonal FDSLRM natural estimators based on the BLUP have the following properties
; where .
If , then 2.
; 3.
; , 4.
;
Finally, if we look at original natural estimators from our paper hancova_natural_2008 , then in orthogonal FDSLRM it can be easily shown that . If we introduce , then we obtain the complete orthogonal version of (3.4) for computing BLUP-NE :
[TABLE]
4 Mathematical and computational tools of convex optimization
4.1 Empirical BLUPs in the BLUP-NE estimation method
The computational form of our BLUP natural estimators, their first and second order moment characteristics were derived at known variances . But if we really knew variances , we would not need to estimate them. The practical estimation procedure for cannot depend on the parameters which are to be estimated.
However, such situation is not rare at all in LMMs, also in the FDSLRM theory, when we consider e.g. double weighted least squares estimators (DOWELSE) or maximum likelihood estimators (MLE) for . In FDSRLM fitting these estimators are computed by iterative numerical procedures using projections in Hilbert spaces ((stulajter_predictions_2002, , sec. 3.4) or stulajter_estimation_2004 ). Generally, in the LMM framework, similar iterative numerical estimation procedures are described in (searle_variance_2009, , chap. 8), witkovsky_estimation_2012 .
In our case, the simplest reasonable solution of the mentioned situation is to use an empirical version of BLUP (EBLUP) based on the computational form of BLUP-NE (3.4), or (3.10) in an orthogonal case, as it is usual in the general theory of empirical BLUPs in LMMs ((stulajter_predictions_2002, , chap. 5), witkovsky_estimation_2012 , (rao_small_2015, , chap. 5)). In particular, this step means replacing unknown true parameters with other ”initial” values or estimates of in FDSLRM. In the light of iterative approaches, our EBLUP-NE can be viewed as a two-stage iterative method with one step in the iteration.
If we think again heuristically as during the formulation of the BLUP-NE definition (3.1), it seems reasonable to expect as good starting values of any, in some sense optimal estimates of , e.g. estimates obtained by maximum likelihood or least squares in FDSLRM.
On the top of that, from the theoretical perspective we should prefer estimates with the simplest explicit form and under less restrictive assumptions on the structure or distribution of FDSLRM. On the other hand, from the practical point of view, it will be sufficient to have at least initial estimates which can be obtained by reliable and time efficient computational methods.
Therefore, in the following sections we investigate, theoretically and practically with real data sets, five estimation methods providing initial estimates for under different assumptions on the structure and distribution of FDSLRM observation . Our candidates for the second stage of EBLUP-NE are summarized in tab. 2.
Since theoretical and computational aspects of these estimation procedures in FDSLRM fitting based on the projection theory in Hilbert spaces are more than 10 years old (stulajter_predictions_2002 , stulajter_estimation_2004 , hancova_natural_2008 ), we revisit and update them in the light of the current theoretical and computational tools of convex optimization.
According to agrawal_rewriting_2018 , boyd_convex_2009 , to formulate any mathematical problem as an optimization problem, we need to identify three attributes of the problem: optimization variable, constraints that the variable must satisfy and the objective function depending on the variable whose optimal value we want to achieve.
In all our estimation methods, optimization variable is satisfying constraints . In order to avoid any incompatibility problems in the convex optimization theoretical or computational framework, we extend constraints into the form of standard nonnegativity constraints or using generalized inequality.
Finally, we also consider using present theoretical and computational tools used in linear mixed modeling, since in the LMM framework, the problem of estimating variances has a long and rich history with many essential reference works (e.g. rao_estimation_1988 , searle_variance_2009 , christensen_plane_2011 , demidenko_mixed_2013 , rao_small_2015 , see also a review paper witkovsky_estimation_2012 ).
4.2 General case of the orthogonal FDSLRM
First of all, we formulate the basic assumptions required in our investigations on the structure and distribution of FDSLRM. As we mentioned in subsection 3, in real (practical) FDSLRM analysis we mainly work with orthogonal FDSLRMs (3.9). Therefore, in the rest of the paper we will focus primarily on this type of FDSLRMs.
The second assumption deals with a distribution of the FDSLRM observation . For now, we assume satisfying (2.1) and having any absolutely continuous probability distribution with respect to some -finite measure.
Under these two assumptions, we can apply in the second stage of EBLUP-NE three estimation methods: natural estimators (NE), double ordinary least squares estimators (DOOLSE) and modified double ordinary least squares estimators (MDOOLSE).
Original natural estimators – NE
In the case of NE, we know the analytic solution of the estimation problem hancova_natural_2008 , which gives us always required non-negative estimates of FDSLRM variances. Employing results of hancova_natural_2008 , orthogonality condition (3.9) leads to the following form of NE
[TABLE]
where is nothing else than the vector of ordinary least squares (OLS) residuals in FDSLRM, is the orthogonal projector onto the orthogonal element of the column space of and is an arbitrary realization of the FDSLRM observation . Vectors are columns of design matrix . Moreover, in orthogonal FDSLRM BLUE from MME (3.2) is identical with ordinary least squares estimate not depending on stulajter_estimation_2004 .
As for computational complexity of NE, using elementary theory of complexity boyd_introduction_2018 , we get the complexity operations for the residual vector and for NE from (4.1). Therefore, computing NE represents an algorithm with the complexity having order with respect to the realization length .
Remark 4
For the purposes of software cross-checking of results and evaluating numerical precision of convex optimization algorithms, it is worth to formulate the calculation of NE as a convex optimization problem. Since NE employ the least-squares method, it is not complicated to show that in orthogonal FDSLRMs, NE can be obtained as a unique, always existing, non-negative solution of the following convex optimization problem
[TABLE]
where matrices in the objective function are defined by expressions
[TABLE]
In convex optimization, this kind of optimization problems belongs to the convex quadratic optimization problems or the norm approximation problems (boyd_convex_2009, , sec. 4.4, sec. 6.1).
Double least squares estimators – DOOLSE, MDOOLSE
From the optimization viewpoint, assuming general (not necessarily orthogonal) FDSLRM observation (2.1), DOOLSE and MDOOLSE (stulajter_predictions_2002 , stulajter_estimation_2004 ) can be viewed as the following optimization problems for at given OLS residuals (agrawal_rewriting_2018 , boyd_convex_2009 )
[TABLE]
where are again OLS residuals as in the case of NE and is the covariance matrix of . We call matrix matrix of residual products or more compactly the residual products matrix.
Remark 5
In the next text, some theoretical results can be written in one form for both of DOOLSE and MDOOLSE problems. To emphasize this fact, we will use one abbreviation with parenthesis , e.g (M)DOOLSE, assuming that given results or considerations hold for both problems.
Using the geometrical language of Hilbert spaces (brockwell_time_2009 , stulajter_predictions_2002 ), (M)DOOLSE determine a covariance matrix with the parametric structure as in but having the smallest Euclidean distance to the matrix . In such case, (M)DOOLSE for orthogonal FDSLRMs could be computed geometrically as the orthogonal projection of onto the linear span using the Gram matrix of generated by the inner product . Matrices are given by (4.6), for DOOLSE: and for MDOOLSE: .
According to stulajter_estimation_2004 , gajdos_kriging_2017 , the mentioned projection is given by
[TABLE]
where , where for DOOLSE: , for MDOOLSE: , are again OLS residuals, are columns of .
However, the projection method can produce estimates out of the parametric space at a relatively high probability gajdos_kriging_2017 , i.e. in many cases it produces negative estimates. The proposed method (4.15) is not able to handle constraints given by reliably and in some simple way.
To distinguish the projection estimates from real DOOLSE (4.10), MDOOLSE (4.14) which are always non-negative, we suggest to be more accurate and use full names for real DOOLSE and MDOOLSE: non-negative DOOLSE (NN-DOOLSE) and non-negative MDOOLSE (NN-MDOOLSE) This specific way is similarly used e.g. in the case of Rao’s MINQUE (rao_estimation_1988, , chap. 5). For the projection-based DOOLSE and MDOOLSE, without considering nonnegativity constraints, we leave the original acronyms DOOLSE, MDOOLSE.
Applying the basic convex quadratic optimization theory (cornuejols_optimization_2018 , bertsekas_convex_2009 ) in orthogonal FDSLRM, we can rewrite NN-(M)DOOLSE as strictly convex quadratic problems, whose solutions always exist and are unique global minimizers on (see Propositions 2 and 4 in the Appendix)
[TABLE]
The most important result of convex optimization for NN-(M)DOOLSE problems are fundamental Karush-Kuhn-Tucker (KKT) optimality conditions, which in orthogonal FDSLRM provide a necessary and sufficient condition for optimality of solutions (see Proposition 3 in the Appendix). In practice, finding solutions of convex optimization problems is nothing else than solving the corresponding KKT conditions analytically or in general numerically.
In our case of NN-(M)DOOLSE, the KKT conditions can be represented by a set of nonsingular linear systems of equations, where each of them is given by matrix derived from and by vector from (4.15). Our theoretical results in the Appendix show that only one of the linear systems always gives us all required non-negative elements of .
On this basis, we can establish a simple KKT optimization algorithm which run through given linear systems, compute their analytical solution and stops when the solution appears non-negative. In other words, the proposed KKT algorithm always founds the required NN-(M)DOOLSE in at most steps. The algorithm is summarized in tab. 3, whereas the proof and details are in the Appendix.
As for computational complexity of the KKT optimization algorithm, all calculations of input have complexity , whereas the body of the algorithm (steps 1-3) is . Since fixed number of variance parameters in matrix is usually much smaller than (typically is of ), then the complete algorithm has the leading order with respect to .
It is the same as in the case of NE and generally typical only for computationally fastest analytical solutions of the KKT conditions.
Computational tools for orthogonal FDSLRM
In nonlinear optimization, there is a variety of highly efficient, fast and reliable open-source and commercial software packages (cornuejols_optimization_2018, , chap. 20). We have chosen one of the most well-known open source libraries for solving convex optimization tasks – CVXPY diamond_cvxpy:_2016 and its R version CVXR fu_cvxr:_2019 .
CVXPY is a scientific Python library but also a language with very simple, readable syntax not requiring any expertise in convex optimization and its PC implementation. CVXPY allows the user to specify the mathematical optimization problem naturally following normal mathematical notation as we can see in computing NN-DOOLSE (fig. 1) where a code easily mimics the non-negative DOOLSE mathematical formulation (4.10) with .
CVXPY, implements not only convex optimization solvers using interior-point numerical methods which are extremely reliable and fast, but also verifies convexity of the given problem using rules of disciplined convex programming grant_disciplined_2006 . In FDSLRM, interior-points numerical methods have complexity for one iteration or times bigger for the complete computation with a precision of the required optimal solution boyd_convex_2009 .
Remark 6
It is worth to mention that both CVXPY and CVXR are able to solve non-negative DOOLSE and MDOOLSE in a general FDSLRM without the orthogonality condition. CVXPY was inspired by MATLAB optimization package CVX grant_cvx:_2018 still used in many references, e.g. cornuejols_optimization_2018 . However, in CVXPY (CVXR) the user can easily combine convex optimization and simplicity of Python (R) language together with its high-level features such as object-oriented design or parallelism.
4.3 Gaussian orthogonal FDSLRM
In the LMM framework, the most usual distribution used in practice is represented by the multivariate normal (Gaussian) distribution. Therefore, in addition to the orthogonality of FDSLRM, we will require in this section the distributional assumption . We will refer to FDSLRM under these assumptions as to Gaussian orthogonal FDSLRM.
In the case of Gaussian orthogonal FDSLRM, we can add to our investigation last two estimation methods from tab. 2: maximum likelihood estimators (MLE) and residual maximum likelihood estimators (REMLE).
Maximum likelihood estimators – MLE, REMLE
Both MLE and REMLE of variances provide estimates maximizing ML and REML loglikelihood functions (logarithms of likelihoods). Using simple and clear arguments from Lamotte’s paper lamotte_direct_2007 , we can easily form the ML loglikelihood function and REML loglikelihood function , assuming general (not necessarily orthogonal) FDSLRM observation (2.1), as
[TABLE]
where are FDSLRM residuals, known as marginal residuals in LMM singer_graphical_2017 , is an arbitrary realization of the FDSLRM observation with covariance matrix , is BLUE of in MME (3.2) and is a generalized vector norm given by .
Remark 7
In Lammote’s paper lamotte_direct_2007 , which presents a direct derivation of REML likelihood function in LMM using familiar linear algebra operations, we can find that the formulation of ML and REML likelihood in LMM under the assumption of normality was originally done in Harville’s works harville_bayesian_1974 , harville_maximum_1977 . However, LMM references up to 2007 show the explicit form of REML likelihood for LMM rarely. In Štulajter’s FDSLRM reference works dealing with MLE stulajter_predictions_2002 , stulajter_estimation_2004 REML likelihood for FDSLRM is also absent. According to Lammote lamotte_direct_2007 , the reason why seems to be Harville’s too sophisticated, difficult and indirect derivation.
Maximum likelihood and residual maximum likelihood estimation of variance components in LMMs produce, in general, no analytical expressions for the estimators (searle_variance_2009, , chap. 6). They exist only for . According to (demidenko_mixed_2013, , sec. 2.5, thm 4) MLE and REMLE estimates do not exist in general FDSLRM with LMM observation (2.1) if and only if a realization of belongs to , linear span of columns and . This condition is also equivalent with .
In orthogonal FDSLRM, the MLE and REMLE can be rewritten as the optimization problems for (agrawal_rewriting_2018 , boyd_convex_2009 ) at given OLS residuals (for orthogonal FDSLRM , stulajter_estimation_2004 )
[TABLE]
[TABLE]
where and is the residual products matrix used in definitions (4.10), (4.14) of (M)DOOLSE.
Concerning convex optimization, (RE)MLE problems (4.29), (4.25) are generally not convex problems. However in orthogonal FDSLRM, according to (boyd_convex_2009, , sec. 4.1.3), using an appropriate bijective transformation, we can reformulate (RE)MLE as equivalent convex optimization problems (see Proposition 5).
Employing the theory of convex optimization, we can prove that equivalent (RE)MLE convex problems are strictly convex, for have always a unique solution, unique global minimizer satisfying corresponding KKT conditions (see Propositions 7, 6 and their proofs in the Appendix). These conditions are again necessary and sufficient for optimal solutions.
Combining all our theoretical results, we arrive to the main theoretical result of the section. It can be shown that for (or ) the KKT conditions for NN-(M)DOOLSE are equivalent with the KKT conditions for (RE)MLE and the bijective transformation. Or in other words, we can formulate the following theorem.
Theorem 4.1 (equivalence between NN-(M)DOOLSE and (RE)MLE)
In a Gaussian orthogonal FDSLRM non-negative (M)DOOLSE are almost sure equal to (RE)MLE, i.e.
[TABLE]
Proof
See the Appendix.
Computational tools for Gaussian orthogonal FDSLRM
The most important and useful consequence of Theorem 4.1 is that we can compute MLE or REMLE in orthogonal FDSLRM as quadratic NN-(M)DOOLSE. Therefore, we can use our KKT algorithm of order or computational tools CVXPY (CVXR) with described in the previous section. Moreover, for NN-(M)DOOLSE do not fail as (RE)MLE, but naturally extend them.
Since a Gaussian orthogonal FDSLRM is also Gaussian LMM, we can also apply current LMM packages. In our previous paper gajdos_kriging_2017 , we stated that no current package in R is directly and effectively suitable for FDSLRM. Thanks to a detailed study of demidenko_mixed_2013 , galecki_linear_2013 we were successfully directed and instructed how to implement the FDSLRM variance structure into one of the best R packages for LMM known as nlme (pinheiro_mixed-effects_2009 , pinheiro_nlme:_2018 ). After that, inspired by nlme, we found another R package called sommer, also suitable for FDSLRM fitting (covarrubias-pazaran_genome-assisted_2016 , cornuejols_optimization_2018 ).
Simultaneously, thanks to online computing environment CoCalc for SageMath stein_sage_2019 with the possibility to run computations and codes for free in many other programming languages and open softwares, we are also able to run and test MATLAB function mixed (witkovsky_mixed_2018 , witkovsky_estimation_2012 ) primarily intended for estimating variances in LMMs. On the basis of successful tests, we wrote its R version called MMEinR, which is also available in our GitHub Repository gajdos_mmeinr:_2019 .
Remark 8
The mentioned LMM packages can also handle (RE)MLE in the general FDSLRM without orthogonality restriction. The packages use iterative methods based on EM algorithm or Henderson MME together with some version of the Newton-Raphson method whose complexity is generally at least .
We are fully aware that another efficient implementation for estimating variances in LMMs provides lme4 package bates_fitting_2015 or SAS package PROC MIXED stroup_sas_2018 , sasinstituteinc._sas/stat_2018 . As for lme4, which is much faster than nlme, we found that the package does not allow implementing FDSLRM using lme4 standard input procedure.
Finally, as SAS laymans, we also tried a university edition of SAS, free from 2014, but we left it after running into initial problems with Windows installation process in a virtual machine environment and subsequently with a more sophisticated programming language than Python or R. Therefore our knowledge with SAS remained only at the theoretical level.
5 Application in real data examples
5.1 Three real data sets and their FDSLRMs
We illustrate the obtained theoretical results and the performance of the proposed EBLUP-NE method of variances in the FDSLRM (1.1) using three real data sets: electricity consumption, tourism and cyber security. For all data sets, we identified the most parsimonious structure of the FDSLRM using an iterative process of the model building and selection based on exploratory tools of spectral analysis and LMM theory. Details of our analysis and modeling can be found in our easy reproducible Jupyter notebooks freely available at our GitHub repository gajdos_fdslrm_2019 .
Electricity consumption
As the first real data example, we have the econometric time series data set, representing hourly observations of the consumption of the electric energy (in kWh) in an department store. The number of time series observations is . The data was adapted from stulajter_estimation_2004 . The consumption data can be fitted by the following Gaussian orthogonal FDSLRM stulajter_estimation_2004 :
[TABLE]
with and . Frequencies are suitable Fourier frequencies from the time series periodogram in spectral analysis (priestley_spectral_2004 , stulajter_estimation_2004 , gajdos_kriging_2017 ).
Since the time series data set contains only 24 observations and we have to estimate three regression and five variance parameters of the FDSLRM, this real data FDSLRM example should be considered only as a toy example. Taking two sets of Fourier frequencies or , we get two toy models previously introduced in gajdos_kriging_2017 and stulajter_estimation_2004 . These models give us the opportunity to check our numerical results and to demonstrate how in principle FDSLRM estimation methods and computational tools work.
Tourism
In this econometric FDSLRM application, we consider the time series data set, called visnights, representing total quarterly visitor nights (in millions) from 1998-2016 in one of the regions of Australia – inner zone of Victoria state. The number of time series observations is . The data was adapted from hyndman_forecasting:_2018 .
The Gaussian orthogonal FDSLRM fitting the tourism data has the following form (for details see our Jupyter notebook tourism.ipynb):
[TABLE]
with and .
Cyber attacks
Our final FDSLRM application describes the real time series data set representing the total weekly number of cyber attacks against a honeynet – an unconventional tool which mimics real systems connected to Internet, like business or school computers intranets, to study methods, tools and goals of cyber attackers. Data, taken from sokol_prediction_2018 , were collected from November 2014 to May 2016 in CZ.NIC honeynet consisting of Kippo honeypots in medium-interaction mode. The number of time series observations is .
The suitable FDSLRM, after a preliminary logarithmic transformation of data , is again Gaussian orthogonal (for details see our Jupyter notebook cyberattacks.ipynb) and in comparison with previous models (5.30), (5.31) has the simplest structure:
[TABLE]
with and .
5.2 Numerical results of estimating
For cross-checking purposes, we realized our numerical computations in both Python and R based software tools (or packages). As we described in previous sections, we implemented own algorithms and methods in SciPy, SageMath and R, particularly analytical expressions (4.1) for NE and the KKT algorithm for NN-(M)DOOLSE (tab. 3) in orthogonal FDSLRM.
Simultaneously, we confirmed the same estimations using CVXPY (or CVXR) package based on convex optimization and analogically using up-to-date standard LMM R packages nlme, MMEinR and sommer.
Detailed computational results for all three data sets using all corresponding relevant tools can be found and reproduced in our collection of Jupyter notebooks (asterisk * represents a name of data and specific computational tool): PY-estimation-.ipynb and R-estimation-.ipynb.
Table 4 summarizes all types of considered initial estimates and their corresponding EBLUP-NE in all four models. The computations also confirmed Theorem 4.1 therefore we wrote results for MLE and REMLE only.
Finally, we point out that thanks to SageMath and our very fast KKT algorithm, we were able to compute (in real time) NN-(M)-DOOLSE in toy examples with infinite precision – as the exact closed-form (algebraic) numbers. To get an explicit idea, e.g. for our first toy model, we got these results for NN-MDOOLSE identical with REMLE
[TABLE]
It means that we can compute real errors in results for all used computational tools to explore their quality from the viewpoint of numerical precision. At the same time, we also watched a run time for particular computational tools using Jupyter notebook extension ExecuteTime, which provided a preliminary comparison of real execution times.
6 Conclusions
We suggested and investigated an alternative, new method EBLUP-NE based on empirical BLUPs for estimating variances in time series modeled by FDSLRM. The estimation method can be also viewed, analogously like EBLUP (rao_small_2015 , witkovsky_estimation_2012 ), as a two-stage iterative method with one step in the iteration.
EBLUP-NE are invariant quadratic, non-negative estimators whose simple computational form and subsequent first and second-moment statistical properties are given by two special matrices – the Schur complement and Gram matrix zhang_schur_2005 . The method can be used not only in the case of normally distributed time series data, but for any absolutely continuous probability distribution of time series data.
As initial starting estimates for EBLUP-NE, we can principally employ any of the previously used methods based on least squares (NE, NN-DOOLSE, NN-MDOOLSE) or maximum likelihood (MLE, REMLE). Applications of FDSLRM with the EBLUP-NE on three real data sets (electricity consumption, tourism, cyber attacks) providing two toy and two real models indicate that the method computationally gives at least comparable results with REMLE or NN-MDOOLSE, but in faster run time (approximately 10-1000 times on the standard PC).
Due to lack of computational implementation for FDSLRM modeling, which would be generally available and readily applicable, and the fact that FDSLRM least squares and maximum likelihoods estimation procedures are more than 10 years old, we revisited and updated theoretical and computational knowledge dealing with these methods. Specifically, applying the convex optimization theory, we reformulated all estimation methods as convex optimization problems in the so-called orthogonal FDSLRM, the most usual form of FDSLRM used in practice.
We formulated the KKT optimality conditions, which, unlike likelihood equations, are necessary and sufficient conditions for optimal solutions of (RE)MLE or NN-(M)DOOLSE on extended parametric space . KKT optimality conditions dictate not only the exact existence conditions of estimates but they also solve the well-known problem dealing with standardly used likelihood equations (christensen_plane_2011, , chap. 12), (searle_variance_2009, , chap. 6) in LMM where their solutions for (RE)MLE or NN-(M)DOOLSE may not be required estimates — they may be out of the parameter space or they may be other than the maximum or minimum.
Moreover, using KKT optimality conditions in orthogonal FDSLRM, we proved the equivalence of NN-(M)DOOLSE and (RE)MLE with propability 1. This most important theoretical result of the paper is the stronger and more general result than in stulajter_estimation_2004 proved only for interior points of .
Simultaneously, the convex optimization theory brought us to the new KKT algorithm for computing NN-(M)DOOLSE, equivalent to (RE)MLE, with double floating-point precision as the default precision of outputs and with computational complexity . Such an algorithm which we implemented in SciPy, SageMath and R, which at the default precision level is times more accurate and approximately faster than the best current Python- or R-based standard computational tools, can be used in effective computational time series research to study properties of FDSLRM (Monte Carlo and bootstrap methods, kreiss_bootstrap_2012 ).
Regarding computational aspects, we were also successful in the identification and demonstration of consistent results for the real data applications of FDSLRM in several free, open-source current standard computational tools — namely CVXPY, CVXR (R version of CVXPY) packages for convex optimization and LMM R packages nlme, sommer and MMEinR (our R version of Witkovsky’s MATLAB mixed function). These results and procedures can be freely viewed in our 16 Jupyter notebooks which are easily readable, sharable, reproducible and modifiable directly in our GitHub repository (gajdos_fdslrm_2019 ). Open-source Jupyter technology with Python and R packages also solved our problem stated in gajdos_kriging_2017 that no current package in R was directly and effectively suitable for FDSLRM.
Finally, our investigation has also brought new questions for further research. There is definitely a need for more exact and detailed analysis based on a simulation study and general EBLUP theory (zadlo_mse_2009 , witkovsky_estimation_2012 , rao_small_2015 ) focusing on EBLUP-NE quality with respect to previously used estimation methods and the performance of the EBLUP-NE method itself with different initial starting points, using different computational tools in various probability distributions. These research questions are under our investigation and will be published in the near future.
In connection with our current computational research in FDSLRM, but also for real time series data analysis and forecasting, we started to build our own R package (see a preliminary, fully functional version at gajdos_fdslrmR_2019 ) on mentioned LMM R packages to manipulate readily with FDSLRM concepts and procedures.
Finally, our results in the paper can be seen reciprocally as contributions to convex optimization and LMM methodology. Particularly, our convex optimization application in the context of time series modeling has become another one from a wide variety of application areas of convex optimization boyd_convex_2009 . Since FDSLRM describing observed time series values is also a special type of LMM, our EBLUP-NE and the very fast, accurate KKT algorithm to compute (RE)MLE may have potential to be used in computational research and applications dealing with LMMs.
Acknowledgements.
We are very grateful especially to František Štulajter (Comenius University in Bratislava, SK) for his guidance during our early contact and study of FDSLRM concepts and to Viktor Witkovský (Slovak Academy of Science, SK) for his recommendations and deep insights dealing with the LMM methodology and its computational tools connected to FDSLRM. We would also like to thank Ivan Žezula and Daniel Klein (both from P. J. Šafárik University in Košice, SK) for a valuable feedback dealing with considered estimation methods. Concerning applied computational tools, we would also like to acknowledge involvement and recommendations of Giovanny Covarrubias-Pazaran (University of Wisconsin, US) in using sommer package, Steve Diamond and Stephen Boyd (Stanford University, US) in using CVXPY and CVXR. Finally, in connection with using SageMath, Jupyter and GitHub, our specials thanks go namely to William Stein (University of Washington, US), Harald Schilly (Universität Wien, AT), Erik Bray (Université Paris-Sud, FR), Luca de Feo (Université de Versailles Saint-Quentin-en-Yvelines, FR) and Charles J. Weiss (Augustana University, US).
Appendix
Acronyms and abbreviations
The BLUP-NE method
Proof (Theorem 3.1)
Employing Lemma 1 and the well-known standard expressions for mean values and covariances of invariant quadratic estimators (see e.g. christensen_plane_2011 , puntanen_formulas_2013 )) and if then , we have
.
According to (3) of Lema 1 . 2.
As a consequence of (1) . 3.
In a similar way as in (2)
. 4.
.
∎
Existence of inversions for ,
To prove nonsingularity of , , it is sufficient to show that both matrices have non-zero determinants. Using idempotence of and expression (searle_matrix_2017, , sec. 6.8)
[TABLE]
we can write for determinants of ,
[TABLE]
Now we can see that the sum of positive definite matrix and each of positive semidefinite matrices is a positive definite matrix, whose determinant is always positive.
Double least squares estimators NN-(M)DOOLSE
Using the standard inner product of matrices defined as , generating the Euclidean norm of a matrix, and basic properties of the trace function, we can easily rewrite expressions (4.10), (4.14) for the objective functions as quadratic forms. The particular form of these quadratic forms in optimization problems (M)DOOLSE is described by the following proposition.
Proposition 2 (NN-(M)DOOLSE as quadratic optimization problems)
In an orthogonal FDSLRM the NN-(M)DOOLSE problems are convex quadratic optimization problems in the form
[TABLE]
\qquad\boldsymbol{q}=\left(\begin{array}[]{c}\boldsymbol{e}^{\prime}\boldsymbol{e}\\ (\boldsymbol{e}^{\prime}\boldsymbol{v}_{1})^{2}\\ (\boldsymbol{e}^{\prime}\boldsymbol{v}_{2})^{2}\\ \vdots\\ (\boldsymbol{e}^{\prime}\boldsymbol{v}_{l})^{2}\end{array}\right)*, \quad\mathrm{G}=\left(\begin{array}[]{ccccc}n^{*}&\left\|\boldsymbol{v}_{1}\right\|^{2}&\left\|\boldsymbol{v}_{2}\right\|^{2}&\ldots&\left\|\boldsymbol{v}_{l}\right\|^{2}\\ \left\|\boldsymbol{v}_{1}\right\|^{2}&\left\|\boldsymbol{v}_{1}\right\|^{4}&0&\ldots&0\\ \left\|\boldsymbol{v}_{2}\right\|^{2}&0&\left\|\boldsymbol{v}_{2}\right\|^{4}&\ldots&0\\ \vdots&\vdots&\vdots&\ldots&\vdots\\ \left\|\boldsymbol{v}_{l}\right\|^{2}&0&0&\ldots&\left\|\boldsymbol{v}_{l}\right\|^{4}\end{array}\right)\succ 0
where for *NN-DOOLSE: * , *NN-MDOOLSE: ,
According to the fundamental KKT theorem of convex optimization (boyd_convex_2009, , chap.5) which handles convex optimization problems with constraints in the form of inequalities giving a list of the so-called Karush-Kuhn-Tucker (KKT) optimality conditions, we consider our optimization problem NN-(M)DOOLSE in the Lagrange multiplier form with Lagrangian as a sum of the objective function and a linear combination of multipliers and constraints .
Then KKT optimality conditions for (6.37) become necessary and sufficient conditions of optimality, satisfied at any local optimal solution, being represented by three sets of conditions: (1) primal and dual feasibility: , , (2) stationarity of the Lagrangian: and (3) complementary slackness: . Computing the gradient of the Lagrangian, we arrive at the following proposition.
Proposition 3 (KKT optimality conditions for NN-(M)DOOLSE)
In an orthogonal FDSLRM consider the non-negative NN-(M)DOOLSE problems in the Lagrange multiplier form with the Lagrangian
[TABLE]
Then, a necessary and sufficient condition for to be problems’ optimal solution is
** 2.
** 3.
.
As for the existence of optimal solutions, we can apply the well-known basic results for minimizing quadratic forms (bertsekas_convex_2009, , chap. 3). Since the Hessian of (6.37) equal to is a positive definite matrix, like , then is strictly convex and also coercive proper function. These two properties are sufficient conditions for NN-(M)DOOLSE (bertsekas_convex_2009, , Weierstrass’ theorem, p. 119) to have always a unique global optimal solution .
Employing the familiar Bessel’s inequality , where equality holds if and only if , it is also not difficult to see that KKT conditions (1-3) rewritten as the following system
[TABLE]
imply an optimal solution with if and only if the vector of OLS residuals belongs to the column space of . Since probability of is zero, our existence conclusions can be summarized in the next proposition.
Proposition 4 (Existence of NN-(M)DOOLSE)
In an orthogonal FDSLRM the following holds
NN-(M)DOOLSE* problems are strictly quadratic optimization problems.* 2.
Their solutions always exist and they are unique global minimizers. 3.
. 4.
.
KKT algorithm for NN-(M)DOOLSE
If we consider which occurs if and only if in KKT conditions (6.38)-(6.40) then the optimal solution for NN-(M)DOOLSE is trivial: .
For implying , and cannot be simultaneous zero otherwise it would lead to contradiction between (6.38) and (6.39). Therefore, in the case , the complementary slackness condition (3) can be rewritten in the form , where is an auxiliary indicator, which is zero if and one if .
Using vector , the derived KKT conditions (2) in Proposition 3 can be described by a matrix function and a vector function as
[TABLE]
where
[TABLE]
Applying the Banachiewicz formula, we can write for the inverse of the following analytic expression
[TABLE]
where
, ,
, .
Finally, Proposition 4 guarantees the existence of unique auxiliary vectors :
[TABLE]
[TABLE]
Based on vectors , the NN-(M)DOOLSE of as a solution of KKT conditions has the final form
[TABLE]
Thanks to Proposition 4, it is worth to mention that the matrix system (6.41) includes also solutions with .
Maximum likelihood estimators (RE)MLE
Generally, (RE)MLE in FDSLRM are not convex optimization problems with respect to and they do not exist for which occurs if and only if . If we apply the following bijective transformation defined on
[TABLE]
we can convert the (RE)MLE problems in the form of equivalent convex problems whose solutions can be readily converted to a (RE)MLE solutions by the inverse transformation to (6.42). Here are more detailed steps of the conversion.
In orthogonal FDSLRM , (stulajter_mse_2003, , lem 2.1) and are equal to
[TABLE]
Using orthogonality conditions (3.9) and expression (6.33), we get for determinants in (RE)ML loglikelihoods (4.25), (4.29)
[TABLE]
Since the chosen bijective transformation (6.42) also transforms convex constraints for to convex constraints for , substituting (6.43), (6.44) into objective functions (4.25), (4.29) of (RE)MLE, we can formulate the following proposition.
Proposition 5 (Equivalent (RE)MLE convex problems)
Let assume and consider a bijective transformation in the following form:
[TABLE]
Then, in a Gaussian orthogonal FDSLRM the (RE)MLE problems are equivalent to convex problems:
[TABLE]
where for and .
Once again, we can write the equivalent (RE)MLE problem with constraints in the corresponding Lagrangian multiplier form based on the following Lagrangian with multipliers
[TABLE]
Then by the direct computation, we obtain KKT conditions for equivalent (RE)MLE describing (1) primal and dual feasibility for and , (2) stationarity of the Lagrangian () and (3) complementary slackness, all described by the next proposition.
Proposition 6 (KKT optimality conditions for equivalent (RE)MLE)
Consider equivalent convex optimization problems to (RE)MLE in the Lagrangian multiplier form
[TABLE]
Then, a necessary and sufficient condition for to be problems’ optimal solution is
** 2.
* *
** 3.
.
Similarly as in the case of NN-(M)-DOOLSE, the Hessian equal to
[TABLE]
leads to strict convexity of the problem. Since is also coercive, we can summarize the existence conditions of optimal solutions in the final proposition as we did for NN-(M)DOOLSE problems.
Proposition 7 (Existence of equivalent (RE)MLE)
Let assume . Then in a Gaussian orthogonal FDSLRM the following holds
Equivalent (RE)MLE problems are strictly convex optimization problems. 2.
The objective function is coercive with respect to constraints. 3.
Their solutions always exist and they are unique global minimizers.
Proof (Theorem 4.1)
The bijection (6.42) implies which dictates to satisfy the complementary slackness conditions in KKT (3) and simultaneously forms the new version of KKT (1-3) of Proposition 6
[TABLE]
But for , which occurs with probability 1, the system (6.49)-(6.51) is the same as the system (6.38)-(6.40), which is what we set out to prove.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) Agrawal, A., Verschueren, R., Diamond, S., Boyd, S.: A rewriting system for convex optimization problems. Journal of Control and Decision 5 (1), 42–60 (2018)
- 2(2) Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting Linear Mixed-Effects Models Using lme 4. Journal of Statistical Software 67 (1), 1–48 (2015)
- 3(3) Beezer, R.A., Bradshaw, R., Grout, J., Stein, W.A.: Sage. In: L. Hogben (ed.) Handbook of Linear Algebra, 2nd edn., pp. 91–1–91–26. Chapman and Hall/CRC, Boca Raton (2013)
- 4(4) Bertsekas, D.P.: Convex Optimization Theory, 1st edn. Athena Scientific, Belmont (2009)
- 5(5) Box, G.E.P., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time Series Analysis: Forecasting and Control, 5th edn. Wiley, Hoboken (2015)
- 6(6) Boyd, S., Vandenberghe, L.: Convex Optimization. 7th printing. Cambridge University Press, Cambridge (2009)
- 7(7) Boyd, S., Vandenberghe, L.: Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares. Cambridge University Press, Cambridge (2018)
- 8(8) Brieulle, L., De Feo, L., Doliskani, J., Flori, J.P., Schost, É.: Computing isomorphisms and embeddings of finite fields. Math. Comp. 88 (317), 1391–1426 (2019)
