Covariance structure associated with an equality between two general ridge estimators
Koji Tsukuda, Hiroshi Kurata

TL;DR
This paper establishes a necessary and sufficient condition for when two general ridge estimators are equal in a linear model, based on the error dispersion matrix, extending classical results on estimator equality.
Contribution
It generalizes existing theorems by characterizing the covariance structure that ensures the equality of two broad classes of ridge estimators.
Findings
Derived a condition based on the dispersion matrix for estimator equality
Extended classical theorems to a broader class of estimators
Explored related problems on residual sums of squares and matrix classification
Abstract
In a general linear model, this paper derives a necessary and sufficient condition under which two general ridge estimators coincide with each other. The condition is given as a structure of the dispersion matrix of the error term. Since the class of estimators considered here contains linear unbiased estimators such as the ordinary least squares estimator and the best linear unbiased estimator, our result can be viewed as a generalization of the well-known theorems on the equality between these two estimators, which have been fully studied in the literature. Two related problems are also considered: equality between two residual sums of squares, and classification of dispersion matrices by a perturbation approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Covariance structure associated with an equality between two general ridge estimators
Koji Tsukuda111Graduate School of Arts and Sciences, the University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan. mail: [email protected]; Hiroshi Kurata222Graduate School of Arts and Sciences, the University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan. mail: [email protected]
Abstract
In a general linear model, this paper derives a necessary and sufficient condition under which two general ridge estimators coincide with each other. The condition is given as a structure of the dispersion matrix of the error term. Since the class of estimators considered here contains linear unbiased estimators such as the ordinary least squares estimator and the best linear unbiased estimator, our result can be viewed as a generalization of the well-known theorems on the equality between these two estimators, which have been fully studied in the literature. Two related problems are also considered: equality between two residual sums of squares, and classification of dispersion matrices by a perturbation approach.
MSC-2010. primary:62J05. secondary: 62F10, 62J07.
key words and phrases: Best linear unbiased estimator; General linear model; Least squares estimator; Perturbation approach
1 Introduction
In a general linear model, this paper derives a necessary and sufficient condition under which two general ridge estimators coincide with each other. To state the problem more precisely, let us consider
[TABLE]
where is an vector, is an matrix () satisfying , is an unknown positive constant and is a known positive definite matrix. As is well-known, the estimator of the form
[TABLE]
which will be called the Gauss–Markov estimator in the sequel, is the best linear unbiased estimator of , that is, it has the smallest covariance matrix (in terms of positive semidefiniteness) among linear unbiased estimators. This estimator is also optimal with respect to the following quadratic risk functions:
[TABLE]
where is an arbitrary positive semidefinite matrix. However, if we broaden the class of estimators to that of linear but not necessarily unbiased estimators, it is no longer optimal and general ridge estimators play an essential role instead. Here, a general ridge estimator is defined to be an estimator of the form
[TABLE]
(Rao (1976)), where and denote the sets of positive definite and semidefinite matrices, respectively. As is proved by Rao (1976) and Markiewicz (1996), the general ridge estimators are linearly sufficient and linearly admissible, and conversely, any linearly sufficient and linearly admissible estimator belongs to the class of general ridge estimators. Moreover, they are linearly complete. For other properties of general ridge estimator, see, for example, Arnold and Stahlecker (2000), Gross (1998) and Groß and Markiewicz (2004).
On the other hand, it is also well-known that there are some cases in which two linear unbiased estimators coincide with each other. Perhaps most important is the one in which the Gauss-Markov estimator is identically equal to the ordinary least squares estimator , which does not depend on . Conditions for the equality between the two estimators have been studied by many authors so far (see, for example, Baksalary and Trenkler (2009), Chapter 7 of Kariya and Kurata (2004), Puntanen and Styan (1989) and Zyskind (1967)). Among others, Rao (1967) proved that for a given , the equality holds for all if and only if is of the form
[TABLE]
where is an matrix satisfying and , and will be fixed throughout. In this paper, we discuss an identical equality between two general ridge estimators. More precisely, we derive a necessary and sufficient condition for to guarantee that, for given , the equality
[TABLE]
holds. This result, which will be presented in Section 2, can be regarded as an extension of (1.3), since the class of general ridge estimators includes the Gauss–Markov and the ordinary least squares estimators. Indeed we can readily see
[TABLE]
The class also contains the ordinary ridge estimators and with and shrinkage estimators of the form and with .
In Sections 3 and 4, two related problems are considered: First one is the problem of deriving a condition on under which an identical equality between two generalized residual sums of squares holds. To state it precisely, let
[TABLE]
Then the ordinary residual sums of squares and its Gauss–Markov version are given respectively by
[TABLE]
and
[TABLE]
In the literature, Kariya (1980) derived a necessary and sufficient condition under which in the context of estimation of . He also derived a condition for the two equalities and to hold simultaneously. The latter result was generalized by Kurata (1998). See also Groß (1997). In this paper, we generalize their result by considering the case in which
[TABLE]
hold for given and . Needless to say, the above equalities do not generally hold. Moreover, Kurata (1998) used
[TABLE]
to measure the extent to which deviates from (1.3). In Section 4, we extend his result to the case including general ridge estimators.
As has been widely recognized, the simple ordinary ridge estimator shows better performance in practice than the ordinary least squares estimator when there exists a multicollinearity in the explanatory variables (Hoerl and Kennard (1970)). Moreover, some previous works such as Frank and Friedman (1993) have reported that works well in many cases. Hence, it is valuable to discuss the case also from the practical viewpoint.
2 Equality between two general ridge estimators
In this section, we derive a necessary and sufficient condition for the dispersion matrix to guarantee an identical equality between two general ridge estimators. We use the fact that the condition (1.3) is equivalent to .
Theorem 1**.**
For , the equality holds if and only if the dispersion matrix is of the form (1.3) with some and satisfying
[TABLE]
Proof.
The equality can be rewritten as
[TABLE]
which is further equivalent to the following two equalities:
[TABLE]
and
[TABLE]
since and the matrix is nonsingular. As is remarked in Section 1, the condition (2.2) is equivalent to (1.3), which can also be expressed as
[TABLE]
Substituting it to (2.3) shows that with (2.2), the condition (2.3) is equivalent to
[TABLE]
This completes the proof.∎∎
Using Theorem 1 with , we have the following corollary.
Corollary 2**.**
For , the equality holds if and only if is of the form (1.3) with satisfying In particular, if is nonsingular, then for some .
Remark 1**.**
Let . Then Corollary 2 implies that is equivalent to for some .
Remark 2**.**
Suppose that satisfies (1.3). Let and with . Then the two matrices satisfy the condition (2.1). In fact, by (3.5), we see that and hence , implying . Thus Theorem 1 applies and hence the equality holds. More specifically, the condition (1.3) is necessary and sufficient for . This conclusion itself is obvious from the forms of the shrinkage estimators.
Next we clarify when there exists satisfying the condition (2.1) for given . For this purpose, let
[TABLE]
Needless to say, and have a one to one correspondence with and , respectively.
Proposition 3**.**
There exists satisfying (2.1) if and only if and satisfy
[TABLE]
where denotes the range of . In this case, is of the form
[TABLE]
where denotes the Moore–Penrose inverse of .
Proof.
Suppose first that the condition (2.1) holds, which is equivalent to
[TABLE]
Here, the two matrices in the left hand side commute, i.e., , since is symmetric. Due to the nonsingularity of , the matrices ’s must satisfy . Hence, by letting , they can be commonly expressed as
[TABLE]
with and a matrix satisfying . Since , we can write for some , and a matrix satisfying and . Furthermore, and can be taken as diagonal matrices. Hence the equality implies which can be rewritten as . Therefore, must be also diagonal, which implies that and commute. Thus we have (2.5) and
[TABLE]
where the last expression is equivalent to (2.6), since and .
Conversely, suppose that (2.5) hold. Then, by letting as in (2.6), we have
[TABLE]
since implies . This shows the existence of that satisfies (2.7), which is equivalent to (2.1). This completes the proof.∎∎
Remark 3**.**
When is unknown, it is often assumed that to make the model identifiable (Kariya (1980)). In this case, in order that holds, the matrices and should satisfy
[TABLE]
as well as (2.1). In particular, when and are positive definite, the matrix should satisfy
[TABLE]
3 Equality between residual sums of squares
In this section, we discuss a condition under which the identical equality
[TABLE]
holds in addition to , where the general residual sum of squares is defined in (1.4). To make notations simpler, let us denote
[TABLE]
where is positive definite and is positive semidefinite.
Theorem 4**.**
For , the two identical equalities
[TABLE]
simultaneously hold if and only if the following three conditions
[TABLE]
hold for some .
Proof.
From Theorem 1, is equivalent to with . In this case, it holds that
[TABLE]
and . This implies that
[TABLE]
with given in (3.1), and
[TABLE]
Thus the problem is to find a condition under which holds for arbitrary , which is clearly equivalent to . The quantities and are calculated respectively as
[TABLE]
and
[TABLE]
where (3.5) is used. Since
[TABLE]
holds, the equality
[TABLE]
is equivalent to
[TABLE]
and
[TABLE]
The equality (3.6) can be rewritten as
[TABLE]
and (3.7) is the same as
[TABLE]
This completes the proof. ∎∎
Remark 4**.**
The above theorem can be viewed as an extension of Kariya (1980) (Corollary), in which it is shown that (3.2) is a necessary and sufficient condition under which and simulataneously hold. In fact, in Theorem 4, let . Then the conditions (3.3) and (3.4) vanish, since they hold for all . Hence, the conditions in the above theorem reduces to (3.2).
Corollary 5**.**
Let . The two equalities and simultaneously hold if and only if .
Proof.
Letting , we will use Theorem 4. From (3.3), . Since which yields (3.4), Theorem 4 implies that both and hold if and only if
[TABLE]
This completes the proof.∎∎
Remark 5**.**
Let . Then Corollary 5 implies that the two equalities and simultaneously hold if and only if .
4 Classification criterion of dispersion matrices
As is observed in the previous sections, the condition in Theorem 1 on rarely holds, and hence the estimators and do not coincide in most cases. In the context of comparing the Gauss-Markov and the ordinary least squares estimators, Kurata (1998) used
[TABLE]
as a criterion to measure the difference between them. The rank ranges from [math] to and takes zero if and only if is of the form (1.3). Hence this criterion can also be regarded as a measure of the extent to which the structure of deviates from (1.3), or equivalently, a criterion to classify . This section is devoted to deriving a generalization of his result to the case including general ridge estimators.
Since the quantity (4.1) is the same as
[TABLE]
it is natural to use the rank of difference matrix
[TABLE]
as a measure that is applicable to general ridge estimators. Since we have
[TABLE]
where
[TABLE]
is the mean square error matrix (see, for example, (3.1) of Gross (1998)), it is also natural to use the rank of
[TABLE]
We adopt the above two quantities in the sequel.
However, since it is in general not easy to analyze them unless , we limit our consideration to the case in which both and are small. More precisely, we fix , in and use the perturbation approach by letting , with a small positive constant . Note that can be expressed as
[TABLE]
for some , and .
Theorem 6**.**
Fix and , and write as in (4.4). Consider general ridge estimators and with
[TABLE]
and a positive constant satisfying
[TABLE]
where denotes a matrix norm. Then the quantities in (4.2) and in (4.3) are evaluated as
[TABLE]
and
[TABLE]
respectively, as .
Proof.
First we prove (4.5). Clearly, is equal to
[TABLE]
As for the first term of (4.7), we have
[TABLE]
The four terms in the right-hand side are further calculated as
[TABLE]
respectively. As for the second term of (4.7), we obtain
[TABLE]
and
[TABLE]
From the definition of matrix functions, equations (4.8)–(4.11) are evaluated as
[TABLE]
respectively. On the other hand, from the definition of matrix functions again, it holds that and , which implies that
[TABLE]
Thus we have
[TABLE]
Next we prove (4.6). Clearly, is equal to
[TABLE]
As for the first and second terms of (4.12), it holds that
[TABLE]
and
[TABLE]
Moreover, recall that and . From the definition of matrix functions, it follows that
[TABLE]
and hence
[TABLE]
which is equal to
[TABLE]
This completes the proof.∎∎
Remark 6**.**
Since
[TABLE]
the quantity
[TABLE]
can be written in original notation as
[TABLE]
If is of the form (1.3), then (4.13) is simplified as
[TABLE]
In particular, when , the above quantity is further reduced to
[TABLE]
If we consider estimators such that and , then the matrices and are given by and , where the constant is absorbed into . In this case, the quantity (4.13) takes the form
[TABLE]
From Theorem 6, when is small, the major part of the deviation of the simple estimator from the good estimator is characterized by the first term
[TABLE]
Moreover, when (i.e., is of the form (1.3)), the first term vanishes and the second term becomes
[TABLE]
which is the coefficient of . If further , then holds and hence also vanishes, implying and .
From this observation, if both and are small, then the criterion proposed by Kurata (1998) still works even in the case including general ridge estimators. As its extension, based on (4.6), we propose the following two step criterion for classification of dispersion matrices:
Classify according to
[TABLE] 2. 2.
Make a finer classification by using
[TABLE]
Remark 7**.**
When , then and our criterion reduces to that of Kurata (1998).
Remark 8**.**
Let . Then and . As is seen in this example, even if we consider the same dispersion matrix, classification may vary according to the choice of and . Besides, when , and , then .
Remark 9**.**
If is of the form (1.3), that is , then
[TABLE]
Remark 10**.**
When and with ,
[TABLE]
Acknowledgments
The portion of the second author’s work was supported by Japan Society for the Promotion of Science KAKENHI Grant Number JP26330035.
This is a pre-print of an article published in Statistical Papers. The final authenticated version is available online at: https://doi.org/10.1007/s00362-017-0975-8.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Arnold and Stahlecker (2000) Arnold BF, Stahlecker P (2000) Another view of the Kuks–Olman estimator. Journal of Statistical Planning and Inference 89 , no. 1–2, 169–174.
- 2Baksalary and Trenkler (2009) Baksalary OM, Trenkler G (2009) A projector oriented approach to the best linear unbiased estimator. Statistical Papers 50 , no. 4, 721–733.
- 3Frank and Friedman (1993) Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35 , no. 2, 109–135.
- 4Groß (1997) Groß J (1997) A note on equality of MINQUE and simple estimator in the general Gauss–Markov model. Statistics & Probability Letters 35 , no. 4, 335–339.
- 5Gross (1998) Gross J (1998) On contractions in linear regression. Journal of Statistical Planning and Inference 74 , no. 2, 343–351.
- 6Groß (2004) Groß J (2004) The general Gauss–Markov model with possibly singular dispersion matrix. Statistical Papers 45 , no. 3, 311–336.
- 7Groß and Markiewicz (2004) Groß J, Markiewicz A (2004) Characterizations of admissible linear estimators in the linear model. Linear Algebra and its Applications 388 , 239–248.
- 8Hoerl and Kennard (1970) Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12 , no. 1, 55–67.
