Covariance structure associated with an equality between two general   ridge estimators

Koji Tsukuda; Hiroshi Kurata

arXiv:1705.02761·math.ST·March 29, 2022

Covariance structure associated with an equality between two general ridge estimators

Koji Tsukuda, Hiroshi Kurata

PDF

TL;DR

This paper establishes a necessary and sufficient condition for when two general ridge estimators are equal in a linear model, based on the error dispersion matrix, extending classical results on estimator equality.

Contribution

It generalizes existing theorems by characterizing the covariance structure that ensures the equality of two broad classes of ridge estimators.

Findings

01

Derived a condition based on the dispersion matrix for estimator equality

02

Extended classical theorems to a broader class of estimators

03

Explored related problems on residual sums of squares and matrix classification

Abstract

In a general linear model, this paper derives a necessary and sufficient condition under which two general ridge estimators coincide with each other. The condition is given as a structure of the dispersion matrix of the error term. Since the class of estimators considered here contains linear unbiased estimators such as the ordinary least squares estimator and the best linear unbiased estimator, our result can be viewed as a generalization of the well-known theorems on the equality between these two estimators, which have been fully studied in the literature. Two related problems are also considered: equality between two residual sums of squares, and classification of dispersion matrices by a perturbation approach.

Equations179

y = X β + ε, E [ε] = 0, E [ε ε^{⊤}] = σ^{2} Ω,

y = X β + ε, E [ε] = 0, E [ε ε^{⊤}] = σ^{2} Ω,

\tilde{β}_{GM} = (X^{⊤} Ω^{- 1} X)^{- 1} X^{⊤} Ω^{- 1} y,

\tilde{β}_{GM} = (X^{⊤} Ω^{- 1} X)^{- 1} X^{⊤} Ω^{- 1} y,

R (\tilde{β}, β) = E [(\tilde{β} - β)^{⊤} W (\tilde{β} - β)],

R (\tilde{β}, β) = E [(\tilde{β} - β)^{⊤} W (\tilde{β} - β)],

\hat{β} (Ψ, K) = (X^{⊤} Ψ^{- 1} X + K)^{- 1} X^{⊤} Ψ^{- 1} y with Ψ \in S^{+} (n) and K \in S^{N} (k)

\hat{β} (Ψ, K) = (X^{⊤} Ψ^{- 1} X + K)^{- 1} X^{⊤} Ψ^{- 1} y with Ψ \in S^{+} (n) and K \in S^{N} (k)

Ω = X Γ X^{⊤} + Z Δ Z^{⊤} for some Γ \in S^{+} (k) and Δ \in S^{+} (n - k),

Ω = X Γ X^{⊤} + Z Δ Z^{⊤} for some Γ \in S^{+} (k) and Δ \in S^{+} (n - k),

\hat{β} (Ω, K_{1}) = \hat{β} (I, K_{2}) for any y \in R^{n}

\hat{β} (Ω, K_{1}) = \hat{β} (I, K_{2}) for any y \in R^{n}

\tilde{β}_{GM} = \hat{β} (Ω, 0) and \tilde{β}_{O L S} = \hat{β} (I, 0) .

\tilde{β}_{GM} = \hat{β} (Ω, 0) and \tilde{β}_{O L S} = \hat{β} (I, 0) .

\mbox GR (Ψ, K) = (y - X \hat{β} (Ψ, K))^{⊤} Ψ^{- 1} (y - X \hat{β} (Ψ, K)) (Ψ \in S^{+} (n); K \in S^{N} (k)) .

\mbox GR (Ψ, K) = (y - X \hat{β} (Ψ, K))^{⊤} Ψ^{- 1} (y - X \hat{β} (Ψ, K)) (Ψ \in S^{+} (n); K \in S^{N} (k)) .

\mbox GR (I, 0) = (y - X \hat{β} (I, 0))^{⊤} (y - X \hat{β} (I, 0))

\mbox GR (I, 0) = (y - X \hat{β} (I, 0))^{⊤} (y - X \hat{β} (I, 0))

\mbox GR (Ω, 0) = (y - X \hat{β} (Ω, 0))^{⊤} Ω^{- 1} (y - X \hat{β} (Ω, 0)) .

\mbox GR (Ω, 0) = (y - X \hat{β} (Ω, 0))^{⊤} Ω^{- 1} (y - X \hat{β} (Ω, 0)) .

\hat{β} (Ω, K_{1}) = \hat{β} (I, K_{2}) and \mbox GR (Ω, K_{1}) = \mbox GR (I, K_{2})

\hat{β} (Ω, K_{1}) = \hat{β} (I, K_{2}) and \mbox GR (Ω, K_{1}) = \mbox GR (I, K_{2})

\mbox rank (\mbox cov (\hat{β} (I, 0) - \hat{β} (Ω, 0))) = \mbox rank (X^{⊤} Ω Z)

\mbox rank (\mbox cov (\hat{β} (I, 0) - \hat{β} (Ω, 0))) = \mbox rank (X^{⊤} Ω Z)

X^{⊤} X Γ K_{1} = K_{2} .

X^{⊤} X Γ K_{1} = K_{2} .

X^{⊤} Ω^{- 1} = (X^{⊤} Ω^{- 1} X + K_{1}) (X^{⊤} X + K_{2})^{- 1} X^{⊤},

X^{⊤} Ω^{- 1} = (X^{⊤} Ω^{- 1} X + K_{1}) (X^{⊤} X + K_{2})^{- 1} X^{⊤},

X^{⊤} Ω^{- 1} Z = 0

X^{⊤} Ω^{- 1} Z = 0

X^{⊤} Ω^{- 1} X = (X^{⊤} Ω^{- 1} X + K_{1}) (X^{⊤} X + K_{2})^{- 1} X^{⊤} X,

X^{⊤} Ω^{- 1} X = (X^{⊤} Ω^{- 1} X + K_{1}) (X^{⊤} X + K_{2})^{- 1} X^{⊤} X,

Ω^{- 1} = X (X^{⊤} X)^{- 1} Γ^{- 1} (X^{⊤} X)^{- 1} X^{⊤} + Z (Z^{⊤} Z)^{- 1} Δ^{- 1} (Z^{⊤} Z)^{- 1} Z^{⊤} .

Ω^{- 1} = X (X^{⊤} X)^{- 1} Γ^{- 1} (X^{⊤} X)^{- 1} X^{⊤} + Z (Z^{⊤} Z)^{- 1} Δ^{- 1} (Z^{⊤} Z)^{- 1} Z^{⊤} .

Γ^{- 1} = (Γ^{- 1} + K_{1}) (X^{⊤} X + K_{2})^{- 1} X^{⊤} X

Γ^{- 1} = (Γ^{- 1} + K_{1}) (X^{⊤} X + K_{2})^{- 1} X^{⊤} X

\overset{ˉ}{K}_{i} = (X^{⊤} X)^{- 1/2} K_{i} (X^{⊤} X)^{- 1/2} (i = 1, 2),

\overset{ˉ}{K}_{i} = (X^{⊤} X)^{- 1/2} K_{i} (X^{⊤} X)^{- 1/2} (i = 1, 2),

\overset{ˉ}{Γ} = (X^{⊤} X)^{1/2} Γ (X^{⊤} X)^{1/2} .

R (\overset{ˉ}{K}_{1}) = R (\overset{ˉ}{K}_{2}) and \overset{ˉ}{K}_{1} \overset{ˉ}{K}_{2} = \overset{ˉ}{K}_{2} \overset{ˉ}{K}_{1},

R (\overset{ˉ}{K}_{1}) = R (\overset{ˉ}{K}_{2}) and \overset{ˉ}{K}_{1} \overset{ˉ}{K}_{2} = \overset{ˉ}{K}_{2} \overset{ˉ}{K}_{1},

Γ = (X^{⊤} X)^{- 1/2} {\overset{ˉ}{K}_{2} \overset{ˉ}{K}_{1}^{+} + (I - \overset{ˉ}{K}_{1} \overset{ˉ}{K}_{1}^{+}) H (I - \overset{ˉ}{K}_{1} \overset{ˉ}{K}_{1}^{+})} (X^{⊤} X)^{- 1/2}

Γ = (X^{⊤} X)^{- 1/2} {\overset{ˉ}{K}_{2} \overset{ˉ}{K}_{1}^{+} + (I - \overset{ˉ}{K}_{1} \overset{ˉ}{K}_{1}^{+}) H (I - \overset{ˉ}{K}_{1} \overset{ˉ}{K}_{1}^{+})} (X^{⊤} X)^{- 1/2}

for some H \in S^{+} (k),

\overset{ˉ}{Γ} \overset{ˉ}{K}_{1} = \overset{ˉ}{K}_{2} .

\overset{ˉ}{Γ} \overset{ˉ}{K}_{1} = \overset{ˉ}{K}_{2} .

\overset{ˉ}{K}_{i} = V D_{i} V^{⊤} (i = 1, 2)

\overset{ˉ}{K}_{i} = V D_{i} V^{⊤} (i = 1, 2)

\overset{ˉ}{Γ}

\overset{ˉ}{Γ}

\overset{ˉ}{Γ} \overset{ˉ}{K}_{1}

\overset{ˉ}{Γ} \overset{ˉ}{K}_{1}

det (Ω) = det ((X, Z) (Γ 0 0 Δ) (X^{⊤} Z^{⊤})) = (det ((X, Z)))^{2} det (Γ) det (Δ) = 1

det (Ω) = det ((X, Z) (Γ 0 0 Δ) (X^{⊤} Z^{⊤})) = (det ((X, Z)))^{2} det (Γ) det (Δ) = 1

det (Δ) = \frac{det ( X ^{⊤} X ) det ( K _{1} )}{( det (( X , Z )) ) ^{2} det ( K _{2} )} .

det (Δ) = \frac{det ( X ^{⊤} X ) det ( K _{1} )}{( det (( X , Z )) ) ^{2} det ( K _{2} )} .

\mbox GR (Ω, K_{1}) = \mbox GR (I, K_{2})

\mbox GR (Ω, K_{1}) = \mbox GR (I, K_{2})

A = (X^{⊤} X)^{- 1} Γ^{- 1} (X^{⊤} X)^{- 1},

A = (X^{⊤} X)^{- 1} Γ^{- 1} (X^{⊤} X)^{- 1},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Covariance structure associated with an equality between two general ridge estimators

Koji Tsukuda111Graduate School of Arts and Sciences, the University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan. mail: [email protected]; Hiroshi Kurata222Graduate School of Arts and Sciences, the University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan. mail: [email protected]

Abstract

In a general linear model, this paper derives a necessary and sufficient condition under which two general ridge estimators coincide with each other. The condition is given as a structure of the dispersion matrix of the error term. Since the class of estimators considered here contains linear unbiased estimators such as the ordinary least squares estimator and the best linear unbiased estimator, our result can be viewed as a generalization of the well-known theorems on the equality between these two estimators, which have been fully studied in the literature. Two related problems are also considered: equality between two residual sums of squares, and classification of dispersion matrices by a perturbation approach.

MSC-2010. primary:62J05. secondary: 62F10, 62J07.

key words and phrases: Best linear unbiased estimator; General linear model; Least squares estimator; Perturbation approach

1 Introduction

In a general linear model, this paper derives a necessary and sufficient condition under which two general ridge estimators coincide with each other. To state the problem more precisely, let us consider

[TABLE]

where $y$ is an $n\times 1$ vector, $X$ is an $n\times k$ matrix ( $n>k$ ) satisfying $\mbox{\rm rank}(X)=k$ , $\sigma^{2}$ is an unknown positive constant and $\Omega$ is a known positive definite matrix. As is well-known, the estimator of the form

[TABLE]

which will be called the Gauss–Markov estimator in the sequel, is the best linear unbiased estimator of $\beta$ , that is, it has the smallest covariance matrix (in terms of positive semidefiniteness) among linear unbiased estimators. This estimator is also optimal with respect to the following quadratic risk functions:

[TABLE]

where $W$ is an arbitrary positive semidefinite matrix. However, if we broaden the class of estimators to that of linear but not necessarily unbiased estimators, it is no longer optimal and general ridge estimators play an essential role instead. Here, a general ridge estimator is defined to be an estimator of the form

[TABLE]

(Rao (1976)), where $\mathcal{S}^{+}(m)$ and $\mathcal{S}^{N}(m)$ denote the sets of $m\times m$ positive definite and semidefinite matrices, respectively. As is proved by Rao (1976) and Markiewicz (1996), the general ridge estimators are linearly sufficient and linearly admissible, and conversely, any linearly sufficient and linearly admissible estimator belongs to the class of general ridge estimators. Moreover, they are linearly complete. For other properties of general ridge estimator, see, for example, Arnold and Stahlecker (2000), Gross (1998) and Groß and Markiewicz (2004).

On the other hand, it is also well-known that there are some cases in which two linear unbiased estimators coincide with each other. Perhaps most important is the one in which the Gauss-Markov estimator $\tilde{\beta}_{GM}$ is identically equal to the ordinary least squares estimator $\tilde{\beta}_{OLS}=(X^{\top}X)^{-1}X^{\top}y$ , which does not depend on $\Omega$ . Conditions for the equality between the two estimators have been studied by many authors so far (see, for example, Baksalary and Trenkler (2009), Chapter 7 of Kariya and Kurata (2004), Puntanen and Styan (1989) and Zyskind (1967)). Among others, Rao (1967) proved that for a given $X$ , the equality $\tilde{\beta}_{GM}=\tilde{\beta}_{OLS}$ holds for all $y$ if and only if $\Omega$ is of the form

[TABLE]

where $Z$ is an $n\times(n-k)$ matrix satisfying $X^{\top}Z=0$ and $\mbox{\rm rank}(Z)=n-k$ , and will be fixed throughout. In this paper, we discuss an identical equality between two general ridge estimators. More precisely, we derive a necessary and sufficient condition for $\Omega$ to guarantee that, for given $K_{1},K_{2}\in\mathcal{S}^{N}(n)$ , the equality

[TABLE]

holds. This result, which will be presented in Section 2, can be regarded as an extension of (1.3), since the class of general ridge estimators includes the Gauss–Markov and the ordinary least squares estimators. Indeed we can readily see

[TABLE]

The class also contains the ordinary ridge estimators $\hat{\beta}(I,\lambda I)=(X^{\top}X+\lambda I)^{-1}X^{\top}y$ and $\hat{\beta}(\Omega,\lambda I)=(X^{\top}\Omega^{-1}X+\lambda I)^{-1}X^{\top}\Omega^{-1}y$ with $\lambda>0$ and shrinkage estimators of the form $\hat{\beta}(I,\rho X^{\top}X)=\rho\tilde{\beta}_{OLS}$ and $\hat{\beta}(\Omega,\rho X^{\top}\Omega^{-1}X)=\rho\tilde{\beta}_{GM}$ with $\rho>0$ .

In Sections 3 and 4, two related problems are considered: First one is the problem of deriving a condition on $\Omega$ under which an identical equality between two generalized residual sums of squares holds. To state it precisely, let

[TABLE]

Then the ordinary residual sums of squares and its Gauss–Markov version are given respectively by

[TABLE]

and

[TABLE]

In the literature, Kariya (1980) derived a necessary and sufficient condition under which $\mbox{GR}(\Omega,0)=\mbox{GR}(I,0)$ in the context of estimation of $\sigma^{2}$ . He also derived a condition for the two equalities $\hat{\beta}(\Omega,0)=\hat{\beta}(I,0)$ and $\mbox{GR}(\Omega,0)=\mbox{GR}(I,0)$ to hold simultaneously. The latter result was generalized by Kurata (1998). See also Groß (1997). In this paper, we generalize their result by considering the case in which

[TABLE]

hold for given $K_{1}$ and $K_{2}$ . Needless to say, the above equalities do not generally hold. Moreover, Kurata (1998) used

[TABLE]

to measure the extent to which $\Omega$ deviates from (1.3). In Section 4, we extend his result to the case including general ridge estimators.

As has been widely recognized, the simple ordinary ridge estimator $\hat{\beta}(I,\lambda I)$ shows better performance in practice than the ordinary least squares estimator $\hat{\beta}(I,0)$ when there exists a multicollinearity in the explanatory variables (Hoerl and Kennard (1970)). Moreover, some previous works such as Frank and Friedman (1993) have reported that $\hat{\beta}(I,\lambda I)$ works well in many cases. Hence, it is valuable to discuss the case $K\neq 0$ also from the practical viewpoint.

2 Equality between two general ridge estimators

In this section, we derive a necessary and sufficient condition for the dispersion matrix $\Omega$ to guarantee an identical equality between two general ridge estimators. We use the fact that the condition (1.3) is equivalent to $X^{\top}\Omega^{-1}Z=0$ .

Theorem 1.

For $K_{1},K_{2}\in S^{N}(k)$ , the equality $\hat{\beta}(\Omega,K_{1})=\hat{\beta}(I,K_{2})$ holds if and only if the dispersion matrix $\Omega$ is of the form (1.3) with some $\Gamma\in\mathcal{S}^{+}(k)$ and $\Delta\in\mathcal{S}^{+}(n-k)$ satisfying

[TABLE]

Proof.

The equality $\hat{\beta}(\Omega,K_{1})=\hat{\beta}(I,K_{2})$ can be rewritten as

[TABLE]

which is further equivalent to the following two equalities:

[TABLE]

and

[TABLE]

since $X^{\top}Z=0$ and the matrix $(X,Z)$ is nonsingular. As is remarked in Section 1, the condition (2.2) is equivalent to (1.3), which can also be expressed as

[TABLE]

Substituting it to (2.3) shows that with (2.2), the condition (2.3) is equivalent to

[TABLE]

This completes the proof.∎∎

Using Theorem 1 with $K_{1}=K_{2}=K$ , we have the following corollary.

Corollary 2.

For $K\in\mathcal{S}^{N}(k)$ , the equality $\hat{\beta}(\Omega,K)=\hat{\beta}(I,K)$ holds if and only if $\Omega$ is of the form (1.3) with $\Gamma$ satisfying $X^{\top}X\Gamma K=K.$ In particular, if $K$ is nonsingular, then $\hat{\beta}(\Omega,K)=\hat{\beta}(I,K)\Leftrightarrow\Omega=X(X^{\top}X)^{-1}X^{\top}+Z\Delta Z^{\top}$ for some $\Delta\in\mathcal{S}^{+}(n-k)$ .

Remark 1.

Let $K=\lambda I\in\mathcal{S}^{+}(k)$ . Then Corollary 2 implies that $\hat{\beta}(\Omega,\lambda I)=\hat{\beta}(I,\lambda I)$ is equivalent to $\Omega=X(X^{\top}X)^{-1}X^{\top}+Z\Delta Z^{\top}$ for some $\Delta\in\mathcal{S}^{+}(n-k)$ .

Remark 2.

Suppose that $\Omega$ satisfies (1.3). Let $K_{1}=\rho X^{\top}\Omega^{-1}X$ and $K_{2}=\rho X^{\top}X$ with $\rho>0$ . Then the two matrices satisfy the condition (2.1). In fact, by (3.5), we see that $X^{\top}\Omega^{-1}X=\Gamma^{-1}$ and hence $K_{1}=\rho\Gamma^{-1}$ , implying $X^{\top}X\Gamma K_{1}=X^{\top}X\Gamma(\rho\Gamma^{-1})=\rho X^{\top}X=K_{2}$ . Thus Theorem 1 applies and hence the equality $\hat{\beta}(\Omega,\rho X^{\top}\Omega^{-1}X)=\hat{\beta}(I,\rho X^{\top}X)$ holds. More specifically, the condition (1.3) is necessary and sufficient for $\hat{\beta}(\Omega,\rho X^{\top}\Omega^{-1}X)=\hat{\beta}(I,\rho X^{\top}X)$ . This conclusion itself is obvious from the forms of the shrinkage estimators.

Next we clarify when there exists $\Gamma$ satisfying the condition (2.1) for given $K_{1},K_{2}\in\mathcal{S}^{N}(k)$ . For this purpose, let

[TABLE]

Needless to say, $\bar{K}_{i}$ and $\bar{\Gamma}$ have a one to one correspondence with $K_{i}$ and $\Gamma$ , respectively.

Proposition 3.

There exists $\Gamma\in\mathcal{S}^{+}(k)$ satisfying (2.1) if and only if $\bar{K}_{1}$ and $\bar{K}_{2}$ satisfy

[TABLE]

where $\mathcal{R}(\bar{K}_{i})$ denotes the range of $\bar{K}_{i}$ . In this case, $\Gamma$ is of the form

[TABLE]

where $\bar{K}_{1}^{+}$ denotes the Moore–Penrose inverse of $\bar{K}_{1}$ .

Proof.

Suppose first that the condition (2.1) holds, which is equivalent to

[TABLE]

Here, the two matrices in the left hand side commute, i.e., $\bar{K}_{1}\bar{\Gamma}=\bar{\Gamma}\bar{K}_{1}$ , since $\bar{K}_{2}$ is symmetric. Due to the nonsingularity of $\bar{\Gamma}$ , the matrices $\bar{K}_{i}$ ’s must satisfy $\mathcal{R}(\bar{K}_{1})=\mathcal{R}(\bar{K}_{2})$ . Hence, by letting $\mbox{\rm rank}(\bar{K}_{1})=\mbox{\rm rank}(\bar{K}_{2})=r$ , they can be commonly expressed as

[TABLE]

with $D_{i}\in\mathcal{S}^{+}(r)$ and $V$ a $k\times r$ matrix satisfying $V^{\top}V=I$ . Since $\bar{\Gamma}\bar{K}_{1}=\bar{K}_{1}\bar{\Gamma}$ , we can write $\bar{\Gamma}=VFV^{\top}+WGW^{\top}$ for some $F\in\mathcal{S}^{+}(r)$ , $G\in\mathcal{S}^{+}(k-r)$ and $W$ a $k\times(k-r)$ matrix satisfying $W^{\top}W=I$ and $V^{\top}W=0$ . Furthermore, $D_{1}$ and $F$ can be taken as diagonal matrices. Hence the equality $\bar{\Gamma}\bar{K}_{1}=\bar{K}_{2}$ implies $VFD_{1}V^{\top}=VD_{2}V^{\top},$ which can be rewritten as $FD_{1}=D_{2}$ . Therefore, $D_{2}$ must be also diagonal, which implies that $\bar{K}_{1}$ and $\bar{K}_{2}$ commute. Thus we have (2.5) and

[TABLE]

where the last expression is equivalent to (2.6), since $\bar{K}_{1}^{+}=VD_{1}^{+}V^{\top}$ and $WW^{\top}=I-\bar{K}_{1}\bar{K}_{1}^{+}$ .

Conversely, suppose that (2.5) hold. Then, by letting $\Gamma$ as in (2.6), we have

[TABLE]

since $\mathcal{R}(\bar{K}_{1})=\mathcal{R}(\bar{K}_{2})$ implies $\bar{K}_{1}^{+}\bar{K}_{1}=\bar{K}_{2}^{+}\bar{K}_{2}$ . This shows the existence of $\Gamma$ that satisfies (2.7), which is equivalent to (2.1). This completes the proof.∎∎

Remark 3.

When $\Omega$ is unknown, it is often assumed that $\det(\Omega)=1$ to make the model identifiable (Kariya (1980)). In this case, in order that $\hat{\beta}(\Omega,K_{1})=\hat{\beta}(I,K_{2})$ holds, the matrices $\Gamma$ and $\Delta$ should satisfy

[TABLE]

as well as (2.1). In particular, when $K_{1}$ and $K_{2}$ are positive definite, the matrix $\Delta$ should satisfy

[TABLE]

3 Equality between residual sums of squares

In this section, we discuss a condition under which the identical equality

[TABLE]

holds in addition to $\hat{\beta}(\Omega,K_{1})=\hat{\beta}(I,K_{2})$ , where the general residual sum of squares $\mbox{GR}(\Omega,K_{1})$ is defined in (1.4). To make notations simpler, let us denote

[TABLE]

where $A$ is positive definite and $B$ is positive semidefinite.

Theorem 4.

For $K_{1},K_{2}\in S^{N}(k)$ , the two identical equalities

[TABLE]

simultaneously hold if and only if the following three conditions

[TABLE]

hold for some $\Gamma\in\mathcal{S}^{+}(k)$ .

Proof.

From Theorem 1, $\hat{\beta}(\Omega,K_{1})=\hat{\beta}(I,K_{2})$ is equivalent to $\Omega=X\Gamma X^{\top}+Z\Delta Z^{\top}$ with $X^{\top}X\Gamma K_{1}=K_{2}$ . In this case, it holds that

[TABLE]

and $y-X\hat{\beta}(\Omega,K_{1})=y-X\hat{\beta}(I,K_{2})$ . This implies that

[TABLE]

with $B$ given in (3.1), and

[TABLE]

Thus the problem is to find a condition under which $y^{\top}(B\Omega^{-1}B-BB)y=0$ holds for arbitrary $y$ , which is clearly equivalent to $B\Omega^{-1}B=BB$ . The quantities $B\Omega^{-1}B$ and $BB$ are calculated respectively as

[TABLE]

and

[TABLE]

where (3.5) is used. Since

[TABLE]

holds, the equality

[TABLE]

is equivalent to

[TABLE]

and

[TABLE]

The equality (3.6) can be rewritten as

[TABLE]

and (3.7) is the same as

[TABLE]

This completes the proof. ∎∎

Remark 4.

The above theorem can be viewed as an extension of Kariya (1980) (Corollary), in which it is shown that (3.2) is a necessary and sufficient condition under which $\hat{\beta}(\Omega,0)=\hat{\beta}(I,0)$ and $\mbox{GR}(\Omega,0)=\mbox{GR}(I,0)$ simulataneously hold. In fact, in Theorem 4, let $K_{1}=K_{2}=0$ . Then the conditions (3.3) and (3.4) vanish, since they hold for all $\Gamma\in\mathcal{S}^{+}(k)$ . Hence, the conditions in the above theorem reduces to (3.2).

Corollary 5.

Let $K\in\mathcal{S}^{+}(k)$ . The two equalities $\hat{\beta}(\Omega,K)=\hat{\beta}(I,K)$ and $\mbox{GR}(\Omega,K)=\mbox{GR}(I,K)$ simultaneously hold if and only if $\Omega=I$ .

Proof.

Letting $K_{1}=K_{2}=K$ , we will use Theorem 4. From (3.3), $\Gamma=(X^{\top}X)^{-1}$ . Since $A=(X^{\top}X)^{-1}$ which yields (3.4), Theorem 4 implies that both $\hat{\beta}(\Omega,\lambda I)=\hat{\beta}(I,\lambda I)$ and $\mbox{GR}(\Omega,\lambda I)=\mbox{GR}(I,\lambda I)$ hold if and only if

[TABLE]

This completes the proof.∎∎

Remark 5.

Let $K=\lambda I\in\mathcal{S}^{+}(k)$ . Then Corollary 5 implies that the two equalities $\hat{\beta}(\Omega,\lambda I)=\hat{\beta}(I,\lambda I)$ and $\mbox{GR}(\Omega,\lambda I)=\mbox{GR}(I,\lambda I)$ simultaneously hold if and only if $\Omega=I$ .

4 Classification criterion of dispersion matrices

As is observed in the previous sections, the condition in Theorem 1 on $\Omega$ rarely holds, and hence the estimators $\hat{\beta}(\Omega,K_{1})$ and $\hat{\beta}(I,K_{2})$ do not coincide in most cases. In the context of comparing the Gauss-Markov and the ordinary least squares estimators, Kurata (1998) used

[TABLE]

as a criterion to measure the difference between them. The rank ranges from [math] to $\min(k,n-k)$ and takes zero if and only if $\Omega$ is of the form (1.3). Hence this criterion can also be regarded as a measure of the extent to which the structure of $\Omega$ deviates from (1.3), or equivalently, a criterion to classify $\Omega$ . This section is devoted to deriving a generalization of his result to the case including general ridge estimators.

Since the quantity (4.1) is the same as

[TABLE]

it is natural to use the rank of $L^{2}$ difference matrix

[TABLE]

as a measure that is applicable to general ridge estimators. Since we have

[TABLE]

where

[TABLE]

is the mean square error matrix (see, for example, (3.1) of Gross (1998)), it is also natural to use the rank of

[TABLE]

We adopt the above two quantities in the sequel.

However, since it is in general not easy to analyze them unless $K_{i}=0$ , we limit our consideration to the case in which both $K_{1}$ and $K_{2}$ are small. More precisely, we fix $L_{1}$ , $L_{2}$ in $\mathcal{S}^{N}(k)$ and use the perturbation approach by letting $K_{1}=\epsilon L_{1}$ , $K_{2}=\epsilon L_{2}$ with a small positive constant $\epsilon$ . Note that $\Omega\in\mathcal{S}^{+}(n)$ can be expressed as

[TABLE]

for some $\Gamma\in\mathcal{S}^{+}(k)$ , $\Delta\in\mathcal{S}^{+}(n-k)$ and $\Xi:k\times(n-k)$ .

Theorem 6.

Fix $L_{1},L_{2}\in\mathcal{S}^{N}(k)$ and $\Omega\in\mathcal{S}^{+}(n)$ , and write $\Omega$ as in (4.4). Consider general ridge estimators $\hat{\beta}(\Omega,K_{1})$ and $\hat{\beta}(I,K_{2})$ with

[TABLE]

and $\epsilon$ a positive constant satisfying

[TABLE]

where $\|\cdot\|$ denotes a matrix norm. Then the quantities ${\rm d_{1}}$ in (4.2) and ${\rm d_{2}}$ in (4.3) are evaluated as

[TABLE]

and

[TABLE]

respectively, as $\epsilon\downarrow 0$ .

Proof.

First we prove (4.5). Clearly, ${\rm d_{1}}(\Omega,K_{1},K_{2})$ is equal to

[TABLE]

As for the first term of (4.7), we have

[TABLE]

The four terms in the right-hand side are further calculated as

[TABLE]

respectively. As for the second term of (4.7), we obtain

[TABLE]

and

[TABLE]

From the definition of matrix functions, equations (4.8)–(4.11) are evaluated as

[TABLE]

respectively. On the other hand, from the definition of matrix functions again, it holds that ${\sf E}[\hat{\beta}(I,K_{2})]=\beta+O(\epsilon)$ and ${\sf E}[\hat{\beta}(\Omega,K_{1})]=\beta+O(\epsilon)$ , which implies that

[TABLE]

Thus we have

[TABLE]

Next we prove (4.6). Clearly, ${\rm d_{2}}(\Omega,K_{1},K_{2})$ is equal to

[TABLE]

As for the first and second terms of (4.12), it holds that

[TABLE]

and

[TABLE]

Moreover, recall that ${\sf E}[\hat{\beta}(I,K_{2})]=\{I+(X^{\top}X)^{-1}K_{2}\}^{-1}\beta$ and ${\sf E}[\hat{\beta}(\Omega,K_{1}))=\{I+(\Gamma-\Xi\Delta^{-1}\Xi^{\top})K_{1}\}^{-1}\beta$ . From the definition of matrix functions, it follows that

[TABLE]

and hence

[TABLE]

which is equal to

[TABLE]

This completes the proof.∎∎

Remark 6.

Since

[TABLE]

the quantity

[TABLE]

can be written in original notation as

[TABLE]

If $\Omega$ is of the form (1.3), then (4.13) is simplified as

[TABLE]

In particular, when $\Omega=I$ , the above quantity is further reduced to

[TABLE]

If we consider estimators such that $K_{1}=\rho X^{\top}\Omega^{-1}X$ and $K_{2}=\rho X^{\top}X$ , then the matrices $L_{1}$ and $L_{2}$ are given by $L_{1}=X^{\top}\Omega^{-1}X$ and $L_{2}=X^{\top}X$ , where the constant $\rho$ is absorbed into $\epsilon$ . In this case, the quantity (4.13) takes the form

[TABLE]

From Theorem 6, when $\epsilon$ is small, the major part of the deviation of the simple estimator $\hat{\beta}(I,K_{2})$ from the good estimator $\hat{\beta}(\Omega,K_{1})$ is characterized by the first term

[TABLE]

Moreover, when $\Xi=0$ (i.e., $\Omega$ is of the form (1.3)), the first term vanishes and the second term becomes

[TABLE]

which is the coefficient of $\epsilon$ . If further $K_{2}=X^{\top}X\Gamma K_{1}$ , then $L_{2}=X^{\top}X\Gamma L_{1}=L_{1}\Gamma X^{\top}X$ holds and hence $(X^{\top}X)^{-1}L_{2}\Gamma+\Gamma L_{2}(X^{\top}X)^{-1}-2\Gamma L_{1}\Gamma$ also vanishes, implying ${\rm d_{1}}(\Omega,K_{1},K_{2})=O(\epsilon^{2})$ and ${\rm d_{2}}(\Omega,K_{1},K_{2})=O(\epsilon^{2})$ .

From this observation, if both $\det(K_{1})$ and $\det(K_{2})$ are small, then the criterion proposed by Kurata (1998) still works even in the case including general ridge estimators. As its extension, based on (4.6), we propose the following two step criterion for classification of dispersion matrices:

Classify according to

[TABLE] 2. 2.

Make a finer classification by using

[TABLE]

Remark 7.

When $K_{1}=K_{2}=0$ , then $v_{2}=0$ and our criterion reduces to that of Kurata (1998).

Remark 8.

Let $\Omega=I$ . Then $v_{1}=0$ and $v_{2}=\mbox{\rm rank}(K_{2}-K_{1})$ . As is seen in this example, even if we consider the same dispersion matrix, classification may vary according to the choice of $K_{1}$ and $K_{2}$ . Besides, when $K_{1}=\lambda_{1}I$ , $K_{2}=\lambda_{2}I$ and $\lambda_{1}\neq\lambda_{2}>0$ , then $v_{2}=k$ .

Remark 9.

If $\Omega$ is of the form (1.3), that is $\Xi=0$ , then

[TABLE]

Remark 10.

When $K_{1}=\rho X^{\top}\Omega^{-1}X$ and $K_{2}=\rho X^{\top}X$ with $\rho>0$ ,

[TABLE]

Acknowledgments

The portion of the second author’s work was supported by Japan Society for the Promotion of Science KAKENHI Grant Number JP26330035.

This is a pre-print of an article published in Statistical Papers. The final authenticated version is available online at: https://doi.org/10.1007/s00362-017-0975-8.

Bibliography16

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Arnold and Stahlecker (2000) Arnold BF, Stahlecker P (2000) Another view of the Kuks–Olman estimator. Journal of Statistical Planning and Inference 89 , no. 1–2, 169–174.
2Baksalary and Trenkler (2009) Baksalary OM, Trenkler G (2009) A projector oriented approach to the best linear unbiased estimator. Statistical Papers 50 , no. 4, 721–733.
3Frank and Friedman (1993) Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35 , no. 2, 109–135.
4Groß (1997) Groß J (1997) A note on equality of MINQUE and simple estimator in the general Gauss–Markov model. Statistics & Probability Letters 35 , no. 4, 335–339.
5Gross (1998) Gross J (1998) On contractions in linear regression. Journal of Statistical Planning and Inference 74 , no. 2, 343–351.
6Groß (2004) Groß J (2004) The general Gauss–Markov model with possibly singular dispersion matrix. Statistical Papers 45 , no. 3, 311–336.
7Groß and Markiewicz (2004) Groß J, Markiewicz A (2004) Characterizations of admissible linear estimators in the linear model. Linear Algebra and its Applications 388 , 239–248.
8Hoerl and Kennard (1970) Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12 , no. 1, 55–67.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Covariance structure associated with an equality between two general ridge estimators

Abstract

1 Introduction

2 Equality between two general ridge estimators

Theorem 1**.**

Proof.

Corollary 2**.**

Remark 1**.**

Remark 2**.**

Proposition 3**.**

Proof.

Remark 3**.**

3 Equality between residual sums of squares

Theorem 4**.**

Proof.

Remark 4**.**

Corollary 5**.**

Proof.

Remark 5**.**

4 Classification criterion of dispersion matrices

Theorem 6**.**

Proof.

Remark 6**.**

Remark 7**.**

Remark 8**.**

Remark 9**.**

Remark 10**.**

Acknowledgments

Theorem 1.

Corollary 2.

Remark 1.

Remark 2.

Proposition 3.

Remark 3.

Theorem 4.

Remark 4.

Corollary 5.

Remark 5.

Theorem 6.

Remark 6.

Remark 7.

Remark 8.

Remark 9.

Remark 10.