Test for homogeneity with unordered paired observations

Jiahua Chen; Pengfei Li; Jing Qin; and Tao Yu

arXiv:1905.01402·math.ST·May 7, 2019

Test for homogeneity with unordered paired observations

Jiahua Chen, Pengfei Li, Jing Qin, and Tao Yu

PDF

Open Access

TL;DR

This paper develops likelihood ratio tests for homogeneity using unordered paired observations, relaxing previous assumptions and improving accuracy with Bartlett corrections, supported by simulations and real data.

Contribution

It introduces new likelihood ratio test procedures for unordered paired data that do not rely on variance or independence assumptions, with improved finite-sample accuracy.

Findings

01

Proposed likelihood ratio tests perform well under various scenarios.

02

Bartlett corrections improve test accuracy for small samples.

03

Methods are validated through simulations and real data examples.

Abstract

In some applications, an experimental unit is composed of two distinct but related subunits. The response from such a unit is $(X_{1}, X_{2})$ but we observe only $Y_{1} = min {X_{1}, X_{2}}$ and $Y_{2} = max {X_{1}, X_{2}}$ , i.e., the subunit identities are not observed. We call $(Y_{1}, Y_{2})$ unordered paired observations. Based on unordered paired observations ${(Y_{1 i}, Y_{2 i})}_{i = 1}^{n}$ , we are interested in whether the marginal distributions for $X_{1}$ and $X_{2}$ are identical. Testing methods are available in the literature under the assumptions that $V a r (X_{1}) = V a r (X_{2})$ and $C o v (X_{1}, X_{2}) = 0$ . However, by extensive simulation studies, we observe that when one or both assumptions are violated, these methods have inflated type I errors or much lower powers. In this paper, we study the likelihood ratio test statistics for various scenarios and explore their limiting distributions…

Tables3

Table 1. Table 1: Values of p n subscript 𝑝 𝑛 p_{n} , r n subscript 𝑟 𝑛 r_{n} , p n ∗ superscript subscript 𝑝 𝑛 p_{n}^{*} , and r n ∗ superscript subscript 𝑟 𝑛 r_{n}^{*} via computer experiments

$n$	10	20	30	40	50	60	70	80	90	100
$p_{n}$	0.809	0.681	0.634	0.627	0.596	0.587	0.585	0.587	0.568	0.568
$r_{n}$	1.312	1.150	1.092	1.070	1.046	1.028	1.030	1.032	1.016	1.012
$p_{n}^{*}$	0.932	0.801	0.749	0.721	0.687	0.674	0.669	0.651	0.649	0.645
$r_{n}^{*}$	1.417	1.194	1.129	1.090	1.062	1.040	1.038	1.028	1.022	1.018

Table 2. Table 2: Simulated Type I errors (%) of LRTs based on limiting distributions/adjusted limiting distributions

Levels	10%	5%	1%	10%	5%	1%
	$n = 25$			$n = 75$
	$ρ = 0$
$R_{n, 1}$	13.7/10.7	7.3/5.7	1.8/1.4	11.3/ 9.9	5.9/5.1	1.3/1.2
$R_{n, 2}$	12.9/10.6	6.9/5.2	1.6/1.0	10.8/10.2	5.6/5.2	1.2/1.0
$R_{n, 1}^{*}$	15.9/10.5	8.1/5.5	1.8/1.1	13.4/10.4	7.0/5.5	1.5/1.1
$R_{n, 2}^{*}$	13.5/10.1	7.4/5.0	1.8/1.1	11.1/10.1	5.9/5.2	1.2/1.0
	$ρ = 0.25$
$R_{n, 1}$	1.2/0.8	0.5/0.3	0.1/0.1	0.1/0.0	0.0/0.0	0.0/0.0
$R_{n, 2}$	3.8/3.0	1.9/1.4	0.4/0.3	1.8/1.7	0.7/0.7	0.1/0.1
$R_{n, 1}^{*}$	15.9/10.5	8.1/5.5	1.8/1.1	13.4/10.4	7.0/5.5	1.5/1.1
$R_{n, 2}^{*}$	13.5/10.1	7.4/5.0	1.8/1.1	11.1/10.1	5.9/5.2	1.2/1.0
	$ρ = 0.5$
$R_{n, 1}$	0.0/0.0	0.0/0.0	0.0/0.0	0.0/0.0	0.0/0.0	0.0/0.0
$R_{n, 2}$	0.7/0.5	0.3/0.2	0.0/0.0	0.1/0.1	0.0/0.0	0.0/0.0
$R_{n, 1}^{*}$	15.9/10.5	8.1/5.5	1.8/1.1	13.4/10.4	7.0/5.5	1.5/1.1
$R_{n, 2}^{*}$	13.5/10.1	7.4/5.0	1.8/1.1	11.1/10.1	5.9/5.2	1.2/1.0
	$ρ = - 0.25$
$R_{n, 1}$	53.7/47.2	38.6/33.0	15.2/12.7	83.1/80.9	71.6/69.1	43.6/41.3
$R_{n, 2}$	39.0/34.0	25.5/21.2	8.6/6.2	67.6/66.3	53.6/52.0	27.3/25.6
$R_{n, 1}^{*}$	15.9/10.5	8.1/5.5	1.8/1.1	13.4/10.4	7.0/5.5	1.5/1.1
$R_{n, 2}^{*}$	13.5/10.1	7.4/5.0	1.8/1.1	11.1/10.1	5.9/5.2	1.2/1.0
	$ρ = - 0.5$
$R_{n, 1}$	92.6/89.9	84.5/80.5	57.5/52.4	100.0/99.9	99.9/99.8	98.5/98.3
$R_{n, 2}$	80.1/76.2	67.1/61.3	37.3/30.2	99.7/99.6	99.0/98.9	94.5/93.9
$R_{n, 1}^{*}$	15.9/10.5	8.1/5.5	1.8/1.1	13.4/10.4	7.0/5.5	1.5/1.1
$R_{n, 2}^{*}$	13.5/10.1	7.4/5.0	1.8/1.1	11.1/10.1	5.9/5.2	1.2/1.0

Table 3. Table 3: Powers (%) of R n , 1 subscript 𝑅 𝑛 1 R_{n,1} , R n , 2 subscript 𝑅 𝑛 2 R_{n,2} , R n , 1 ∗ superscript subscript 𝑅 𝑛 1 R_{n,1}^{*} , and R n , 2 ∗ superscript subscript 𝑅 𝑛 2 R_{n,2}^{*} at the 5% significance level

$σ$	$μ$	$n = 25$				$n = 75$
		$R_{n, 1}$	$R_{n, 2}$	$R_{n, 1}^{*}$	$R_{n, 2}^{*}$	$R_{n, 1}$	$R_{n, 2}$	$R_{n, 1}^{*}$	$R_{n, 2}^{*}$
		$ρ = 0$
1.0	1.0	28.1	18.3	8.3	6.3	57.6	41.8	11.2	8.0
1.0	1.5	67.0	49.7	19.2	11.3	97.5	93.0	40.2	24.8
0.5	1.0	46.9	85.2	12.3	70.5	88.2	99.9	21.7	99.6
0.5	1.5	92.2	99.2	39.2	90.6	100.0	100.0	79.7	100.0
		$ρ = 0.25$
1.0	1.0	7.2	6.2	10.4	7.2	6.7	6.0	16.7	10.5
1.0	1.5	38.8	27.0	29.6	17.5	70.9	56.9	63.9	44.8
0.5	1.0	22.4	77.3	16.4	78.2	43.2	99.8	32.5	99.9
0.5	1.5	80.9	98.5	54.0	95.5	99.7	100.0	93.5	100.0
		$ρ = 0.5$
1.0	1.0	1.0	1.9	15.8	9.8	0.1	1.0	32.8	20.0
1.0	1.5	17.7	13.1	54.7	34.6	22.4	16.6	93.7	83.2
0.5	1.0	8.4	71.8	24.3	91.3	7.6	99.6	53.6	100.0
0.5	1.5	66.0	98.1	76.4	99.5	95.7	100.0	99.5	100.0
		$ρ = - 0.25$
1.0	1.0	65.1	45.6	7.3	5.9	97.7	93.1	9.0	6.8
1.0	1.5	90.0	76.1	14.2	9.0	100.0	99.9	27.1	16.5
0.5	1.0	75.7	92.1	10.2	68.3	99.7	100.0	16.6	99.5
0.5	1.5	97.9	99.7	29.5	87.8	100.0	100.0	64.5	100.0
		$ρ = - 0.5$
1.0	1.0	93.8	81.0	6.7	5.7	100.0	100.0	8.1	6.4
1.0	1.5	99.0	94.3	11.3	7.9	100.0	100.0	19.8	12.2
0.5	1.0	94.9	97.8	9.0	73.9	100.0	100.0	13.3	99.8
0.5	1.5	99.7	100.0	23.3	90.6	100.0	100.0	50.3	100.0

Equations235

\left(\begin{array}[]{c}X_{1i}\\ X_{2i}\end{array}\right)\sim N\left(\left(\begin{array}[]{c}\mu_{1}\\ \mu_{2}\\ \end{array}\right),\left(\begin{array}[]{cc}\sigma_{1}^{2}&\rho\sigma_{1}\sigma_{2}\\ \rho\sigma_{1}\sigma_{2}&\sigma_{2}^{2}\\ \end{array}\right)\right).

\left(\begin{array}[]{c}X_{1i}\\ X_{2i}\end{array}\right)\sim N\left(\left(\begin{array}[]{c}\mu_{1}\\ \mu_{2}\\ \end{array}\right),\left(\begin{array}[]{cc}\sigma_{1}^{2}&\rho\sigma_{1}\sigma_{2}\\ \rho\sigma_{1}\sigma_{2}&\sigma_{2}^{2}\\ \end{array}\right)\right).

H_{0} : (μ_{1}, σ_{1}^{2}) = (μ_{2}, σ_{2}^{2}) \mbox v er s u s H_{a} : (μ_{1}, σ_{1}^{2}) \neq = (μ_{2}, σ_{2}^{2}) .

H_{0} : (μ_{1}, σ_{1}^{2}) = (μ_{2}, σ_{2}^{2}) \mbox v er s u s H_{a} : (μ_{1}, σ_{1}^{2}) \neq = (μ_{2}, σ_{2}^{2}) .

P (Y_{1} \leq y_{1}, Y_{2} \leq y_{2})

P (Y_{1} \leq y_{1}, Y_{2} \leq y_{2})

\phi(y_{1},y_{2};\mbox{\boldmath$\theta$})+\phi(y_{2},y_{1};\mbox{\boldmath$\theta$}),

\phi(y_{1},y_{2};\mbox{\boldmath$\theta$})+\phi(y_{2},y_{1};\mbox{\boldmath$\theta$}),

\ell_{n}(\mbox{\boldmath$\theta$})=\sum_{i=1}^{n}\log\{\phi(Y_{1i},Y_{2i};\mbox{\boldmath$\theta$})+\phi(Y_{2i},Y_{1i};\mbox{\boldmath$\theta$})\}.

\ell_{n}(\mbox{\boldmath$\theta$})=\sum_{i=1}^{n}\log\{\phi(Y_{1i},Y_{2i};\mbox{\boldmath$\theta$})+\phi(Y_{2i},Y_{1i};\mbox{\boldmath$\theta$})\}.

\displaystyle\hat{}\mbox{\boldmath$\theta$}

\displaystyle\hat{}\mbox{\boldmath$\theta$}

\displaystyle\tilde{}\mbox{\boldmath$\theta$}

\displaystyle\check{\mbox{\boldmath$\theta$}}

R_{n,1}=2\{\ell_{n}(\tilde{}\mbox{\boldmath$\theta$})-\ell_{n}(\check{}\mbox{\boldmath$\theta$})\},~{}~{}~{}R_{n,2}=2\{\ell_{n}(\hat{}\mbox{\boldmath$\theta$})-\ell_{n}(\check{}\mbox{\boldmath$\theta$})\}.

R_{n,1}=2\{\ell_{n}(\tilde{}\mbox{\boldmath$\theta$})-\ell_{n}(\check{}\mbox{\boldmath$\theta$})\},~{}~{}~{}R_{n,2}=2\{\ell_{n}(\hat{}\mbox{\boldmath$\theta$})-\ell_{n}(\check{}\mbox{\boldmath$\theta$})\}.

R_{n, 1} \to D 0.5 χ_{0}^{2} + 0.5 χ_{1}^{2};

R_{n, 1} \to D 0.5 χ_{0}^{2} + 0.5 χ_{1}^{2};

R_{n, 2} \to D R \equiv x_{1}, x_{2} sup {2 \mbox x^{τ} \mbox w - \mbox x^{τ} \mbox x},

R_{n, 2} \to D R \equiv x_{1}, x_{2} sup {2 \mbox x^{τ} \mbox w - \mbox x^{τ} \mbox x},

\displaystyle\frac{\partial\ell_{n}(\mu+\Delta,\mu-\Delta,\sigma_{1},\sigma_{2},\rho)}{\partial\Delta}\Big{|}_{\Delta=0,\sigma_{1}=\sigma_{2}}=0.

\displaystyle\frac{\partial\ell_{n}(\mu+\Delta,\mu-\Delta,\sigma_{1},\sigma_{2},\rho)}{\partial\Delta}\Big{|}_{\Delta=0,\sigma_{1}=\sigma_{2}}=0.

\displaystyle\hat{}\mbox{\boldmath$\theta$}^{*}

\displaystyle\hat{}\mbox{\boldmath$\theta$}^{*}

\displaystyle\tilde{}\mbox{\boldmath$\theta$}^{*}

\displaystyle\check{}\mbox{\boldmath$\theta$}^{*}

\displaystyle R^{*}_{n,1}=2\{\ell_{n}(\tilde{}\mbox{\boldmath$\theta$}^{*})-\ell_{n}(\check{}\mbox{\boldmath$\theta$}^{*})\},~{}~{}~{}R^{*}_{n,2}=2\{\ell_{n}(\hat{}\mbox{\boldmath$\theta$}^{*})-\ell_{n}(\check{}\mbox{\boldmath$\theta$}^{*})\}.

\displaystyle R^{*}_{n,1}=2\{\ell_{n}(\tilde{}\mbox{\boldmath$\theta$}^{*})-\ell_{n}(\check{}\mbox{\boldmath$\theta$}^{*})\},~{}~{}~{}R^{*}_{n,2}=2\{\ell_{n}(\hat{}\mbox{\boldmath$\theta$}^{*})-\ell_{n}(\check{}\mbox{\boldmath$\theta$}^{*})\}.

R_{n, 1}^{*} \to D 0.5 χ_{0}^{2} + 0.5 χ_{1}^{2};

R_{n, 1}^{*} \to D 0.5 χ_{0}^{2} + 0.5 χ_{1}^{2};

R_{n, 2}^{*} \to D R^{*} \equiv max {w_{1}^{2} + (w_{2}^{+})^{2}, w_{1}^{2} + (w_{3}^{+})^{2}},

R_{n, 2}^{*} \to D R^{*} \equiv max {w_{1}^{2} + (w_{2}^{+})^{2}, w_{1}^{2} + (w_{3}^{+})^{2}},

P(R^{*}\leq x)=P\Big{(}\max\{w_{1}^{2}+(w_{2}^{+})^{2},w_{1}^{2}+(w_{3}^{+})^{2}\}\leq x\Big{)}=\int_{0}^{x}\Phi^{2}(\sqrt{x-y})(2\pi y)^{-1/2}\exp(-y/2)dy

P(R^{*}\leq x)=P\Big{(}\max\{w_{1}^{2}+(w_{2}^{+})^{2},w_{1}^{2}+(w_{3}^{+})^{2}\}\leq x\Big{)}=\int_{0}^{x}\Phi^{2}(\sqrt{x-y})(2\pi y)^{-1/2}\exp(-y/2)dy

F_{n 1}

F_{n 1}

F_{n 2}

F_{n 1}^{*}

F_{n 2}^{*}

p_{n}

p_{n}

r_{n}

p_{n}^{*}

r_{n}^{*}

(0.5 - 1.440 n^{- 0.676}) χ_{0}^{2} + (0.5 + 1.440 n^{- 0.676}) χ_{1}^{2}

(0.5 - 1.440 n^{- 0.676}) χ_{0}^{2} + (0.5 + 1.440 n^{- 0.676}) χ_{1}^{2}

(1 + 4.589 n^{- 1.163}) R

(0.5 - 1.332 n^{- 0.492}) χ_{0}^{2} + (0.5 + 1.332 n^{- 0.492}) χ_{1}^{2}

(1 + 6.325 n^{- 1.176}) R^{*}

(\overset{μ}{^}_{1}^{*}, \overset{μ}{^}_{2}^{*}, \overset{σ}{^}_{1}^{*}, \overset{σ}{^}_{2}^{*}, \overset{ρ}{^}^{*}) = (62.05, 65.55, 3.50, 8.20, - 0.73) .

(\overset{μ}{^}_{1}^{*}, \overset{μ}{^}_{2}^{*}, \overset{σ}{^}_{1}^{*}, \overset{σ}{^}_{2}^{*}, \overset{ρ}{^}^{*}) = (62.05, 65.55, 3.50, 8.20, - 0.73) .

(\overset{μ}{^}_{1}^{*}, \overset{μ}{^}_{2}^{*}, \overset{σ}{^}_{1}^{*}, \overset{σ}{^}_{2}^{*}, \overset{ρ}{^}^{*}) = (86.75, 68.58, 10.55, 8.29, 0.46) .

(\overset{μ}{^}_{1}^{*}, \overset{μ}{^}_{2}^{*}, \overset{σ}{^}_{1}^{*}, \overset{σ}{^}_{2}^{*}, \overset{ρ}{^}^{*}) = (86.75, 68.58, 10.55, 8.29, 0.46) .

\displaystyle\ell_{n}^{*}(\mbox{\boldmath$\theta$})

\displaystyle\ell_{n}^{*}(\mbox{\boldmath$\theta$})

E (Z_{1 i}) = (μ_{1} + μ_{2}) /2 = μ,

E (Z_{1 i}) = (μ_{1} + μ_{2}) /2 = μ,

E (Z_{2 i}) = (μ_{1} - μ_{2}) /2 = Δ,

\mbox \sc v a r (Z_{1 i}) = (1/4) (σ_{1}^{2} + σ_{2}^{2} + 2 ρ σ_{1} σ_{2}) = σ_{+}^{2},

\mbox \sc v a r (Z_{2 i}) = (1/4) (σ_{1}^{2} + σ_{2}^{2} - 2 ρ σ_{1} σ_{2}) = σ_{-}^{2},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods in Clinical Trials · Optimal Experimental Design Methods · Statistical Methods and Inference

Full text

Test for homogeneity with unordered paired observations

Jiahua Chen, Pengfei Li, Jing Qin, and Tao Yu

Abstract

In some applications, an experimental unit is composed of two distinct but related subunits. The response from such a unit is $(X_{1},X_{2})$ but we observe only $Y_{1}=\min\{X_{1},X_{2}\}$ and $Y_{2}=\max\{X_{1},X_{2}\}$ , i.e., the subunit identities are not observed. We call $(Y_{1},Y_{2})$ unordered paired observations. Based on unordered paired observations $\{(Y_{1i},Y_{2i})\}_{i=1}^{n}$ , we are interested in whether the marginal distributions for $X_{1}$ and $X_{2}$ are identical. Testing methods are available in the literature under the assumptions that $\mbox{\sc var}(X_{1})=\mbox{\sc var}(X_{2})$ and ${\mbox{\sc cov}}(X_{1},X_{2})=0$ . However, by extensive simulation studies, we observe that when one or both assumptions are violated, these methods have inflated type I errors or much lower powers. In this paper, we study the likelihood ratio test statistics for various scenarios and explore their limiting distributions without these restrictive assumptions. Furthermore, we develop Bartlett correction formulae for these statistics to enhance their precision when the sample size is not large. Simulation studies and real-data examples are used to illustrate the efficacy of the proposed methods.

1 Introduction

In some applications, an experimental unit is made of two distinct but related subunits. The response from such a unit is $(X_{1},X_{2})$ but we observe only $Y_{1}=\min\{X_{1},X_{2}\}$ and $Y_{2}=\max\{X_{1},X_{2}\}$ ; that is, the subunit identities are not observed or unobservable. We call $(Y_{1},Y_{2})$ unordered paired observations. We assume that $(X_{1i},X_{2i})^{\tau}$ , for $i=1,\ldots,n$ , are independent and identically distributed (i.i.d.) normal random vectors:

[TABLE]

We say that $\{(Y_{1i},Y_{2i})\}_{i=1}^{n}$ are uncorrelated when $\rho=0$ and correlated when $\rho\neq 0$ . This paper studies the homogeneity testing of the marginal distributions of $X_{1i}$ and $X_{2i}$ :

[TABLE]

Unordered paired data occur in many applications, and there is a long research history. For instance, Hinkley (1973) analyzed such a data set from human genetics. The genetic blueprint of an individual is contained in 23 pairs of chromosomes. Each member of the pair is inherited from the corresponding chromosome pair of a parent. If we do not know the chromosome correspondences between the offspring and the parents, we lose the parental identities and end up with unordered paired observations. Olkin and Viana (1995) provide more examples. In visual acuity studies, we may record only a subject’s extreme acuities (the “best” and “worst” acuities) without recording the corresponding eyes. In twin experiments, we obtain unordered paired observations without a label for each member of a twin pair; see Ernst et al. (1996) and Shekar et al. (2006) and the references therein. Furthermore, unordered data of a higher dimension are collected in various scientific disciplines. For example, Davies and Phillips (1988) provided an example of unordered data of dimension $k$ . In the interim analysis of a double-blinded clinical trial of $k$ treatments, we get the $k$ order statistics without knowledge of the corresponding treatments; see also van der Meulen (2005) and Miller et al. (2009). In diffusion tensor (DT) brain imaging (see Yu et al. (2013) and the references therein), the eigenvalues of the DT estimates for each brain voxel are viewed as unordered triples.

With unordered paired observations, a fundamental question is whether or not $X_{1i}$ and $X_{2i}$ have the same distribution. Under Model (1), this is equivalent to testing the hypothesis specified in (2). Hinkley (1973) proposed a likelihood ratio test (LRT) procedure under the assumption that $\rho=0$ and $\sigma_{1}^{2}=\sigma_{2}^{2}$ . Li and Qin (2011) investigated this problem in a semiparametric setup. Other approaches can be found in Moore II (1973), Lauder (1977), Moore II et al. (1979), Carothers (1981), Efron et al. (1971), and Qin and Zhang (2005), among others. All these works assume that $X_{1i}$ and $X_{2i}$ are independent with equal variance. These assumptions may not hold in applications, and they can be severely violated, as evidenced by the examples in Section 5. Ignoring the dependence structure and/or imposing an incorrect equal-variance assumption can lead to unreliable inference conclusions: the type I error may be severely inflated or the power markedly decreased.

This paper focuses on tests for (2). In particular, we study the LRT in four scenarios: (1) $\rho=0$ and $\sigma_{1}^{2}=\sigma_{2}^{2}$ ; (2) $\rho=0$ ; (3) $\sigma_{1}^{2}=\sigma_{2}^{2}$ ; and (4) no assumption on $\rho$ , $\sigma_{1}^{2}$ , and $\sigma_{2}^{2}$ .

Investigating the asymptotic behavior of these LRT statistics is technically challenging. The well-developed theory (Wilks, 1938; Chernoff, 1954; Self and Liang, 1987; Drton, 2009) is not applicable because of the undesirable mathematical properties (see (5) in Section 2) of the log-likelihood function. In addition, an important byproduct of the theory for the corresponding LRT statistics is the asymptotic behavior of the maximum likelihood estimators (MLEs) for $(\mu_{1},\mu_{2},\sigma^{2}_{1},\sigma_{2}^{2})$ . Interestingly, we have shown that the asymptotic behavior depends on whether $\rho=0$ is known or $\rho$ is unknown. The convergence rates of these parameter estimates depend on the scenario.

We observe that the limiting distributions of the LRT statistics under $H_{0}$ are not sufficiently accurate approximations to their finite-sample distributions when $n$ is not large. To enhance the approximation precision of the limiting distributions, we adjust the statistics based on the Bartlett correction (Bartlett, 1937; Lawley, 1956). Simulation results confirm the efficacy of the adjustment.

We organize the rest of the paper as follows. Section 2 introduces the LRT statistics for (2) and studies their asymptotic behavior under $H_{0}$ . Section 3 presents the adjusted limiting distributions of our statistics for data of limited sample size. Section 4 contains simulation studies, and Section 5 gives real-data examples. The technical details are relegated to Section 6.

2 Main Results

The LRT is an essential tool in statistical inference, especially under the parametric model assumption; see Wilks (1938); Chernoff (1954); Self and Liang (1987); Drton (2009), and the references therein. In this section, we present LRT statistics and study their properties for testing (2) under model assumptions on $\rho$ and whether or not $\sigma_{1}^{2}=\sigma_{2}^{2}$ .

We first derive the log-likelihood function with unordered paired observations. For any $y_{1}<y_{2}$ , we have

[TABLE]

Therefore, the joint density function of $(Y_{1},Y_{2})$ is given by

[TABLE]

where $\phi(x_{1},x_{2};\mbox{\boldmath$ \theta $})$ denotes the bivariate normal density function with parameters $\mbox{\boldmath$ \theta $}=(\mu_{1},\mu_{2},\sigma_{1},\sigma_{2},\rho)^{\tau}$ specified in (1). The log-likelihood function based on $\{(Y_{1i},Y_{2i})\}_{i=1}^{n}$ and Model (1) is:

[TABLE]

This likelihood function is the basis for our subsequent development.

2.1 Unordered uncorrelated paired data

In this section, we assume that $\rho=0$ is known; problem (2) is reduced to $H_{0}:\mu_{1}=\mu_{2},\sigma_{1}=\sigma_{2}$ . We define

[TABLE]

and we use the notational convention that the entries of $\hat{}\mbox{\boldmath$ \theta $}$ are $\hat{\mu}_{1}$ , $\hat{\mu}_{2}$ , and so on. Note that $\hat{}\mbox{\boldmath$ \theta $}$ , $\tilde{}\mbox{\boldmath$ \theta $}$ , and $\check{}\mbox{\boldmath$ \theta $}$ are MLEs of $\theta$ under various constraints. The LRT statistics for testing the null hypothesis (2) against two alternatives, specified by $\sigma_{1}=\sigma_{2}$ and $\sigma_{1}\neq\sigma_{2}$ respectively, are given by

[TABLE]

Theorem 1 below establishes the asymptotic distributions of $R_{n,1}$ and $R_{n,2}$ as well as the convergence rates of $\tilde{}\mbox{\boldmath$ \theta $}$ and $\hat{}\mbox{\boldmath$ \theta $}$ under $H_{0}$ . For presentational continuity, we relegate its proof to Section 6. Let $\stackrel{{\scriptstyle\cal D}}{{\to}}$ denote “convergence in distribution.” We use $0.5\chi^{2}_{0}+0.5\chi^{2}_{1}$ for an equal mixture of $\chi_{0}^{2}$ and $\chi_{1}^{2}$ , with $\chi_{0}^{2}$ being the distribution with a point mass at zero.

Theorem 1.

Assume Model (1) and $\rho=0$ . Under $H_{0}$ , as $n\to\infty$ , we have

(a)

$(\tilde{\mu}_{1}-\mu_{0})^{2},(\tilde{\mu}_{2}-\mu_{0})^{2}$ , and $\tilde{\sigma}-\sigma_{0}$ are all of order $O_{p}(n^{-1/2})$ , and

[TABLE] 2. (b)

$(\hat{\mu}_{j}-\mu_{0})^{2}$ , $(\hat{\sigma}_{j}-\sigma_{0})^{2}$ for $j=1,2$ are all of order $O_{p}(n^{-1/2})$ , and

[TABLE]

where $\mbox{\bf x}^{\tau}=(x_{1}^{2},x_{2}^{2},2x_{1}x_{2})$ and $\mbox{\bf w}^{\tau}=(w_{1},w_{2},w_{3})$ with $w_{1},w_{2},w_{3}$ being three i.i.d. $N(0,1)$ random variables.

Deriving the asymptotic null distributions of $R_{n,1}$ and $R_{n,2}$ is technically challenging. We make the following comments. Let $\mu=(\mu_{1}+\mu_{2})/2$ and $\Delta=(\mu_{1}-\mu_{2})/2$ so that $\mu_{1}=\mu+\Delta$ and $\mu_{2}=\mu-\Delta$ ; we have

[TABLE]

This fact implies that the Fisher information matrix of $\theta$ under the null hypothesis degenerates and undermines the basis for the elegant classical results (Wilks, 1938; Chernoff, 1954; Self and Liang, 1987; Drton, 2009). The crucial step in obtaining the asymptotic null distribution of the LRT is a quadratic approximation in $\hat{}\mbox{\boldmath$ \theta $}-\mbox{\boldmath$ \theta $}$ to the log-likelihood ratio function. Following this path, we need to consider a fourth-order Taylor expansion to obtain a quadratic approximation in $(\hat{}\mbox{\boldmath$ \theta $}-\mbox{\boldmath$ \theta $})^{2}$ and so on. Fortunately, we find that the sandwich technique of Chen and Chen (2001) and Chen et al. (2001) overcomes the technical obstacles caused by (5).

2.2 Unordered correlated pair data

In this section, we study the LRTs for (2) with $\rho$ being an unknown parameter. Define

[TABLE]

Similarly to the strategy for (4), we define the LRT statistics for (2) with $\rho$ being an unknown parameter:

[TABLE]

Theorem 2 below establishes the asymptotic distributions of $R_{n,1}^{*}$ and $R_{n,2}^{*}$ as well as the convergence rates of $\tilde{}\mbox{\boldmath$ \theta $}^{*}$ and $\hat{}\mbox{\boldmath$ \theta $}^{*}$ under their respective $H_{0}$ . The proof is given in Section 6.

Theorem 2.

Assume Model (1) but do not assume $\rho=0$ . Under $H_{0}$ , as $n\to\infty$ , we have

(a)

$(\tilde{\mu}_{1}^{*}-\mu_{0})^{2}$ , $(\tilde{\mu}_{2}^{*}-\mu_{0})^{2}$ , $(\tilde{\sigma}^{*}-\sigma_{0})$ , and $(\tilde{\rho}^{*}-\rho_{0})$ are all of order $O_{p}\left(n^{-1/4}\right)$ , and

[TABLE] 2. (b)

$(\hat{\mu}_{1}^{*}-\mu_{0})^{2}$ , $(\hat{\mu}_{2}^{*}-\mu_{0})^{2}$ , $\hat{\sigma}^{*}_{1}-\sigma_{0}$ , $\hat{\sigma}^{*}_{2}-\sigma_{0}$ , and $\hat{\rho}^{*}-\rho_{0}$ are all of order $O_{p}\left(n^{-1/4}\right)$ , and

[TABLE]

where $w_{1}$ , $w_{2}$ , and $w_{3}$ are three i.i.d. $N(0,1)$ random variables.

The limiting cumulative distribution function (c.d.f.) of $R^{*}_{n,2}$ is given by:

[TABLE]

for $x\geq 0$ with $\Phi(\cdot)$ being the c.d.f. of the standard normal distribution. We use this expression to evaluate the asymptotic quantile and the p-value for the corresponding test.

3 Adjusted Limiting Distributions

One drawback of the general asymptotic results is that they may offer poor approximations to the corresponding finite-sample distributions. The convergence rates of the parameter estimators given in Theorems 1 and 2 are much lower than those of the MLEs from the regular parametric models. This adversely affects the approximation accuracy of the asymptotic distributions to the finite-sample distributions of the LRT statistics. To improve the approximation precision when $n$ is not very large, we use the Bartlett correction. Suppose the limiting distribution of a statistic $T_{n}$ is given by $F(x)$ . We may search for a sequence of c.d.f.s $F_{n}(x)\to F(x)$ such that $F_{n}(x)$ and $T_{n}$ have the same first moment up to order $O\left(n^{-1}\right)$ . This idea was pioneered by Bartlett (1937) and generalized by Lawley (1956).

In this spirit, we search for accurate approximate distributions for $R_{n,1}$ , $R_{n,2}$ , $R_{n,1}^{*}$ , and $R_{n,2}^{*}$ as follows. Recall that $R$ and $R^{*}$ are the limiting distributions of $R_{n,2}$ and $R^{*}_{n,2}$ . Let

[TABLE]

We need to find $p_{n}$ , $r_{n}$ , $p^{*}_{n}$ , and $r^{*}_{n}$ so that the above distributions have first moments very close to the first moments of their corresponding test statistics for a wide range of $n$ values. High-order asymptotic techniques can be used, but they may involve complicated analytical tools with little assurance of the quality of the end products. The computer experiment approach of Chen and Li (2011) is more effective and practical, and it matches the spirit of the data science.

The experiment works as follows. We consider a sufficiently wide range of values for $n$ . For each $n$ , we simulate a large number of data sets, with each data set composed of $n$ i.i.d. unordered paired observations. Due to the invariance property of the LRT statistics, each data set is generated from the standard bivariate normal distribution. Based on these data sets, we obtain the simulated first moments of $R_{n,1}$ , $R_{n,2}$ , $R_{n,1}^{*}$ , and $R_{n,2}^{*}$ . We choose $p_{n}$ so that the simulated first moment of $R_{n,1}$ matches the first moment of $F_{n1}$ . We then look for a regression model for $p_{n}$ versus $n$ . Similar procedures are applied to obtain regression models for $r_{n}$ , $p^{*}_{n}$ , and $r^{*}_{n}$ .

Specifically, let us take $R_{n,1}$ for ease of illustration:

Step 1. For every $n$ in $\{10,20,\ldots,100\}$ , generate $N=50,000$ data sets of size $n$ .
Step 2. Obtain $N$ values of $R_{n,1}$ and therefore its simulated first moment, denoted $\hat{p}_{n}$ . Match $\hat{p}_{n}$ with the first moment of $F_{n1}$ to find $p_{n}=\hat{p}_{n}$ .
Step 3. Fit a regression model to $(n,p_{n})$ with $p_{n}$ being the response and $n$ being the covariate.

We postulate the following nonlinear but parametric regression models:

[TABLE]

with $a$ and $b$ being regression parameters, and $\epsilon_{n}$ accounting for imperfect fit. Applying Steps 1–2 outlined above leads to the $p_{n}$ , $r_{n}$ , $p_{n}^{*}$ , and $r_{n}^{*}$ values in Table 1. Fitting the nonlinear regression models (6)–(9) to the data in Table 1 gives us the fitted values of $a$ and $b$ . With these values, we calculate the approximate p-values with the following adjusted limiting distributions:

[TABLE]

We have implemented the four LRT statistics with the proposed adjusting limiting distributions in an R package; it is available upon request.

4 Simulation Studies

4.1 Data generation

Because of the invariance property, we need only study the LRT tests based on data generated from distributions with standardized parameter values.

To examine the sizes of the tests, we simulate at $\mu_{1}=\mu_{2}=0$ and $\sigma_{1}=\sigma_{2}=1$ in (1). We study five cases corresponding to $\rho=-0.5,-0.25,0,0.25$ , and $0.5$ . To compare the powers of the tests, we set $\mu_{1}=0$ , $\sigma_{1}=1$ , and form 20 cases as combinations of $\mu_{2}=1.0,1.5$ , $\sigma_{2}=1.0,0.5$ and $\rho=-0.5,-0.25,0,0.25,0.5$ .

In each case, we generate $(X_{1},X_{2})$ from model (1) with one of the above parameter settings. Then, we obtain $Y_{1}=\min\{X_{1},X_{2}\}$ and $Y_{2}=\max\{X_{1},X_{2}\}$ . We repeat the process to obtain $n$ unordered pairs $(Y_{1},Y_{2})$ .

Based on each set of $n$ unordered pairs, we compute the values of $R_{n,1}$ , $R_{n,2}$ , $R_{n,1}^{*}$ , and $R_{n,2}^{*}$ and carry out the tests for $H_{0}$ without checking that the model for generating the data satisfies the conditions for the tests. We record the rejection rates based on $50,000$ repetitions; the results are presented in the next section.

4.2 Results

We calculate the rejection rate of each test at the significance levels $\alpha=10\%,5\%$ , and $1\%$ . The rejection percentages under the null models are summarized in Table 2.

When $\rho=0$ , $X_{1}$ and $X_{2}$ are simulated to be independent. The assumptions for all the LRTs, $R_{n,1}$ , $R_{n,2}$ , $R_{n,1}^{*}$ , and $R_{n,2}^{*}$ , are satisfied. However, as shown in the first section of Table 2, if their limiting distributions are applied without adjustment, the resulting tests are inaccurate: their type I errors markedly exceed the nominal significance levels. The adjustment proposed in Section 3 is very helpful. After the adjustment, the type I errors of all the tests are close to the nominal levels. The precision is impressive since the adjustment works well even when $n$ is as small as $25$ .

When $\rho=\pm 0.25$ or $\pm 0.5$ , the model assumptions for $R_{n,1}$ and $R_{n,2}$ are violated. When we apply the tests, the type I errors are either near zero when $\rho=0.25$ or $0.5$ or seriously inflated when $\rho=-0.25$ or $-0.5$ . In contrast, because of their invariance property, $R_{n,1}^{*}$ and $R_{n,2}^{*}$ continue to perform well: with their limiting distributions adjusted, they have satisfactory precision in the type I errors.

To further illustrate the effects of the adjustment on the limiting distributions, Figure 1 presents the type I errors (%) of our LRTs at the 5% significance level when $100\leq n\leq 1500$ and $\rho=0$ . The trends for the 10% and 1% significance levels are similar and are omitted. The plots show that the type I errors of $R_{n,1}$ , $R_{n,2}$ after the adjustment are within a $0.2\%$ band of the nominal level for large $n$ and a $0.4\%$ band otherwise; similar results are observed for $R_{n,1}^{*}$ . For $R_{n,1}^{*}$ , the approximation accuracy shows no clear improvement as $n$ increases, but the type I errors are between 5% and 5.4%, which is sufficiently accurate for typical applications.

Next, we compare the powers of $R_{n,1}$ , $R_{n,2}$ , $R_{n,1}^{*}$ , and $R_{n,2}^{*}$ under the alternatives. All combinations of $n$ , $\rho$ , $\mu$ , and $\sigma$ are incorporated, as described in Section 4.1. Their powers, summarized in Table 3, are computed at the 5% significance level based on the adjusted limiting distributions. We observe that when $\rho=0$ , $R_{n,1}$ and $R_{n,2}$ have higher powers than $R_{n,1}^{*}$ and $R_{n,2}^{*}$ ; when $\rho=0.25$ , $R_{n,1}$ and $R_{n,2}$ have higher powers in most cases; when $\rho$ is increased to 0.5, $R_{n,1}^{*}$ and $R_{n,2}^{*}$ are much more powerful; when $\rho=-0.25$ and $-0.5$ , $R_{n,1}$ and $R_{n,2}$ are more powerful, but at the cost of the inflated type I errors reported in Table 2; a test with a markedly inflated type I error is generally not recommended.

5 Real-Data Examples

5.1 Data from karyotype analysis

This example considers 40 unordered pairs of the lengths of the longer and shorter arms of chromosome II of Larix decidua from 40 specimens; so $n=40$ . The data are available in Table 1 of Matérn and Simak (1968). The test results from $R_{n,1}$ , $R_{n,2}$ , $R_{n,1}^{*}$ , and $R_{n,2}^{*}$ for (2) are as follows:

•

$R_{n,1}=14.91$ and $R_{n,2}=17.71$ . Calibrated by the adjusted limiting distributions, the asymptotic $p$ -values of $R_{n,1}$ and $R_{n,2}$ are $7\times 10^{-5}$ and $2\times 10^{-4}$ .

•

$R_{n,1}^{*}=1.08$ and $R_{n,2}^{*}=16.69$ . Calibrated by the adjusted limiting distributions, the asymptotic $p$ -values of $R_{n,1}^{*}$ and $R_{n,2}^{*}$ are $0.21$ and $4\times 10^{-4}$ .

The maximum likelihood estimate of $(\mu_{1},\mu_{2},\sigma_{1},\sigma_{2},\rho)$ is found to be

[TABLE]

Note that $\hat{\rho}^{*}=-0.73$ suggests strong negative correlation between $X_{1i}$ and $X_{2i}$ . As revealed in the simulation studies reported in the bottom section of Table 2, $R_{n,1}$ and $R_{n,2}$ are therefore not reliable because they are designed for $\rho=0$ . Moreover, the fitted values $\hat{\mu}_{1}^{*}$ and $\hat{\mu}_{2}^{*}$ are very close, but $\hat{\sigma}_{1}^{*}$ and $\hat{\sigma}_{2}^{*}$ are significantly different. Hence, $R_{n,1}^{*}$ is unsuitable because it is designed for the case where $\sigma_{1}=\sigma_{2}$ . We recommend $R_{n,2}^{*}$ , which is designed to detect departures from either equal-mean or equal-variance hypotheses.

5.2 C-band area of human chromosome data

This example consists of normalized measurements of the C-band area on the No. 9 chromosome pair (Mason et al., 1975). The measurements are based on three groups: the father, mother, and offspring. These groups respectively have 40, 18, and 31 unordered pairs of normalized measurements of the C-band area. The data are available in Table 1 of Lauder (1977). We analyze the group of fathers as an example; the analysis of the other groups is similar. We constructed $R_{n,1}$ , $R_{n,2}$ , $R_{n,1}^{*}$ , and $R_{n,2}^{*}$ and the corresponding $p$ -values from the adjusted limiting distributions. The results are as follows:

•

$R_{n,1}=6.51$ and $R_{n,2}=9.47$ with $n=40$ . Calibrated by the adjusted limiting distributions, the asymptotic $p$ -values of $R_{n,1}$ and $R_{n,2}$ are $6.6\times 10^{-3}$ and $8.9\times 10^{-3}$ .

•

$R_{n,1}^{*}=10.74$ and $R_{n,2}^{*}=13.48$ with $n=40$ . Calibrated by the adjusted limiting distributions, the asymptotic $p$ -values of $R_{n,1}^{*}$ and $R_{n,2}^{*}$ are 7.5 $\times 10^{-4}$ and $1.9\times 10^{-3}$ .

The maximum likelihood estimate of $(\mu_{1},\mu_{2},\sigma_{1},\sigma_{2},\rho)$ is found to be

[TABLE]

Note that $\hat{\rho}^{*}=0.46$ suggests strong postive correlation between $X_{1i}$ and $X_{2i}$ . Moreover, $\hat{\mu}_{1}^{*}$ and $\hat{\mu}_{2}^{*}$ are quite different whereas $\hat{\sigma}_{1}^{*}\approx\hat{\sigma}_{2}^{*}$ . These suggest that $R_{n,1}^{*}$ is the most suitable test while $R^{*}_{n,2}$ is also a possibility. Note that $R_{n,1}^{*}$ is sharper than $R_{n,2}^{*}$ with a smaller p-value.

6 Technical Details

6.1 Reparameterization and preparation lemmas

Recall that $(Y_{1i},Y_{2i})$ is the unordered pair of $(X_{1i},X_{2i})$ and the latter has a bivariate normal distribution with parameter vector $\mbox{\boldmath$ \theta $}=(\mu_{1},\mu_{2},\sigma_{1},\sigma_{2},\rho)^{\tau}$ . The log-likelihood function based on $\{(Y_{1i},Y_{2i})\}_{i=1}^{n}$ is

[TABLE]

Let $Z_{1i}=(X_{1i}+X_{2i})/2$ and $Z_{2i}=(X_{1i}-X_{2i})/2$ . We introduce notation for the following quantities:

[TABLE]

Further, let $\beta_{0}=\Delta-\mu({\sigma_{-}}/{\sigma_{+}})\xi,~{}~{}\beta_{1}=({\sigma_{-}}/{\sigma_{+}})\xi,~{}~{}\eta^{2}=(1-\xi^{2})\sigma_{-}^{2},$ and

[TABLE]

Note that we use $\phi(x;\mu,\sigma)$ to denote the density function of $N(\mu,\sigma^{2})$ , matching $\phi(x_{1},x_{2};\mbox{\boldmath$ \theta $})$ for the bivariate normal distribution.

With these, we obtain the following decomposition of the likelihood function:

[TABLE]

We use a generic $\theta$ for the parameters, which may be interpreted as $\mbox{\boldmath$ \theta $}=(\mu,\sigma_{+},\beta_{0},\beta_{1},\eta)^{\tau}$ when necessary.

Under $H_{0}$ in Theorem 1 which includes the assumption that $\rho=0$ , suppose the true parameter values of the data-generating distribution are $\mu_{1}=\mu_{2}=\mu_{*}$ , $\sigma^{2}_{1}=\sigma^{2}_{2}=\sigma^{2}_{*}$ . We may then, in our proofs, work with the transformed data

[TABLE]

After the transformation, the algebraic form of the likelihood does not change but the true parameter values of the data-generating distribution become $\mu_{1}=\mu_{2}=0$ and $\sigma_{1}^{2}=\sigma_{2}^{2}=2$ . Without loss of generality, based on the above invariance property, we may assume that the true parameters $\mu_{1}=\mu_{2}=0$ and $\sigma_{1}^{2}=\sigma_{2}^{2}=2$ under $H_{0}$ .

Under $H_{0}$ in Theorem 2, without loss of generality, the same assumption is applicable to $\mu$ and $\sigma$ . We now reveal that by the same invariance principle we may also assume $\rho=0$ as long as the true value $\rho\neq\pm 1$ . When $\rho_{*}\neq\pm 1$ , we simply let

[TABLE]

The distribution-generated data $\{X_{1}^{**},X_{2}^{**}\}$ now has the true parameter values $\mu_{1}=\mu_{2}=0$ , $\sigma_{1}^{2}=\sigma_{2}^{2}=2$ , and $\rho=0$ under $H_{0}$ .

With the above standardization operation, for both Theorems 1 and 2, we study the asymptotic null properties under the assumption that $Z_{1i}$ and $Z_{2i}$ are independent normal random variables with the standard parameter values:

[TABLE]

We first establish three preparatory lemmas.

Lemma 1.

As $n\to\infty$ , we have, almost surely,

[TABLE]

where $\mbox{$ \mathbbm{1} $}(\cdot)$ is the indicator function.

Proof.

Note that

[TABLE]

is the empirical measure of the two-dimensional stripe formed by the inequality

[TABLE]

This class of stripes can divide $n$ points in two-dimensional space into at most a polynomial number of different subsets. By Pollard (1990), this property implies the uniform strong law of large numbers:

[TABLE]

almost surely.

The distribution of $Z_{2}-\beta_{0}-\beta_{1}Z_{1}$ is normal with variance at least 1. Based on this, we have $P(|Z_{2}-\beta_{0}-\beta_{1}Z_{1}|\leq 1/4)\leq{0.2}$ for any $\beta_{0},\beta_{1}$ . Hence, almost surely,

[TABLE]

This completes the proof. ∎

Lemma 2.

Suppose an estimator $\bar{}\mbox{\boldmath$ \theta $}$ satisfies

[TABLE]

for some constant $C$ . Then under the null model, $\bar{}\mbox{\boldmath$ \theta $}=\mbox{\boldmath$ \theta $}_{0}+o_{p}(1)=(0,1,0,0,1)^{\tau}+o_{p}(1)$ .

Proof.

Note that we have decomposed $\ell_{n}(\bar{}\mbox{\boldmath$ \theta $})-\ell_{n}(\mbox{\boldmath$ \theta $}_{0})$ into a sum of two terms. For the first term, according to the classical result about the LRT under regular models, it is clear that

[TABLE]

When in the second term the variance parameter $\eta>M_{0}=\exp(4)$ , we have

[TABLE]

By the law of large numbers, we have

[TABLE]

almost surely. This implies that

[TABLE]

and subsequently, uniformly for $\eta$ in this range,

[TABLE]

Together with (12), we have, whenever $\eta>M_{0}=\exp(4)$ ,

[TABLE]

in probability. Since the lemma condition clearly states that $\bar{\eta}$ does not have the above property, it cannot be in this range. That is, we conclude that $\bar{\eta}\leq M_{0}$ .

Suppose $\eta<\epsilon_{0}$ and $\epsilon_{0}$ is a very small positive value. In this case, for all $i$ , we have

[TABLE]

For $i$ such that

[TABLE]

we have

[TABLE]

By Lemma 1, uniformly in $\beta_{0}$ and $\beta_{1}$ and almost surely, at least $(1/2)n$ of the $i$ ’s satisfy (13). Therefore,

[TABLE]

as $n\to\infty$ and $\eta\to 0$ . Namely, for all $\eta<\epsilon_{0}$ sufficiently small, we also have

[TABLE]

In conclusion, the $\bar{\eta}$ value satisfying the lemma condition must almost surely fall within the interval $[\epsilon_{0},M_{0}]$ for some sufficiently small $\epsilon_{0}>0$ and sufficiently large $M_{0}<\infty$ .

Within the parameter space $[\epsilon_{0},M_{0}]\times\mathbb{R}^{2}$ , the density function

[TABLE]

satisfies the conditions for the consistency of the MLE specified in Wald (1949). For instance, it is a continuous density function with its limit being 0 whenever $\beta_{0}$ or $\beta_{1}$ goes to infinity. For a sufficiently small $\epsilon>0$ , let

[TABLE]

be a ball centered at the true value. The side conclusion as stated in Wald (1949) is

[TABLE]

for some $\delta>0$ . Again, by the lemma condition on $\bar{\mbox{\boldmath$ \theta $}}$ , we must have $\bar{\beta}_{0},\bar{\beta}_{1},\bar{\eta}$ within $\epsilon$ of the true parameter value for any $\epsilon>0$ as $n\to\infty$ . This proves part of the lemma.

It is now apparent that we also have

[TABLE]

By the same argument based on the assumed property of $\bar{}\mbox{\boldmath$ \theta $}$ , we must have

[TABLE]

This is sufficient for the proof of the consistency of $(\bar{\mu},\bar{\sigma}_{+})$ . Combined with the proof of the other parts, this completes the proof of the lemma. ∎

Next, we strengthen the results of Lemma 2. We first define some notation for the next lemma. Let

[TABLE]

It can be seen that $\mathbb{E}(A_{i})=0$ , $\mathbb{E}(B_{i})=0$ , $A_{i}$ and $B_{i}$ are uncorrelated, and

[TABLE]

Further, we introduce two parameter vectors of lengths 2 and 4:

[TABLE]

In the following, we use $|{\bf x}|$ and $\|{\bf x}\|$ to denote the $L_{1}$ and $L_{2}$ norms of the vector ${\bf x}$ , respectively.

Lemma 3.

Under the conditions of Lemma 2 and the null hypothesis, we have

[TABLE]

Proof.

We first prove (a). By Lemma 2, we have $(\bar{\mu},\bar{\sigma}_{+})=(0,1)+o_{p}(1)$ . We obtain (a) by expanding $\ell_{n,1}^{*}(\bar{\mu},\bar{\sigma}_{+})$ at $(\bar{\mu},\bar{\sigma}_{+})=(0,1)$ to the second order and then assessing the asymptotic orders via the weak law of large numbers.

To prove (b), we first denote

[TABLE]

and then write

[TABLE]

Applying the inequality $\log(1+x)\leq x-x^{2}/2+x^{3}/3$ , we have

[TABLE]

Next, we delineate $\delta_{i}(\beta_{0},\beta_{1},\eta)$ given $(\bar{\beta}_{0},\bar{\beta}_{1},\bar{\eta})=(0,0,1)+o_{p}(1)$ as proved in Lemma 2. We perform two main steps. In the first step, we obtain the fourth-order Taylor expansion of $\delta_{i}(\beta_{0},\beta_{1},\eta)$ ; in the second step, we assess the asymptotic orders of the terms in the expansion and put them into appropriate order expressions.

We start with the first step. Let the partial derivatives be

[TABLE]

Expanding both $\phi(\pm Z_{2i};\beta_{0}+\beta_{1}Z_{1i},\eta)$ to the fourth order at $(\beta_{0},\beta_{1},\eta)=(0,0,1)$ , we get

[TABLE]

where the summation is over all non-negative integer combinations of $s,t,k$ summing to $4$ and $\epsilon_{in}^{(1)}$ is the remainder term in the Taylor expansion. Let $\epsilon_{n}^{(1)}=\sum_{i=1}^{n}\epsilon_{in}^{(1)}$ , then

[TABLE]

In the second step, we first show that every term in the summation part of (16) satisfying $s+2t+2k\geq 5$ is of order $o_{p}(n^{1/2}){|\mbox{\bf s}_{2}|}$ . For instance, when $s=t=k=1$ , we have

[TABLE]

helped by the fact that we are investigating the region of $\beta_{0}=o_{p}(1)$ . For notational simplicity, let $\delta^{(s,t,k)}_{i}=\delta^{(s,t,k)}_{i}(0,0,1)$ . It is easy to check that $\delta^{(s,t,k)}_{i}$ has zero mean and finite variance, so

[TABLE]

Therefore, we have

[TABLE]

The proofs for the other $s+2t+2k\geq 5$ terms are similar. Hence, we may write

[TABLE]

and still have

[TABLE]

By straightforward algebra, we find

[TABLE]

where the unwanted term $B_{i}[4]$ is the fourth element of vector $B_{i}$ . Its coefficient is easily verified to be $\{\beta_{0}^{2}+(\eta^{2}-1)\}^{2}=o_{p}(|\mbox{\bf s}_{2}|)$ . This allows us to obtain a neater expression by absorbing it into the higher-order term, concluding that

[TABLE]

such that

[TABLE]

In short, we have shown that

[TABLE]

The above algebraic manipulations are typical of the techniques employed in Chen and Chen (2001) and Chen et al. (2001). The same techniques, which are tedious but not sophisticated, give

[TABLE]

Together with the weak law of large numbers these lead to

[TABLE]

Combining (22)–(24) with (15), we have

[TABLE]

Recall that $(\bar{\beta}_{0},\bar{\beta}_{1},\bar{\eta})=(0,0,1)+o_{p}(1)$ , so the above upper bound is applicable to $\ell_{n,2}^{*}(\bar{\beta}_{0},\bar{\beta}_{1},\bar{\eta})-\ell_{n,2}^{*}(0,0,1)$ . This completes the proof of (b).

Finally, we come to (c). Combining (a) and (b) and the conditions in Lemma 2, we have

[TABLE]

which is possible only if both $\bar{}\mbox{\bf s}_{1}=O_{p}(n^{-1/2})$ and $\bar{}\mbox{\bf s}_{2}=O_{p}(n^{-1/2})$ . This leads to the order assessments in (c) and completes the proof of the entire lemma. ∎

6.2 Proof of Theorem 1

The difference between Theorems 1 and 2 is that in the former we consider $\rho_{0}=0$ to be known when formulating the test statistic. This makes it helpful to reorganize the entries of $A_{i}$ and $B_{i}$ and the corresponding entries of $\mbox{\bf s}_{1}$ and $\mbox{\bf s}_{2}$ .

When $\rho_{0}=0$ is known, we have $\sigma_{+}=\sigma_{-}$ . Let

[TABLE]

Every entry of $\mbox{\bf s}_{1}$ and $\mbox{\bf s}_{2}$ is a linear combination of the entries of t, possibly with an $O_{p}(\|\mbox{\bf t}\|^{2})$ difference when these parameter values approach their default null values. We enumerate these entries as follows. The first entry of $\mbox{\bf s}_{1}$ is $\mbox{\bf s}_{1}[1]=\mbox{\bf t}[1]$ , and the second is $\mbox{\bf s}_{1}[2]=\mbox{\bf t}[2]-\mbox{\bf t}[3]/2$ . For the entries of $\mbox{\bf s}_{2}$ , we have

[TABLE]

For the others, $\mbox{\bf s}_{2}[2]=\mbox{\bf t}[4]$ , $\mbox{\bf s}_{2}[3]=\mbox{\bf t}[5]$ , and $\mbox{\bf s}_{2}[4]=(\mbox{\bf t}[3])^{2}=O_{p}(\|\mbox{\bf t}\|^{2})$ .

Because every entry of $\mbox{\bf s}_{1}$ and $\mbox{\bf s}_{2}$ is virtually a linear combination of the entries of t, we can reorganize the entries of $A_{i}$ and $B_{i}$ into a vector $D_{i}$ such that

[TABLE]

Naturally, we have $\mathbb{E}(D_{i})=0$ and some algebra shows that $\mbox{\sc var}(D_{i})=\Sigma_{D}=\mbox{diag}(1,1,1/4,1,2)$ . The following result is immediate.

Lemma 4.

Assume the conditions of Lemma 3 and let $\bar{\rho}=0$ . If, under the null model,

[TABLE]

we then have

[TABLE]

We are now ready for Theorem 1. The order conclusions of the MLEs in both Theorem 1(a) and 1(b) have been established in Lemma 4. We now derive the limiting distributions.

We rewrite $R_{n,1}$ defined in (4) as

[TABLE]

with $\check{}\mbox{\boldmath$ \theta $}$ being the maximum point of the reduced model where $(\mu_{1},\sigma_{1})=(\mu_{2},\sigma_{2})$ . Since the reduced model is regular, by standard techniques such as those in Serfling (2000):

[TABLE]

where $D_{i}[1],D_{i}[2]$ denote the first two entries of vector $D_{i}$ .

Next, note that $\tilde{}\mbox{\boldmath$ \theta $}$ is the maximum point of the reduced model where $\sigma_{1}=\sigma_{2}=\sigma$ . This makes $\beta_{1}=\xi=0$ and subsequently for t under the reduced model,

[TABLE]

Nevertheless, Lemma 4 is applicable to the above form of t as long as it is close to its counterpart in the null model. Hence,

[TABLE]

Note the range of the supremum conforms to the form of t in the reduced model and the fact that $\mbox{\bf t}[3]=\beta_{0}^{2}\geq 0$ . The specific coefficient values are due to the value of $\Sigma_{D}$ .

The upper bound in (27) is attained if we put

[TABLE]

With some straightforward algebra, the corresponding $\theta$ values of t exist and satisfy

[TABLE]

Applying the Taylor expansion, with $\theta$ being the above $\theta$ , we get

[TABLE]

Since $\tilde{}\mbox{\boldmath$ \theta $}$ is the maximum point of $\ell_{n}(\mbox{\boldmath$ \theta $})$ , $2\{\ell_{n}(\tilde{}\mbox{\boldmath$ \theta $})-\ell_{n}(\mbox{\boldmath$ \theta $}_{0})\}$ is not smaller than the value in (27). The sandwich technique of Chen and Chen (2001) and Chen et al. (2001) or the squeeze theorem can be applied to obtain

[TABLE]

Combining (26) and (30) gives

[TABLE]

which has the limiting distribution $0.5\chi^{2}_{0}+0.5\chi^{2}_{1}$ . This completes the proof of part (a).

We now prove conclusion (b). In this case, the range of t has only an intrinsic restriction as seen in the expression

[TABLE]

Let $\mbox{\bf t}_{1}=(\mu,\beta_{0}^{2}/2+(\sigma_{+}^{2}-1))^{\tau}$ and $\mbox{\bf t}_{2}=(\beta_{0}^{2},\beta_{1}^{2},\beta_{0}\beta_{1})^{\tau}$ . It can be seen that $\mbox{\bf t}_{2}$ lies on a two-dimensional manifold. Nonetheless, the upper bound developed in Lemma 4 remains valid. We partition $D_{i}$ into $D_{i1}$ and $D_{i2}$ with covariance matrices $\Sigma_{D1}$ and $\Sigma_{D2}$ . With these preparations, we have

[TABLE]

The supremum is taken over $\mbox{\bf t}_{2}$ with the intrinsic restriction respected. Similarly to (30), the upper bound (31) is attained at some feasible parameter value. Hence,

[TABLE]

Combining (26) and (32), we get

[TABLE]

The intrinsic restriction due to the specific form of $\mbox{\bf t}_{2}=(\beta_{0}^{2},\beta_{1}^{2},\beta_{0}\beta_{1})^{\tau}$ leads to the nonstandard form of the limiting distribution in the theorem.

6.3 Proof of Theorem 2

The test problem in Theorem 2 is different from that of Theorem 1 because we do not assume knowledge of the $\rho_{0}$ value. The parameter vector is now $\mbox{\boldmath$ \theta $}=(\mu_{1},\mu_{2},\sigma_{1},\sigma_{2},\rho)^{\tau}$ including the correlation coefficient $\rho$ . Because of the invariance argument, we need consider only the case where $\mbox{\boldmath$ \theta $}_{0}=(0,0,1,1,0)^{\tau}$ under the null hypothesis for the asymptotic properties in this theorem.

With the introduction of $\rho$ , it helps to redefine $\mbox{\bf s}_{1}$ , $\mbox{\bf s}_{2}$ , and so on as follows:

[TABLE]

and the corresponding $A_{i}$ , $B_{i}$ as

[TABLE]

These are almost the quantities with the same names defined above Lemma 3. The difference is that the first entry of $\mbox{\bf s}_{2}$ is now the third entry of $\mbox{\bf s}_{1}$ . That is, we partition the vector differently here.

When $(\mu_{1},\sigma_{1})=(\mu_{2},\sigma_{2})$ in Theorem 2, the asymptotic expansion of the likelihood ratio is an expansion for regular models:

[TABLE]

The result of Lemma 3 remains applicable:

[TABLE]

Since $\sigma_{1}=\sigma_{2}$ in Theorem 2(a), we have

[TABLE]

This leads to

[TABLE]

where we have $(\sum_{i=1}^{n}B_{i}[3])^{+}$ instead of $(\sum_{i=1}^{n}B_{i}[3])$ because of the intrinsic constraint $\mbox{\bf s}_{2}[3]=\beta^{4}_{0}\geq 0$ . We skip the step of showing that the above upper bound is attainable, since this is now routine.

Combining (33) and (34) gives

[TABLE]

which converges to $0.5\chi^{2}_{0}+0.5\chi^{2}_{1}$ in distribution, which is conclusion (a).

For $R_{n,2}^{*}$ in (b), we are not helped by $\sigma_{1}=\sigma_{2}$ . Yet

[TABLE]

remains true for $\theta$ in a small neighborhood of $\mbox{\boldmath$ \theta $}_{0}$ . Similarly, we still have

[TABLE]

We skip the proof that this upper bound is attained. Hence,

[TABLE]

The challenge is to provide an analytical description of the limiting distribution when

[TABLE]

For this purpose, we highlight the fact that $n^{-1/2}\sum_{i=1}^{n}B_{i}$ is asymptotically multivariate normal with mean 0 and covariance matrix $\Sigma_{B}=\mbox{diag}(1,2,1/6)$ . The supremum is hence attained in the range of $\mbox{\bf s}_{2}=O_{p}(n^{-1/2})$ . In the subregion where $|\beta_{0}|<n^{-1/7}=o(n^{-1/8})$ , we have $\mbox{\bf s}_{2}[3]=\beta_{0}^{8}<n^{-8/7}=o(n^{-1})$ . Hence,

[TABLE]

In the other subregion where $|\beta_{0}|\geq n^{-1/7}$ , combined with the restriction $\beta_{0}\beta_{1}=O_{p}(n^{-1/2})$ , we must have $\beta_{1}=O_{p}(n^{-1/3})$ . Consequently, in this region, $\mbox{\bf s}_{2}[1]=\beta_{1}^{2}=O(n^{-2/3})$ . This leads to

[TABLE]

Hence,

[TABLE]

Combining (35)–(37), we find

[TABLE]

Therefore, $R_{n,2}^{*}$ has the limiting distribution as claimed.

Acknowledgements

The research is supported in part by NSERC Grants RGPIN-2014-03743 and RGPIN-2015-06592 and Singapore Ministry Education Academic Research Fund Tier 1 and the Ministry of Education of Singapore: MOE2014-T2-1- 072.

Bibliography30

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Bartlett (1937) Bartlett, M. S. (1937), ‘Properties of sufficiency and statistical tests’, Proceedings of The Royal Society A 160 , 268–282.
3Carothers (1981) Carothers, A. D. (1981), ‘On determining the parental origins of homologous chromosomes’, Annals of Human Genetics 45 , 367–374.
4Chen and Chen (2001) Chen, H. and Chen, J. (2001), ‘The likelihood ratio test for homogeneity in finite mixture models’, The Canadian Journal of Statistics 29 , 201–215.
5Chen et al. (2001) Chen, H., Chen, J. and Kalbfleisch, J. D. (2001), ‘A modified likelihood ratio test for homogeneity in finite mixture models’, Journal of the Royal Statistical Society: Series B 63 , 19–29.
6Chen and Li (2011) Chen, J. and Li, P. (2011), ‘Tuning the EM-test for finite mixture models’, Canadian Journal of Statistics 39 (3), 389–404.
7Chernoff (1954) Chernoff, H. (1954), ‘On the distribution of the likelihood ratio’, The Annals of Mathematical Statistics 25 , 573–578.
8Davies and Phillips (1988) Davies, P. and Phillips, A. J. (1988), ‘Nonparametric tests of population differences and estimation of the probability of misidentification with unidentified paired data’, Biometrika 75 , 753–760.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Test for homogeneity with unordered paired observations

Abstract

1 Introduction

2 Main Results

2.1 Unordered uncorrelated paired data

Theorem 1**.**

2.2 Unordered correlated pair data

Theorem 2**.**

3 Adjusted Limiting Distributions

4 Simulation Studies

4.1 Data generation

4.2 Results

5 Real-Data Examples

5.1 Data from karyotype analysis

5.2 C-band area of human chromosome data

6 Technical Details

6.1 Reparameterization and preparation lemmas

Lemma 1**.**

Proof.

Lemma 2**.**

Proof.

Lemma 3**.**

Proof.

6.2 Proof of Theorem 1

Lemma 4**.**

6.3 Proof of Theorem 2

Acknowledgements

Theorem 1.

Theorem 2.

Lemma 1.

Lemma 2.

Lemma 3.

Lemma 4.