Effect Inference from Two-Group Data with Sampling Bias

Dave Zachariah; Petre Stoica

arXiv:1902.09923·math.ST·November 12, 2019·IEEE Signal Process. Lett.

Effect Inference from Two-Group Data with Sampling Bias

Dave Zachariah, Petre Stoica

PDF

1 Repo

TL;DR

This paper introduces a new inference method that accurately compares two populations despite sampling biases, maintaining low false positive rates where standard methods fail.

Contribution

The authors develop a bias-resilient inference technique that controls false positives under moderate sampling biases, improving reliability over traditional methods.

Findings

01

Method performs well on synthetic data

02

Effective on real biomarker datasets

03

Reduces false positives under bias

Abstract

In many applications, different populations are compared using data that are sampled in a biased manner. Under sampling biases, standard methods that estimate the difference between the population means yield unreliable inferences. Here we develop an inference method that is resilient to sampling biases and is able to control the false positive errors under moderate bias levels in contrast to the standard approach. We demonstrate the method using synthetic and real biomarker data.

Equations64

b = E [δ] - δ,

b = E [δ] - δ,

y = [y_{0} y_{1}] \sim N ([1101] [μ δ], [v_{0} I 0 0 v_{1} I])

y = [y_{0} y_{1}] \sim N ([1101] [μ δ], [v_{0} I 0 0 v_{1} I])

δ \sim N (0, λ),

δ \sim N (0, λ),

\Pr\Big{\{}\>\delta\in C_{\alpha}(\mathbf{y})\>\Big{\}}\;\geq\;1-\alpha.

\Pr\Big{\{}\>\delta\in C_{\alpha}(\mathbf{y})\>\Big{\}}\;\geq\;1-\alpha.

θ = col {μ, λ, v_{0}, v_{1}} .

θ = col {μ, λ, v_{0}, v_{1}} .

\begin{split}\widehat{\delta}(\mathbf{y})&=\operatorname{E}_{\theta}[\delta|\mathbf{y}]\>\Big{|}_{\theta=\widehat{\theta}}\\ &=\frac{\rho n_{1}}{\rho n_{1}+1}(\overline{y}_{1}-\mu)\>\Big{|}_{\theta=\widehat{\theta}},\end{split}

\begin{split}\widehat{\delta}(\mathbf{y})&=\operatorname{E}_{\theta}[\delta|\mathbf{y}]\>\Big{|}_{\theta=\widehat{\theta}}\\ &=\frac{\rho n_{1}}{\rho n_{1}+1}(\overline{y}_{1}-\mu)\>\Big{|}_{\theta=\widehat{\theta}},\end{split}

c_{θ}^{2} = \frac{ρ v _{1}}{ρ n _{1} + 1} + \frac{ρ ^{2} n _{1}^{2}}{( ρ n _{1} + 1 ) ^{2}} (\frac{n _{0}}{v _{0}} + \frac{n _{1}}{v _{1}} \frac{1}{ρ n _{1} + 1})^{- 1} .

c_{θ}^{2} = \frac{ρ v _{1}}{ρ n _{1} + 1} + \frac{ρ ^{2} n _{1}^{2}}{( ρ n _{1} + 1 ) ^{2}} (\frac{n _{0}}{v _{0}} + \frac{n _{1}}{v _{1}} \frac{1}{ρ n _{1} + 1})^{- 1} .

C_{\alpha}(\mathbf{y})=\big{\{}\delta^{\prime}:|\delta^{\prime}-\widehat{\delta}(\mathbf{y})|<\alpha^{-1/2}c_{\theta}\big{\}}.

C_{\alpha}(\mathbf{y})=\big{\{}\delta^{\prime}:|\delta^{\prime}-\widehat{\delta}(\mathbf{y})|<\alpha^{-1/2}c_{\theta}\big{\}}.

p_{θ} (y) = \int p_{θ} (y ∣ δ) p_{θ} (δ) d δ

p_{θ} (y) = \int p_{θ} (y ∣ δ) p_{θ} (δ) d δ

Cov_{θ} [y] = diag (v_{0} I, λ 1 1^{⊤} + v_{1} I),

Cov_{θ} [y] = diag (v_{0} I, λ 1 1^{⊤} + v_{1} I),

θ = θ arg max p_{θ} (y),

θ = θ arg max p_{θ} (y),

α β γ = y_{1}^{⊤} y_{1} - μ n_{1} (2 \overline{y}_{1} - μ) = n_{1}^{2} (\overline{y}_{1} - μ)^{2} = α n_{1} - β .

α β γ = y_{1}^{⊤} y_{1} - μ n_{1} (2 \overline{y}_{1} - μ) = n_{1}^{2} (\overline{y}_{1} - μ)^{2} = α n_{1} - β .

v_{0} = \frac{1}{n _{0}} y_{0}^{⊤} y_{0} - μ (2 \overline{y}_{0} - μ),

v_{0} = \frac{1}{n _{0}} y_{0}^{⊤} y_{0} - μ (2 \overline{y}_{0} - μ),

v_{1} = \frac{1}{n _{1}} \frac{α + ρ γ}{1 + ρ n _{1}},

v_{1} = \frac{1}{n _{1}} \frac{α + ρ γ}{1 + ρ n _{1}},

ρ = {\frac{β - α}{γ}, 0, β - α \geq 0. otherwise.

ρ = {\frac{β - α}{γ}, 0, β - α \geq 0. otherwise.

f (μ) = n_{0} ln v_{0} + n_{1} ln (α + ρ γ) - (n_{1} - 1) ln (1 + ρ n_{1})

f (μ) = n_{0} ln v_{0} + n_{1} ln (α + ρ γ) - (n_{1} - 1) ln (1 + ρ n_{1})

E [∣ δ - δ ∣^{2}] = E_{y} [E_{δ ∣ y} [∣ δ - \overline{δ} + \overline{δ} - δ ∣^{2}]] = E_{y} [Var [δ ∣ y] + ∣ \overline{δ} - δ ∣^{2}] = \frac{λ v _{1}}{λ n _{1} + v _{1}} + E_{y} [∣ \overline{δ} - δ ∣^{2}] .

E [∣ δ - δ ∣^{2}] = E_{y} [E_{δ ∣ y} [∣ δ - \overline{δ} + \overline{δ} - δ ∣^{2}]] = E_{y} [Var [δ ∣ y] + ∣ \overline{δ} - δ ∣^{2}] = \frac{λ v _{1}}{λ n _{1} + v _{1}} + E_{y} [∣ \overline{δ} - δ ∣^{2}] .

ϕ ≜ \partial_{θ} ln p_{θ} (y) and J = E_{y} [ϕ ϕ^{⊤}] .

ϕ ≜ \partial_{θ} ln p_{θ} (y) and J = E_{y} [ϕ ϕ^{⊤}] .

J = [J_{1, 1} 0 0 *],

J = [J_{1, 1} 0 0 *],

J_{1, 1} = 1^{⊤} [v_{0}^{- 1} I 0 0 V_{1}^{- 1}] 1 = v_{0}^{- 1} 1^{⊤} 1 + 1^{⊤} V_{1}^{- 1} 1 = \frac{n _{0}}{v _{0}} + \frac{n _{1}}{λ n _{1} + v _{1}}

J_{1, 1} = 1^{⊤} [v_{0}^{- 1} I 0 0 V_{1}^{- 1}] 1 = v_{0}^{- 1} 1^{⊤} 1 + 1^{⊤} V_{1}^{- 1} 1 = \frac{n _{0}}{v _{0}} + \frac{n _{1}}{λ n _{1} + v _{1}}

0 \leq E_{y} [∣ (\overline{δ} - δ) - g^{⊤} J^{- 1} ϕ ∣^{2}] = E_{y} [∣ \overline{δ} - δ ∣^{2}] - g^{⊤} J^{- 1} g .

0 \leq E_{y} [∣ (\overline{δ} - δ) - g^{⊤} J^{- 1} ϕ ∣^{2}] = E_{y} [∣ \overline{δ} - δ ∣^{2}] - g^{⊤} J^{- 1} g .

g = \int [\partial_{θ} ln p_{θ}] (\overline{δ} - δ) p_{θ} d y = \int \partial_{θ} [p_{θ} (\overline{δ} - δ)] - p_{θ} [\partial_{θ} (\overline{δ} - δ)] d y = \partial_{θ} [bias (θ)] - E_{y} [\partial_{θ} (\overline{δ} - δ)] = - E_{y} [\partial_{θ} (\frac{λ}{λ n _{1} + v _{1}} 1^{⊤} (y_{1} - μ 1))] = - \frac{λ}{λ n _{1} + v _{1}} 1^{⊤} 1 + 0 \partial_{λ} \frac{λ}{λ n _{1} + v _{1}} 1^{⊤} E_{y} [y_{1} - μ 1] 0 \partial_{v_{1}} \frac{λ}{λ n _{1} + v _{1}} 1^{⊤} E_{y} [y_{1} - μ 1] = - \frac{λ n _{1}}{λ n _{1} + v _{1}} 000,

g = \int [\partial_{θ} ln p_{θ}] (\overline{δ} - δ) p_{θ} d y = \int \partial_{θ} [p_{θ} (\overline{δ} - δ)] - p_{θ} [\partial_{θ} (\overline{δ} - δ)] d y = \partial_{θ} [bias (θ)] - E_{y} [\partial_{θ} (\overline{δ} - δ)] = - E_{y} [\partial_{θ} (\frac{λ}{λ n _{1} + v _{1}} 1^{⊤} (y_{1} - μ 1))] = - \frac{λ}{λ n _{1} + v _{1}} 1^{⊤} 1 + 0 \partial_{λ} \frac{λ}{λ n _{1} + v _{1}} 1^{⊤} E_{y} [y_{1} - μ 1] 0 \partial_{v_{1}} \frac{λ}{λ n _{1} + v _{1}} 1^{⊤} E_{y} [y_{1} - μ 1] = - \frac{λ n _{1}}{λ n _{1} + v _{1}} 000,

E [∣ δ - δ ∣^{2}] \geq \frac{λ v _{1}}{λ n _{1} + v _{1}} + (\frac{λ n _{1}}{λ n _{1} + v _{1}})^{2} J_{1, 1}^{- 1} .

E [∣ δ - δ ∣^{2}] \geq \frac{λ v _{1}}{λ n _{1} + v _{1}} + (\frac{λ n _{1}}{λ n _{1} + v _{1}})^{2} J_{1, 1}^{- 1} .

δ \neq \in C_{α} (y) \Leftrightarrow α c_{θ}^{- 2} ∣ δ - δ (y) ∣^{2} \geq 1.

δ \neq \in C_{α} (y) \Leftrightarrow α c_{θ}^{- 2} ∣ δ - δ (y) ∣^{2} \geq 1.

\begin{split}\Pr\big{\{}\delta\not\in C_{\alpha}(\mathbf{y})\big{\}}&=\int_{\delta\not\in C_{\alpha}(\mathbf{y})}p(\mathbf{y},\delta)d\delta d\mathbf{y}\\ &\leq\int_{\delta\not\in C_{\alpha}(\mathbf{y})}\alpha c^{-2}_{\theta}|\delta-\widehat{\delta}(\mathbf{y})|^{2}p(\mathbf{y},\delta)d\delta d\mathbf{y}\\ &\leq\alpha c^{-2}_{\theta}\operatorname{E}\left[|\delta-\widehat{\delta}(\mathbf{y})|^{2}\right]=\alpha\frac{\text{MSE}}{c^{2}_{\theta}}.\end{split}

\begin{split}\Pr\big{\{}\delta\not\in C_{\alpha}(\mathbf{y})\big{\}}&=\int_{\delta\not\in C_{\alpha}(\mathbf{y})}p(\mathbf{y},\delta)d\delta d\mathbf{y}\\ &\leq\int_{\delta\not\in C_{\alpha}(\mathbf{y})}\alpha c^{-2}_{\theta}|\delta-\widehat{\delta}(\mathbf{y})|^{2}p(\mathbf{y},\delta)d\delta d\mathbf{y}\\ &\leq\alpha c^{-2}_{\theta}\operatorname{E}\left[|\delta-\widehat{\delta}(\mathbf{y})|^{2}\right]=\alpha\frac{\text{MSE}}{c^{2}_{\theta}}.\end{split}

f (θ) = f_{0} (μ, v_{0}) n_{0} ln v_{0} + \frac{1}{v _{0}} ∥ y_{0} - μ 1 ∥^{2} + f_{1} (μ, λ, v_{1}) ln ∣ V_{1} ∣ + ∥ y_{1} - μ 1 ∥_{V_{1}^{- 1}}^{2} .

f (θ) = f_{0} (μ, v_{0}) n_{0} ln v_{0} + \frac{1}{v _{0}} ∥ y_{0} - μ 1 ∥^{2} + f_{1} (μ, λ, v_{1}) ln ∣ V_{1} ∣ + ∥ y_{1} - μ 1 ∥_{V_{1}^{- 1}}^{2} .

v_{0} = ∥ y_{0} - μ 1 ∥^{2} / n_{0}

v_{0} = ∥ y_{0} - μ 1 ∥^{2} / n_{0}

f_{0} (μ, v_{0}) = n_{0} ln v_{0} + n_{0}

f_{0} (μ, v_{0}) = n_{0} ln v_{0} + n_{0}

f_{1} (μ, ρ, v) = ln (1 + ρ n) + ln v^{n} + \frac{1}{v} (∥ y_{1} - μ 1 ∥^{2} - \frac{ρ ∣ 1 ^{⊤} ( y _{1} - μ 1 ) ∣ ^{2}}{1 + ρ n})

f_{1} (μ, ρ, v) = ln (1 + ρ n) + ln v^{n} + \frac{1}{v} (∥ y_{1} - μ 1 ∥^{2} - \frac{ρ ∣ 1 ^{⊤} ( y _{1} - μ 1 ) ∣ ^{2}}{1 + ρ n})

f_{1} (μ, ρ, v) = ln \frac{( α + ρ γ ) ^{n}}{( 1 + ρ n ) ^{n - 1}} + n .

f_{1} (μ, ρ, v) = ln \frac{( α + ρ γ ) ^{n}}{( 1 + ρ n ) ^{n - 1}} + n .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dzachariah/two-groups-data
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Effect Inference from Two-Group Data

with Sampling Bias

Dave Zachariah and Petre Stoica This work has been partly supported by the Swedish Research Council (VR) under contract 2018-05040.

Abstract

In many applications, different populations are compared using data that are sampled in a biased manner. Under sampling biases, standard methods that estimate the difference between the population means yield unreliable inferences. Here we develop an inference method that is resilient to sampling biases and is able to control the false positive errors under moderate bias levels in contrast to the standard approach. We demonstrate the method using synthetic and real biomarker data.

I Introduction

In many applications of statistical inference, the aim is to compare data from different populations. Specifically, given $n_{0}$ and $n_{1}$ samples from two groups, collected in vectors $\mathbf{y}_{0}$ and $\mathbf{y}_{1}$ , the target quantity is often the difference between their means, denoted $\delta$ , which we call the effect. For instance, in randomized trials and A/B testing, the data are outcomes from two populations and $\delta$ is the average causal effect of assigning subjects to a test group ‘ $1$ ’ as compared to a control group ‘[math]’. [1, 2] The standard approach is to use the difference between sample averages in each group, viz. $\widehat{\delta}=\overline{y}_{1}-\overline{y}_{0}$ , where $\overline{y}_{i}=\mathbf{1}^{\top}\mathbf{y}_{i}/n_{i}$ . Confidence intervals for $\widehat{\delta}$ can be obtained using Welch’s method, which employs an approximating t-distribution [3, 4, 5]. Inferring $\delta\neq 0$ is equivalent to detecting that the means of two distributions differ, which is a classical problem in statistical signal processing [6, 7].

Ideally, the samples from both groups are representative of their target populations. Then the bias of the estimator,

[TABLE]

is zero. However, in nonideal conditions with finite samples this is not the case, e.g., when some units of the intended populations are less likely to be included than others. Under such conditions, $b$ decreases with sample sizes $n_{0}$ and $n_{1}$ but will nevertheless be nonzero. Sampling biases increase the risk of inferring spurious effects when using standard inference methods.

In this paper, we develop an inference method that is resilient to sampling biases. In contrast to the standard approach, the proposed method reduces the risk of reporting spurious effect estimates and is capable of controlling the false positive errors under moderate biases. The method relies on an effect estimator using a fully automatic and data-adaptive regularization. We demonstrate its performance on both synthetic and real data.

*Remark 1**.*

Code for the method can be found at https://github.com/dzachariah/two-groups-data

II Problem formulation

We model the dataset as

[TABLE]

The model based on the Gaussian distribution yields the least favourable distribution for estimating the unknown effect $\delta$ [8]. We model the effect as a random variable, where different ranges of values of $\delta$ have different probabilities. To achieve resilliance to sampling biases, we adopt a conservative approach in which nonexistant or negligible effects are considered to be more probable. Specifically, we employ the following model:

[TABLE]

where $\lambda$ is an unknown parameter.

Our aim is to derive a confidence interval $C_{\alpha}(\mathbf{y})$ that contains the unknown $\delta$ with a coverage probability of at least $1-\alpha$ . That is,

[TABLE]

The confidence interval is to be centered on an estimator $\widehat{\delta}(\mathbf{y})$ and should be resilient to sampling biases. That is, even if $b\neq 0$ the interval must not indicate nonzero effects with a probability greater than $\alpha$ . Fig. 1 illustrates the ability of the method proposed below to ensure (3) under a range of biases, provided $b$ does not greatly exceed the dispersion of sample averages, i.e., $\sqrt{v_{i}/n_{i}}$ .

We will derive a confidence interval using model (1) and (2), with nuisance parameters

[TABLE]

III Proposed method

Let $\operatorname{E}_{\theta}[\delta|\mathbf{y}]$ be the conditional mean of the effect given the data. Using an estimate $\widehat{\boldsymbol{\theta}}$ of the nuisance parameters, we propose the following effect estimator

[TABLE]

where we introduce the variable $\rho\equiv\lambda/v_{1}$ that can be interpreted as a signal-to-noise ratio, see [9] for a derivation.

Result 1 (Cramér-Rao bound).

When the systematic error of $\widehat{\delta}(\mathbf{y})$ is invariant with respect to $\boldsymbol{\theta}$ , then the mean-squared error over all possible effects and data has a Cramér-Rao bound $\operatorname{E}\left[|\delta-\widehat{\delta}(\mathbf{y})|^{2}\right]\geq c^{2}_{\theta},$ where

[TABLE]

Proof.

See Appendix -A. ∎

Result 2 (Confidence interval).

Let

[TABLE]

When using an efficient estimator that attains the bound (5), the interval in (6) satifies the specified coverage probability (3).

Proof.

See Appendix -B. ∎

Evaluating $\widehat{\delta}(\mathbf{y})$ and $C_{\alpha}(\mathbf{y})$ requires estimates of the nuisance parameters $\boldsymbol{\theta}$ . Here we adopt the maximum likelihood approach and estimate $\boldsymbol{\theta}$ using the marginalized data distribution,

[TABLE]

It can be shown that (7) is a Gaussian distribution [9] with mean $\operatorname{E}_{\theta}[\mathbf{y}]=\mathbf{1}$ and covariance

[TABLE]

The estimated parameters are given by

[TABLE]

which can be shown to yield an asymptotically efficient estimator (4) [10, corr. 9].

Interestingly, the problem (8) can be solved by a one-dimensional numerical search. Begin by defining the variables

[TABLE]

Note that $\gamma\geq 0$ . Then the following result holds.

Result 3 (Nuisance parameter estimates).

The estimated variances are given by

[TABLE]

which are ensured to be nonnegative, and $\widehat{\lambda}=\widehat{\rho}\widehat{v}_{1}$ , where

[TABLE]

All variables in (9)-(11) are functions of the mean $\mu$ , whose estimate $\widehat{\mu}$ is obtained by minimizing the one-dimensional function

[TABLE]

Proof.

See Appendix -C. ∎

By plugging in $\widehat{\mu}$ , $\widehat{\rho}$ , $\widehat{v}_{0}$ and $\widehat{v}_{1}$ into (4) and (6), we obtain estimates $\widehat{\delta}(\mathbf{y})$ and $C_{\alpha}(\mathbf{y})$ , respectively. We note that the overall mean $\mu$ is fitted to the data in a nonstandard manner using (12), which yields a fully automatic and data-adaptive regularization of the effect estimator (4). If the minimizing $\widehat{\mu}$ is such that $\beta<\alpha$ , then the estimated signal-to-noise ratio is $\widehat{\rho}=0$ . In this case, the method indicates that the data is not sufficiently informative to discriminate any systematic difference from noise. Consequently, $\widehat{\delta}(\mathbf{y})$ collapses to zero and $C_{\alpha}(\mathbf{y})=\emptyset$ , indicating a case in which the effect cannot be reliably inferred.

IV Experimental results

We demonstrate the proposed inference method using both synthetic and real data.

IV-A Synthetic data

We generate two-group data using the model (1) and add a negative bias $b$ to the test group, using the setup parameters described in Fig. 1. The adaptive regularization of $\widehat{\delta}$ is illustrated in Fig. 2: when the unknown effect is nonexistent, $\delta=0$ , the estimates are concentrated at zero, despite the bias $b$ . As $\delta$ exceeds the dispersion of the sample averages, however, the regularized and standard estimators become nearly identical.

We report a significant effect estimate when a nonempty interval $C_{\alpha}(\mathbf{y})$ excludes the zero effect. Fig. 3 illustrates the ability of the proposed method to control the false positive error probability as $n_{0}$ increases, in contrast to the standard method. This is achieved while incurring a loss of statistical power that vanishes as the number of samples increases.

IV-B Prostate cancer data

We now consider real data from $n_{0}=50$ healthy individuals and $n_{1}=52$ individuals with prostate cancer [11, 12]. The data contains 6033 different biomarker responses. The inferred effects are shown in Fig. 4. For 6 markers, the effects were found to be significant at the $\alpha=0.05$ level. By contrast, the standard approach using Welch’s t-intervals yields 478 genes, but the inferences are less reliable under sampling biases.

V Conclusions

We developed a method for inferring effects in two-group data that, unlike the standard approach, is resilient to sampling biases. The method is able to control the false positive errors under moderate bias levels and its performance was demonstrated using both synthetic and real biomarker data.

-A The derivation of the Cramér-Rao bound

The mean-square error can be decomposed as

[TABLE]

where $\overline{\delta}$ is the conditional mean. Next, define the score function and the information matrix,

[TABLE]

Since the marginal pdf is Gaussian, we can compute $\mathbf{J}$ using Slepian-Bangs formula [13]. It has a block diagonal form

[TABLE]

where

[TABLE]

and $\mathbf{V}_{1}=\lambda\mathbf{1}\mathbf{1}^{\top}+v_{1}\mathbf{I}$ .

Let $\mathbf{g}\triangleq\operatorname{E}_{y}[\boldsymbol{\phi}(\overline{\delta}-\widehat{\delta})]$ denote the correlation between the score function and estimation error. Then we have the general bound

[TABLE]

In our case, we obtain

[TABLE]

where the fourth line follows under the constant bias assumption. Inserting this expression for $\mathbf{g}$ in (17) yields

[TABLE]

This completes the proof.

-B The derivation of the confidence interval

We have that

[TABLE]

Let $p(\mathbf{y},\delta)=p_{\theta}(\mathbf{y}|\delta)p_{\theta}(\delta)$ , then

[TABLE]

Thus $\Pr\big{\{}\delta\in C_{\alpha}(\mathbf{y})\big{\}}\geq 1-\alpha$ when the estimator is efficient.

-C The derivation of the concentrated cost

Problem (8) can be formulated equivalently as the minimization of:

[TABLE]

The minimizer

[TABLE]

is inserted back to yield a concentrated cost function

[TABLE]

Next, using the Sherman-Morrison and matrix determinant lemmas we can reparametrize $f_{1}$ as

[TABLE]

where we dropped the subindices for notational convenience.

Using the identities $\alpha=\|\mathbf{y}_{1}-\mu\mathbf{1}\|^{2}$ , $\beta=|\mathbf{1}^{\top}(\mathbf{y}_{1}-\mu\mathbf{1})|^{2}$ and $\gamma=\alpha n-\beta$ , the minimizing $v$ of (23) is found as (10). Inserting the variance estimate back, yields a concentrated cost function

[TABLE]

To find the minimizing $\rho\geq 0$ , we first consider the stationary point of

[TABLE]

Taking the derivative with respect to $\rho$ , yields the following condition for a stationary point:

[TABLE]

or equivalently $\gamma(1+\rho n)-(n-1)(\alpha+\rho\gamma)=0$ . Solving for $\rho\geq 0$ , we obtain the estimate (11).

By evaluating the second derivative at this point, we verify that it is a minimum. Inserting (11) back into (24) and combining with (22), we can write (20) in the concentrated form (12) after omitting irrelevant constants.

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] G. Imbens and D. Rubin, Causal Inference in Statistics, Social, and Biomedical Sciences . Cambridge University Press, 2015.
2[2] J. Pearl, M. Glymour, and N. Jewell, Causal Inference in Statistics: A Primer . Wiley, 2016.
3[3] C. Rao, Linear Statistical Inference and its Applications . Wiley Series in Probability and Statistics, Wiley, 1973.
4[4] B. L. Welch, “The significance of the difference between two means when the population variances are unequal,” Biometrika , vol. 29, no. 3/4, pp. 350–362, 1938.
5[5] S.-H. Kim and A. S. Cohen, “On the behrens-fisher problem: a review,” Journal of Educational and Behavioral Statistics , vol. 23, no. 4, pp. 356–377, 1998.
6[6] H. Van Trees, K. Bell, and Z. Tian, Detection Estimation and Modulation Theory, Part I: Detection, Estimation, and Filtering Theory . Detection Estimation and Modulation Theory, Wiley, 2013.
7[7] S. Kay, Fundamentals of Statistical Signal Processing: Detection theory . Fundamentals of Statistical Signal Processing, PTR Prentice-Hall, 1993.
8[8] P. Stoica and P. Babu, “The Gaussian data assumption leads to the largest Cramér-Rao bound [lecture notes],” IEEE Signal Processing Magazine , vol. 28, no. 3, pp. 132–133, 2011.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Effect Inference from Two-Group Data

Abstract

I Introduction

Remark 1*.*

II Problem formulation

III Proposed method

Result 1** (Cramér-Rao bound).**

Proof.

Result 2** (Confidence interval).**

Proof.

Result 3** (Nuisance parameter estimates).**

Proof.

IV Experimental results

IV-A Synthetic data

IV-B Prostate cancer data

V Conclusions

-A The derivation of the Cramér-Rao bound

-B The derivation of the confidence interval

-C The derivation of the concentrated cost

*Remark 1**.*

Result 1 (Cramér-Rao bound).

Result 2 (Confidence interval).

Result 3 (Nuisance parameter estimates).