Conditional bias reduction can be dangerous: a key example from   sequential analysis

Ben Berckmoes; Anna Ivanova; Geert Molenberghs

arXiv:1812.06046·math.ST·December 18, 2018

Conditional bias reduction can be dangerous: a key example from sequential analysis

Ben Berckmoes, Anna Ivanova, Geert Molenberghs

PDF

Open Access

TL;DR

This paper demonstrates that applying conditional bias reduction in sequential analysis can lead to infinite mean absolute error, highlighting potential dangers of this approach.

Contribution

It provides a critical example showing that conditional bias reduction may have unintended harmful effects in sequential analysis.

Findings

01

Conditional bias reduction can cause infinite mean absolute error.

02

The paper offers a key example illustrating this danger.

03

Highlights the need for caution in bias reduction methods.

Abstract

We present a key example from sequential analysis, which illustrates that conditional bias reduction can cause infinite mean absolute error.

Equations36

P [N_{n} = n ∣ X_{1}, \dots, X_{n}] = ψ (K_{n} / n^{γ}),

P [N_{n} = n ∣ X_{1}, \dots, X_{n}] = ψ (K_{n} / n^{γ}),

L (θ; X_{1}, \dots, X_{N_{n}}) = \frac{1}{σ ^{N_{n}}} k = 1 \prod N_{n} ϕ (\frac{x _{k} - θ}{σ}),

L (θ; X_{1}, \dots, X_{N_{n}}) = \frac{1}{σ ^{N_{n}}} k = 1 \prod N_{n} ϕ (\frac{x _{k} - θ}{σ}),

μ_{N_{n}} = \frac{1}{N _{n}} k = 1 \sum N_{n} X_{k} .

μ_{N_{n}} = \frac{1}{N _{n}} k = 1 \sum N_{n} X_{k} .

L (θ; X_{1}, \dots, X_{m} ∣ N_{n} = m) = \frac{1}{σ ^{m}} k = 1 \prod m ϕ (\frac{X _{k} - θ}{σ}) \frac{P _{θ} [ N = m ∣ X _{1} , \dots , X _{m} ]}{P _{θ} [ N _{n} = m ]} .

L (θ; X_{1}, \dots, X_{m} ∣ N_{n} = m) = \frac{1}{σ ^{m}} k = 1 \prod m ϕ (\frac{X _{k} - θ}{σ}) \frac{P _{θ} [ N = m ∣ X _{1} , \dots , X _{m} ]}{P _{θ} [ N _{n} = m ]} .

n \to \infty lim E [∣ μ_{N_{n}} ∣] = 0 and \forall n \in N_{0} : E [∣ μ_{c, N_{n}} ∣] = \infty.

n \to \infty lim E [∣ μ_{N_{n}} ∣] = 0 and \forall n \in N_{0} : E [∣ μ_{c, N_{n}} ∣] = \infty.

\mathbb{P}[N_{n}=n\mid X_{1},\ldots,X_{n}]=\left\{\begin{array}[]{clrr}1&\textrm{ if }&K_{n}\geq 0\\ 0&\textrm{ if }&K_{n}<0\end{array}\right..

\mathbb{P}[N_{n}=n\mid X_{1},\ldots,X_{n}]=\left\{\begin{array}[]{clrr}1&\textrm{ if }&K_{n}\geq 0\\ 0&\textrm{ if }&K_{n}<0\end{array}\right..

f_{N_{n}, K_{N_{n}}} (n, k) = \frac{1}{n} ϕ (\frac{k}{n}) 1_{[0, \infty [} (k)

f_{N_{n}, K_{N_{n}}} (n, k) = \frac{1}{n} ϕ (\frac{k}{n}) 1_{[0, \infty [} (k)

f_{N_{n}, K_{N_{n}}} (2 n, k)

f_{N_{n}, K_{N_{n}}} (2 n, k)

E [∣ μ_{N_{n}} ∣]

E [∣ μ_{N_{n}} ∣]

n \to \infty lim E [∣ μ_{N_{n}} ∣] = 0.

n \to \infty lim E [∣ μ_{N_{n}} ∣] = 0.

\frac{1}{n} K_{n} = ψ_{1} (n θ),

\frac{1}{n} K_{n} = ψ_{1} (n θ),

\frac{1}{2 n} K_{2 n} = ψ_{2} (n θ),

\frac{1}{2 n} K_{2 n} = ψ_{2} (n θ),

μ_{c, n} = \frac{1}{n} ψ_{1}^{- 1} (\frac{1}{n} K_{n})

μ_{c, n} = \frac{1}{n} ψ_{1}^{- 1} (\frac{1}{n} K_{n})

μ_{c, 2 n} = \frac{1}{n} ψ_{2}^{- 1} (\frac{1}{2 n} K_{2 n})

μ_{c, 2 n} = \frac{1}{n} ψ_{2}^{- 1} (\frac{1}{2 n} K_{2 n})

E [∣ μ_{c, N_{n}} ∣] \geq \frac{1}{n} \int_{ψ_{1} (- N)}^{ψ_{1} (0)} ψ_{1}^{- 1} (u) ϕ (u) d u \geq - \frac{1}{n} ϕ (ψ_{1} (0)) \int_{ψ_{1} (- N)}^{ψ_{1} (0)} ψ_{1}^{- 1} (u) d u,

E [∣ μ_{c, N_{n}} ∣] \geq \frac{1}{n} \int_{ψ_{1} (- N)}^{ψ_{1} (0)} ψ_{1}^{- 1} (u) ϕ (u) d u \geq - \frac{1}{n} ϕ (ψ_{1} (0)) \int_{ψ_{1} (- N)}^{ψ_{1} (0)} ψ_{1}^{- 1} (u) d u,

= \frac{1}{n} ϕ (ψ_{1} (0)) \int_{- N}^{0} ψ_{1} (u) d u - N ψ_{1} (- N),

= \frac{1}{n} ϕ (ψ_{1} (0)) \int_{- N}^{0} ψ_{1} (u) d u - N ψ_{1} (- N),

= \frac{1}{n} ϕ (ψ_{1} (0)) (lo g (1/2) + \frac{N ^{2}}{2} - lo g Φ (- N) - N \frac{ϕ ( - N )}{Φ ( - N )}) .

= \frac{1}{n} ϕ (ψ_{1} (0)) (lo g (1/2) + \frac{N ^{2}}{2} - lo g Φ (- N) - N \frac{ϕ ( - N )}{Φ ( - N )}) .

\forall n : E [∣ μ_{c, N_{n}} ∣] = \infty.

\forall n : E [∣ μ_{c, N_{n}} ∣] = \infty.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods in Clinical Trials · Optimal Experimental Design Methods · Advanced Statistical Process Monitoring

Full text

Conditional bias reduction can be dangerous:

a key example from sequential analysis

Ben Berckmoes, Anna Ivanova, Geert Molenberghs

Abstract.

We present a key example from sequential analysis, which illustrates that conditional bias reduction can cause infinite mean absolute error.

Key words and phrases:

conditional MLE, marginal MLE, group sequential trial, mean absolute error

Ben Berckmoes is post doctoral fellow at the Fund for Scientific Research of Flanders (FWO)

Financial support from the IAP research network #P7/06 of the Belgian Government (Belgian Science Policy) is gratefully acknowledged.

1. Introduction

The following group sequential paradigm has been studied extensively in the literature, see e.g. [BIM18, C89, EF90, FDL00, HP88, LH99, MKA14, W92].

Let $X_{1},X_{2},\ldots$ be independent and identically distributed observations with normal law $N(\mu,\sigma^{2})$ , and, for each $n\in\mathbb{N}_{0}$ , $N_{n}$ an $\{n,2n\}$ -valued random sample size that solely depends on $X_{1},\ldots,X_{n}$ through the stopping rule

[TABLE]

where $\psi$ is a Borel measurable map of $\mathbb{R}$ into $[0,1]$ , $\gamma\in\mathbb{R}^{+}_{0}$ a shape parameter, and, for $m\in\mathbb{N}_{0}$ , $K_{m}=\sum_{k=1}^{m}X_{k}$ . The choice $\gamma=1/2$ leads to Pocock boundaries ([P77]) and the choice $\gamma=0$ to O’Brien-Fleming boundaries ([OF79]).

The above setting models the idea that, after having collected the data $X_{1},\ldots,X_{n}$ , it is decided, based on the stopping rule (1), if the trial is stopped (that is, the final sample size is $N_{n}=n$ ), or continued (that is, the additional data $X_{n+1},\ldots,X_{2n}$ are collected and the final sample size is $N_{n}=2n$ ).

Assuming $\sigma$ known, the following estimators for the location parameter $\mu$ are often discussed in the literature ([FDL00],[MKA14]):

(a) the marginal MLE, defined by the parameter value that maximizes the marginal likelihood

[TABLE]

where $\phi$ is the standard normal density. Of course, the marginal MLE is the ordinary sample mean

[TABLE]

This approach is simple, because it is based on the likelihood of the collected data only, without taking the stopping mechanism into account. The marginal MLE has been criticized in the literature, because it has potentially large bias ([EF90]). However, it was shown in [BIM18] that in many cases the bias vanishes quickly if $n$ grows.

(b) the conditional MLE $\widehat{\mu}_{c,N_{n}}$ , defined by the parameter value that, for $N_{n}=m$ , maximizes the conditional likelihood

[TABLE]

This approach is complex, because contrary to the marginal MLE, it also models the stopping mechanism. An explicit value for the conditional MLE cannot be obtained, and one has to rely on numerical methods to calculate it. However, the conditional MLE, also known as the conditional bias reduction estimate ([FDL00]), is favored by the literature because it is claimed to reduce bias by taking all information into account.

In this paper, we will show that if we take $\mu=0$ , $\sigma=1$ , $\psi=1_{\left[0,\infty\right[}$ , and $\gamma$ arbitrary, then

[TABLE]

That is, conditional bias reduction can cause infinite mean absolute error.

2. Mean absolute error

We keep the setting of the previous section, and we take $\mu=0$ , $\sigma=1$ , $\psi=1_{\left[0,\infty\right[}$ , and $\gamma$ arbitrary. So the stopping rule (1) is now turned into

[TABLE]

That is, after having collected the $N(0,1)$ -data $X_{1},\ldots,X_{n}$ , the trial is stopped if $K_{n}\geq 0$ and continued otherwise.

We first focus on the marginal MLE $\widehat{\mu}_{N_{n}}=K_{N_{n}}/N_{n}$ . Let $\phi$ be the standard normal density and $\Phi$ the standard normal cumulative distribution function. Following [BIM18], we see that the joint density of $N_{n}$ and $K_{N_{n}}$ is given by

[TABLE]

and

[TABLE]

We learn from (5) and (2) that

[TABLE]

with $\xi$ a standard normally distributed random variable. It clearly follows from (2) that

[TABLE]

That is, the mean absolute error of $\widehat{\mu}_{N_{n}}$ with respect to the true parameter [math] vanishes if $n\to\infty$ .

We now turn to the conditional MLE $\widehat{\mu}_{c,N_{n}}$ , which maximizes the conditional likelihood (4). It is easily seen that this estimator is obtained by solving the equation

[TABLE]

with $\psi_{1}(x)=x+\frac{\phi(x)}{\Phi(x)}$ , in the case $N_{n}=n$ , and the equation

[TABLE]

with $\psi_{2}(x)=x\sqrt{2}+\frac{1}{\sqrt{2}}\frac{\phi(x)}{1-\Phi(x)}$ , in the case $N_{n}=2n$ . One checks numerically that the map $\psi_{1}$ strictly increases on $\mathbb{R}$ from [math] to $\infty$ and that the map $\psi_{2}$ strictly increases on $\mathbb{R}$ from $-\infty$ to $\infty$ . In particular, $\psi_{1}$ and $\psi_{2}$ are bijective, from which it follows that $\widehat{\mu}_{c,N_{n}}$ is uniquely defined by

[TABLE]

if $N_{n}=n$ , and

[TABLE]

if $N_{n}=2n$ . Applying the Transformation Theorem, and using (5) and (9), we get, for each $n\in\mathbb{N}_{0}$ and each $N\in\mathbb{N}_{0}$ ,

[TABLE]

which, by the well known integral equality $\int_{a}^{b}f(x)dx+\int_{f(a)}^{f(b)}f^{-1}(x)dx=bf(b)-af(a),$

[TABLE]

which, plugging in the definition of $\psi_{1}$ and calculating the integral,

[TABLE]

It can be checked numerically that, for fixed $n$ , expression (11) tends to $\infty$ if $N\to\infty$ . We conclude that

[TABLE]

We infer from (8) and (12) that conditional bias reduction can cause infinite mean absolute error.

Bibliography11

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[BIM 18] Berckmoes, B.; Ivanova, A.; Molenberghs, G. (2018) On the sample mean after a group sequential trial Computational Statistics & Data Analysis, 125, 104-118. https://arxiv.org/pdf/1706.01291.pdf
2[C 89] Chang, M. N. (1989) Confidence intervals for a normal mean following a group sequential test. Biometrics 45, no. 1, 247–254.
3[EF 90] Emerson, S. S.; Fleming, T. R. (1990) Parameter estimation following group sequential hypothesis testing. Biometrika 77, 875–892.
4[FDL 00] Fan, X. F.; De Mets, D. L.; Lan, G. (2000) Bias point of estimation following a group sequential test. Technical report https://www.biostat.wisc.edu/sites/default/files/tr_157.pdf
5[HP 88] Hughes, M.D.; Pocock, S.J. (1988) Stopping rules and estimation problems in clinical trials. Statistics in Medicine 7, 1231–1242.
6[LH 99] Liu, A.; Hall, W. J. (1999) Unbiased estimation following a group sequential test. Biometrika 86, 71–78.
7[MKA 14] Molenberghs, G.; Kenward, M. G.; Aerts, M.; Verbeke, G.; Tsiatis, A. A.; Davidian, M.; Rizopoulos, D. (2014) On random sample size, ignorability, ancillarity, completeness, separability, and degeneracy: sequential trials, random sample sizes, and missing data. Stat. Methods Med. Res. 23, no. 1, 11–41.
8[S 78] Siegmund, D. (1978) Estimation following sequential tests. Biometrika 64, 191–199.