On the conditional distribution of the mean of the two closest among a   set of three observations

I.J.H. Visagie; F. Lombard

arXiv:1906.12106·stat.ME·July 1, 2019

On the conditional distribution of the mean of the two closest among a set of three observations

I.J.H. Visagie, F. Lombard

PDF

Open Access

TL;DR

This paper investigates the statistical properties of a new estimation method for chemical assay values, which adaptively combines two or three measurements based on their differences, analyzing its distribution under normal and Laplace assumptions.

Contribution

It introduces a novel adaptive estimator for chemical measurements and derives its conditional distribution under different distributional assumptions.

Findings

01

Conditional distributions differ significantly between normal and Laplace models.

02

The proposed method improves estimation accuracy when initial measurements differ greatly.

03

Analytical expressions for the estimator's distribution are provided.

Abstract

Chemical analyses of raw materials are often repeated in duplicate or triplicate. The assay values obtained are then combined using a predetermined formula to obtain an estimate of the true value of the material of interest. When duplicate observations are obtained, their average typically serves as an estimate of the true value. On the other hand, the "best of three" method involves taking three measurements and using the average of the two closest ones as estimate of the true value. In this paper, we consider another method which potentially involves three measurements. Initially two measurements are obtained and if their difference is sufficiently small, their average is taken as estimate of the true value. However, if the difference is too large then a third independent measurement is obtained. The estimator is then defined as the average between the third observation and the one…

Equations48

f (x) = \frac{1}{2} exp (- 2 ∣ x - μ ∣) .

f (x) = \frac{1}{2} exp (- 2 ∣ x - μ ∣) .

∣ X_{1} - X_{2} ∣ > r (α),

∣ X_{1} - X_{2} ∣ > r (α),

P [∣ X_{1} - X_{2} ∣ > r (α)] = α

P [∣ X_{1} - X_{2} ∣ > r (α)] = α

\widehat{\mu}=\left\{\begin{array}[]{c}\dfrac{X_{1}+X_{3}}{2}\ if\ |X_{1}-X_{3}|<|X_{2}-X_{3}|\\ \\ \dfrac{X_{2}+X_{3}}{2}\ if\ |X_{2}-X_{3}|<|X_{1}-X_{3}|.\end{array}\right.

\widehat{\mu}=\left\{\begin{array}[]{c}\dfrac{X_{1}+X_{3}}{2}\ if\ |X_{1}-X_{3}|<|X_{2}-X_{3}|\\ \\ \dfrac{X_{2}+X_{3}}{2}\ if\ |X_{2}-X_{3}|<|X_{1}-X_{3}|.\end{array}\right.

G (x, α) = P [\frac{X _{1} + X _{3}}{2} \leq x, X_{1} - X_{2} > r, X_{3} > \frac{X _{1} + X _{2}}{2}],

G (x, α) = P [\frac{X _{1} + X _{3}}{2} \leq x, X_{1} - X_{2} > r, X_{3} > \frac{X _{1} + X _{2}}{2}],

g (x, α) = \frac{d}{d x} G (x, α) .

g (x, α) = \frac{d}{d x} G (x, α) .

h (x, α) = \frac{2}{α} [g (x, α) + g (- x, α)] .

h (x, α) = \frac{2}{α} [g (x, α) + g (- x, α)] .

g (x, α) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} 2 f (2 x - x_{1}) J (x, x_{1}, x_{2}) f (x_{1}) f (x_{2}) d x_{1} d x_{2},

g (x, α) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} 2 f (2 x - x_{1}) J (x, x_{1}, x_{2}) f (x_{1}) f (x_{2}) d x_{1} d x_{2},

J (x, x_{1}, x_{2}) = I (x > \frac{3 x _{1} + x _{2}}{4}, x_{1} - x_{2} > r),

J (x, x_{1}, x_{2}) = I (x > \frac{3 x _{1} + x _{2}}{4}, x_{1} - x_{2} > r),

P (μ > x ∣ ∣ X_{1} - X_{2} ∣ > r)

P (μ > x ∣ ∣ X_{1} - X_{2} ∣ > r)

Z_{j} = \frac{X _{1, j} - X _{2, j}}{0.4} .

Z_{j} = \frac{X _{1, j} - X _{2, j}}{0.4} .

T_{n} = 1 \leq j \leq n max \frac{F ( Z _{j} ) - F _{n} ( Z _{j} )}{F ( Z _{j} ) ( 1 - F ( Z _{j} ) )}_{,}

T_{n} = 1 \leq j \leq n max \frac{F ( Z _{j} ) - F _{n} ( Z _{j} )}{F ( Z _{j} ) ( 1 - F ( Z _{j} ) )}_{,}

F_{n} (x) = \frac{1}{n} j = 1 \sum n I (Z_{j} \leq x) .

F_{n} (x) = \frac{1}{n} j = 1 \sum n I (Z_{j} \leq x) .

P [X \in d x ∣ ∣ X_{1} - X_{2} ∣ > r]

P [X \in d x ∣ ∣ X_{1} - X_{2} ∣ > r]

P [X \in d x, ∣ X_{1} - X_{2} ∣ > r] = 2 P [X \in d x, X_{1} - X_{2} > r] .

P [X \in d x, ∣ X_{1} - X_{2} ∣ > r] = 2 P [X \in d x, X_{1} - X_{2} > r] .

P [X \in d x, X_{1} - X_{2} > r]

P [X \in d x, X_{1} - X_{2} > r]

X_{1} - X_{2} > r an d ∣ X_{1} - X_{3} ∣ < ∣ X_{2} - X_{3} ∣

X_{1} - X_{2} > r an d ∣ X_{1} - X_{3} ∣ < ∣ X_{2} - X_{3} ∣

X_{1} - X_{2} > r an d ∣ X_{1} - X_{3} ∣ > ∣ X_{2} - X_{3} ∣

X_{1} - X_{2} > r an d ∣ X_{1} - X_{3} ∣ > ∣ X_{2} - X_{3} ∣

P [\frac{X _{2} + X _{3}}{2} \in d x, X_{1} - X_{2} > r, X_{3} < \frac{X _{1} + X _{2}}{2}]

P [\frac{X _{2} + X _{3}}{2} \in d x, X_{1} - X_{2} > r, X_{3} < \frac{X _{1} + X _{2}}{2}]

\frac{P [ X \in d x ∣ ∣ X _{1} - X _{2} ∣ > r ]}{2 δ}

\frac{P [ X \in d x ∣ ∣ X _{1} - X _{2} ∣ > r ]}{2 δ}

P [X \in d x ∣ ∣ X_{1} - X_{2} ∣ > r] = \frac{2}{α} (g (x, α) + g (- x, α)) .

P [X \in d x ∣ ∣ X_{1} - X_{2} ∣ > r] = \frac{2}{α} (g (x, α) + g (- x, α)) .

P (\frac{X _{1} + X _{3}}{2} \leq x, X_{1} - X_{2} > r, X_{3} > \frac{X _{1} + X _{2}}{2} X_{1} = x_{1}, X_{2} = x_{2})

P (\frac{X _{1} + X _{3}}{2} \leq x, X_{1} - X_{2} > r, X_{3} > \frac{X _{1} + X _{2}}{2} X_{1} = x_{1}, X_{2} = x_{2})

G (x, a)

G (x, a)

g (x, α)

g (x, α)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Process Monitoring · Pesticide Residue Analysis and Safety · Advanced Statistical Methods and Models

Full text

On the conditional distribution of the mean of the two closest among a set of three observations

I.J.H. Visagie

Department of Statistics, University of Pretoria, South Africa, [email protected]

F. Lombard

Department of Statistics, University of Johannesburg, South Africa

Abstract

Chemical analyses of raw materials are often repeated in duplicate or triplicate. The assay values obtained are then combined using a predetermined formula to obtain an estimate of the true value of the material of interest. When duplicate observations are obtained, their average typically serves as an estimate of the true value. On the other hand, the “best of three” method involves taking three measurements and using the average of the two closest ones as estimate of the true value.

In this paper, we consider another method which potentially involves three measurements. Initially two measurements are obtained and if their difference is sufficiently small, their average is taken as estimate of the true value. However, if the difference is too large then a third independent measurement is obtained. The estimator is then defined as the average between the third observation and the one among the first two which is closest to it.

Our focus in the paper is the conditional distribution of the estimate in cases where the initial difference is too large. We find that the conditional distributions are markedly different under the assumption of a normal distribution and a Laplace distribution.

Keywords: Conditional density, normal distribution, Laplace distribution, closest two out of three.

1 Introduction

Chemical analyses of raw materials are often repeated in duplicate or triplicate. The assay values obtained are then combined using a predetermined formula to obtain an estimate of the true value, $\mu$ , of the material of interest. When duplicate observations $X_{1}$ and $X_{2}$ are obtained, their average typically serves as an estimate of the true value. On the other hand, the “best of three” method involves taking three measurements $X_{1}$ , $X_{2}$ , and $X_{3}$ and using the average of the two closest of these values as estimate of the true value. The statistical properties of this estimator were worked out by Seth (1950) and Lieblein (1952).

In this paper, we consider another method which potentially involves three measurements. Initially two measurements, $X_{1}$ and $X_{2}$ , are obtained. If the difference between $X_{1}$ and $X_{2}$ is sufficiently small, their average is taken as the estimate. If the difference is too large, then a third independent measurement, $X_{3}$ , is obtained. Then the estimator, henceforth denoted by $\widehat{\mu}$ , is the average between $X_{3}$ and the one among $X_{1}$ and $X_{2}$ which is closest to $X_{3}$ . The rationale underlying the method is that whichever one of $X_{1}$ and $X_{2}$ is closest to $X_{3}$ is the least likely to contain a large measurement error.

The usual assumption made in standards documents is that the measurement error is normally distributed. However, Wilson (1923) draws attention to the fact that in some instances there are strong grounds for assuming that the errors follow a Laplace distribution. In the context of a series of observations that estimate the true value of a given parameter, Keynes (1911) asks the following question: “If the most probable value (maximum likelihood estimate in modern terminology) of the quantity is equal to the arithmetic mean of the measurements, what law of error does this imply?” Under the additional assumption that the resulting law of error is symmetric, Keynes shows that it is necessarily normal. Interestingly, he also shows that when the question is restated to enquire about the median instead of the mean, then the resulting law of error is the Laplace distribution which, in standardised form, has density function

[TABLE]

These facts provide motivation for studying the behaviour of the estimator, $\widehat{\mu}$ , under both the normal and Laplace distribution assumptions.

Even if both $X_{1}$ and $X_{2}$ are unbiased estimators of $\mu$ , the measurement errors attached to each will result in a fixed proportion $\alpha\in\left(0,1\right)$ of unacceptably large differences. In other words, a type I error will be made with probability $\alpha$ . In this paper, we investigate the conditional distribution of $\widehat{\mu}$ given that a type I error has occurred. On a purely intuitive level, one would expect this conditional distribution to be symmetric around $\mu$ . This is indeed the case. However, the form of the symmetry is quite surprising. For realistic values of $\alpha$ we have the following. It turns out that for the normal distribution $\widehat{\mu}$ has a bimodal conditional distribution with modes to the left and the right of $\mu$ . For the Laplace distribution the surprise is that $\widehat{\mu}$ has a unimodal distribution with mode $\mu$ .

The remainder of the paper is structured as follows. In Section 2, we define the estimator and derive its conditional density function in the general case where $X_{1}$ , $X_{2}$ and $X_{3}$ are independent and identically distribution (i.i.d.) observations from a symmetric distribution. The conditional density function of the estimator is then computed specifically in the normal and Laplace cases and the surprising difference between the two is illustrated and its possible consequences discussed. In Section 3, we consider a dataset and demonstrate that the Laplace rather than the normal distribution provides an acceptable fit to the observed data.

2 Conditional distribution of the estimator

In the application sketched in the Introduction, the difference between $X_{1}$ and $X_{2}$ is regarded as unacceptably large if

[TABLE]

where $r(\alpha)$ satisfies

[TABLE]

for an a priori given small positive $\alpha$ . In the following, the argument $\alpha$ in $r\left(\alpha\right)$ is suppressed in cases where this is unlikely to lead to confusion. Thus, in the absence of any change in the population mean or standard deviation, the type I error rate will be $\alpha$ . There are two possibilities, namely

(i)

$\left|X_{1}-X_{2}\right|\leq r$ , in which case the estimate $\widehat{\mu}=(X_{1}+X_{2})/2$ ;

(ii)

$\left|X_{1}-X_{2}\right|>r$ , in which case a third observation $X_{3}$ is obtained and

[TABLE]

Since $\mu$ and the standard deviation of the error distribution, $\sigma$ , are assumed to be fixed and known, we may assume without loss of generality that $\mu=0$ and $\sigma=1$ .

Our interest centers on (ii), hence on the conditional distribution of $\widehat{\mu}$ given that $\left|X_{1}-X_{2}\right|>r$ . Let

[TABLE]

and

[TABLE]

We show in Appendix 1 that the conditional density function of $\widehat{\mu}$ , given $\left|X_{1}-X_{2}\right|>r$ , is

[TABLE]

The density $h$ is symmetric around $x=0$ which is what one would expect a priori. However, from a practitioner’s point of view, it is the shape of this density that turns out to be the most interesting and important aspect of the conditional distribution. Given a density function $f$ of the $X_{i}$ , $g\left(x,\alpha\right)$ is given by the expression

[TABLE]

where

[TABLE]

with $\mathbb{I}\left(\cdot\right)$ the indicator function. Substitution of the normal or Laplace density functions into (6) does not lead to any substantial algebraic simplification of the expression for $h\left(x\right)$ . Therefore, we obtain $g(x,\alpha)$ by numerical integration over a fine grid of $x$ values using the Matlab function “integral2.m” - see Appendix 2.

Figure 1 shows the conditional densities (5) of $\widehat{\mu}$ in the normal and Laplace distributions. The density in the normal distribution is bimodal, while in the Laplace distribution it is unimodal. In both cases, the estimator is centered around the population average. Nevertheless, a process engineer is bound to be somewhat perplexed upon seeing the bimodal form in the normal distribution. This phenomenon can, to some extent, be explained as follows. First, the Laplace distribution differs from the normal distribution in some important respects. For instance, the Laplace density has a sharp peak at its point of symmetry, hence is not differentiable there. The tails of the Laplace density are also substantially thicker than those of the normal density. This is perhaps not obvious from visual inspection of Figure 2, which shows plots of the density functions of the two standardised densities.

In order to better appreciate the differences between the tails of the distributions, consider Table 1, which shows the numbers $r(\alpha)$ which make $P\left(|X_{1}-X_{2}|>r(\alpha)\right)=\alpha$ for a range of values of $\alpha$ . The indications are that the Laplace distribution has substantially heavier tails than the normal distribution. In fact, the kurtosis of the Laplace distribution is $6$ , twice that of the normal distribution.

[TABLE]

Second, we now argue that, as a consequence of the preceding remark, the resulting density is bimodal in the case where the separation between $g\left(x,\alpha\right)$ and $g\left(-x,\alpha\right)$ per unit standard deviation is large and unimodal when this separation is small.

Figure 3 shows plots of $g\left(x,\alpha\right)$ and $g\left(-x,\alpha\right)$ for the normal distribution while Figure 4 shows the corresponding plots for the Laplace distribution. The figures clearly indicate that the separation between $g\left(x,\alpha\right)$ and $g\left(-x,\alpha\right)$ is substantially larger under the normal distribution than under the Laplace distribution.

We now discuss some possible consequences of this difference between the two conditional distributions. The quality of coal is determined, in part, by its ash content. The lower the ash content, the greater is the release of energy when the coal is burnt. As a result, the price of coal is often linked to its ash content. Typically, two determinations, $X_{1}$ and $X_{2}$ , of the ash content of a batch of coal are made and the estimate, $\widehat{\mu}$ , is computed as shown above. As pointed out above, even if both determinations are unbiased estimators of $\mu$ , unacceptably large deviations would occur in a proportion $\alpha$ of batches. If $\mu$ denotes the contractual ash content, then ash contents in excess of $\mu$ could attract penalties, i.e., a lower price than that originally agreed upon.

Figure 5 shows conditional exceedance probabilities

[TABLE]

over a range of $x$ values for the normal and Laplace distributions.

From the figure it is clear that deviations up to $1.5$ standard deviations in a normal distribution will tend to attract larger penalties than in a Laplace distribution. This is also rather clear from Figure 1. The economic implications of this are greater than would seem to be apparent at first glance. A batch of coal could consist of several hundreds of tons, which means that the penalty of, for example, $1\%$ of the contractual price could involve hundreds of thousands of dollars.

3 Application to some data

If an enormous amount of data were available, it would be possible to assess empirically which of the conditional densities seen in Figure 1 is the valid one. In the absence of a large amount of data we will have to be satisfied with something less, namely a test of sorts to decide which of the normal or Laplace error distributions is applicable. Towards this, Figure 6 shows the differences $X_{1,j}-X_{2,j}$ , $j=1,...,199$ , for 199 batches of coal. Typically, a prescribed value of $\sigma$ , the common standard deviation of $X_{1}$ and $X_{2}$ , is attained by following a standard operating procedure. In the present instance, the prescribed value was $\sigma=0.4$ . Thus, we standardise the observed differences as follows:

[TABLE]

The resulting sample mean and standard deviation are $-0.06$ and $1.40\ (\approx\sqrt{2})$ respectively.

In order to determine which of the two distributions is most appropriate we use the standardised Kolmogorov-Smirnov statistic:

[TABLE]

where $F$ denotes the cumulative distribution function of $Z$ and $F_{n}$ denotes the usual empirical distribution function

[TABLE]

The observed values of $T_{n}$ in the dataset are $T_{n}=0.27$ and $T_{n}=0.21$ when $F$ is based on the normal and Laplace error distributions respectively. The corresponding $p$ -values obtained from $100\ 000$ Monte Carlo simulations are $0.09$ and $0.21$ respectively. These $p$ -values suggest more support for the Laplace assumption than for the normal in this particular instance.

4 Appendix 1: Derivation of (5)

Let $X_{1}$ and $X_{2}$ denote the first two observations and let $X_{3}$ denote the third sample observation. Given $x$ and a small $\delta>0$ , let $dx$ denote the interval $(x-\delta,x+\delta)$ . Then

[TABLE]

Furthermore, since $(X_{1},X_{2},X_{3})$ has the same distribution as $(X_{2},X_{1},X_{3})$ ,

[TABLE]

Now,

[TABLE]

with the next to last equality following because

[TABLE]

and

[TABLE]

Next, the second term in (4) is

[TABLE]

with the next to last equality following because $(-X_{2},-X_{1},-X_{3})$ has the same distribution as $(X_{1},X_{2},X_{3})$ . Putting (9), (10), (4) and (12) together, we see that

[TABLE]

Letting $\delta\downarrow 0$ gets us to (5):

[TABLE]

5 Appendix 2: Derivation of (6)

Let $X_{1}$ , $X_{2}$ and $X_{3}$ be independent random variables with common distribution function $F$ and density function $f$ . Then, for fixed $x_{1}$ and $x_{2}$ ,

[TABLE]

where $\mathbb{J}$ is defined in (7). Consequently,

[TABLE]

Taking the derivative with respect to $x$ , we obtain

[TABLE]

Bibliography5

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Keynes, J.M. (1911). The principal averages and the laws of error which lead to them. Journal of the Royal Statistical Society, 74, 322-331.
2[2] Lieblein, J. (1952). Properties of certain statistics involving the closest pair in a sample of three observations , Journal of Research of the National Bureau of Standards, 48 (3) , 255-268.
3[3] MATLAB Release 2018 b, The Math Works, Inc., Natick, Massachusetts, United States.
4[4] Seth, G.R. (1950). On the distribution of the two closest among a set of three observations. The Annals of Mathematical Statistics, 21 (2), 298-301.
5[5] Wilson, E.B. (1923). First and second laws of error. Journal of the American Statistical Association , 18 , 841-851.