Overlap Coefficients Based on Kullback-Leibler Divergence: Exponential   Populations Case

Hamza Dhaker; Papa Ngom; Malick Mbodj

arXiv:1704.02671·stat.ME·April 11, 2017

Overlap Coefficients Based on Kullback-Leibler Divergence: Exponential Populations Case

Hamza Dhaker, Papa Ngom, Malick Mbodj

PDF

TL;DR

This paper introduces a new overlap coefficient based on Kullback-Leibler divergence for exponential populations, compares it with existing measures, and discusses statistical inference methods including confidence intervals and estimator properties.

Contribution

A novel overlap measure $ ext{ extLambda}$ based on Kullback-Leibler divergence is proposed for exponential populations, along with inference techniques and property analyses.

Findings

01

The new measure $ ext{ extLambda}$ is invariant and effective.

02

Confidence intervals for overlap measures are constructed using Taylor series.

03

Simulation studies evaluate bias and mean square error of estimators.

Abstract

This article is devoted to the study of overlap measures of densities of two exponential populations. Various Overlapping Coefficients, namely: Matusita's measure $ρ$ , Morisita's measure $λ$ and Weitzman's measure $Δ$ . A new overlap measure $Λ$ based on Kullback-Leibler measure is proposed. The invariance property and a method of statistical inference of these coefficients also are presented. Taylor series approximation are used to construct confidence intervals for the overlap measures. The bias and mean square error properties of the estimators are studied through a simulation study.

Tables2

OVL	lower limit $(L^{^{'}})$	upper limit $(U^{^{'}})$
$Δ$	$1 - L^{\frac{1}{1 - L}} \| 1 - \frac{1}{L} \|$	$1 - U^{\frac{1}{1 - U}} \| 1 - \frac{1}{U} \|$
$ρ$	$\frac{2 \sqrt{L}}{(L + 1)}$	$\frac{2 \sqrt{U}}{(U + 1)}$
$λ$	$\frac{4 L}{{(L + 1)}^{2}}$	$\frac{4 U}{{(U + 1)}^{2}}$
$Λ$	$\frac{L}{L^{2} + L - 1}$	$\frac{U}{U^{2} + U - 1}$

	$\hat{ρ}$			$\hat{λ}$			$\hat{Δ}$			$\hat{Λ}$
$n$	$B i a s$	$M S E$	$R a t i o$	$B i a s$	$M S E$	$R a t i o$	$B i a s$	$M S E$	$R a t i o$	$B i a s$	$M S E$	$R a t i o$
c=0.2	$ρ = 0.745$			$λ = 0.556$			$Δ = 0.465$			$Λ = 0.24$
20	-0.029	0.007	-0.36	-0.030	0.016	-0.25	-0.0180	0.008	-0.061	0.0060	0.0080	0.067
50	-0.011	0.003	-0.22	-0.012	0.006	-0.15	-0.0070	0.007	-0.030	0.0020	0.0030	0.041
100	-0.055	0.001	-0.15	-0.056	0.003	-0.11	-0.0034	0.0015	-0.017	0.0011	0.0015	0.029
200	-0.003	0.000^∗	-0.11	-0.003	0.001	-0.07	-0.0020	0.027	0.010	0.000^∗	0.000^∗	0.020
500	-0.001	0.000^∗	-0.07	-0.001	0.000^∗	-0.05	0.000^∗	0.000^∗	-0.039	0.000^∗	0.000^∗	0.013
c=0.5	$ρ = 0.943$			$λ = 0.889$			$Δ = 0.750$			$Λ = 0.667$
20	-0.036	0.0040	-0.71	-0.640	0.0140	-0.66	-0.031	0.014	-0.092	0.048	0.0500	0.22
50	-0.014	0.0010	-0.44	-0.024	0.0040	-0.41	-0.012	0.005	-0.045	0.018	0.0190	0.013
100	-0.007	0.000^∗	-0.31	-0.012	0.0020	-0.28	-0.006	0.0024	-0.026	0.009	0.0090	0.095
200	-0.003	0.000^∗	-0.27	-0.006	0.000^∗	-0.20	-0.003	0.001	-0.015	0.004	0.0045	0.067
500	-0.001	0.000^∗	-0.13	-0.002	0.000^∗	-0.13	-0.001	0.000^∗	-0.05	-0.0018	0.0018	-0.042
c=0.8	$ρ = 0.994$			$λ = 0.988$			$Δ = 0.918$			$Λ = 0.952$
20	-0.032	0.001	-0.87	-0.063	0.005	-0.87	-0.037	0.016	-0.3	-0.20	0.061	-0.84
50	-0.012	0.000^∗	-0.74	-0.024	0.0011	-0.73	-0.014	0.006	-0.19	-0.079	0.013	-0.69
100	-0.006	0.000^∗	-0.61	-0.012	0.000^∗	-0.6	-0.007	0.0027	-0.133	-0.039	0.005	-0.56
200	-0.003	0.000^∗	-0.47	-0.006	0.000^∗	-0.47	-0.003	0.001	-0.09	-0.019	0.002	-0.43
500	-0.001	0.000^∗	-0.32	-0.002	0.000^∗	-0.32	-0.001	0.000^∗	-0.06	-0.008	0.000^∗	-0.28

Equations58

Δ = \int min [f_{1} (x), f_{2} (x)] d x .

Δ = \int min [f_{1} (x), f_{2} (x)] d x .

ρ = \int f_{1} (x) f_{2} (x) d x

ρ = \int f_{1} (x) f_{2} (x) d x

λ = \frac{2 \int f _{1} ( x ) f _{2} ( x ) d x}{\int [ f _{1} ( x ) ] ^{2} d x + \int [ f _{2} ( x ) ] ^{2} d x}

λ = \frac{2 \int f _{1} ( x ) f _{2} ( x ) d x}{\int [ f _{1} ( x ) ] ^{2} d x + \int [ f _{2} ( x ) ] ^{2} d x}

Λ = \frac{1}{1 + K L ( f _{1} ∥ f _{2} )}

Λ = \frac{1}{1 + K L ( f _{1} ∥ f _{2} )}

f_{i} (x; θ_{i}) = θ_{i} exp (θ_{i} x), f or x \in (0, \infty) .

f_{i} (x; θ_{i}) = θ_{i} exp (θ_{i} x), f or x \in (0, \infty) .

Δ = 1 - ∣1 - \frac{1}{R} ∣ R^{\frac{1}{1 - R}} R \neq = 1

Δ = 1 - ∣1 - \frac{1}{R} ∣ R^{\frac{1}{1 - R}} R \neq = 1

ρ = \frac{2 R}{1 + R}

ρ = \frac{2 R}{1 + R}

λ = \frac{4 R}{( 1 + R ) ^{2}}

λ = \frac{4 R}{( 1 + R ) ^{2}}

Λ = \frac{R}{R ^{2} - R + 1}

Λ = \frac{R}{R ^{2} - R + 1}

f_{1} (x) = \frac{1}{θ _{1}} exp {- \frac{x}{θ _{1}}} x > 0

f_{1} (x) = \frac{1}{θ _{1}} exp {- \frac{x}{θ _{1}}} x > 0

f_{2} (x) = \frac{1}{θ _{2}} exp {- \frac{x}{θ _{2}}} x > 0

f_{2} (x) = \frac{1}{θ _{2}} exp {- \frac{x}{θ _{2}}} x > 0

θ_{1} = \overline{X}_{1} = \frac{\sum _{i = 1}^{n_{1}} X _{1 i}}{n _{1}}

θ_{1} = \overline{X}_{1} = \frac{\sum _{i = 1}^{n_{1}} X _{1 i}}{n _{1}}

θ_{2} = \overline{X}_{2} = \frac{\sum _{i = 1}^{n_{2}} X _{2 i}}{n _{2}}

θ_{2} = \overline{X}_{2} = \frac{\sum _{i = 1}^{n_{2}} X _{2 i}}{n _{2}}

θ_{1} \sim G (n_{1}, \frac{θ _{1}}{n _{1}}) θ_{2} \sim G (n_{2}, \frac{θ _{2}}{n _{2}})

θ_{1} \sim G (n_{1}, \frac{θ _{1}}{n _{1}}) θ_{2} \sim G (n_{2}, \frac{θ _{2}}{n _{2}})

V a r (R^{*}) = R^{2} \frac{( n _{1} + n _{2} - 1 )}{n _{1} ( n _{2} - 2 )}

V a r (R^{*}) = R^{2} \frac{( n _{1} + n _{2} - 1 )}{n _{1} ( n _{2} - 2 )}

Δ = 1 - ∣1 - \frac{1}{R ^{*}} ∣ (R^{*})^{\frac{1}{1 - R ^{*}}}

Δ = 1 - ∣1 - \frac{1}{R ^{*}} ∣ (R^{*})^{\frac{1}{1 - R ^{*}}}

ρ = \frac{2 R ^{*}}{1 + R ^{*}}

ρ = \frac{2 R ^{*}}{1 + R ^{*}}

λ = \frac{4 R ^{*}}{( 1 + R ^{*} ) ^{2}}

λ = \frac{4 R ^{*}}{( 1 + R ^{*} ) ^{2}}

Λ = \frac{R}{R ^{2} - R + 1}

Λ = \frac{R}{R ^{2} - R + 1}

V a r (Δ) = \frac{( n _{1} + n _{2} - 1 ) ( R ) ^{\frac{2}{1 - R}} ( lo g R ) ^{2}}{n _{1} ( n _{2} - 2 ) ( 1 - R ) ^{2}}

V a r (Δ) = \frac{( n _{1} + n _{2} - 1 ) ( R ) ^{\frac{2}{1 - R}} ( lo g R ) ^{2}}{n _{1} ( n _{2} - 2 ) ( 1 - R ) ^{2}}

V a r (ρ) = \frac{R ( 1 - R ) ^{2} ( n _{1} + n _{2} - 1 )}{n _{1} ( n _{2} - 2 ) ( 1 + R ) ^{4}}

V a r (ρ) = \frac{R ( 1 - R ) ^{2} ( n _{1} + n _{2} - 1 )}{n _{1} ( n _{2} - 2 ) ( 1 + R ) ^{4}}

V a r (λ) = \frac{16 R ^{2} ( 1 - R ) ^{2} ( n _{1} + n _{2} - 1 )}{n _{1} ( n _{2} - 2 ) ( 1 + R ) ^{6}}

V a r (λ) = \frac{16 R ^{2} ( 1 - R ) ^{2} ( n _{1} + n _{2} - 1 )}{n _{1} ( n _{2} - 2 ) ( 1 + R ) ^{6}}

V a r (Λ) = \frac{( n _{1} + n _{2} - 1 )}{n _{1} ( n _{2} - 2 )} \frac{R ^{2} ( 1 - R ^{2} ) ^{2}}{( R ^{2} - R + 1 ) ^{4}}

V a r (Λ) = \frac{( n _{1} + n _{2} - 1 )}{n _{1} ( n _{2} - 2 )} \frac{R ^{2} ( 1 - R ^{2} ) ^{2}}{( R ^{2} - R + 1 ) ^{4}}

\displaystyle Bias(\widehat{\Delta})=\displaystyle\left\{\begin{array}[]{l}\frac{(n_{1}+n_{2}-1)R^{2}}{n_{1}(n_{2}-2)}\frac{R^{\frac{2R-1}{1-R}}[R(2R-\log(R)-2)\log(R)-(R-1)^{2}]}{(R-1)^{3}}\quad if\quad R>1\\ \\ \frac{(n_{1}+n_{2}-1)R^{2}}{n_{1}(n_{2}-2)}\frac{R^{\frac{2R-1}{1-R}}[R(2R-\log(R)-2)\log(R)-(R-1)^{2}]}{(1-R)^{3}}\quad if\quad R<1\end{array}\right.

\displaystyle Bias(\widehat{\Delta})=\displaystyle\left\{\begin{array}[]{l}\frac{(n_{1}+n_{2}-1)R^{2}}{n_{1}(n_{2}-2)}\frac{R^{\frac{2R-1}{1-R}}[R(2R-\log(R)-2)\log(R)-(R-1)^{2}]}{(R-1)^{3}}\quad if\quad R>1\\ \\ \frac{(n_{1}+n_{2}-1)R^{2}}{n_{1}(n_{2}-2)}\frac{R^{\frac{2R-1}{1-R}}[R(2R-\log(R)-2)\log(R)-(R-1)^{2}]}{(1-R)^{3}}\quad if\quad R<1\end{array}\right.

B ia s (ρ^{*}) = \frac{( n _{1} + n _{2} - 1 ) R}{n _{1} ( n _{2} - 2 )} \frac{3 R ( R - 2 ) - 1}{2 ( R + 1 ) ^{3}}

B ia s (ρ^{*}) = \frac{( n _{1} + n _{2} - 1 ) R}{n _{1} ( n _{2} - 2 )} \frac{3 R ( R - 2 ) - 1}{2 ( R + 1 ) ^{3}}

B ia s (λ^{*}) = \frac{n _{1} + n _{2} - 1}{n _{1} ( n _{2} - 2 )} \frac{8 R ^{2} ( R - 2 )}{( R + 1 ) ^{4}}

B ia s (λ^{*}) = \frac{n _{1} + n _{2} - 1}{n _{1} ( n _{2} - 2 )} \frac{8 R ^{2} ( R - 2 )}{( R + 1 ) ^{4}}

B ia s (Λ) = - \frac{n _{1} + n _{2} - 1}{n _{1} ( n _{2} - 2 )} \frac{R ^{2} ( 2 R ^{3} - 6 R + 2 )}{( R ^{2} - R + 1 ) ^{3}}

B ia s (Λ) = - \frac{n _{1} + n _{2} - 1}{n _{1} ( n _{2} - 2 )} \frac{R ^{2} ( 2 R ^{3} - 6 R + 2 )}{( R ^{2} - R + 1 ) ^{3}}

P (F_{(2 n_{1}, 2 n_{2})}^{α /2} < \frac{θ _{2}}{θ _{1}} R < F_{(2 n_{1}, 2 n_{2})}^{1 - α /2}) = 1 - α

P (F_{(2 n_{1}, 2 n_{2})}^{α /2} < \frac{θ _{2}}{θ _{1}} R < F_{(2 n_{1}, 2 n_{2})}^{1 - α /2}) = 1 - α

L = \frac{R}{F _{(2 n_{1}, 2 n_{2})}^{1 - α /2}} an d U = \frac{R}{F _{(2 n_{1}, 2 n_{2})}^{α /2}}

L = \frac{R}{F _{(2 n_{1}, 2 n_{2})}^{1 - α /2}} an d U = \frac{R}{F _{(2 n_{1}, 2 n_{2})}^{α /2}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Overlap Coefficients Based on Kullback-Leibler Divergence: Exponential Populations Case

Hamza Dhaker

[email protected]

Papa Ngom

Malick Mbodj

LMDAN,Université Cheikh Anta Diop, Dakar, Senegal

LMA,Université Cheikh Anta Diop, Dakar, Senegal

Bowie State University, Maryland, USA

Abstract

This article is devoted to the study of overlap measures of densities of two exponential populations. Various Overlapping Coefficients, namely: Matusita’s measure $\rho$ , Morisita’s measure $\lambda$ and Weitzman’s measure $\Delta$ . A new overlap measure $\Lambda$ based on Kullback-Leibler measure is proposed. The invariance property and a method of statistical inference of these coefficients also are presented. Taylor series approximation are used to construct confidence intervals for the overlap measures. The bias and mean square error properties of the estimators are studied through a simulation study.

keywords:

Kullback-Leibler divergence; Matusita’s measure; Morisita’s measure; Weitzman’s measure; overlap coefficients; Taylor expansion.

††journal: Journal Name

1 Introduction

The similarity between two densities can be considered as the commonality shared by both populations. Generally it is measured on the scale of [math] to $1$ . Values of measure close to [math] corresponding to the distributions having supports with no intersection and $1$ to the perfect matching of the two distributions. Scientists from different disciplines propose different measures of similarity serving different purposes.

By using delta method Smith [20] derived formulas for estimating the mean and the variance of the discrete version of Weizman’s measure (also known as the overlap coefficient). Mishra et al. [12] gave the small and large sample properties of the sampling distributions for a function of this overlap measure estimator, under the assumption of homogeneity of variances for the case of two normal distributions. Mulekar and Mishra [14] simulated the sampling distribution of estimators of the overlap measures when the two densities correspond to the normal case with equal means and obtained the approximate expressions for the bias and variance of their estimators.

Smith [20] derived approximate formulas using the delta method for estimating the mean and variance of the discrete version of one such measure known as Weitzman’s measure $\Delta$ (Weitzman [21]) (also known as the overlap coefficient). Mishra et al. [12] gave some properties of the sampling distributions for a function of the estimator, under the assumption of homogeneity of variances for the case of two normal distributions. Recently, several authors including Bradley and Piantadosi [4], Inman and Bradley [8], Clemons [5], Reiser and Faraggi [18], Clemons and Bradley [6], Mulekar and Mishra [15], Al-Saidy, et al. [1], Al-Saleh and Samawi [2], and Samawi and Al-Saleh [19] considered this measure.

Dixon [7] described the use of bootstrap and jackknife techniques for the Gini coefficient of size hierarchy, a commonly used measure of similarity between income distributions of two ethnic, gender, or geographical groups, and the Jaccard index of community similarity. AL-Saidy et al. [1] consider the problem of drawing inference about the three overlap measures under the Weibul distribution function with equal shape parameter. Wei Ning et al [16] have compared mixtures of generalized lambda distributions (GLDs) with normal mixtures by using KullbackLeibler $(KL)$ distance and overlapping coefficient $(\delta)$ .

The main objective of this paper is to propose a new $OVL$ based on the Kulback-Leibler divergence [9] for two Exponential distributions, i.e. from a measure of divergence or dissimilarity, we construct a measure of similarity noted $\Lambda$ defined in (1). We provide its maximum likelihood estimator.

The coefficients and their properties are given in section 2. The expressions for approximate bias and variance of $OVL$ are included in section 3. A method for making statistical inferences about the $OVLs$ is also discussed in this section. The results of simulation study are described in section 4, along with an example demonstrating the usefulness of $OVLs$ . Finally, the conclusion and perspective is presented in Section 5.

2 Overlap Coefficients

We consider four different similarity measures (the overlap coefficients ( $OVL$ )): Matusita’s measure $\rho$ , Morisita’s measure $\lambda$ , Weitzman’s measure $\Delta$ and the measure based Kullback-Leibler divergence $\Lambda$ . The overlap measure ( $OVL$ ) is defined as the area of intersection of the graphs of two probability density functions. It measures the similarity, which is the agreement or the closeness of the two probability distributions.

Let $F_{1}(x)$ and $F_{2}(x)$ be two distribution functions with the corresponding density functions with respect to the Lebesgue measure. Four commonly used measures that describe the closeness between $F_{1}(x)$ and $F_{2}(x)$ are described below;

Weitzman’s Measure [21] The overlapping coefficient $\Delta$ is the area under two functions simultaneously, defined as,

[TABLE]

2.

Matusita’s Measure [11] second measure studied here is known as the Matusita’s measure, $\rho$ , which is defined as,

[TABLE]

This measure is based on the distance between two functions (Matusita [11]). Matusita actually developed a discrete version of $\rho$ , which is also known as the Freeman-Tukey measure (FT). This measure is related to the Hellinger distance (Rao [17] and Beran [3]).

3.

Morisita’s Measure [13] Morisita proposed an index of similarity between communities. Consider an ecological study involving two populations from each of which a random sample is taken, defined as,

[TABLE]

4.

Kullback-Leibler [9] : The Kullback-Leibler divergence was originally introduced by Solomon Kullback and Richard Leibler in 1951 as the directed divergence between two distributions. It is discussed in Kullback’s historic text, Information Theory and Statistics.

the overlap coefficient $\Lambda$ is the complement of Kullback-Leibler

[TABLE]

with $KL(f_{1}\|f_{2})=\int(f_{1}-f_{2})\log\left(\frac{f_{1}}{f_{2}}\right)dx$

2.1 Overlap measures (OVL) for Exponential Distribution

The simplest and most commonly used distribution in survival and reliability analysis is the one-parameter exponential distribution. Suppose $f_{i}(x;\theta_{i})$ indicate two exponential populations with respective hazard rates $\theta_{i}>0(i=1,2)$ , that is

[TABLE]

The Overlapping Coefficients is shown graphically in Figure 2.

Let $R=\frac{\theta_{1}}{\theta_{2}}$ , the ratio of hazard rates, then these measures can be shown to be functions of $R$ as follows

[TABLE]

and

[TABLE]

Lemma 1

For OVLs defined earlier,

a)

$0\leq OVL\leq 1$ * for all $R\geq 0$ *

b)

$OVL=1$ * iff $R=1$ *

c)

$OVL=0$ * iff $R=0$ or $R=\infty$ *

ll four OVLs possess properties of reciprocity, invariance, and piecewise monotonicity

a)

$OVL(R)=OVL(1/R)$ **

b)

$OVLs$ * are monotonically increasing in $R$ for $0\leq R\leq 1$ and decreasing in $R>1$ *

3 Bias and Variance of Estimates

As noted earlier, the overlap coefficients are functions of the ratio. Most commonly, in the estimation of ratios, estimators that are convenient and easy to understand are found to be biased. As noted by Lu, et al. (1989), the OVLs in this study are no exception to it. The amount of bias is $B(OVL)=\mathbb{E}(OVL)-OVL$ . To examine the effects of bias, approximate expressions for the mean and the variance of estimates are obtained.

suppose that $(X_{ij};j=1,...,n_{i};i=1,2)$ denote independent observation from two independent random samples draw from $f_{1}(x)$ and $f_{2}(x)$ respectively, where

[TABLE]

and

[TABLE]

The maximum likelihood estimators (MLEs) based on the two samples are given by:

From the first sample:

[TABLE]

2)

From the second sample:

[TABLE]

Note that, it is easy to show that

[TABLE]

where $G(.,.)$ stands for the gamma distribution function. Hence, the variances of those MLE’s are respectively $Var(\widehat{\theta}_{1})=\frac{\theta_{1}^{2}}{n_{1}}$ and $Var(\widehat{\theta}_{2})=\frac{\theta_{2}^{2}}{n_{2}}$ Then we may define an estimate of $R$ is $\widehat{R}=\frac{\widehat{\theta}_{1}}{\widehat{\theta}_{2}}$ .

Therefore, using the relationship between Gamma distribution and Chi-square distribution and the fact that the two samples are independent, it is easy to show that $\frac{\theta_{1}}{\theta_{2}}\widehat{R}$ has $F$ -distribution (i.e, $F(2n_{1},2n_{2})$ ). Hence, the variance of $\widehat{R}$ is $Var(\widehat{R})=R^{2}\frac{n_{2}^{2}(n_{1}+n_{2}-1)}{n_{1}(n_{2}-1)(n_{2}-2)}$ Also, an unbiased estimate of $R$ is given by $\widehat{R}^{*}=\frac{\widehat{\theta}_{1}}{\widehat{\theta}_{2}}\frac{(n_{2}-1)}{n_{2}}=\frac{(n_{2}-1)}{n_{2}}\widehat{R}$ with

[TABLE]

. Clearly, $\widehat{R}^{*}$ has less variance than $\widehat{R}$ .

[TABLE]

and

[TABLE]

Theorem 1

Suppose $\widehat{\Delta}$ , $\widehat{\rho}$ , $\widehat{\lambda}$ and $\widehat{\Lambda}$ are the estimates of $\Delta$ , $\rho$ , $\lambda$ and $\Lambda$ respectively, obtained replacing $R$ by $\widehat{R}^{*}$ . the approximate sampling variance of the $OVL$ measures can be obtained as follows:

[TABLE]

Proof 1

Since each of the $OVL$ is a function of $R$ , the expressions are obtained using the first order Taylor series expansion about $R$ and the $Var(\widehat{R}^{*})$ given in equation (2).

Theorem 2

the approximate sampling bias of the $OVL$ measures can be obtained as follows:

[TABLE]

Proof 2

Using the second order Taylor series expansion the desired results are obtained.

Remark 1

Reasonable estimates for the above variances and the biases can be obtained by substituting $R$ by its consistency estimator $\widehat{R}^{*}$ in the above formulas.

4 Confidence Interval Eestimation of Overlap

From Section 3, $\frac{\widehat{R}}{R}\sim F(2n_{1},2n_{2})$ , then $\frac{\theta_{2}n_{2}}{\theta_{1}(n_{2}-1)}\widehat{R}^{*}\sim F(2n_{1},2n_{2})$ . Let $L$ and $U$ be the lower and upper confidence limits respectively of $R$ , corresponding to the probability $1-\alpha$ , i.e., $\mathbb{P}(L<R<U)=1-\alpha$ . Thus $L$ and $U$ can be determined by solving for $R$ the equation

[TABLE]

where $F^{\alpha/2}_{(2n_{1},2n_{2})}$ and $F^{1-\alpha/2}_{(2n_{1},2n_{2})}$ are the lower and the upper $\alpha/2$ quantile of the $F(2n_{1},2n_{2})$ distribution respectively. Thus

[TABLE]

The lower $(L^{{}^{\prime}})$ and upper $(U^{{}^{\prime}})$ limits of OVLs can be obtained using appropriate transformation as $1-\alpha=Pr(L^{{}^{\prime}}<OVL(R)<U^{{}^{\prime}})$ . Here $L^{{}^{\prime}}=OVL(L)$ and $U^{{}^{\prime}}=OVL(U)$ . The confidence limits for OVLs are as follows:

If $(L,U)\in(1\infty)$ , then the $L^{{}^{\prime}}$ and $U^{{}^{\prime}}$ interchange their role and the confidence interval for OVL becomes $(U^{{}^{\prime}},L^{{}^{\prime}})$ If 1 is enclosed in the interval $(L,U)$ , then it asserts at $OVL=1$ .

5 Simulation Study

A Monte Carlo study was conducted using to evaluate the performance of approximations to bias and variance of four overlap coefficients. From each population $1000$ samples of $20,50,100$ , $200$ and $500$ observations were generated. $\widehat{\rho}$ , $\widehat{\lambda}$ , $\widehat{\Delta}$ and $\widehat{\Lambda}$ were computed for each pair of samples. The bias and variance of estimates were computed using actual OVLs and the estimates. The bias and MSE for $R=0.2,0.5,0.8$ are reported in Table 1.

The following conclusions are drawn based on these computations where only the values of $R<1$ are considered. However, for the Overlap measures, the case $R<1$ is symmetric to the case $R>1$ the comments given below in terms of $R$ can also be interpreted in terms of $1/R$ for these OVL measures.

For sample sizes larger than 50, the bias is fairly close to zero. Weitzman’s measure has less bias than others but Morisita’s measure has the largest bias.

The bias decreases as sample size increases, as expected and the MSE goes to zero for each OVLs. $\Lambda$ tend to be more biased and the sampling distributions show larger variability.

It is clear that the actual OVLs are found to be underestimated (Figure $3$ ) and for very small values of $R$ and small sample sizes, they are observe to be overestimated. The bias approaches [math] very fast. For $n\geq 50$ , the amount of bias is negligible and fairly close to 0. Although $\widehat{\Lambda}$ has less bias than the other in case $R=0.2$ and has the largest bias for $R=0.8$ ; the bias of Delta approaches [math] faster than the other three. The bias of $\widehat{\lambda}$ is the slowest in approaching [math].

An important increase in standard deviations for small values of $R$ is observed for $\rho$ and $\lambda$ . For $\Delta$ standard deviation increases as $R$ approaches $1$ . But a remarkable increase in standard deviations for moderate values of $R$ in the $\Lambda$ case (Figure $4$ ). They decrease fast as $n$ increases, from $n=100$ the standard deviations are negligible. The difference between the $MSE$ of $\rho$ and $\Lambda$ is almost nil for small values of $R$ , but the difference increases as $R$ becomes large with $\rho$ giving lowest $MSE$ values and $\Lambda$ the highest.

The estimates of MSE are plotted in Figure 5 for all four overlap coefficients. As the sample size increases, the MSE reduces considerably.

6 Conclusion

The problem of estimation of four commonly used measures of overlap for two exponential densities with heterogeneous variances is considered and relations between them are studied. Overlap coefficients are used frequently to describe the degree of interspecific encounter or crowdedness of two species in their resource utilization.

Relations between three commonly used measures of overlap with our measure of overlap are studied and approximate expressions for the bias and the variance of the estimates are presented. The invariance property and a method of statistical inference of these coefficients also are presented. Monte Carlo evaluations are used to study the bias and precision of the proposed overlap measures.

References

[1] Al-Saidy, O., Samawi, H. M., and Al-Saleh, M. F. (2005). Inference on overlap coefficients under the Weibul distribution: Equal Shape Parameter. ESAM: PS, 9, 206–219.
[2] Al-Saleh, M. F. O., and Samawi, H. (2007). Interference on Overlapping Coefficients in Two Exponential Populations. Journal of Modern Applied Statistical Methods.Vol. 6, No. 2, 503–516
[3] Beran, R. (1977). Minimum Hellinger distance estimates for parametric models, Ann. Statist. 5, 455–463.
[4] Bradley, E. L., and Piantadosi, S. (1982). The overlapping coefficient as a measure of agreement between distributions. Technical Report, Department of Biostatistics and Biomathematics, University of Alabama at Birmingham, Birmingham, AL.
[5] Clemons. T. E. (1996). The overlapping coefficient for two normal probability functions with unequal variances. Unpublished Thesis, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL.
[6] Clemons, T. E., and Bradley Jr. (2000). A nonparametric measure of the overlapping coefficient. Comp. Statist. And Data Analysis, 34, 51–61.
[7] Dixon, P.M., The Bootstrap and the Jackknife: describing the precision of ecological Indices, in Design and Analysis of Ecological Experiments, S.M. Scheiner and J. Gurevitch Eds. Chapman and Hall, New York (1993) 209–318.
[8] Inman, H. F. , and Bradley, E. L. (1989). The Overlapping coefficient as a measure of agreement between probability distributions and point estimation of the overlap of two normal densities. Comm. Statist. Theory and Methods, 18, 3851-3874.
[9] Kullback, S. and Leibler, R. A. (1951). On information and sufﬁciency. Annals of Mathematical Statistics 22, 79–86. 1, 11
[10] Lu, R., Smith, E. P., and Good, I. J. (1989). Multivariate measures of similarity and niche overlap, Theoretical Population Biology, 35, 1-21.
[11] Matusita, K. (1955). Decision rules based on distance, for problems of fit, two samples and applications, Annals of Inst. of Math. Statist, 19, 181-192.
[12] Mishra, S. N., Shah, A. K., and Lefante, J. J. (1986). Overlapping coeffecient: the generalized t approach. Commun. Statist.-Theory and Methods, 15, 123-128.
[13] Morisita, M. (1959). Measuring interspecific association and similarity between communities. Memoirs of the faculty of Kyushu University. Series E, Biology, 3, 65-8.
[14] Mulekar, M. S., and Mishra, S. N. (1994). Overlap Coefficient of two normal densities: equal means case. J. Japan Statist. Soc., 24, 169-

[15] Mulekar, M. S., and Mishra, S. N. (2000). Confidence interval estimation of overlap: equal means case. Comp. Statist .and Data Analysis, 34, 121-137.
[16] Ning, W., Gao, Y. and Dudewicz, E. (2008). Fitting Mixture Distributions Using Generalized Lambda Distributions and Comparison with Normal Mixtures. AMERICAN JOURNAL OF MATHEMATICAL AND MANAGEMENT SCIENCES, 28, 81–99.
[17] Rao, C. R. (1963). Criteria of estimation in large samples, Sankhya, Series A, 25, 189-206
[18] Reiser, B. and Faraggi, D. (1999). Confidence intervals for the overlapping coefficient: the normal equal variance case. The statistician, 48, Part 3, 413-418.
[19] Al-Saidy, O., Samawi, H. M. (2008). Inferrence on Overlapping Coefficients in Two Exponential Populations Using Ranked Set Sampling. Communications Of Korean Statistical Society, Vol. 15, No 2, 2008, 147–159.
[20] Smith, E. P. (1982). Niche breadth, resource availability, and inference. Ecology, 63, 1675-1681
[21] Weitzman, M. S. (1970). Measures of overlap of income distributions of white and Negro families in the United States. Technical paper No. 22, Departement of Commerce, Bureau of Census, Washington, D. C.

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Al-Saidy, O., Samawi, H. M., and Al-Saleh, M. F. (2005). Inference on overlap coefficients under the Weibul distribution: Equal Shape Parameter. ESAM: PS, 9, 206–219.
2[2] Al-Saleh, M. F. O., and Samawi, H. (2007). Interference on Overlapping Coefficients in Two Exponential Populations. Journal of Modern Applied Statistical Methods.Vol. 6, No. 2, 503–516
3[3] Beran, R. (1977). Minimum Hellinger distance estimates for parametric models, Ann. Statist. 5, 455–463.
4[4] Bradley, E. L., and Piantadosi, S. (1982). The overlapping coefficient as a measure of agreement between distributions. Technical Report, Department of Biostatistics and Biomathematics, University of Alabama at Birmingham, Birmingham, AL.
5[5] Clemons. T. E. (1996). The overlapping coefficient for two normal probability functions with unequal variances. Unpublished Thesis, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL.
6[6] Clemons, T. E., and Bradley Jr. (2000). A nonparametric measure of the overlapping coefficient. Comp. Statist. And Data Analysis, 34, 51–61.
7[7] Dixon, P.M., The Bootstrap and the Jackknife: describing the precision of ecological Indices, in Design and Analysis of Ecological Experiments, S.M. Scheiner and J. Gurevitch Eds. Chapman and Hall, New York (1993) 209–318.
8[8] Inman, H. F. , and Bradley, E. L. (1989). The Overlapping coefficient as a measure of agreement between probability distributions and point estimation of the overlap of two normal densities. Comm. Statist. Theory and Methods, 18, 3851-3874.