Overlap Coefficients Based on Kullback-Leibler Divergence: Exponential Populations Case
Hamza Dhaker, Papa Ngom, Malick Mbodj

TL;DR
This paper introduces a new overlap coefficient based on Kullback-Leibler divergence for exponential populations, compares it with existing measures, and discusses statistical inference methods including confidence intervals and estimator properties.
Contribution
A novel overlap measure $ ext{ extLambda}$ based on Kullback-Leibler divergence is proposed for exponential populations, along with inference techniques and property analyses.
Findings
The new measure $ ext{ extLambda}$ is invariant and effective.
Confidence intervals for overlap measures are constructed using Taylor series.
Simulation studies evaluate bias and mean square error of estimators.
Abstract
This article is devoted to the study of overlap measures of densities of two exponential populations. Various Overlapping Coefficients, namely: Matusita's measure , Morisita's measure and Weitzman's measure . A new overlap measure based on Kullback-Leibler measure is proposed. The invariance property and a method of statistical inference of these coefficients also are presented. Taylor series approximation are used to construct confidence intervals for the overlap measures. The bias and mean square error properties of the estimators are studied through a simulation study.
| OVL | lower limit | upper limit | ||
|---|---|---|---|---|
| c=0.2 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20 | -0.029 | 0.007 | -0.36 | -0.030 | 0.016 | -0.25 | -0.0180 | 0.008 | -0.061 | 0.0060 | 0.0080 | 0.067 |
| 50 | -0.011 | 0.003 | -0.22 | -0.012 | 0.006 | -0.15 | -0.0070 | 0.007 | -0.030 | 0.0020 | 0.0030 | 0.041 |
| 100 | -0.055 | 0.001 | -0.15 | -0.056 | 0.003 | -0.11 | -0.0034 | 0.0015 | -0.017 | 0.0011 | 0.0015 | 0.029 |
| 200 | -0.003 | 0.000∗ | -0.11 | -0.003 | 0.001 | -0.07 | -0.0020 | 0.027 | 0.010 | 0.000∗ | 0.000∗ | 0.020 |
| 500 | -0.001 | 0.000∗ | -0.07 | -0.001 | 0.000∗ | -0.05 | 0.000∗ | 0.000∗ | -0.039 | 0.000∗ | 0.000∗ | 0.013 |
| c=0.5 | ||||||||||||
| 20 | -0.036 | 0.0040 | -0.71 | -0.640 | 0.0140 | -0.66 | -0.031 | 0.014 | -0.092 | 0.048 | 0.0500 | 0.22 |
| 50 | -0.014 | 0.0010 | -0.44 | -0.024 | 0.0040 | -0.41 | -0.012 | 0.005 | -0.045 | 0.018 | 0.0190 | 0.013 |
| 100 | -0.007 | 0.000∗ | -0.31 | -0.012 | 0.0020 | -0.28 | -0.006 | 0.0024 | -0.026 | 0.009 | 0.0090 | 0.095 |
| 200 | -0.003 | 0.000∗ | -0.27 | -0.006 | 0.000∗ | -0.20 | -0.003 | 0.001 | -0.015 | 0.004 | 0.0045 | 0.067 |
| 500 | -0.001 | 0.000∗ | -0.13 | -0.002 | 0.000∗ | -0.13 | -0.001 | 0.000∗ | -0.05 | -0.0018 | 0.0018 | -0.042 |
| c=0.8 | ||||||||||||
| 20 | -0.032 | 0.001 | -0.87 | -0.063 | 0.005 | -0.87 | -0.037 | 0.016 | -0.3 | -0.20 | 0.061 | -0.84 |
| 50 | -0.012 | 0.000∗ | -0.74 | -0.024 | 0.0011 | -0.73 | -0.014 | 0.006 | -0.19 | -0.079 | 0.013 | -0.69 |
| 100 | -0.006 | 0.000∗ | -0.61 | -0.012 | 0.000∗ | -0.6 | -0.007 | 0.0027 | -0.133 | -0.039 | 0.005 | -0.56 |
| 200 | -0.003 | 0.000∗ | -0.47 | -0.006 | 0.000∗ | -0.47 | -0.003 | 0.001 | -0.09 | -0.019 | 0.002 | -0.43 |
| 500 | -0.001 | 0.000∗ | -0.32 | -0.002 | 0.000∗ | -0.32 | -0.001 | 0.000∗ | -0.06 | -0.008 | 0.000∗ | -0.28 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Overlap Coefficients Based on Kullback-Leibler Divergence: Exponential Populations Case
Hamza Dhaker
Papa Ngom
Malick Mbodj
LMDAN,Université Cheikh Anta Diop, Dakar, Senegal
LMA,Université Cheikh Anta Diop, Dakar, Senegal
Bowie State University, Maryland, USA
Abstract
This article is devoted to the study of overlap measures of densities of two exponential populations. Various Overlapping Coefficients, namely: Matusita’s measure , Morisita’s measure and Weitzman’s measure . A new overlap measure based on Kullback-Leibler measure is proposed. The invariance property and a method of statistical inference of these coefficients also are presented. Taylor series approximation are used to construct confidence intervals for the overlap measures. The bias and mean square error properties of the estimators are studied through a simulation study.
keywords:
Kullback-Leibler divergence; Matusita’s measure; Morisita’s measure; Weitzman’s measure; overlap coefficients; Taylor expansion.
††journal: Journal Name
1 Introduction
The similarity between two densities can be considered as the commonality shared by both populations. Generally it is measured on the scale of [math] to . Values of measure close to [math] corresponding to the distributions having supports with no intersection and to the perfect matching of the two distributions. Scientists from different disciplines propose different measures of similarity serving different purposes.
By using delta method Smith [20] derived formulas for estimating the mean and the variance of the discrete version of Weizman’s measure (also known as the overlap coefficient). Mishra et al. [12] gave the small and large sample properties of the sampling distributions for a function of this overlap measure estimator, under the assumption of homogeneity of variances for the case of two normal distributions. Mulekar and Mishra [14] simulated the sampling distribution of estimators of the overlap measures when the two densities correspond to the normal case with equal means and obtained the approximate expressions for the bias and variance of their estimators.
Smith [20] derived approximate formulas using the delta method for estimating the mean and variance of the discrete version of one such measure known as Weitzman’s measure (Weitzman [21]) (also known as the overlap coefficient). Mishra et al. [12] gave some properties of the sampling distributions for a function of the estimator, under the assumption of homogeneity of variances for the case of two normal distributions. Recently, several authors including Bradley and Piantadosi [4], Inman and Bradley [8], Clemons [5], Reiser and Faraggi [18], Clemons and Bradley [6], Mulekar and Mishra [15], Al-Saidy, et al. [1], Al-Saleh and Samawi [2], and Samawi and Al-Saleh [19] considered this measure.
Dixon [7] described the use of bootstrap and jackknife techniques for the Gini coefficient of size hierarchy, a commonly used measure of similarity between income distributions of two ethnic, gender, or geographical groups, and the Jaccard index of community similarity. AL-Saidy et al. [1] consider the problem of drawing inference about the three overlap measures under the Weibul distribution function with equal shape parameter. Wei Ning et al [16] have compared mixtures of generalized lambda distributions (GLDs) with normal mixtures by using KullbackLeibler distance and overlapping coefficient .
The main objective of this paper is to propose a new based on the Kulback-Leibler divergence [9] for two Exponential distributions, i.e. from a measure of divergence or dissimilarity, we construct a measure of similarity noted defined in (1). We provide its maximum likelihood estimator.
The coefficients and their properties are given in section 2. The expressions for approximate bias and variance of are included in section 3. A method for making statistical inferences about the is also discussed in this section. The results of simulation study are described in section 4, along with an example demonstrating the usefulness of . Finally, the conclusion and perspective is presented in Section 5.
2 Overlap Coefficients
We consider four different similarity measures (the overlap coefficients ()): Matusita’s measure , Morisita’s measure , Weitzman’s measure and the measure based Kullback-Leibler divergence . The overlap measure () is defined as the area of intersection of the graphs of two probability density functions. It measures the similarity, which is the agreement or the closeness of the two probability distributions.
Let and be two distribution functions with the corresponding density functions with respect to the Lebesgue measure. Four commonly used measures that describe the closeness between and are described below;
Weitzman’s Measure [21] The overlapping coefficient is the area under two functions simultaneously, defined as,
[TABLE]
- 2.
Matusita’s Measure [11] second measure studied here is known as the Matusita’s measure, , which is defined as,
[TABLE]
This measure is based on the distance between two functions (Matusita [11]). Matusita actually developed a discrete version of , which is also known as the Freeman-Tukey measure (FT). This measure is related to the Hellinger distance (Rao [17] and Beran [3]).
- 3.
Morisita’s Measure [13] Morisita proposed an index of similarity between communities. Consider an ecological study involving two populations from each of which a random sample is taken, defined as,
[TABLE]
- 4.
Kullback-Leibler [9] : The Kullback-Leibler divergence was originally introduced by Solomon Kullback and Richard Leibler in 1951 as the directed divergence between two distributions. It is discussed in Kullback’s historic text, Information Theory and Statistics.
the overlap coefficient is the complement of Kullback-Leibler
[TABLE]
with
2.1 Overlap measures (OVL) for Exponential Distribution
The simplest and most commonly used distribution in survival and reliability analysis is the one-parameter exponential distribution. Suppose indicate two exponential populations with respective hazard rates , that is
[TABLE]
The Overlapping Coefficients is shown graphically in Figure 2.
Let , the ratio of hazard rates, then these measures can be shown to be functions of as follows
[TABLE]
[TABLE]
[TABLE]
and
[TABLE]
Lemma 1
For OVLs defined earlier,
- a)
* for all *
- b)
* iff *
- c)
* iff or *
ll four OVLs possess properties of reciprocity, invariance, and piecewise monotonicity
- a)
**
- b)
* are monotonically increasing in for and decreasing in *
3 Bias and Variance of Estimates
As noted earlier, the overlap coefficients are functions of the ratio. Most commonly, in the estimation of ratios, estimators that are convenient and easy to understand are found to be biased. As noted by Lu, et al. (1989), the OVLs in this study are no exception to it. The amount of bias is . To examine the effects of bias, approximate expressions for the mean and the variance of estimates are obtained.
suppose that denote independent observation from two independent random samples draw from and respectively, where
[TABLE]
and
[TABLE]
The maximum likelihood estimators (MLEs) based on the two samples are given by:
From the first sample:
[TABLE]
- 2)
From the second sample:
[TABLE]
Note that, it is easy to show that
[TABLE]
where stands for the gamma distribution function. Hence, the variances of those MLE’s are respectively and Then we may define an estimate of is .
Therefore, using the relationship between Gamma distribution and Chi-square distribution and the fact that the two samples are independent, it is easy to show that has -distribution (i.e, ). Hence, the variance of is Also, an unbiased estimate of is given by with
[TABLE]
. Clearly, has less variance than .
[TABLE]
[TABLE]
[TABLE]
and
[TABLE]
Theorem 1
Suppose , , and are the estimates of , , and respectively, obtained replacing by . the approximate sampling variance of the measures can be obtained as follows:
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Proof 1
Since each of the is a function of , the expressions are obtained using the first order Taylor series expansion about and the given in equation (2).
Theorem 2
the approximate sampling bias of the measures can be obtained as follows:
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Proof 2
Using the second order Taylor series expansion the desired results are obtained.
Remark 1
Reasonable estimates for the above variances and the biases can be obtained by substituting by its consistency estimator in the above formulas.
4 Confidence Interval Eestimation of Overlap
From Section 3, , then . Let and be the lower and upper confidence limits respectively of , corresponding to the probability , i.e., . Thus and can be determined by solving for the equation
[TABLE]
where and are the lower and the upper quantile of the distribution respectively. Thus
[TABLE]
The lower and upper limits of OVLs can be obtained using appropriate transformation as . Here and . The confidence limits for OVLs are as follows:
If , then the and interchange their role and the confidence interval for OVL becomes If 1 is enclosed in the interval , then it asserts at .
5 Simulation Study
A Monte Carlo study was conducted using to evaluate the performance of approximations to bias and variance of four overlap coefficients. From each population samples of , and observations were generated. , , and were computed for each pair of samples. The bias and variance of estimates were computed using actual OVLs and the estimates. The bias and MSE for are reported in Table 1.
The following conclusions are drawn based on these computations where only the values of are considered. However, for the Overlap measures, the case is symmetric to the case the comments given below in terms of can also be interpreted in terms of for these OVL measures.
For sample sizes larger than 50, the bias is fairly close to zero. Weitzman’s measure has less bias than others but Morisita’s measure has the largest bias.
The bias decreases as sample size increases, as expected and the MSE goes to zero for each OVLs. tend to be more biased and the sampling distributions show larger variability.
It is clear that the actual OVLs are found to be underestimated (Figure ) and for very small values of and small sample sizes, they are observe to be overestimated. The bias approaches [math] very fast. For , the amount of bias is negligible and fairly close to 0. Although has less bias than the other in case and has the largest bias for ; the bias of Delta approaches [math] faster than the other three. The bias of is the slowest in approaching [math].
An important increase in standard deviations for small values of is observed for and . For standard deviation increases as approaches . But a remarkable increase in standard deviations for moderate values of in the case (Figure ). They decrease fast as increases, from the standard deviations are negligible. The difference between the of and is almost nil for small values of , but the difference increases as becomes large with giving lowest values and the highest.
The estimates of MSE are plotted in Figure 5 for all four overlap coefficients. As the sample size increases, the MSE reduces considerably.
6 Conclusion
The problem of estimation of four commonly used measures of overlap for two exponential densities with heterogeneous variances is considered and relations between them are studied. Overlap coefficients are used frequently to describe the degree of interspecific encounter or crowdedness of two species in their resource utilization.
Relations between three commonly used measures of overlap with our measure of overlap are studied and approximate expressions for the bias and the variance of the estimates are presented. The invariance property and a method of statistical inference of these coefficients also are presented. Monte Carlo evaluations are used to study the bias and precision of the proposed overlap measures.
References
- [1] Al-Saidy, O., Samawi, H. M., and Al-Saleh, M. F. (2005). Inference on overlap coefficients under the Weibul distribution: Equal Shape Parameter. ESAM: PS, 9, 206–219.
- [2] Al-Saleh, M. F. O., and Samawi, H. (2007). Interference on Overlapping Coefficients in Two Exponential Populations. Journal of Modern Applied Statistical Methods.Vol. 6, No. 2, 503–516
- [3] Beran, R. (1977). Minimum Hellinger distance estimates for parametric models, Ann. Statist. 5, 455–463.
- [4] Bradley, E. L., and Piantadosi, S. (1982). The overlapping coefficient as a measure of agreement between distributions. Technical Report, Department of Biostatistics and Biomathematics, University of Alabama at Birmingham, Birmingham, AL.
- [5] Clemons. T. E. (1996). The overlapping coefficient for two normal probability functions with unequal variances. Unpublished Thesis, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL.
- [6] Clemons, T. E., and Bradley Jr. (2000). A nonparametric measure of the overlapping coefficient. Comp. Statist. And Data Analysis, 34, 51–61.
- [7] Dixon, P.M., The Bootstrap and the Jackknife: describing the precision of ecological Indices, in Design and Analysis of Ecological Experiments, S.M. Scheiner and J. Gurevitch Eds. Chapman and Hall, New York (1993) 209–318.
- [8] Inman, H. F. , and Bradley, E. L. (1989). The Overlapping coefficient as a measure of agreement between probability distributions and point estimation of the overlap of two normal densities. Comm. Statist. Theory and Methods, 18, 3851-3874.
- [9] Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics 22, 79–86. 1, 11
- [10] Lu, R., Smith, E. P., and Good, I. J. (1989). Multivariate measures of similarity and niche overlap, Theoretical Population Biology, 35, 1-21.
- [11] Matusita, K. (1955). Decision rules based on distance, for problems of fit, two samples and applications, Annals of Inst. of Math. Statist, 19, 181-192.
- [12] Mishra, S. N., Shah, A. K., and Lefante, J. J. (1986). Overlapping coeffecient: the generalized t approach. Commun. Statist.-Theory and Methods, 15, 123-128.
- [13] Morisita, M. (1959). Measuring interspecific association and similarity between communities. Memoirs of the faculty of Kyushu University. Series E, Biology, 3, 65-8.
- [14] Mulekar, M. S., and Mishra, S. N. (1994). Overlap Coefficient of two normal densities: equal means case. J. Japan Statist. Soc., 24, 169-
- [15] Mulekar, M. S., and Mishra, S. N. (2000). Confidence interval estimation of overlap: equal means case. Comp. Statist .and Data Analysis, 34, 121-137.
- [16] Ning, W., Gao, Y. and Dudewicz, E. (2008). Fitting Mixture Distributions Using Generalized Lambda Distributions and Comparison with Normal Mixtures. AMERICAN JOURNAL OF MATHEMATICAL AND MANAGEMENT SCIENCES, 28, 81–99.
- [17] Rao, C. R. (1963). Criteria of estimation in large samples, Sankhya, Series A, 25, 189-206
- [18] Reiser, B. and Faraggi, D. (1999). Confidence intervals for the overlapping coefficient: the normal equal variance case. The statistician, 48, Part 3, 413-418.
- [19] Al-Saidy, O., Samawi, H. M. (2008). Inferrence on Overlapping Coefficients in Two Exponential Populations Using Ranked Set Sampling. Communications Of Korean Statistical Society, Vol. 15, No 2, 2008, 147–159.
- [20] Smith, E. P. (1982). Niche breadth, resource availability, and inference. Ecology, 63, 1675-1681
- [21] Weitzman, M. S. (1970). Measures of overlap of income distributions of white and Negro families in the United States. Technical paper No. 22, Departement of Commerce, Bureau of Census, Washington, D. C.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Al-Saidy, O., Samawi, H. M., and Al-Saleh, M. F. (2005). Inference on overlap coefficients under the Weibul distribution: Equal Shape Parameter. ESAM: PS, 9, 206–219.
- 2[2] Al-Saleh, M. F. O., and Samawi, H. (2007). Interference on Overlapping Coefficients in Two Exponential Populations. Journal of Modern Applied Statistical Methods.Vol. 6, No. 2, 503–516
- 3[3] Beran, R. (1977). Minimum Hellinger distance estimates for parametric models, Ann. Statist. 5, 455–463.
- 4[4] Bradley, E. L., and Piantadosi, S. (1982). The overlapping coefficient as a measure of agreement between distributions. Technical Report, Department of Biostatistics and Biomathematics, University of Alabama at Birmingham, Birmingham, AL.
- 5[5] Clemons. T. E. (1996). The overlapping coefficient for two normal probability functions with unequal variances. Unpublished Thesis, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL.
- 6[6] Clemons, T. E., and Bradley Jr. (2000). A nonparametric measure of the overlapping coefficient. Comp. Statist. And Data Analysis, 34, 51–61.
- 7[7] Dixon, P.M., The Bootstrap and the Jackknife: describing the precision of ecological Indices, in Design and Analysis of Ecological Experiments, S.M. Scheiner and J. Gurevitch Eds. Chapman and Hall, New York (1993) 209–318.
- 8[8] Inman, H. F. , and Bradley, E. L. (1989). The Overlapping coefficient as a measure of agreement between probability distributions and point estimation of the overlap of two normal densities. Comm. Statist. Theory and Methods, 18, 3851-3874.
