Relative variation indexes for multivariate continuous distributions on $[0,\infty)^k$ and extensions
C\'elestin C.Kokonendji, Aboubacar Y. Tour\'e, Amadou Sawadogo

TL;DR
This paper introduces new multivariate variation indexes for non-negative distributions, useful for comparing and discriminating between different models based on their deviation from a reference distribution.
Contribution
The paper proposes novel scalar indexes based on quadratic forms of mean and covariance, extending relative variation measures to multivariate continuous distributions on $[0, olinebreak ext{ extasciitilde}] olinebreak^k$.
Findings
Indexes effectively discriminate between positive distributions.
Asymptotic properties of indexes are established.
Numerical examples demonstrate practical applications.
Abstract
We introduce some new indexes to measure the departure of any multivariate continuous distribution on non-negative orthant from a given reference one such the uncorrelated exponential model, similar to the relative Fisher dispersion indexes of multivariate count models. The proposed multivariate variation indexes are scalar quantities, defined as ratios of two quadratic forms of the mean vector and the covariance matrix. They can be used to discriminate between continuous positive distributions. Generalized and multiple marginal variation indexes with and without correlation structure, respectively, and their relative extensions are discussed. The asymptotic behavior and other properties are studied. Illustrative examples and numerical applications are analyzed under several scenarios, leading to appropriate choices of multivariate models. Some concluding remarks and possible…
| 0.1 | 0.3 | 0.5 | 0.8 | 1 | 2 | 4 | 10 | 100 | |
|---|---|---|---|---|---|---|---|---|---|
| 92378 | 15.12 | 3 | 1.29 | 1 | 0.64 | 0.54 | 0.5072 | 0.5001 | |
| 184755 | 29.24 | 5 | 1.59 | 1 | 0.27 | 0.08 | 0.0145 | 0.0002 |
| Dataset | MV | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| No 1 | 50 | 2.58 | 3.80 | 13.36 | 31.08 | 2.02 | 2.15 | O/O | 2.11 | 1.00 | |
| No 2 | 80 | 1.34 | 1.31 | 3.98 | 4.05 | 2.21 | 2.36 | O/O | 2.14 | 0.76 | |
| No 3 | 100 | 1.94 | 1.96 | 3.74 | 3.76 | 0.99 | 0.98 | 0.20 | E/E | 0.99 | 1.98 |
| No 4 | 120 | 5.76 | 5.04 | 34.97 | 29.47 | 1.02 | 1.01 | E/E | 1.00 | 0.34 | |
| No 5 | 150 | 0.27 | 0.26 | 0.02 | 0.04 | 0.24 | 0.51 | 0.56 | U/U | 0.39 | 2.28 |
| No 6 | 300 | 8.77 | 8.55 | 22.87 | 20.82 | 0.30 | 0.28 | 0.03 | U/U | 0.25 | 0.15 |
| No 7 | 500 | 5.60 | 2.10 | 299.03 | 20.32 | 9.53 | 4.63 | O/O | 7.40 | 7.32 | |
| No 8 | 800 | 10.59 | 10.39 | 110.44 | 111.03 | 0.98 | 1.02 | 0.05 | E/E | 0.98 | 1.01 |
| No 9 | 1000 | 2.66 | 1.76 | 2.35 | 0.41 | 0.33 | 0.13 | 0.78 | U/U | 0.17 | 1.02 |
| No 10 | 1000 | 6.10 | 1.74 | 323.00 | 0.52 | 8.68 | 0.17 | 0.49 | O/U | 7.42 | 7.50 |
| No 11 | 1500 | 1.77 | 3.41 | 0.44 | 17.88 | 0.14 | 1.55 | U/O | 0.96 | 0.87 | |
| No 12 | 3000 | 0.75 | 0.62 | 1.62 | 0.11 | 2.86 | 0.28 | O/U | 1.06 | 0.99 | |
| No 13 | 3000 | 1.00 | 1.09 | 1.00 | 3.95 | 1.00 | 3.33 | E/O | 1.19 | 0.82 | |
| No 14 | 5000 | 0.68 | 1.00 | 0.42 | 1.02 | 0.89 | 1.02 | 0.80 | U/E | 0.98 | 2.23 |
| No 15 | 8000 | 1.98 | 3.37 | 3.83 | 17.97 | 0.98 | 1.58 | E/O | 1.33 | 0.99 |
| (MVj) | |||||||
|---|---|---|---|---|---|---|---|
| 1 | 4.1476 | 1.9630 | 0.1141 (U) | 1.0000 | 0.9579 | 0.9905 | 0.3926 |
| 2 | 3.1709 | 0.6049 | 0.0602 (U) | 0.9579 | 1.0000 | 0.9552 | 0.6002 |
| 3 | 2.2610 | 0.6330 | 0.1238 (U) | 0.9905 | 0.9552 | 1.0000 | 0.4331 |
| 4 | 4.5547 | 8.4074 | 0.4053 (U) | 0.3926 | 0.6002 | 0.4331 | 1.0000 |
| MVj | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1.2245 | 3.2031 | O | 1.0000 | 0.5572 | 0.2939 | |||
| 2 | 0.4929 | 0.2324 | E | 1.0000 | 0.3293 | 0.0102 | |||
| 3 | 1.1548 | 0.4834 | U | 0.5572 | 1.0000 | 0.3136 | 0.3116 | ||
| 4 | 0.9507 | 0.9236 | E | 0.1074 | 0.3136 | 1.0000 | 0.1451 | 0.0264 | |
| 5 | 4.3871 | 28.6346 | O | 0.2939 | 0.3293 | 0.3116 | 0.1451 | 1.0000 | 0.0310 |
| 6 | 0.9093 | 0.1039 | U | 0.0102 | 0.0264 | 0.0310 | 1.0000 |
| 50 | 0.2030 | 19163.96 | 19056.40 | 1.2533 38.3712 | 1.2395 38.2634 |
|---|---|---|---|---|---|
| 100 | 0.1571 | 36433.89 | 36299.33 | 0.9571 37.4111 | 0.9189 37.3420 |
| 300 | 0.2092 | 28413.48 | 28229.35 | 1.2442 19.0743 | 1.1788 19.0124 |
| 500 | 0.2050 | 21789.08 | 21618.11 | 1.1487 12.9385 | 1.0753 12.8876 |
| 1 000 | 0.1958 | 14589.39 | 14448.38 | 1.0147 7.4863 | 0.9366 7.4500 |
| 3 000 | 0.2017 | 17982.94 | 17800.34 | 1.1648 4.7986 | 1.0888 4.7742 |
| 5 000 | 0.2067 | 18892.96 | 18688.16 | 1.2188 3.8099 | 1.1316 3.7892 |
| 10 000 | 0.2067 | 17558.26 | 17354.58 | 1.1714 2.5971 | 1.0818 2.5820 |
| 0.3209 | 6524.25 | 3150.98 | 4.1477 22.3887 | 2.4510 15.5592 | |
| 0.3452 | 1697.28 | 1190.33 | 3.1632 8.0747 | 1.8721 6.7621 | |
| 0.6915 | 5014.56 | 3877.69 | 3.5238 8.0132 | 2.7728 7.0465 | |
| 0.7071 | 6547.90 | 1803.12 | 2.8285 7.0927 | 2.1544 3.7220 | |
| 0.6490 | 5911.86 | 1631.04 | 2.7014 4.7655 | 1.9614 2.5031 | |
| 0.6582 | 4498.76 | 0901.81 | 2.4906 2.4001 | 1.7832 1.0746 | |
| 0.5998 | 5239.97 | 1542.03 | 2.8828 2.0064 | 1.9242 1.0885 | |
| 0.6069 | 5274.03 | 1200.05 | 2.7298 1.4234 | 1.8337 0.6790 |
| 50 | 0.9174 | 200.3427 | 180.0472 | 0.7795 3.9233 | 0.8794 3.7193 |
|---|---|---|---|---|---|
| 100 | 0.9634 | 77.9392 | 67.7172 | 0.7354 1.7303 | 0.8242 1.6129 |
| 300 | 0.9551 | 76.8087 | 69.3033 | 0.6743 0.9917 | 0.7955 0.9420 |
| 500 | 0.9446 | 65.1680 | 58.5276 | 0.6174 0.7076 | 0.7309 0.6706 |
| 1 000 | 0.9281 | 49.6498 | 44.1097 | 0.5368 0.4367 | 0.6490 0.4116 |
| 3 000 | 0.9262 | 34.0762 | 29.2322 | 0.4619 0.2089 | 0.5675 0.1935 |
| 5 000 | 0.9221 | 32.6305 | 28.0190 | 0.4529 0.1583 | 0.5661 0.1467 |
| 10 000 | 0.9195 | 38.7897 | 33.6378 | 0.4980 0.1221 | 0.6161 0.1137 |
| Boots() | Boots() | |||
|---|---|---|---|---|
| 30 | 1.0154 38.1119 | 0.9603 0.0869 | 0.9798 37.9062 | 0.9656 0.0851 |
| 50 | 1.0110 36.1508 | 0.9604 0.0498 | 1.0407 36.0743 | 1.0149 0.0486 |
| 100 | 1.0241 26.1398 | 1.0119 0.0359 | 0.9950 26.0580 | 0.9620 0.0352 |
| 300 | 0.9715 23.5589 | 1.0416 0.0310 | 1.0703 23.4215 | 1.0409 0.0305 |
| 500 | 1.1679 15.6334 | 1.0229 0.0172 | 1.1648 15.5693 | 1.0150 0.0169 |
| 1 000 | 1.1994 09.0242 | 1.0315 0.0095 | 1.1952 08.9738 | 1.0278 0.0093 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Distribution Estimation and Applications · Statistical Methods and Bayesian Inference · Bayesian Methods and Mixture Models
Relative variation indexes for multivariate continuous distributions on and extensions
Célestin C. Kokonendji
Laboratoire de mathématiques de Besançon, Université Bourgogne Franche-Comté, Besançon, France
Aboubacar Y. Touré
Amadou Sawadogo
UFR de Mathématiques et Informatique, Université Félix Houphouët Boigny, 22 BP 582 Abidjan 22, Côte d’Ivoire
Abstract
We introduce some new indexes to measure the departure of any multivariate continuous distribution on non-negative orthant from a given reference one such the uncorrelated exponential model, similar to the relative Fisher dispersion indexes of multivariate count models. The proposed multivariate variation indexes are scalar quantities, defined as ratios of two quadratic forms of the mean vector and the covariance matrix. They can be used to discriminate between continuous positive distributions. Generalized and multiple marginal variation indexes with and without correlation structure, respectively, and their relative extensions are discussed. The asymptotic behavior and other properties are studied. Illustrative examples and numerical applications are analyzed under several scenarios, leading to appropriate choices of multivariate models. Some concluding remarks and possible extensions are made.
keywords:
Dependence; Equi-variation; Multivariate exponential distribution; Over-variation; Under-variation.
2010 Mathematics Subject Classification: 62E10, 62F10, 62H05, 62H12, 62H99, 62-07.
††journal: arXiv and then for publication in impacted journal.
1 Introduction
The choice of a multivariate model from a dataset is not an easy task (e.g., Kotz et al., 2000; Joe, 2014). In practice, we sometimes need simple and effective indicators of multivariate distribution classes in this jungle. They must be appropriate summaries of the multivariate dataset.
Behind the Gaussian distribution and similar to the Poisson distribution for count models (e.g., Kokonendji, 2014), we probably have the exponential distribution on the positive half real line which is the most common probability distributions for this support. It is a particular case of many ones, for instance the lognormal and Weibull distributions, and it has also a wide range of statistical applications in many fields such the reliability; see, e.g., the monograph of Balakrishnan and Basu (1995) for a review. In the multivariate setting, there is not a unique way to define a multivariate exponential distribution; e.g., Basu (1988) and Cuenin et al. (2016).
Recently, Abid et al. (2019abc) have introduced the variation index (VI) for measuring the departure of any absolutely continuous probability distribution concentrated on the non-negative half real line from the equivaried exponential model. Defined as the ratio of variance to squared mean and can be seen as the square of the well-known coefficient of variation (Pearson, 1896), the so-called Jørgensen variation index (or simply VI) makes it possible to discriminate between univariate continuous distributions to over- and under-variation with respect to exponential distribution and to make inference; see Touré et al. (2019). Since both univariate concepts of VI and of the well-known Fisher (1934) dispersion index with respect to the equidispersed Poisson model are similar (e.g., Touré et al., 2019), we here suggest first a useful and appropriate definition of multivariate over-, equi- and under-variation following the multivariate dispersion indexes of Kokonendji and Puig (2018). Then, we mainly propose an extension for unifying multivariate dispersion and multivariate variation indexes in the framework of natural exponential families.
The rest of the paper is organized as follows. Section 2 presents notations, generalized and relative variation indexes with their interpretation and properties for practical handling. Section 3 illustrates calculations of these measures on some usual bi- and multi-variate continuous positive orthant distributions such beta, exponential and Weibull. Section 4 provides asymptotic properties of the corresponding estimators. Section 5 presents example applications from real life and simulated continuous (non-negative orthant) datasets under several scenarios, and produces some simulation studies. Section 6 concludes with some remarks and a unified variability index which includes all multivariate dispersion and variation indexes. To make the paper self-contained and more understandable, three appendices are added: (A) a broader multivariate exponential distribution which is derived from Cuenin et al. (2016), (B) a construction of the generalized VI is deduced from Albert and Zhang (2010), and (C) proofs of the asymptotic results are adapted from Kokonendji and Puig (2018).
2 Multivariate variation indexes
Let be a non-negative continuous -variate random vector on , . We consider the following notations: is the elementwise square root of the variance vector of ; is the diagonal matrix with diagonal entries and [math] elsewhere; and, denotes the covariance matrix of which is a symmetric matrix with entries such that is the variance of . Then
[TABLE]
where \boldsymbol{{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\rho}}_{\boldsymbol{Y}}}=\boldsymbol{\rho}(\boldsymbol{Y}) is the correlation matrix of ; see, e.g., Johnson and Wichern (2007, Eq. 2-36). Note that there are infinitely many multivariate distributions with exponential margins. We denote a generic -variate exponential distribution by , given specific positive mean vector and correlation matrix ; see, e.g., Appendix A for a broader one. The uncorrelated or independent -variate exponential will be written as , for the unit matrix.
2.1 Basic definitions
Proceeding along similar lines as Albert and Zhang (2010) and also as Kokonendji and Puig (2018), we define the generalized variation index of by
[TABLE]
see Appendix B for its construction. Remark that when , is the univariate variation index VI (Abid et al., 2019b). The relative (generalized) variation index is defined, for two continuous random vectors and on the same support with and , by
[TABLE]
i.e., the over- (equi- and under-variation) of compared to , and denoted by ( and ), is realized if ( and , respectively). In the framework of the natural exponential family on (e.g., Chapter 54 in Kotz et al. 2000) , generated by the distribution of and characterized by its variance function , the index can also be rewritten via and . As for (2), the “generalized variation function” defined on the mean domain to by
[TABLE]
appears to be very useful through this parameterization.
2.2 Interpretation and properties
Concerning an interpretation of GVI, we first express the denominator of (2) as , using then (1) to rewrite , obtaining
[TABLE]
From (5), it is clear that makes it possible to compare the full variability of (in the numerator) with respect to its expected uncorrelated exponential variability (in the denominator) which depends only on .
Next, the index can be considered in itself as a notion of -variate over-, equi- and under-variation.
Proposition 1**.**
For all positive continuous random vector on , , and then . Furthermore, one has
[TABLE]
with .
Proof. It is trivial from (5), with if and only if , i.e., .
From Proposition 1, the multivariate exponential model can be over-, equi- or under-varied (with respect to the uncorrelated exponential) according to its correlation structure. For instance, if then (6) clearly gives the one-to-one relationship, viz.
[TABLE]
Finally, if we only want to take into account the variation information coming from the margins, we can modify GVI by replacing in (2) with , that is in (1), obtaining the “multiple marginal variation index”, viz.
[TABLE]
The expression on the right-hand side of (7) provides a representation of MVI as a weighted average of the univariate variation indexes VI of the components. MVI could be used for exploring profile distributions in multiple positive response regression models (Bonat and Jørgensen, 2016) or in multivariate continuous time series. Similarly to (4), the corresponding “multiple marginal variation function” is defined on the mean domain to by
[TABLE]
In the same way as (3), the relative versions of MVI can be introduced.
3 Illustrations and comments
We will illustrate our variation indexes with two bivariate models and two families of general -variate ones; it will then be seen how the marginal VIs interplay with the correlation structure in the multivariate variation measures discussed previously. Considering and using (7), we explicitly write
[TABLE]
which points out that GVI is not a weighted average of VIs, as MVI is. Note that accordingly to , with for . Similar remarks hold for the -variate cases, where the correlation matrix is reduced to .
3.1 Bivariate beta distribution of Arnold and Tony Ng (2011)
The flexible bivariate beta of Arnold and Tony Ng (2011) which exhibits both positive and negative correlation between random variables can be defined as follows. Suppose that and are independent gamma random variables with common unit scale parameter, i.e., , , and with , and . Then, for and , one has
[TABLE]
Since the covariance of and cannot be expressed in closed form, it has been numerically shown that the correlation belongs into . In fact, from the proposed construction, the positive correlations are obtained when . For negative correlations, one can consider with and fixed, and it will be get closer to as and get larger. See Arnold and Tony Ng (2011) for more details and connected references.
Thus, for given of , the direct calculations of GVI() through (9) and MVI() via (7) are obtained from the following first moments and VIs of the univariate beta random variables , :
[TABLE]
and
[TABLE]
Since and are over-, equi- and under-varied, then can be too in the bivariate sense and according to the values of . A -variate extension of this bivariate beta distribution is also available for tedious calculations of their multivariate variation indexes.
3.2 Bivariate Weibull distribution of Teimouri and Gupta (2011)
Consider the bivariate Weibull of Teimouri and Gupta (2011) built by a copula and having the next density
[TABLE]
with and , , and . For one gets the uncorrelated bivariate Weibull distribution which depends on both scale parameters and shape parameters , .
Since we explicitly have the first, second and product moments of and (Teimouri and Gupta, 2011), the calculations of GVI() using (9) and MVI() through (7) are derived from the following first moments, VIs and correlation of , :
[TABLE]
[TABLE]
depending only on shape parameter , and
[TABLE]
where is the classical gamma function. The above univariate variation indexes , , satisfy the following equivalences:
[TABLE]
for all fixed. Indeed, the first equivalence of (10) is derived from the gamma duplication formula or Legendre’s doubling formula - revisited (e.g., Abramowitz and Stegun, 1972; Chap. 6), and the second one stems from the function study (see, e.g., Table 1). Note that conditions (10) on the shape parameter of the univariate Weibull distribution are known, in the opposite sense of with respect to , for the failure rate (bathtub) curve in reliability; one can refer to Barlow and Proschan (1981) using the standard coefficient of variation. Hence, according to its parameters this bivariate Weibull distribution can be over-, equi- and under-varied with respect to the uncorrelated bivariate exponential distribution.
3.3 Multivariate exponential distribution of Marshall and Olkin (1967)
The -variate exponential of Marshall and Olkin (1967) is constructed as follows. Let and be univariate exponential random variables with parameters and respectively. Then, by setting for , one easily has and for all . Note that each correlation lies in if and only if .
Thus, using (6) appropriately we obtain
[TABLE]
and through (7) we easily have
[TABLE]
Hence, this multivariate exponential model is always under-varied with respect to the MVI and over- or equi-varied with respect to GVI. If then this -variate exponential distribution is reduced to with . However, the assumption of non-negative correlations between components is sometimes insufficient for some analyzes. We can refer to Appendix A for a more extensive exponential model which is derived as a particular case of a full multivariate Tweedie (1984) models with flexible dependence structure (Cuenin et al., 2016).
3.4 Multiple stable Tweedie (MST) models
Consider the huge -variate MST class of families of models which has been introducted in Boubacar Maïnassara and Kokonendji (2014) to extend the so-called normal stable Tweedie (NST) with for . The normal inverse Gaussian (NIG) model is a common particular case of NST with (Barndorff-Nielsen, 1997). This MST class contains infinite subclasses of multivariate distributions among others the gamma-MST with and inverse Gaussian-MST with . We also have some particular models as the multiple gamma with , multiple inverse Gaussian with and the gamma-Gaussian with of Casalis (1996). In fact, the MST models are composed by a fixed univariate stable Tweedie (1984) variable having a positive mean domain and random variables that, given the fixed one, are real independent stable Tweedie variables, possibly different, with the same dispersion parameter equal to the fixed component.
Precisely and for short, within the framework of natural exponential families Kokonendji and Moypemna Sembona (2018) have completely characterized the MST models through their variance functions as follows. Let with and for . Then, the variance function of is given by for all and . Therefore, from (4) and (8), one has
[TABLE]
and
[TABLE]
According to different classifications of Kokonendji and Moypemna Sembona (2018), several scenarios occur for -variate over-, equi- and under-variation with respect to GVI and MVI. For instance, let and for the exponential-MST subclass then one has for all with . In order to investigate both indexes GVI and MVI for -variate (semi-)continuous models on , we finally exclude cases for the Poisson-MST and also for all and , , related to the normal and Poisson components, respectively. Hence, the NST class is removed from this study.
4 Estimation and asymptotic properties
Let be a random sample from with support on , where for each , . It is common to consider the empirical versions
[TABLE]
of the mean vector and covariance matrix of , respectively. An estimator of directly derived from (11) is given by
[TABLE]
Since all the univariate positive continuous variables take positive values, we deduce from Cramér (1974, pp. 357-358) that is an asymptotically unbiased estimator, i.e., . As for the theoretical variance of , we would need at least the moments of fourth order of the components of .
More interestingly, we establish the following central limit and strong consistency results of and . The proofs are given in Appendix C.
Proposition 2**.**
Let be a positive continuous -variate random vector on , , such that . Let also be a random sample from .
- (i)
As ,
[TABLE]
where stands for convergence in distribution and is the centered normal distribution with variance . The vector and the four-block symmetric matrix
[TABLE]
are such that, for all ,
[TABLE]
, for , , for and for ;
- (ii)
As ,
[TABLE]
with . The vector and the four-block symmetric matrix
[TABLE]
are such that, for ,
[TABLE]
, , and .
Note that Parts (i) and (ii) of Proposition 2 provide the same result for with
[TABLE]
see Touré et al. (2019, Part (i) of Section 4.1 with . Also, an asymptotic confidence interval for is expressed as
[TABLE]
where is the th percentile of the standard normal distribution and is the corresponding empirical version of (Proposition 2). A similar result also holds for the intuitive index MVI. Finally, we state the following results for strong consistency.
Proposition 3**.**
Let be a positive continuous -variate random vector on , , such that . If be a random sample from , then
[TABLE]
where stands for almost sure convergence.
In finite samples, we suggest the use of the bootstrap method for approximating the variance of all the estimators and their corresponding confidence intervals lenght.
5 Numerical applications
All computations have been done with the Python (Python Software Foundation, 2019) and R software (R Core Team, 2018). To generate a -variate continuous positive orthant distribution given (over-, equi- and under-varied) marginals and correlation matrix , we have used the NORmal To Anything (NORTA) method (e.g., Su, 2015).
In practical way, we will consider the exponential, lognormal and Weibull distribution (e.g., Dey and Kundu, 2009). They have been used quite effectively in analyzing positively skewed data, which play important roles in the reliability analysis. Recall here that the univariate exponential distribution is always equi-varied for all and, both univariate lognormal and Weibull models are over- (equi- and under-) varied for with and, from (10) for with , respectively. All these theoretical behaviors work well on simulated datasets of univariate exponential, lognormal and Weibull distributions that we omit presenting here.
5.1 Some scenarios of bivariate cases and a real -variate dataset
We first consider Table 2 consisting of fifteen simulated bivariate datasets from exponential, lognormal and Weibull distributions, presenting several scenarios of correlation (positive or negative) and marginal over-, equi- or under-variation. The table presents a summary of these datasets, along with the sample values of the indexes GVI and MVI.
In order to measure the departure from the bivariate uncorrelated exponential of the considered datasets our estimated index provides a very good summary of the bivariate variation by taking into account both marginal variation and the non-null correlation value . Indeed, the bivariate equi-variation () is significantly obtained here for both over-varied marginals with negative correlation (No 1), for both under-varied marginals with large positive correlation (No 9), for both equi-varied marginals with very weak correlation (No 8), and either for one marginal over-varied and the other under-varied (No 12) or equi-varied (No 15) with negative correlations. The bivariate over-variation () is pointed out for both over-varied marginals with weak negative correlation (No 7), for both equi-varied marginals with positive correlation (No 3), for both under-varied marginals with positive correlation (No 5), and either for one marginal under-varied and the other over-varied (No 10) or equi-varied (No 14) with positive correlation. Concerning the bivariate under-variation (), this is pointed out for both over-varied marginals with negative correlation (No 2), for both equi-varied marginals with negative correlation (No 4), for both under-varied marginals with weak positive correlation (No 6), and either for one marginal over-varied and the other under-varied (No 11) or equi-varied (No 13) with negative correlation. In the common sense, we always have the bivariate over-/under-variation for both over-/under-varied marginals with positive/negative correlation. The values of provide the corresponding degree of (over-/under-) variation with respect to the reference value 1 of the bivariate equi-variation. For instance, we detect a higher degree of over-variation in No 7 and No 10 than in No 3, No 5 or No 14; similarly, we detect a weaker degree of bivariate under-variation (close to 1) in No 11 and No 13 than in Nos. 4 or 6.
Similarly, the marginal index also works very well, summarizing both marginal variations (without correlation). Both indexes and are close when the correlation is quasi-null (Nos. 7 or 8). For the sake of brevity, we omit here an analysis of the standard errors of the estimated indexes of these datasets; a complete analysis will be done in the next section for -variate datasets.
In summary, multivariate variation indexes MVI and GVI are meaningful because they summarize the variation behavior from each individual variable. In addition, GVI also contains information about their correlation. They can be used for descriptive analysis, for clustering, for comparing different datasets and for testing departures from known multivariate distributions as Touré et al. (2019) for univariate case.
Secondly, we consider the real -variate dataset which refers to the annual observations from 1900 to 1989 of the United Stated. It is reported by Hayashi (2000): the first variable is the natural log of the money M1, the second is the natural log of the net national product price deflator, the third natural log of the net national product and the fourth is the commercial paper rate in percent at an annual rate.
To measure the departure from the 4-variate uncorrelated exponential distribution of the considered dataset, our estimated indexes provide very good summaries through and . Indeed, both indexes strongly show a 4-variate under-variation with . Since is very close to [math] than , each of the four marginal distributions must be univariate under-varied with the correlation matrice having only positive coefficients. Table 3 confirms this analysis only from results of and . Thus, one can choose an appropriate theoretical 4-variate distribution for modelling this dataset and their (interest) parameters adjust directly by estimation.
5.2 Other multivariate cases and simulation studies
In this section we first study a -variate simulated dataset. We then analyze the behavior of the asymptotic variances and confidence intervals by simulation. Finally, we compare the asymptotic standard errors of GVI and MVI to those obtained from the bootstrap method.
The -variate dataset of size is simulated following this scenario. We have considered two over-, two equi- and two under-variations as univariate marginals with the theoretical correlation matrix such that
[TABLE]
Table 4 shows the summary needed to compute the variation indexes and . As commented before for Table 2, we also observe a different behavior of the two variation indexes in this -variate example. We obtain here and , both indicating a -variate phenomenon of quasi-equi-variation.
Table 5 depicts an evolution of the asymptotic variances and confidence intervals of and from subsamples of a simulated -variate dataset with a maximum size , having the same parameters as those for Table 4. We observe that both estimated standard errors decrease when sample size increases, and similarly in this context of -variate quasi-equi-variation for GVI and also for MVI. The stable behavior of the variances agrees with Proposition 2.
Similar studies have also been performed simulating a -variate over-varied distribution (Table 6) and a trivariate under-varied distribution (Table 7). The results shown in Table 6 have been obtained by simulating four marginal (over-, equi-, and under-varied) Weibull distributions, with the cross correlation matrix such that
[TABLE]
For the results shown in Table 7, we have simulated one marginal Weibull distribution, one marginal exponential distribution and one marginal lognormal distribution, with the correlation matrix such that
[TABLE]
We also notice that all estimated standard errors decrease when sample size increases, but more slowly for GVI than for MVI in Table 6 of the -variate phenomenon of over-variation. Figure 1 clearly points out typical behaviors of boxplots related to Tables 5, 6 and 7. However, the estimated variances in Table 6 of the -variate over-variation are much larger than those in Table 7 of the -variate under-variation. Therefore, for small and moderate sample sizes one can use a bootstrapped approach or a robust version for reducing the estimated variances.
Table 8 presents behaviors of both asymptotic and bootstrap confidence intervals for GVI and MVI in the situations of small and moderate sample sizes (e.g., Angelo and Brian, 2019). For these -variate equi-varied datasets, we still observe that all estimated standard errors decrease when sample size increases, but more sharply and very weakly through the bootstrap method.
6 Concluding remarks and extensions
From the univariate case of variation index (Abid et al., 2019b) and the multivariate dispersion indexes for count models (Kokonendji and Puig, 2018), we have first introduced multivariate variation indexes GVI, MVI and RVI for continuous distributions on non-negative orthant. All these proposed indexes are easy to handle from a theoretical and practical point of view. Unlike the intuitive marginal variation index MVI, the index GVI takes into account the correlations between variables. The ratio of two GVI provides the index RVI for changing the reference distribution of the measure of over-, equi- and under-variation in the multivariate framework. The interpretation and some properties of GVI and MVI are provided. Also, the asymptotic variances of GVI and MVI obtained from Proposition 2 seem to provide large standard errors for small and moderate sample sizes; they can be improved, for instance, through a bootstrap method. An example of real data analysis is presented, helping to select an appropriate multivariate model.
Then, from given in (3) one exactly obtains its equivalent (i.e., relative dispersion index) for count models by changing the support of and (Formula (9) of Kokonendji and Puig, 2018). Concerning a generalization of the basical GVI of (2) which is also considered as a particular RVI with respect to to the uncorrelated exponential model, the recent univariate unification of dipersion and variation indexes by Touré et al. (2019) is used in the multivariate framework of natural exponential families as follows. Let and be two random vectors on the same support and assume , and fixed, then the relative variability index of with respect to can be defined as
[TABLE]
where is the unique Moore-Penrose inverse of the associated matrix to ; see Appendix B for GVI. Thus, we unify the construction of GDI and GVI by choosing and , respectively. Note that one can consider as a particular case of the MST variance function of Section 3.4; but, it will be equivalent to the proposed GVI via RVI for supports of distributions. Tests of hypothesis relying on the corresponding estimators as test statistics with their asymptotic normality distributions should be deduced.
Finally, let us note the following problems which are in advanced discussion. Is it possible to characterize first the univariate over-/under-variation with respect to exponential distribution through the weighted exponential distribution as the count case by Kokonendji et al. (2008)? See also Kokonendji (2014) for some references. Therefore, how to investigate the multivariate connections to over-, equi- and under-variation through or ? How, for instance, to discriminate some closed distributions from these indexes? See, e.g., Dey and Kundu (2009) for a univariate case. Statistical tests of these multivariate variation indexes can be produced in the direction of Aerts and Haesbroeck (2017); see also Feltz and Miller (1996).
Appendix A. On a broader multivariate exponential distribution
According to Cuenin et al. (2016), taking in their multivariate Tweedie (1984) models of flexible dependence structure, another way to define a -variate exponential distribution is given by . The symmetric variation matrix is such that , the mean of the marginal exponential is , and the nonnegative correlation terms satisfy
[TABLE]
with . The construction of Cuenin et al. (2016) is perfectly defined having parameters as in . Furthermore, we attain the exact bounds of the correlation terms in (13). The main fact is that Cuenin et al. (2016) pointed out the construction and simulation of the negative correlation structure from the positive one of (13) by using the inversion method.
The negativity of a correlation component is important for the rare phenomenon of undervariation in a bivariate/multivariate positive continuous model. Figure 2 (right) plots a limit shape of any bivariate positive continuous distribution with very strong negative correlation (in red), which is not the diagonal line of the upper bound () of positive correlation (in blue); see, e.g., Cuenin et al. (2016) for bivariate count model. Contrarily, Figure 2 (left) represents the classic lower () and upper () bounds of correlations on or finite support.
Appendix B. Construction of GVI
In order to extend appropriately the univariate VI to the -dimensional one for any positive continuous random vector on having positive (elementwise) mean vector and covariance matrix , we consider the product of two matrices, namely , where is the matrix outer product of and which is well-defined. According to the singularity of , the unique Moore-Penrose inverse of is therefore
[TABLE]
Then, we have . Since the rank of is equal to 1, then is also of rank 1 and has only one positive eigenvalue:
[TABLE]
where “” stands for the trace operator.
This quantity does not depend on the number of variables and it is numerically comparable to the univariate VI . Also, it characterizes uniquely the matrix, leading to the following definition of GVI. Note finally that if then we easily deduce , and conversely. We thus have the natural ordering of the half nonnegative real line for .
Appendix C. Proofs of the asymptotic results
Proof of Proposition 2. Part (i): Let , , for , and the map given through and ; i.e., for , , where is the mean vector of and is the covariance matrix of with . Since is differentiable at , the multivariate delta method (e.g., Serfling, 1980, Theorem A of Section 3.3) allows one to deduce that, as ,
[TABLE]
To check that of the proposition under the assumption on the fourth order moments of , one can rewrite in the following order: with and such that and
[TABLE]
Then, the three main block matrices of are successively found to be
[TABLE]
To see that , we first expand as follows:
[TABLE]
Then, direct calculations provide all components of : for , one has
[TABLE]
and while for , . This ends the proof of Part (i).
Part (ii): Introduce , for and the map defined by with . Then, one has and . The function is differentiable at the point and, therefore, a straightforward application of the multivariate delta method leads to the conclusion that, as ,
[TABLE]
Here, it is now trivial that of the theorem under the assumption of the finite moments on and also that with and for all . This concludes the proof.
Proof of Proposition 3. According to the both continuous maps defined through and and such that and in the proof of Proposition 2, the desired result is easily deduced from and , respectively.
References
- [1] Abid, R., Kokonendji, C.C., Masmoudi, A. (2019a). Geometric dispersion models with real quadratic v-functions. Statistics and Probability Letters 145, 197-204.
- [2] Abid, R., Kokonendji, C.C., Masmoudi, A. (2019b). Geometric Tweedie regression models for continuous and semicontinuous data with variation phenomenon. AStA Advances in Statistical Analysis, DOI:10.1007/s10182-019-00350-8.
- [3] Abid, R., Kokonendji, C.C., Masmoudi, A. (2019c). Poisson-exponential-Tweedie regression models for ultra-overdispersed count data and applications. Submitted for publication.
- [4] Abramowitz, M., Stegun, I.A. (1972). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Dover Publications, New York.
- [5] Aerts, S., Haesbroeck, G. (2017). Robust asymptotic tests for the equality of multivariate coefficients of variation, TEST 26, 163–187.
- [6] Albert, A., Zhang, L. (2010). A novel definition of the multivariate coefficient of variation, Biometrical Journal 52, 667–675.
- [7] Angelo, C., Brian, R. (2019). Package boot, https://cran.r-project.org/web/packages/boot/
- [8] Arnold, B.C., Tony Ng, H.K. (2011). Flexible bivariate beta distributions, Journal of Multivariate Analysis 102, 1194–1202.
- [9] Balakrishnan, N., Basu, A.P. (1995). The Exponential Distribution: Theory, Models and Applications, Gordon and Breach, Amsterdam.
- [10] Barlow, R.A., Proschan, F. (1981). Statistical Theory of Reliability and Life Testing: Probability Models, To begin with, Silver Springs, Maryland.
- [11] Barndorff-Nielsen, O.E. (1997). Normal inverse Gaussian distribution and stochastic volatility modelling, Scandinavian Journal of Statistics 24, 1–13.
- [12] Basu, A.P. (1988). Multivariate exponential distributions and their applications in reliability. In: Handbook of Statistics, vol. 7, Quality Control and Reliability, P.R. Krishnaiah and C.R. Rao (eds), Elsevier, Amsterdam, 467–477.
- [13] Bonat, W.H., Jørgensen, B. (2016). Multivariate covariance generalized linear models, Journal of the Royal Statistical Society Series C (Appl. Statist.) 65, 649–675.
- [14] Boubacar Maïnassara, Y., Kokonendji, C.C. (2014). On normal stable Tweedie models and power-generalized variance functions of only one component. TEST 23, 585-606.
- [15] Casalis, M. (1996). The simple quadratic natural exponential families on , Annal of Statistics 24, 1828–1854.
- [16] Cramér, H. (1974). Mathematical Methods of Statistics, Princeton University Press, Princeton.
- [17] Cuenin, J., Jørgensen, B., Kokonendji, C.C. (2016). Simulations of full multivariate Tweedie with flexible dependence structure, Computional Statistics 31, 1477–1492.
- [18] Dey, A.K., Kundu, D. (2009). Discriminating among the log-normal, Weibull, and generalized exponential distributions, IEEE Transactions on Reliability 58, 416–424.
- [19] Feltz, C.J., Miller, G.E. (1996). An asymptotic test for the equality of coefficients of variation from populations, Statistics in Medicine 15, 647–658.
- [20] Fisher, R.A. (1934). The effects of methods of ascertainment upon the estimation of frequencies, Annals of Eugenics 6, 13-25.
- [21] Hayashi, F. (2000). Econometrics, Princeton University Press, URL:http://fhayashi.fc2web.com/hayashi_econometrics.htm, Chapter 10, 665–667.
- [22] Joe, H. (2014). Dependence Modeling with Copulas, Monographs on Statistics and Applied Probability 134, Chapman & Hall - CRC Press, London.
- [23] Johnson, R.A., Wichern, D.W. (2007). Applied Multivariate Statistical Analysis, 6th Edition, Pearson Prentice Hall, New Jersey.
- [24] Jørgensen, B., Kokonendji, C.C. (2016). Discrete dispersion models and their Tweedie asymptotics, AStA Advances in Statistical Analysis 100, 133–153.
- [25] Kokonendji, C.C. (2014). Over- and underdispersion models. In: N. Balakrishnan (Ed.) The Wiley Encyclopedia of Clinical Trials - Methods and Applications of Statistics in Clinical Trials, Vol. 2 (Chap. 30), Wiley, New York, pp. 506-526.
- [26] Kokonendji, C.C., Mizère, D., Balakrishnan, N. (2008). Connections of the Poisson weight function to overdispersion and underdispersion, Journal of Statistical Planning and Inference 138, 1287–1296.
- [27] Kokonendji, C.C., Moypemna Sembona, C.C. (2018). Characterization and classification of multiple stable Tweedie models. Lithuanian Mathematical Journal 58, 441-456.
- [28] Kokonendji, C.C., Puig, P. (2018). Fisher dispersion index for multivariate count distributions: A review and a new proposal, Journal of Multivariate Analysis 165, 180–193.
- [29] Kotz, S., Balakrishnan, N., Johnson, L.N. (2000). Continuous Multivariate Distributions, Wiley, Chichester.
- [30] Marshall, A.W., Olkin, I. (1967). A multivariate exponential distribution, Journal of American Statistical Association 62, 30–44.
- [31] Pearson, K. (1896). Mathematical contributions to the theory of evolution. III. Regression, heredity and panmixia, Philosophical Transactions of the Royal Society, Series A, 187, 253–318.
- [32] Python Software Foundation. (2019). Python Language Reference, Version 3.7.3, Available at http://www.python.org
- [33] R Core Team. (2018). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna. http://cran.r-project.org/
- [34] Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics, Wiley, New York.
- [35] Su, P. (2015). Generation of Multivariate Data with Arbitrary Marginals: Package, https://cran.r-project.org/web/packages/NORTARA/.
- [36] Teimouri, M., Gupta, A.K. (2011). On a bivariate Weibull distribution, Advances and Applications in Statistics 22, 77–106.
- [37] Touré, A.Y., Dossou-Gbété, S., Kokonendji, C.C. (2019). Asymptotic normality of the test statistics for relative dispersion and relative variation indexes, Submitted for publication.
- [38] Tweedie, M.C.K. (1984). An index which distinguishes between some important exponential families. In: Ghosh, J.K., Roy, J. (eds.) Statistics: Applications and New Directions. Proceedings of the Indian Statistical Golden Jubilee International Conference, Calcutta, pp. 579–604.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Abid, R., Kokonendji, C.C., Masmoudi, A. (2019 a). Geometric dispersion models with real quadratic v-functions. Statistics and Probability Letters 145, 197-204.
- 2[2] Abid, R., Kokonendji, C.C., Masmoudi, A. (2019 b). Geometric Tweedie regression models for continuous and semicontinuous data with variation phenomenon. A St A Advances in Statistical Analysis, DOI:10.1007/s 10182-019-00350-8.
- 3[3] Abid, R., Kokonendji, C.C., Masmoudi, A. (2019 c). Poisson-exponential-Tweedie regression models for ultra-overdispersed count data and applications. Submitted for publication.
- 4[4] Abramowitz, M., Stegun, I.A. (1972). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Dover Publications, New York.
- 5[5] Aerts, S., Haesbroeck, G. (2017). Robust asymptotic tests for the equality of multivariate coefficients of variation, TEST 26, 163–187.
- 6[6] Albert, A., Zhang, L. (2010). A novel definition of the multivariate coefficient of variation, Biometrical Journal 52, 667–675.
- 7[7] Angelo, C., Brian, R. (2019). Package boot, https://cran.r-project.org/web/packages/boot/
- 8[8] Arnold, B.C., Tony Ng, H.K. (2011). Flexible bivariate beta distributions, Journal of Multivariate Analysis 102, 1194–1202.
