Relative variation indexes for multivariate continuous distributions on   $[0,\infty)^k$ and extensions

C\'elestin C.Kokonendji; Aboubacar Y. Tour\'e; Amadou Sawadogo

arXiv:1906.09485·math.ST·June 25, 2019

Relative variation indexes for multivariate continuous distributions on $[0,\infty)^k$ and extensions

C\'elestin C.Kokonendji, Aboubacar Y. Tour\'e, Amadou Sawadogo

PDF

Open Access

TL;DR

This paper introduces new multivariate variation indexes for non-negative distributions, useful for comparing and discriminating between different models based on their deviation from a reference distribution.

Contribution

The paper proposes novel scalar indexes based on quadratic forms of mean and covariance, extending relative variation measures to multivariate continuous distributions on $[0, olinebreak ext{ extasciitilde}] olinebreak^k$.

Findings

01

Indexes effectively discriminate between positive distributions.

02

Asymptotic properties of indexes are established.

03

Numerical examples demonstrate practical applications.

Abstract

We introduce some new indexes to measure the departure of any multivariate continuous distribution on non-negative orthant from a given reference one such the uncorrelated exponential model, similar to the relative Fisher dispersion indexes of multivariate count models. The proposed multivariate variation indexes are scalar quantities, defined as ratios of two quadratic forms of the mean vector and the covariance matrix. They can be used to discriminate between continuous positive distributions. Generalized and multiple marginal variation indexes with and without correlation structure, respectively, and their relative extensions are discussed. The asymptotic behavior and other properties are studied. Illustrative examples and numerical applications are analyzed under several scenarios, leading to appropriate choices of multivariate models. Some concluding remarks and possible…

Tables8

Table 1. Table 1: Some values of equivalences ( 10 ) for the univariate Weibull distribution.

$β$	0.1	0.3	0.5	0.8	1	2	4	10	100
$β Γ (2 / β) {Γ (1 / β)}^{- 2}$	92378	15.12	3	1.29	1	0.64	0.54	0.5072	0.5001
$VI (β)$	184755	29.24	5	1.59	1	0.27	0.08	0.0145	0.0002

Table 2. Table 2: Some scenarios from fifteen simulated bivariate datasets with VI ^ j = σ ^ j 2 / y ¯ j 2 subscript ^ VI 𝑗 superscript subscript ^ 𝜎 𝑗 2 superscript subscript ¯ 𝑦 𝑗 2 \widehat{\mathrm{VI}}_{j}=\widehat{\sigma}_{j}^{2}/\bar{y}_{j}^{2} and marginal variations (MV): Over- (O), Equi- (E) and Under-variation (U).

Dataset	$n$	${\bar{y}}_{1}$	${\bar{y}}_{2}$	${\hat{σ}}_{1}^{2}$	${\hat{σ}}_{2}^{2}$	${\hat{VI}}_{1}$	${\hat{VI}}_{2}$	${\hat{ρ}}_{12}$	MV	$\hat{MVI}$	$\hat{GVI}$
No 1	50	2.58	3.80	13.36	31.08	2.02	2.15	$- 0.23$	O/O	2.11	1.00
No 2	80	1.34	1.31	3.98	4.05	2.21	2.36	$- 0.33$	O/O	2.14	0.76
No 3	100	1.94	1.96	3.74	3.76	0.99	0.98	0.20	E/E	0.99	1.98
No 4	120	5.76	5.04	34.97	29.47	1.02	1.01	$- 0.40$	E/E	1.00	0.34
No 5	150	0.27	0.26	0.02	0.04	0.24	0.51	0.56	U/U	0.39	2.28
No 6	300	8.77	8.55	22.87	20.82	0.30	0.28	0.03	U/U	0.25	0.15
No 7	500	5.60	2.10	299.03	20.32	9.53	4.63	$- 0.06$	O/O	7.40	7.32
No 8	800	10.59	10.39	110.44	111.03	0.98	1.02	0.05	E/E	0.98	1.01
No 9	1000	2.66	1.76	2.35	0.41	0.33	0.13	0.78	U/U	0.17	1.02
No 10	1000	6.10	1.74	323.00	0.52	8.68	0.17	0.49	O/U	7.42	7.50
No 11	1500	1.77	3.41	0.44	17.88	0.14	1.55	$- 0.57$	U/O	0.96	0.87
No 12	3000	0.75	0.62	1.62	0.11	2.86	0.28	$- 0.29$	O/U	1.06	0.99
No 13	3000	1.00	1.09	1.00	3.95	1.00	3.33	$- 0.29$	E/O	1.19	0.82
No 14	5000	0.68	1.00	0.42	1.02	0.89	1.02	0.80	U/E	0.98	2.23
No 15	8000	1.98	3.37	3.83	17.97	0.98	1.58	$- 0.38$	E/O	1.33	0.99

Table 3. Table 3: Summary of real 4 4 4 -variate data with VI ^ j = σ ^ j 2 / y ¯ j 2 subscript ^ VI 𝑗 superscript subscript ^ 𝜎 𝑗 2 superscript subscript ¯ 𝑦 𝑗 2 \widehat{\mathrm{VI}}_{j}=\widehat{\sigma}_{j}^{2}/\bar{y}_{j}^{2} and the marginal variation (MV j ): Under-variation (U), size n = 90 𝑛 90 n=90 , det 𝝆 ^ = 0.0003 ^ 𝝆 0.0003 \det\widehat{\boldsymbol{\rho}}=0.0003 , GVI ^ = 0.1397 ^ GVI 0.1397 \widehat{\mathrm{GVI}}=0.1397 and MVI ^ = 0.0771 ^ MVI 0.0771 \widehat{\mathrm{MVI}}=0.0771 .

$j$	${\bar{y}}_{j}$	${\hat{σ}}_{j}^{2}$	${\hat{VI}}_{j}$ (MV_j)	${\hat{ρ}}_{j 1}$	${\hat{ρ}}_{j 2}$	${\hat{ρ}}_{j 3}$	${\hat{ρ}}_{j 4}$
1	4.1476	1.9630	0.1141 (U)	1.0000	0.9579	0.9905	0.3926
2	3.1709	0.6049	0.0602 (U)	0.9579	1.0000	0.9552	0.6002
3	2.2610	0.6330	0.1238 (U)	0.9905	0.9552	1.0000	0.4331
4	4.5547	8.4074	0.4053 (U)	0.3926	0.6002	0.4331	1.0000

Table 4. Table 4: Summary of simulated 6 6 6 -variate data with marginal variations (MV): Over- (O), Equi- (E) and Under-variation (U), size n = 560 𝑛 560 n=560 and det 𝝆 ^ = 0.2063 ^ 𝝆 0.2063 \det\widehat{\boldsymbol{\rho}}=0.2063 such that GVI ^ = 1.0572 ≈ 1 ^ GVI 1.0572 1 \widehat{\mathrm{GVI}}=1.0572\approx 1 and MVI ^ = 0.9637 ≈ 1 ^ MVI 0.9637 1 \widehat{\mathrm{MVI}}=0.9637\approx 1 .

$j$	${\bar{y}}_{j}$	${\hat{σ}}_{j}^{2}$	MV_j	${\hat{ρ}}_{j 1}$	${\hat{ρ}}_{j 2}$	${\hat{ρ}}_{j 3}$	${\hat{ρ}}_{j 4}$	${\hat{ρ}}_{j 5}$	${\hat{ρ}}_{j 6}$
1	1.2245	3.2031	O	1.0000	$- 0.0197$	0.5572	$0.1074$	0.2939	$- 0.5586$
2	0.4929	0.2324	E	$- 0.0197$	1.0000	$- 0.0683$	$- 0.1078$	0.3293	0.0102
3	1.1548	0.4834	U	0.5572	$- 0.0683$	1.0000	0.3136	0.3116	$- 0.5946$
4	0.9507	0.9236	E	0.1074	$- 0.1078$	0.3136	1.0000	0.1451	0.0264
5	4.3871	28.6346	O	0.2939	0.3293	0.3116	0.1451	1.0000	0.0310
6	0.9093	0.1039	U	$- 0.5586$	0.0102	$- 0.5946$	0.0264	0.0310	1.0000

Table 5. Table 5: Asymptotic variances and confidence intervals ( u = u 0.975 = 1.96 𝑢 subscript 𝑢 0.975 1.96 u=u_{0.975}=1.96 ) from subsamples of simulated 6 6 6 -variate data with n = 10 000 𝑛 10 000 n=\mbox{10 000} having the same parameters as for Table 4 .

$n$	$det \hat{𝝆}$	${\hat{σ}}_{g v i}^{2}$	${\hat{σ}}_{m v i}^{2}$	${\hat{GVI}}_{n} \pm u {\hat{σ}}_{g v i} / \sqrt{n}$	${\hat{MVI}}_{n} \pm u {\hat{σ}}_{m v i} / \sqrt{n}$
50	0.2030	19163.96	19056.40	1.2533 $\pm$ 38.3712	1.2395 $\pm$ 38.2634
100	0.1571	36433.89	36299.33	0.9571 $\pm$ 37.4111	0.9189 $\pm$ 37.3420
300	0.2092	28413.48	28229.35	1.2442 $\pm$ 19.0743	1.1788 $\pm$ 19.0124
500	0.2050	21789.08	21618.11	1.1487 $\pm$ 12.9385	1.0753 $\pm$ 12.8876
1 000	0.1958	14589.39	14448.38	1.0147 $\pm$ 7.4863	0.9366 $\pm$ 7.4500
3 000	0.2017	17982.94	17800.34	1.1648 $\pm$ 4.7986	1.0888 $\pm$ 4.7742
5 000	0.2067	18892.96	18688.16	1.2188 $\pm$ 3.8099	1.1316 $\pm$ 3.7892
10 000	0.2067	17558.26	17354.58	1.1714 $\pm$ 2.5971	1.0818 $\pm$ 2.5820

Table 6. Table 6: Asymptotic variances and confidence intervals ( u = u 0.975 = 1.96 𝑢 subscript 𝑢 0.975 1.96 u=u_{0.975}=1.96 ) from subsamples of simulated over-varied 4 4 4 -variate data with n = 10 000 𝑛 10 000 n=\mbox{10 000} .

$n$	$det \hat{𝝆}$	${\hat{σ}}_{g v i}^{2}$	${\hat{σ}}_{m v i}^{2}$	${\hat{GVI}}_{n} \pm u {\hat{σ}}_{g v i} / \sqrt{n}$	${\hat{MVI}}_{n} \pm u {\hat{σ}}_{m v i} / \sqrt{n}$
$50$	0.3209	6524.25	3150.98	4.1477 $\pm$ 22.3887	2.4510 $\pm$ 15.5592
$100$	0.3452	1697.28	1190.33	3.1632 $\pm$ 8.0747	1.8721 $\pm$ 6.7621
$300$	0.6915	5014.56	3877.69	3.5238 $\pm$ 8.0132	2.7728 $\pm$ 7.0465
$500$	0.7071	6547.90	1803.12	2.8285 $\pm$ 7.0927	2.1544 $\pm$ 3.7220
$1 000$	0.6490	5911.86	1631.04	2.7014 $\pm$ 4.7655	1.9614 $\pm$ 2.5031
$3 000$	0.6582	4498.76	0901.81	2.4906 $\pm$ 2.4001	1.7832 $\pm$ 1.0746
$5 000$	0.5998	5239.97	1542.03	2.8828 $\pm$ 2.0064	1.9242 $\pm$ 1.0885
$10 000$	0.6069	5274.03	1200.05	2.7298 $\pm$ 1.4234	1.8337 $\pm$ 0.6790

Table 7. Table 7: Asymptotic variances and confidence intervals ( u = u 0.975 = 1.96 𝑢 subscript 𝑢 0.975 1.96 u=u_{0.975}=1.96 ) from subsamples of simulated under-varied trivariate data with n = 10 000 . 𝑛 10 000 n=\mbox{10 000}.

$n$	$det \hat{𝝆}$	${\hat{σ}}_{g v i}^{2}$	${\hat{σ}}_{m v i}^{2}$	${\hat{GVI}}_{n} \pm u {\hat{σ}}_{g v i} / \sqrt{n}$	${\hat{MVI}}_{n} \pm u {\hat{σ}}_{m v i} / \sqrt{n}$
50	0.9174	200.3427	180.0472	0.7795 $\pm$ 3.9233	0.8794 $\pm$ 3.7193
100	0.9634	77.9392	67.7172	0.7354 $\pm$ 1.7303	0.8242 $\pm$ 1.6129
300	0.9551	76.8087	69.3033	0.6743 $\pm$ 0.9917	0.7955 $\pm$ 0.9420
500	0.9446	65.1680	58.5276	0.6174 $\pm$ 0.7076	0.7309 $\pm$ 0.6706
1 000	0.9281	49.6498	44.1097	0.5368 $\pm$ 0.4367	0.6490 $\pm$ 0.4116
3 000	0.9262	34.0762	29.2322	0.4619 $\pm$ 0.2089	0.5675 $\pm$ 0.1935
5 000	0.9221	32.6305	28.0190	0.4529 $\pm$ 0.1583	0.5661 $\pm$ 0.1467
10 000	0.9195	38.7897	33.6378	0.4980 $\pm$ 0.1221	0.6161 $\pm$ 0.1137

Table 8. Table 8: Asymptotic and bootstrap confidence intervals for GVI and MVI indexes from sample size n 𝑛 n simulated with u = u 0.975 = 1.96 𝑢 subscript 𝑢 0.975 1.96 u=u_{0.975}=1.96 using the dataset of Table 4 .

$n$	${\hat{GVI}}_{n} \pm u {\hat{σ}}_{g v i} / \sqrt{n}$	Boots( $\hat{GVI} \pm u \hat{σ} / \sqrt{n}$ )	${\hat{MVI}}_{n} \pm u {\hat{σ}}_{m v i} / \sqrt{n}$	Boots( $\hat{MVI} \pm u \hat{σ} / \sqrt{n}$ )
30	1.0154 $\pm$ 38.1119	0.9603 $\pm$ 0.0869	0.9798 $\pm$ 37.9062	0.9656 $\pm$ 0.0851
50	1.0110 $\pm$ 36.1508	0.9604 $\pm$ 0.0498	1.0407 $\pm$ 36.0743	1.0149 $\pm$ 0.0486
100	1.0241 $\pm$ 26.1398	1.0119 $\pm$ 0.0359	0.9950 $\pm$ 26.0580	0.9620 $\pm$ 0.0352
300	0.9715 $\pm$ 23.5589	1.0416 $\pm$ 0.0310	1.0703 $\pm$ 23.4215	1.0409 $\pm$ 0.0305
500	1.1679 $\pm$ 15.6334	1.0229 $\pm$ 0.0172	1.1648 $\pm$ 15.5693	1.0150 $\pm$ 0.0169
1 000	1.1994 $\pm$ 09.0242	1.0315 $\pm$ 0.0095	1.1952 $\pm$ 08.9738	1.0278 $\pm$ 0.0093

Equations94

\mathrm{cov}\boldsymbol{Y}=(\mathrm{diag}\!\sqrt{\mathrm{var}\boldsymbol{Y}})(\boldsymbol{{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\rho}}_{\boldsymbol{Y}}})(\mathrm{diag}\!\sqrt{\mathrm{var}\boldsymbol{Y}}),

\mathrm{cov}\boldsymbol{Y}=(\mathrm{diag}\!\sqrt{\mathrm{var}\boldsymbol{Y}})(\boldsymbol{{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\rho}}_{\boldsymbol{Y}}})(\mathrm{diag}\!\sqrt{\mathrm{var}\boldsymbol{Y}}),

GVI (Y) = \frac{E Y ^{⊤} ( cov Y ) E Y}{( E Y ^{⊤} E Y ) ^{2}};

GVI (Y) = \frac{E Y ^{⊤} ( cov Y ) E Y}{( E Y ^{⊤} E Y ) ^{2}};

RVI_{X} (Y) := \frac{GVI ( Y )}{GVI ( X )} ⪌ 1;

RVI_{X} (Y) := \frac{GVI ( Y )}{GVI ( X )} ⪌ 1;

m \mapsto GVI_{F_{Y}} (m) = \frac{m ^{⊤} { V _{F_{Y}} ( m )} m}{( m ^{⊤} m ) ^{2}}

m \mapsto GVI_{F_{Y}} (m) = \frac{m ^{⊤} { V _{F_{Y}} ( m )} m}{( m ^{⊤} m ) ^{2}}

\mathrm{GVI}(\boldsymbol{Y})=\frac{\{(\mathrm{diag}\!\sqrt{\mathrm{var}\boldsymbol{Y}})\mathbb{E}\boldsymbol{Y}\}^{\top}(\boldsymbol{{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\rho}}_{\boldsymbol{Y}}})\,\{(\mathrm{diag}\!\sqrt{\mathrm{var}\boldsymbol{Y}})\mathbb{E}\boldsymbol{Y}\}}{\left\{[(\mathrm{diag}\sqrt{\mathbb{E}\boldsymbol{Y}})\!\sqrt{\mathbb{E}\boldsymbol{Y}}]^{\top}(\boldsymbol{{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}I}}}_{k})[(\mathrm{diag}\sqrt{\mathbb{E}\boldsymbol{Y}})\!\sqrt{\mathbb{E}\boldsymbol{Y}}]\right\}^{2}}.

\mathrm{GVI}(\boldsymbol{Y})=\frac{\{(\mathrm{diag}\!\sqrt{\mathrm{var}\boldsymbol{Y}})\mathbb{E}\boldsymbol{Y}\}^{\top}(\boldsymbol{{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\rho}}_{\boldsymbol{Y}}})\,\{(\mathrm{diag}\!\sqrt{\mathrm{var}\boldsymbol{Y}})\mathbb{E}\boldsymbol{Y}\}}{\left\{[(\mathrm{diag}\sqrt{\mathbb{E}\boldsymbol{Y}})\!\sqrt{\mathbb{E}\boldsymbol{Y}}]^{\top}(\boldsymbol{{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}I}}}_{k})[(\mathrm{diag}\sqrt{\mathbb{E}\boldsymbol{Y}})\!\sqrt{\mathbb{E}\boldsymbol{Y}}]\right\}^{2}}.

Y \sim E_{k} (μ, ρ) \Rightarrow GVI (Y) = 1 + \frac{μ ^{- ⊤} ( ρ - I _{k} ) μ ^{- 1}}{( μ ^{- ⊤} μ ^{- 1} ) ^{2}} ⪌ 1,

Y \sim E_{k} (μ, ρ) \Rightarrow GVI (Y) = 1 + \frac{μ ^{- ⊤} ( ρ - I _{k} ) μ ^{- 1}}{( μ ^{- ⊤} μ ^{- 1} ) ^{2}} ⪌ 1,

GVI (Y) = 1 + \frac{2 ρ μ _{1}^{- 2} μ _{2}^{- 2}}{( μ _{1}^{- 2} + μ _{2}^{- 2} ) ^{2}} ⪌ 1 \Leftrightarrow ρ ⪌ 0.

GVI (Y) = 1 + \frac{2 ρ μ _{1}^{- 2} μ _{2}^{- 2}}{( μ _{1}^{- 2} + μ _{2}^{- 2} ) ^{2}} ⪌ 1 \Leftrightarrow ρ ⪌ 0.

MVI (Y) = \frac{E Y ^{⊤} ( diag var Y ) E Y}{( E Y ^{⊤} E Y ) ^{2}} = j = 1 \sum k \frac{( E Y _{j} ) ^{4}}{( E Y ^{⊤} E Y ) ^{2}} VI (Y_{j}) .

MVI (Y) = \frac{E Y ^{⊤} ( diag var Y ) E Y}{( E Y ^{⊤} E Y ) ^{2}} = j = 1 \sum k \frac{( E Y _{j} ) ^{4}}{( E Y ^{⊤} E Y ) ^{2}} VI (Y_{j}) .

m \mapsto MVI_{F_{Y}} (m) = \frac{m ^{⊤} { diag V _{F_{Y}} ( m )} m}{( m ^{⊤} m ) ^{2}} .

m \mapsto MVI_{F_{Y}} (m) = \frac{m ^{⊤} { diag V _{F_{Y}} ( m )} m}{( m ^{⊤} m ) ^{2}} .

GVI (Y_{1}, Y_{2}) = MVI (Y_{1}, Y_{2}) + ρ \frac{2 ( E Y _{1} ) ^{2} ( E Y _{2} ) ^{2} VI ( Y _{1} ) VI ( Y _{2} )}{{( E Y _{1} ) ^{2} + ( E Y _{2} ) ^{2} } ^{2}},

GVI (Y_{1}, Y_{2}) = MVI (Y_{1}, Y_{2}) + ρ \frac{2 ( E Y _{1} ) ^{2} ( E Y _{2} ) ^{2} VI ( Y _{1} ) VI ( Y _{2} )}{{( E Y _{1} ) ^{2} + ( E Y _{2} ) ^{2} } ^{2}},

Y_{j} := \frac{U _{j} + V _{j}}{U _{j} + V _{1} + V _{2} + W} \sim B_{1} (α_{j} + α_{j}^{'}, α_{0} + α_{1}^{'} + α_{2}^{'} - α_{j}^{'}) .

Y_{j} := \frac{U _{j} + V _{j}}{U _{j} + V _{1} + V _{2} + W} \sim B_{1} (α_{j} + α_{j}^{'}, α_{0} + α_{1}^{'} + α_{2}^{'} - α_{j}^{'}) .

E Y_{j} = \frac{α _{j} + α _{j}^{'}}{α _{0} + α _{1}^{'} + α _{2}^{'} + α _{j}} > 0

E Y_{j} = \frac{α _{j} + α _{j}^{'}}{α _{0} + α _{1}^{'} + α _{2}^{'} + α _{j}} > 0

VI (Y_{j})

VI (Y_{j})

f_{Y} (y_{1}, y_{2})

f_{Y} (y_{1}, y_{2})

E Y_{j} = α_{j} Γ (1 + 1/ β_{j}),

E Y_{j} = α_{j} Γ (1 + 1/ β_{j}),

VI (Y_{j}) = \frac{Γ ( 1 + 2/ β _{j} )}{{ Γ ( 1 + 1/ β _{j} ) } ^{2}} - 1

VI (Y_{j}) = \frac{Γ ( 1 + 2/ β _{j} )}{{ Γ ( 1 + 1/ β _{j} ) } ^{2}} - 1

ρ = ρ (Y_{1}, Y_{2})

ρ = ρ (Y_{1}, Y_{2})

0 < VI (Y_{j}) ⪌ 1 ⟺ (i) 0 < β_{j} Γ (2/ β_{j}) {Γ (1/ β_{j})}^{- 2} ⪌ 1 ⟺ (ii) 0 < β_{j} ⪋ 1,

0 < VI (Y_{j}) ⪌ 1 ⟺ (i) 0 < β_{j} Γ (2/ β_{j}) {Γ (1/ β_{j})}^{- 2} ⪌ 1 ⟺ (ii) 0 < β_{j} ⪋ 1,

GVI (Y) = 1 + \frac{μ _{0} \sum _{j = 1}^{k} ( μ _{j} + μ _{0} ) ^{- 1} { \sum _{ℓ \neq = j} ( μ _{j} + μ _{ℓ} + μ _{0} ) ^{- 1} ( μ _{ℓ} + μ _{0} ) ^{- 1} }}{{( μ _{1} + μ _{0} ) ^{- 2} + \dots + ( μ _{k} + μ _{0} ) ^{- 2} } ^{2}} \geq 1 (\Leftrightarrow μ_{0} \geq 0),

GVI (Y) = 1 + \frac{μ _{0} \sum _{j = 1}^{k} ( μ _{j} + μ _{0} ) ^{- 1} { \sum _{ℓ \neq = j} ( μ _{j} + μ _{ℓ} + μ _{0} ) ^{- 1} ( μ _{ℓ} + μ _{0} ) ^{- 1} }}{{( μ _{1} + μ _{0} ) ^{- 2} + \dots + ( μ _{k} + μ _{0} ) ^{- 2} } ^{2}} \geq 1 (\Leftrightarrow μ_{0} \geq 0),

MVI (Y) = \frac{\sum _{j = 1}^{k} ( μ _{j} + μ _{0} ) ^{- 4}}{\sum _{j = 1}^{k} ( μ _{j} + μ _{0} ) ^{- 4} + 2 \sum _{1 \leq j < ℓ \leq 1} ( μ _{j} + μ _{0} ) ^{- 2} ( μ _{ℓ} + μ _{0} ) ^{- 2}} < 1.

MVI (Y) = \frac{\sum _{j = 1}^{k} ( μ _{j} + μ _{0} ) ^{- 4}}{\sum _{j = 1}^{k} ( μ _{j} + μ _{0} ) ^{- 4} + 2 \sum _{1 \leq j < ℓ \leq 1} ( μ _{j} + μ _{0} ) ^{- 2} ( μ _{ℓ} + μ _{0} ) ^{- 2}} < 1.

GVI_{F_{p}} (m) = λ^{1 - p_{1}} m_{1}^{p_{1} - 2} + \frac{\sum _{j = 2}^{k} m _{1}^{1 - p_{j}} m _{j}^{p_{j} + 2}}{( m ^{⊤} m ) ^{2}} > 0

GVI_{F_{p}} (m) = λ^{1 - p_{1}} m_{1}^{p_{1} - 2} + \frac{\sum _{j = 2}^{k} m _{1}^{1 - p_{j}} m _{j}^{p_{j} + 2}}{( m ^{⊤} m ) ^{2}} > 0

MVI_{F_{p}} (m) = \frac{λ ^{1 - p_{1}} m _{1}^{p_{1} - 2} \sum _{j = 1}^{k} m _{j}^{4} + \sum _{j = 2}^{k} m _{1}^{1 - p_{j}} m _{j}^{p_{j} + 2}}{\sum _{j = 1}^{k} m _{j}^{4} + \sum _{1 \leq j < ℓ \leq k} m _{j}^{2} m _{ℓ}^{2}} > 0.

MVI_{F_{p}} (m) = \frac{λ ^{1 - p_{1}} m _{1}^{p_{1} - 2} \sum _{j = 1}^{k} m _{j}^{4} + \sum _{j = 2}^{k} m _{1}^{1 - p_{j}} m _{j}^{p_{j} + 2}}{\sum _{j = 1}^{k} m _{j}^{4} + \sum _{1 \leq j < ℓ \leq k} m _{j}^{2} m _{ℓ}^{2}} > 0.

\overline{Y}_{n} = \frac{1}{n} i = 1 \sum n Y_{i} = (\overline{Y}_{1}, \dots, \overline{Y}_{k})^{⊤} and cov Y = \frac{1}{n - 1} i = 1 \sum n Y_{i} Y_{i}^{⊤} - \overline{Y}_{n} \overline{Y}_{n}^{⊤}

\overline{Y}_{n} = \frac{1}{n} i = 1 \sum n Y_{i} = (\overline{Y}_{1}, \dots, \overline{Y}_{k})^{⊤} and cov Y = \frac{1}{n - 1} i = 1 \sum n Y_{i} Y_{i}^{⊤} - \overline{Y}_{n} \overline{Y}_{n}^{⊤}

GVI_{n} (Y) = \frac{Y _{n}^{⊤} cov Y Y _{n}}{( Y _{n}^{⊤} Y _{n} ) ^{2}} .

GVI_{n} (Y) = \frac{Y _{n}^{⊤} cov Y Y _{n}}{( Y _{n}^{⊤} Y _{n} ) ^{2}} .

n {GVI_{n} (Y) - GVI (Y)} ⇝ N (0, σ_{g v i}^{2}),

n {GVI_{n} (Y) - GVI (Y)} ⇝ N (0, σ_{g v i}^{2}),

Γ = [Σ Γ_{3} Γ_{3}^{⊤} Γ_{4}]

Γ = [Σ Γ_{3} Γ_{3}^{⊤} Γ_{4}]

Δ_{j} = ⎩ ⎨ ⎧ 2 j^{'} = 1 \sum k E Y_{j^{'}} cov (Y_{j}, Y_{j^{'}}) - 4 E Y_{j} j^{'} = 1 \sum k (E Y_{j^{'}})^{2} GVI (Y) ⎭ ⎬ ⎫ / (E Y^{⊤} E Y)^{2},

Δ_{j} = ⎩ ⎨ ⎧ 2 j^{'} = 1 \sum k E Y_{j^{'}} cov (Y_{j}, Y_{j^{'}}) - 4 E Y_{j} j^{'} = 1 \sum k (E Y_{j^{'}})^{2} GVI (Y) ⎭ ⎬ ⎫ / (E Y^{⊤} E Y)^{2},

n {MVI_{n} (Y) - MVI (Y)} ⇝ N (0, σ_{m v i}^{2}),

n {MVI_{n} (Y) - MVI (Y)} ⇝ N (0, σ_{m v i}^{2}),

Π = [Σ Π_{3} Π_{3}^{⊤} Π_{4}]

Π = [Σ Π_{3} Π_{3}^{⊤} Π_{4}]

Λ_{j} = ⎩ ⎨ ⎧ 2 E Y_{j} var Y_{j} - 4 E Y_{j} j^{'} = 1 \sum k (E Y_{j^{'}})^{2} MVI (Y) ⎭ ⎬ ⎫ / (E Y^{⊤} E Y)^{2},

Λ_{j} = ⎩ ⎨ ⎧ 2 E Y_{j} var Y_{j} - 4 E Y_{j} j^{'} = 1 \sum k (E Y_{j^{'}})^{2} MVI (Y) ⎭ ⎬ ⎫ / (E Y^{⊤} E Y)^{2},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Distribution Estimation and Applications · Statistical Methods and Bayesian Inference · Bayesian Methods and Mixture Models

Full text

Relative variation indexes for multivariate continuous distributions on $[0,\infty)^{k}$ and extensions

Célestin C. Kokonendji

Laboratoire de mathématiques de Besançon, Université Bourgogne Franche-Comté, Besançon, France

[email protected]

Aboubacar Y. Touré

[email protected]

Amadou Sawadogo

UFR de Mathématiques et Informatique, Université Félix Houphouët Boigny, 22 BP 582 Abidjan 22, Côte d’Ivoire

[email protected]

Abstract

We introduce some new indexes to measure the departure of any multivariate continuous distribution on non-negative orthant from a given reference one such the uncorrelated exponential model, similar to the relative Fisher dispersion indexes of multivariate count models. The proposed multivariate variation indexes are scalar quantities, defined as ratios of two quadratic forms of the mean vector and the covariance matrix. They can be used to discriminate between continuous positive distributions. Generalized and multiple marginal variation indexes with and without correlation structure, respectively, and their relative extensions are discussed. The asymptotic behavior and other properties are studied. Illustrative examples and numerical applications are analyzed under several scenarios, leading to appropriate choices of multivariate models. Some concluding remarks and possible extensions are made.

keywords:

Dependence; Equi-variation; Multivariate exponential distribution; Over-variation; Under-variation.

2010 Mathematics Subject Classification: 62E10, 62F10, 62H05, 62H12, 62H99, 62-07.

††journal: arXiv and then for publication in impacted journal.

1 Introduction

The choice of a multivariate model from a dataset is not an easy task (e.g., Kotz et al., 2000; Joe, 2014). In practice, we sometimes need simple and effective indicators of multivariate distribution classes in this jungle. They must be appropriate summaries of the multivariate dataset.

Behind the Gaussian distribution and similar to the Poisson distribution for count models (e.g., Kokonendji, 2014), we probably have the exponential distribution on the positive half real line which is the most common probability distributions for this support. It is a particular case of many ones, for instance the lognormal and Weibull distributions, and it has also a wide range of statistical applications in many fields such the reliability; see, e.g., the monograph of Balakrishnan and Basu (1995) for a review. In the multivariate setting, there is not a unique way to define a multivariate exponential distribution; e.g., Basu (1988) and Cuenin et al. (2016).

Recently, Abid et al. (2019abc) have introduced the variation index (VI) for measuring the departure of any absolutely continuous probability distribution concentrated on the non-negative half real line from the equivaried exponential model. Defined as the ratio of variance to squared mean and can be seen as the square of the well-known coefficient of variation (Pearson, 1896), the so-called Jørgensen variation index (or simply VI) makes it possible to discriminate between univariate continuous distributions to over- and under-variation with respect to exponential distribution and to make inference; see Touré et al. (2019). Since both univariate concepts of VI and of the well-known Fisher (1934) dispersion index with respect to the equidispersed Poisson model are similar (e.g., Touré et al., 2019), we here suggest first a useful and appropriate definition of multivariate over-, equi- and under-variation following the multivariate dispersion indexes of Kokonendji and Puig (2018). Then, we mainly propose an extension for unifying multivariate dispersion and multivariate variation indexes in the framework of natural exponential families.

The rest of the paper is organized as follows. Section 2 presents notations, generalized and relative variation indexes with their interpretation and properties for practical handling. Section 3 illustrates calculations of these measures on some usual bi- and multi-variate continuous positive orthant distributions such beta, exponential and Weibull. Section 4 provides asymptotic properties of the corresponding estimators. Section 5 presents example applications from real life and simulated continuous (non-negative orthant) datasets under several scenarios, and produces some simulation studies. Section 6 concludes with some remarks and a unified variability index which includes all multivariate dispersion and variation indexes. To make the paper self-contained and more understandable, three appendices are added: (A) a broader multivariate exponential distribution which is derived from Cuenin et al. (2016), (B) a construction of the generalized VI is deduced from Albert and Zhang (2010), and (C) proofs of the asymptotic results are adapted from Kokonendji and Puig (2018).

2 Multivariate variation indexes

Let $\boldsymbol{Y}=(Y_{1},\ldots,Y_{k})^{\top}$ be a non-negative continuous $k$ -variate random vector on $[0,\infty)^{k}$ , $k\geq 1$ . We consider the following notations: $\sqrt{\mathrm{var}\boldsymbol{Y}}=(\sqrt{\mathrm{var}Y_{1}},\ldots,\sqrt{\mathrm{var}Y_{k}})^{\top}$ is the elementwise square root of the variance vector of $\boldsymbol{Y}$ ; $\mathrm{diag}\sqrt{\mathrm{var}\boldsymbol{Y}}=\mathrm{diag}_{k}(\sqrt{\mathrm{var}Y_{j}})$ is the $k\times k$ diagonal matrix with diagonal entries $\sqrt{\mathrm{var}Y_{j}}$ and [math] elsewhere; and, $\mathrm{cov}\boldsymbol{Y}=(\mathrm{cov}(Y_{i},Y_{j}))_{i,j\in\{1,\ldots,k\}}$ denotes the covariance matrix of $\boldsymbol{Y}$ which is a $k\times k$ symmetric matrix with entries $\mathrm{cov}(Y_{i},Y_{j})$ such that $\mathrm{cov}(Y_{i},Y_{i})=\mathrm{var}Y_{i}$ is the variance of $Y_{i}$ . Then

[TABLE]

where $\boldsymbol{{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\rho}}_{\boldsymbol{Y}}}=\boldsymbol{\rho}(\boldsymbol{Y})$ is the correlation matrix of $\boldsymbol{Y}$ ; see, e.g., Johnson and Wichern (2007, Eq. 2-36). Note that there are infinitely many multivariate distributions with exponential margins. We denote a generic $k$ -variate exponential distribution by $\mathscr{E}_{k}(\boldsymbol{\mu},\boldsymbol{\rho})$ , given specific positive mean vector $\boldsymbol{\mu}^{-1}:=(\mu_{1}^{-1},\ldots,\mu_{k}^{-1})^{\top}$ and correlation matrix $\boldsymbol{\rho}=(\rho_{ij})_{i,j\in\{1,\ldots,k\}}$ ; see, e.g., Appendix A for a broader one. The uncorrelated or independent $k$ -variate exponential will be written as $\mathscr{E}_{k}(\boldsymbol{\mu})$ , for $\boldsymbol{\rho}=\boldsymbol{I}_{k}$ the $k\times k$ unit matrix.

2.1 Basic definitions

Proceeding along similar lines as Albert and Zhang (2010) and also as Kokonendji and Puig (2018), we define the generalized variation index of $\boldsymbol{Y}$ by

[TABLE]

see Appendix B for its construction. Remark that when $k=1$ , $\mathrm{GVI}$ is the univariate variation index VI (Abid et al., 2019b). The relative (generalized) variation index is defined, for two continuous random vectors $\boldsymbol{X}$ and $\boldsymbol{Y}$ on the same support $\mathbb{S}=[0,\infty)^{k}$ with $\mathbb{E}\boldsymbol{X}=\mathbb{E}\boldsymbol{Y}$ and $\mathrm{GVI}(\boldsymbol{X})>0$ , by

[TABLE]

i.e., the over- (equi- and under-variation) of $\boldsymbol{Y}$ compared to $\boldsymbol{X}$ , and denoted by $\boldsymbol{Y}\succ\boldsymbol{X}$ ( $\boldsymbol{Y}\asymp\boldsymbol{X}$ and $\boldsymbol{Y}\prec\boldsymbol{X}$ ), is realized if $\mathrm{GVI}(\boldsymbol{Y})>\mathrm{GVI}(\boldsymbol{X})$ ( $\mathrm{GVI}(\boldsymbol{Y})=\mathrm{GVI}(\boldsymbol{X})$ and $\mathrm{GVI}(\boldsymbol{Y})<\mathrm{GVI}(\boldsymbol{X})$ , respectively). In the framework of the natural exponential family $F_{\boldsymbol{Y}}$ on $[0,\infty)^{k}$ (e.g., Chapter 54 in Kotz et al. 2000) , generated by the distribution of $\boldsymbol{Y}$ and characterized by its variance function $\boldsymbol{m}\mapsto\mathbf{V}_{F_{\boldsymbol{Y}}}(\boldsymbol{m})$ , the index $\mathrm{GVI}$ can also be rewritten via $\boldsymbol{m}$ and $\mathbf{V}_{F_{\boldsymbol{Y}}}(\boldsymbol{m})$ . As for (2), the “generalized variation function” defined on the mean domain $\mathbf{M}_{F_{\boldsymbol{Y}}}\subseteq(0,\infty)^{k}$ to $(0,\infty)$ by

[TABLE]

appears to be very useful through this parameterization.

2.2 Interpretation and properties

Concerning an interpretation of GVI, we first express the denominator of (2) as $\mathbb{E}\boldsymbol{Y}^{\top}\mathbb{E}\boldsymbol{Y}=\sqrt{\mathbb{E}\boldsymbol{Y}}^{\top}\,(\mathrm{diag}\mathbb{E}\boldsymbol{Y})\sqrt{\mathbb{E}\boldsymbol{Y}}$ , using then (1) to rewrite $\mathrm{cov}\boldsymbol{Y}$ , obtaining

[TABLE]

From (5), it is clear that $\mathrm{GVI}(\boldsymbol{Y})$ makes it possible to compare the full variability of $\boldsymbol{Y}$ (in the numerator) with respect to its expected uncorrelated exponential variability (in the denominator) which depends only on $\mathbb{E}\boldsymbol{Y}$ .

Next, the $\mathrm{GVI}$ index can be considered in itself as a notion of $k$ -variate over-, equi- and under-variation.

Proposition 1.

For all positive continuous random vector $\boldsymbol{Y}$ on $(0,\infty)^{k}$ , $k\geq 1$ , and $\boldsymbol{X}\sim\mathscr{E}_{k}(\boldsymbol{\mu})$ then $\mathrm{RVI}_{\boldsymbol{X}}(\boldsymbol{Y})=\mathrm{GVI}(\boldsymbol{Y})$ . Furthermore, one has

[TABLE]

with $\boldsymbol{\mu}^{-\top}:=(\boldsymbol{\mu}^{-1})^{\top}=(\mu_{1}^{-1},\ldots,\mu_{k}^{-1})$ .

Proof. It is trivial from (5), with $\mathrm{GVI}(\boldsymbol{Y})=1$ if and only if $\boldsymbol{\rho}=\boldsymbol{I}_{k}$ , i.e., $\boldsymbol{Y}\sim\mathscr{E}_{k}(\boldsymbol{\mu})$ . $\blacksquare$

From Proposition 1, the multivariate exponential model $\boldsymbol{Y}\sim\mathscr{E}_{k}(\boldsymbol{\mu},\boldsymbol{\rho})$ can be over-, equi- or under-varied (with respect to the uncorrelated exponential) according to its correlation structure. For instance, if $k=2$ then (6) clearly gives the one-to-one relationship, viz.

[TABLE]

Finally, if we only want to take into account the variation information coming from the margins, we can modify GVI by replacing $\mathrm{cov}\boldsymbol{Y}$ in (2) with $\mathrm{diag}\,\mathrm{var}\boldsymbol{Y}$ , that is $\boldsymbol{\rho}=\boldsymbol{I}_{k}$ in (1), obtaining the “multiple marginal variation index”, viz.

[TABLE]

The expression on the right-hand side of (7) provides a representation of MVI as a weighted average of the univariate variation indexes VI of the components. MVI could be used for exploring profile distributions in multiple positive response regression models (Bonat and Jørgensen, 2016) or in multivariate continuous time series. Similarly to (4), the corresponding “multiple marginal variation function” is defined on the mean domain $\mathbf{M}_{F_{\boldsymbol{Y}}}\subseteq(0,\infty)^{k}$ to $(0,\infty)$ by

[TABLE]

In the same way as (3), the relative versions of MVI can be introduced.

3 Illustrations and comments

We will illustrate our variation indexes with two bivariate models and two families of general $k$ -variate ones; it will then be seen how the marginal VIs interplay with the correlation structure in the multivariate variation measures discussed previously. Considering $\boldsymbol{Y}=(Y_{1},Y_{2})^{\top}$ and using (7), we explicitly write

[TABLE]

which points out that GVI is not a weighted average of VIs, as MVI is. Note that $\mathrm{GVI}\gtreqqless\mathrm{MVI}$ accordingly to $\rho\gtreqqless 0$ , with $\mathrm{GVI}=\mathrm{MVI}$ for $\rho=0$ . Similar remarks hold for the $k$ -variate cases, where the correlation matrix $\boldsymbol{\rho}$ is reduced to $\boldsymbol{I}_{k}$ .

3.1 Bivariate beta distribution of Arnold and Tony Ng (2011)

The flexible bivariate beta $\boldsymbol{Y}=(Y_{1},Y_{2})^{\top}\sim\mathscr{B}_{2}(\boldsymbol{\alpha})$ of Arnold and Tony Ng (2011) which exhibits both positive and negative correlation between random variables can be defined as follows. Suppose that $U_{1},U_{2},V_{1},V_{2}$ and $W$ are independent gamma random variables with common unit scale parameter, i.e., $U_{j}\sim\mathscr{G}_{1}(\alpha_{j},1)$ , $V_{j}\sim\mathscr{G}_{1}(\alpha_{j}^{\prime},1)$ , $j=1,2$ and $W\sim\mathscr{G}_{1}(\alpha_{0},1)$ with $\alpha_{j}>0$ , $\alpha_{j}^{\prime}>0$ and $\alpha_{0}>0$ . Then, for $j=1,2$ and $\boldsymbol{\alpha}:=(\alpha_{0},\alpha_{1},\alpha_{2},\alpha_{1}^{\prime},\alpha_{2}^{\prime})$ , one has

[TABLE]

Since the covariance of $Y_{1}$ and $Y_{2}$ cannot be expressed in closed form, it has been numerically shown that the correlation $\rho=\rho(Y_{1},Y_{2})=\rho(\boldsymbol{\alpha})$ belongs into $[-1,1]$ . In fact, from the proposed construction, the positive correlations are obtained when $\alpha_{1}^{\prime}=\alpha_{2}^{\prime}=0$ . For negative correlations, one can consider $\alpha_{0}=0$ with $\alpha_{1}$ and $\alpha_{2}$ fixed, and it will be get closer to $-1$ as $\alpha_{1}^{\prime}$ and $\alpha_{2}^{\prime}$ get larger. See Arnold and Tony Ng (2011) for more details and connected references.

Thus, for given $\rho(\boldsymbol{\alpha})\in[-1,1]$ of $\boldsymbol{Y}=(Y_{1},Y_{2})^{\top}\sim\mathscr{B}_{2}(\boldsymbol{\alpha})$ , the direct calculations of GVI( $\boldsymbol{Y}$ ) through (9) and MVI( $\boldsymbol{Y}$ ) via (7) are obtained from the following first moments and VIs of the univariate beta random variables $Y_{j}$ , $j=1,2$ :

[TABLE]

and

[TABLE]

Since $Y_{1}$ and $Y_{2}$ are over-, equi- and under-varied, then $\boldsymbol{Y}=(Y_{1},Y_{2})^{\top}\sim\mathscr{B}_{2}(\boldsymbol{\alpha})$ can be too in the bivariate sense and according to the values of $\boldsymbol{\alpha}$ . A $k$ -variate extension of this bivariate beta distribution is also available for tedious calculations of their multivariate variation indexes.

3.2 Bivariate Weibull distribution of Teimouri and Gupta (2011)

Consider the bivariate Weibull $\boldsymbol{Y}=(Y_{1},Y_{2})^{\top}\sim\mathscr{W}_{2}(\alpha_{1},\alpha_{2},\beta_{1},\beta_{2},\gamma,\delta)$ of Teimouri and Gupta (2011) built by a copula and having the next density

[TABLE]

with $\alpha_{j}>0$ and $\beta_{j}>0$ , $j=1,2$ , $\gamma>1$ and $\delta\in[0,1]$ . For $\delta=0$ one gets the uncorrelated bivariate Weibull distribution which depends on both scale parameters $\alpha_{j}$ and shape parameters $\beta_{j}$ , $j=1,2$ .

Since we explicitly have the first, second and product moments of $Y_{1}$ and $Y_{2}$ (Teimouri and Gupta, 2011), the calculations of GVI( $\boldsymbol{Y}$ ) using (9) and MVI( $\boldsymbol{Y}$ ) through (7) are derived from the following first moments, VIs and correlation of $Y_{j}$ , $j=1,2$ :

[TABLE]

depending only on shape parameter $\beta_{j}$ , and

[TABLE]

where $\Gamma(\cdot)$ is the classical gamma function. The above univariate variation indexes $\mathrm{VI}(Y_{j})$ , $j=1,2$ , satisfy the following equivalences:

[TABLE]

for all $\alpha_{j}>0$ fixed. Indeed, the first equivalence $(i)$ of (10) is derived from the gamma duplication formula or Legendre’s doubling formula - revisited (e.g., Abramowitz and Stegun, 1972; Chap. 6), and the second one $(ii)$ stems from the function study (see, e.g., Table 1). Note that conditions (10) on the shape parameter of the univariate Weibull distribution are known, in the opposite sense of $\beta_{j}$ with respect to $1$ , for the failure rate (bathtub) curve in reliability; one can refer to Barlow and Proschan (1981) using the standard coefficient of variation. Hence, according to its parameters this bivariate Weibull distribution can be over-, equi- and under-varied with respect to the uncorrelated bivariate exponential distribution.

3.3 Multivariate exponential distribution of Marshall and Olkin (1967)

The $k$ -variate exponential $\boldsymbol{Y}=(Y_{1},\ldots,Y_{k})^{\top}\sim\mathscr{E}_{k}(\mu_{1},\ldots,\mu_{k},\mu_{0})$ of Marshall and Olkin (1967) is constructed as follows. Let $X_{1},\ldots,X_{k}$ and $Z$ be univariate exponential random variables with parameters $\mu_{1}>0,\ldots,\mu_{k}>0$ and $\mu_{0}\geq 0$ respectively. Then, by setting $Y_{j}:=X_{j}+Z$ for $j=1,\ldots,k$ , one easily has $\mathbb{E}Y_{j}=1/(\mu_{j}+\mu_{0})=\sqrt{\mathrm{var}Y_{j}}$ and $\mathrm{cov}(Y_{j},Y_{\ell})=\mu_{0}/\{(\mu_{j}+\mu_{0})(\mu_{\ell}+\mu_{0})(\mu_{j}+\mu_{\ell}+\mu_{0})\}$ for all $j\neq\ell$ . Note that each correlation $\rho(Y_{j},Y_{\ell})=\mu_{0}/(\mu_{j}+\mu_{\ell}+\mu_{0})$ lies in $[0,1]$ if and only if $\mu_{0}\geq 0$ .

Thus, using (6) appropriately we obtain

[TABLE]

and through (7) we easily have

[TABLE]

Hence, this multivariate exponential model is always under-varied with respect to the MVI and over- or equi-varied with respect to GVI. If $\mu_{0}=0$ then this $k$ -variate exponential distribution is reduced to $\mathscr{E}_{k}(\boldsymbol{\mu})$ with $\mathrm{GVI}(\boldsymbol{Y})=1$ . However, the assumption of non-negative correlations between components is sometimes insufficient for some analyzes. We can refer to Appendix A for a more extensive exponential model which is derived as a particular case of a full multivariate Tweedie (1984) models with flexible dependence structure (Cuenin et al., 2016).

3.4 Multiple stable Tweedie (MST) models

Consider the huge $k$ -variate MST class of families $F_{\mathbf{p}}=F_{\mathbf{p}}(m_{1},\ldots,m_{k},\lambda)$ of models which has been introducted in Boubacar Maïnassara and Kokonendji (2014) to extend the so-called normal stable Tweedie (NST) with $\mathbf{p}=(p_{1},0,\ldots,0)$ for $p_{1}\geq 1$ . The normal inverse Gaussian (NIG) model is a common particular case of NST with $p_{1}=3$ (Barndorff-Nielsen, 1997). This MST class contains infinite subclasses of multivariate distributions among others the gamma-MST with $\mathbf{p}=(2,p_{2},\ldots,p_{k})$ and inverse Gaussian-MST with $\mathbf{p}=(3,p_{2},\ldots,p_{k})$ . We also have some particular models as the multiple gamma with $\mathbf{p}=(2,\ldots,2)$ , multiple inverse Gaussian with $\mathbf{p}=(3,\ldots,3)$ and the gamma-Gaussian with $\mathbf{p}=(2,0,\ldots,0)$ of Casalis (1996). In fact, the MST models are composed by a fixed univariate stable Tweedie (1984) variable having a positive mean domain and random variables that, given the fixed one, are real independent stable Tweedie variables, possibly different, with the same dispersion parameter equal to the fixed component.

Precisely and for short, within the framework of natural exponential families Kokonendji and Moypemna Sembona (2018) have completely characterized the MST models through their variance functions as follows. Let $\mathbf{p}=(p_{1},\ldots,p_{k})$ with $p_{1}\geq 1$ and $p_{j}\in\{0\}\cup[1,\infty)$ for $j=2,\ldots,k$ . Then, the variance function of $F_{\mathbf{p}}$ is given by $\mathbf{V}_{F_{\boldsymbol{p}}}(\boldsymbol{m})=\lambda^{1-p_{1}}m_{1}^{p_{1}-2}\boldsymbol{m}\boldsymbol{m}^{\top}+\mathrm{diag}\,(0,m_{1}^{1-p_{2}}m_{2}^{p_{2}},\ldots,m_{1}^{1-p_{k}}m_{k}^{p_{k}})$ for all $\lambda>0$ and $\boldsymbol{m}=(m_{1},\ldots,m_{k})^{\top}\in\mathbf{M}_{F_{\boldsymbol{p}}}=(0,\infty)\times M_{F_{p_{2}}}\times\cdots\times M_{F_{p_{k}}}$ . Therefore, from (4) and (8), one has

[TABLE]

and

[TABLE]

According to different classifications of Kokonendji and Moypemna Sembona (2018), several scenarios occur for $k$ -variate over-, equi- and under-variation with respect to GVI and MVI. For instance, let $\lambda=1$ and $p_{1}=2$ for the exponential-MST subclass then one has $\mathrm{GVI}_{F_{\boldsymbol{p}}}(\boldsymbol{m})>1$ for all $p_{j}>1$ with $j=2,\ldots,k$ . In order to investigate both indexes GVI and MVI for $k$ -variate (semi-)continuous models on $[0,\infty)^{k}$ , we finally exclude cases $p_{1}=1$ for the Poisson-MST and also for all $p_{j}=0$ and $p_{j}=1$ , $j=2,\ldots,k$ , related to the normal and Poisson components, respectively. Hence, the NST class is removed from this study.

4 Estimation and asymptotic properties

Let $\boldsymbol{Y}_{1},\ldots,\boldsymbol{Y}_{n}$ be a random sample from $\boldsymbol{Y}$ with support on $(0,\infty)^{k}$ , where for each $i\in\{1,\ldots,n\}$ , $\boldsymbol{Y}_{i}=(Y_{i1},\ldots,Y_{ik})^{\top}$ . It is common to consider the empirical versions

[TABLE]

of the mean vector and covariance matrix of $\boldsymbol{Y}$ , respectively. An estimator of $\mathrm{GVI}(\boldsymbol{Y})$ directly derived from (11) is given by

[TABLE]

Since all the univariate positive continuous variables take positive values, we deduce from Cramér (1974, pp. 357-358) that $\widehat{\mathrm{GVI}}_{n}(\boldsymbol{Y})$ is an asymptotically unbiased estimator, i.e., $\mathbb{E}\{\widehat{\mathrm{GVI}}_{n}(\boldsymbol{Y})\}\approx\mathrm{GVI}(\boldsymbol{Y})$ . As for the theoretical variance of $\widehat{\mathrm{GVI}}_{n}$ , we would need at least the moments of fourth order of the components of $\boldsymbol{Y}$ .

More interestingly, we establish the following central limit and strong consistency results of $\widehat{\mathrm{GVI}}_{n}$ and $\widehat{\mathrm{MVI}}_{n}$ . The proofs are given in Appendix C.

Proposition 2.

Let $\boldsymbol{Y}=(Y_{1},\ldots,Y_{k})^{\top}$ be a positive continuous $k$ -variate random vector on $(0,\infty)^{k}$ , $k\geq 1$ , such that $\mathbb{E}(Y_{\ell_{1}}Y_{\ell_{2}}Y_{\ell_{3}}Y_{\ell_{4}})<\infty$ . Let also $\boldsymbol{Y}_{1},\ldots,\boldsymbol{Y}_{n}$ be a random sample from $\boldsymbol{Y}$ .

(i)

As $n\to\infty$ ,

[TABLE]

where $\rightsquigarrow$ stands for convergence in distribution and $\mathcal{N}(0,\sigma_{gvi}^{2})$ is the centered normal distribution with variance $\sigma_{gvi}^{2}=\boldsymbol{\Delta}^{\top}\boldsymbol{\Gamma}\boldsymbol{\Delta}$ . The $\{k+k(k+1)/2\}\times 1$ vector $\boldsymbol{\Delta}=(\ldots,\Delta_{j},\ldots;\ldots,\Delta_{j\ell},\ldots)_{j\in\{1,\ldots,k\};\ell\in\{j,\ldots,k\}}^{\top}$ and the $\{k+k(k+1)/2\}\times\{k+k(k+1)/2\}$ four-block symmetric matrix

[TABLE]

are such that, for all $j,j^{\prime},j^{\prime\prime}\in\{1,\ldots,k\}$ ,

[TABLE]

$\Delta_{jj}=(\mathbb{E}Y_{j})^{2}/(\mathbb{E}\boldsymbol{Y}^{\top}\mathbb{E}\boldsymbol{Y})^{2}$ , $\Delta_{j\ell}=2{\mathbb{E}Y_{j}\mathbb{E}Y_{\ell}}/(\mathbb{E}\boldsymbol{Y}^{\top}\mathbb{E}\boldsymbol{Y})^{2}$ for $\ell\in\{j+1,\ldots,k\}$ , $\boldsymbol{\Sigma}(j;j^{\prime})=\mathrm{cov}(Y_{j},Y_{j^{\prime}})$ , $\boldsymbol{\Gamma}_{3}^{\top}(j;j^{\prime},\ell^{\prime})=\mathrm{cov}(Y_{j},Y_{j^{\prime}}Y_{\ell^{\prime}})$ for $\ell^{\prime}\in\{j^{\prime},\ldots,k\}$ and $\boldsymbol{\Gamma}_{4}(j^{\prime},\ell^{\prime};j^{\prime\prime},\ell^{\prime\prime})=\mathrm{cov}(Y_{j^{\prime}}Y_{\ell^{\prime}},Y_{j^{\prime\prime}}Y_{\ell^{\prime\prime}})$ for $\ell^{\prime\prime}\in\{j^{\prime\prime},\ldots,k\}$ ;

(ii)

As $n\to\infty$ ,

[TABLE]

with $\sigma_{mvi}^{2}=\boldsymbol{\Lambda}^{\top}\boldsymbol{\Pi}\,\boldsymbol{\Lambda}$ . The $2k\times 1$ vector $\boldsymbol{\Lambda}=(\Lambda_{1},\ldots,\Lambda_{k};\Lambda_{11},\ldots,\Lambda_{kk})^{\top}$ and the $2k\times 2k$ four-block symmetric matrix

[TABLE]

are such that, for $j,j^{\prime}\in\{1,\ldots,k\}$ ,

[TABLE]

$\Lambda_{jj}=(\mathbb{E}Y_{j})^{2}/(\mathbb{E}\boldsymbol{Y}^{\top}\mathbb{E}\boldsymbol{Y})^{2}$ , $\boldsymbol{\Sigma}(j;j^{\prime})=\mathrm{cov}(Y_{j},Y_{j^{\prime}})$ , $\boldsymbol{\Pi}_{3}^{\top}(j;j^{\prime})=\mathrm{cov}(Y_{j},Y_{j^{\prime}}^{2})$ and $\boldsymbol{\Pi}_{4}(j;j^{\prime})=\mathrm{cov}(Y_{j}^{2},Y_{j^{\prime}}^{2})$ .

Note that Parts (i) and (ii) of Proposition 2 provide the same result for $k=1$ with

[TABLE]

see Touré et al. (2019, Part (i) of Section 4.1 with $\lambda=1$ . Also, an asymptotic confidence interval for $\mathrm{GVI}(\boldsymbol{Y})$ is expressed as

[TABLE]

where $u_{p}$ is the $p$ th percentile of the standard normal distribution $\mathcal{N}(0,1)$ and $\widehat{\sigma}_{gvi}^{2}=\widehat{\boldsymbol{\Delta}}_{n}^{\top}\widehat{\boldsymbol{\Gamma}}_{n}\widehat{\boldsymbol{\Delta}}_{n}$ is the corresponding empirical version of $\sigma_{gvi}^{2}$ (Proposition 2). A similar result also holds for the intuitive index MVI. Finally, we state the following results for strong consistency.

Proposition 3.

Let $\boldsymbol{Y}=(Y_{1},\ldots,Y_{k})^{\top}$ be a positive continuous $k$ -variate random vector on $(0,\infty)^{k}$ , $k\geq 1$ , such that $\mathbb{E}(Y_{\ell_{1}}Y_{\ell_{2}})<\infty$ . If $\boldsymbol{Y}_{1},\ldots,\boldsymbol{Y}_{n}$ be a random sample from $\boldsymbol{Y}$ , then

[TABLE]

where $\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}$ stands for almost sure convergence.

In finite samples, we suggest the use of the bootstrap method for approximating the variance of all the estimators and their corresponding confidence intervals lenght.

5 Numerical applications

All computations have been done with the Python (Python Software Foundation, 2019) and R software (R Core Team, 2018). To generate a $k$ -variate continuous positive orthant distribution given $k$ (over-, equi- and under-varied) marginals and correlation matrix $\boldsymbol{\rho}$ , we have used the NORmal To Anything (NORTA) method (e.g., Su, 2015).

In practical way, we will consider the exponential, lognormal and Weibull distribution (e.g., Dey and Kundu, 2009). They have been used quite effectively in analyzing positively skewed data, which play important roles in the reliability analysis. Recall here that the univariate exponential distribution $\mathscr{E}_{1}(\mu)$ is always equi-varied for all $\mu>0$ and, both univariate lognormal $\mathscr{L\!\!N}_{1}(m,\sigma^{2})$ and Weibull $\mathscr{W}_{1}(\alpha,\beta)$ models are over- (equi- and under-) varied for $0<\sigma^{2}\gtreqqless\log 2$ with $m\in\mathbb{R}$ and, from (10) for $0<\beta\lesseqqgtr 1$ with $\alpha>0$ , respectively. All these theoretical behaviors work well on simulated datasets of univariate exponential, lognormal and Weibull distributions that we omit presenting here.

5.1 Some scenarios of bivariate cases and a real $4$ -variate dataset

We first consider Table 2 consisting of fifteen simulated bivariate datasets from exponential, lognormal and Weibull distributions, presenting several scenarios of correlation (positive or negative) and marginal over-, equi- or under-variation. The table presents a summary of these datasets, along with the sample values of the indexes GVI and MVI.

In order to measure the departure from the bivariate uncorrelated exponential of the considered datasets our estimated index $\widehat{\mathrm{GVI}}$ provides a very good summary of the bivariate variation by taking into account both marginal variation and the non-null correlation value $\widehat{\rho}_{12}$ . Indeed, the bivariate equi-variation ( $\widehat{\mathrm{GVI}}=1$ ) is significantly obtained here for both over-varied marginals with negative correlation (No 1), for both under-varied marginals with large positive correlation (No 9), for both equi-varied marginals with very weak correlation (No 8), and either for one marginal over-varied and the other under-varied (No 12) or equi-varied (No 15) with negative correlations. The bivariate over-variation ( $\widehat{\mathrm{GVI}}>1$ ) is pointed out for both over-varied marginals with weak negative correlation (No 7), for both equi-varied marginals with positive correlation (No 3), for both under-varied marginals with positive correlation (No 5), and either for one marginal under-varied and the other over-varied (No 10) or equi-varied (No 14) with positive correlation. Concerning the bivariate under-variation ( $\widehat{\mathrm{GVI}}\in(0,1)$ ), this is pointed out for both over-varied marginals with negative correlation (No 2), for both equi-varied marginals with negative correlation (No 4), for both under-varied marginals with weak positive correlation (No 6), and either for one marginal over-varied and the other under-varied (No 11) or equi-varied (No 13) with negative correlation. In the common sense, we always have the bivariate over-/under-variation for both over-/under-varied marginals with positive/negative correlation. The values of $\widehat{\mathrm{GVI}}$ provide the corresponding degree of (over-/under-) variation with respect to the reference value 1 of the bivariate equi-variation. For instance, we detect a higher degree of over-variation in No 7 and No 10 than in No 3, No 5 or No 14; similarly, we detect a weaker degree of bivariate under-variation (close to 1) in No 11 and No 13 than in Nos. 4 or 6.

Similarly, the marginal index $\widehat{\mathrm{MVI}}$ also works very well, summarizing both marginal variations (without correlation). Both indexes $\widehat{\mathrm{MVI}}$ and $\widehat{\mathrm{GVI}}$ are close when the correlation is quasi-null (Nos. 7 or 8). For the sake of brevity, we omit here an analysis of the standard errors of the estimated indexes of these datasets; a complete analysis will be done in the next section for $6$ -variate datasets.

In summary, multivariate variation indexes MVI and GVI are meaningful because they summarize the variation behavior from each individual variable. In addition, GVI also contains information about their correlation. They can be used for descriptive analysis, for clustering, for comparing different datasets and for testing departures from known multivariate distributions as Touré et al. (2019) for univariate case.

Secondly, we consider the real $4$ -variate dataset which refers to the annual observations from 1900 to 1989 of the United Stated. It is reported by Hayashi (2000): the first variable $(m)$ is the natural log of the money M1, the second $(p)$ is the natural log of the net national product price deflator, the third $(y)$ natural log of the net national product and the fourth $(r)$ is the commercial paper rate in percent at an annual rate.

To measure the departure from the 4-variate uncorrelated exponential distribution of the considered dataset, our estimated indexes provide very good summaries through $\widehat{\mathrm{GVI}}=0.1397$ and $\widehat{\mathrm{MVI}}=0.0771$ . Indeed, both indexes strongly show a 4-variate under-variation with $0<\widehat{\mathrm{MVI}}<\widehat{\mathrm{GVI}}<1$ . Since $\widehat{\mathrm{MVI}}$ is very close to [math] than $\widehat{\mathrm{GVI}}$ , each of the four marginal distributions must be univariate under-varied with the correlation matrice having only positive coefficients. Table 3 confirms this analysis only from results of $\mathrm{MVI}$ and $\mathrm{GVI}$ . Thus, one can choose an appropriate theoretical 4-variate distribution for modelling this dataset and their (interest) parameters adjust directly by estimation.

5.2 Other multivariate cases and simulation studies

In this section we first study a $6$ -variate simulated dataset. We then analyze the behavior of the asymptotic variances and confidence intervals by simulation. Finally, we compare the asymptotic standard errors of GVI and MVI to those obtained from the bootstrap method.

The $6$ -variate dataset of size $n=560$ is simulated following this scenario. We have considered two over-, two equi- and two under-variations as univariate marginals with the theoretical correlation matrix such that

[TABLE]

Table 4 shows the summary needed to compute the variation indexes $\widehat{\mathrm{GVI}}$ and $\widehat{\mathrm{MVI}}$ . As commented before for Table 2, we also observe a different behavior of the two variation indexes in this $6$ -variate example. We obtain here $\widehat{\mathrm{GVI}}=1.0572\approx 1$ and $\widehat{\mathrm{MVI}}=0.9637\approx 1$ , both indicating a $6$ -variate phenomenon of quasi-equi-variation.

Table 5 depicts an evolution of the asymptotic variances and confidence intervals of $\mathrm{GVI}$ and $\mathrm{MVI}$ from subsamples of a simulated $6$ -variate dataset with a maximum size $n=\mbox{10 000}$ , having the same parameters as those for Table 4. We observe that both estimated standard errors $u\widehat{\sigma}/\!\sqrt{n}$ decrease when sample size $n$ increases, and similarly in this context of $6$ -variate quasi-equi-variation for GVI and also for MVI. The stable behavior of the variances agrees with Proposition 2.

Similar studies have also been performed simulating a $4$ -variate over-varied distribution (Table 6) and a trivariate under-varied distribution (Table 7). The results shown in Table 6 have been obtained by simulating four marginal (over-, equi-, and under-varied) Weibull distributions, with the cross correlation matrix such that

[TABLE]

For the results shown in Table 7, we have simulated one marginal Weibull distribution, one marginal exponential distribution and one marginal lognormal distribution, with the correlation matrix such that

[TABLE]

We also notice that all estimated standard errors $u\widehat{\sigma}/\!\sqrt{n}$ decrease when sample size $n$ increases, but more slowly for GVI than for MVI in Table 6 of the $4$ -variate phenomenon of over-variation. Figure 1 clearly points out typical behaviors of boxplots related to Tables 5, 6 and 7. However, the estimated variances in Table 6 of the $4$ -variate over-variation are much larger than those in Table 7 of the $3$ -variate under-variation. Therefore, for small and moderate sample sizes one can use a bootstrapped approach or a robust version for reducing the estimated variances.

Table 8 presents behaviors of both asymptotic and bootstrap confidence intervals for GVI and MVI in the situations of small and moderate sample sizes (e.g., Angelo and Brian, 2019). For these $6$ -variate equi-varied datasets, we still observe that all estimated standard errors decrease when sample size $n$ increases, but more sharply and very weakly through the bootstrap method.

6 Concluding remarks and extensions

From the univariate case of variation index (Abid et al., 2019b) and the multivariate dispersion indexes for count models (Kokonendji and Puig, 2018), we have first introduced multivariate variation indexes GVI, MVI and RVI for continuous distributions on non-negative orthant. All these proposed indexes are easy to handle from a theoretical and practical point of view. Unlike the intuitive marginal variation index MVI, the index GVI takes into account the correlations between variables. The ratio of two GVI provides the index RVI for changing the reference distribution of the measure of over-, equi- and under-variation in the multivariate framework. The interpretation and some properties of GVI and MVI are provided. Also, the asymptotic variances of GVI and MVI obtained from Proposition 2 seem to provide large standard errors for small and moderate sample sizes; they can be improved, for instance, through a bootstrap method. An example of real data analysis is presented, helping to select an appropriate multivariate model.

Then, from $\mathrm{RVI}_{\boldsymbol{X}}(\boldsymbol{Y})$ given in (3) one exactly obtains its equivalent (i.e., relative dispersion index) $\mathrm{RDI}_{\boldsymbol{X}}(\boldsymbol{Y})$ for count models by changing the support $\mathbb{S}=\mathbb{N}^{k}$ of $\boldsymbol{X}$ and $\boldsymbol{Y}$ (Formula (9) of Kokonendji and Puig, 2018). Concerning a generalization of the basical GVI of (2) which is also considered as a particular RVI with respect to to the uncorrelated exponential model, the recent univariate unification of dipersion and variation indexes by Touré et al. (2019) is used in the multivariate framework of natural exponential families as follows. Let $\boldsymbol{X}$ and $\boldsymbol{Y}$ be two random vectors on the same support $\mathbb{S}\subseteq\mathbb{R}^{k}$ and assume $\boldsymbol{m}:=\mathbb{E}\boldsymbol{X}=\mathbb{E}\boldsymbol{Y}$ , $\boldsymbol{\Sigma}_{\boldsymbol{Y}}:=\mathrm{cov}\boldsymbol{Y}$ and $\mathbf{V}_{F_{\boldsymbol{X}}}(\boldsymbol{m}):=\mathrm{cov}(\boldsymbol{X})$ fixed, then the relative variability index of $\boldsymbol{Y}$ with respect to $\boldsymbol{X}$ can be defined as

[TABLE]

where $\mathbf{W}^{+}_{F_{\boldsymbol{X}}}(\boldsymbol{m})$ is the unique Moore-Penrose inverse of the associated matrix $\mathbf{W}_{F_{\boldsymbol{X}}}(\boldsymbol{m}):=[\mathbf{V}_{F_{\boldsymbol{X}}}(\boldsymbol{m})]^{1/2}[\mathbf{V}_{F_{\boldsymbol{X}}}(\boldsymbol{m})]^{\top/2}$ to $\mathbf{V}_{F_{\boldsymbol{X}}}(\boldsymbol{m})$ ; see Appendix B for GVI. Thus, we unify the construction of GDI and GVI by choosing $\mathbf{W}_{F_{\boldsymbol{X}}}(\boldsymbol{m})=\sqrt{\boldsymbol{m}}\sqrt{\boldsymbol{m}}^{\top}$ and $\mathbf{W}_{F_{\boldsymbol{X}}}(\boldsymbol{m})=\boldsymbol{m}\boldsymbol{m}^{\top}$ , respectively. Note that one can consider $\mathbf{V}_{F_{\boldsymbol{X}}}(\boldsymbol{m})$ as a particular case of the MST variance function of Section 3.4; but, it will be equivalent to the proposed GVI via RVI for supports $\mathbb{S}=[0,\infty)^{k}$ of distributions. Tests of hypothesis relying on the corresponding estimators as test statistics with their asymptotic normality distributions should be deduced.

Finally, let us note the following problems which are in advanced discussion. Is it possible to characterize first the univariate over-/under-variation with respect to exponential distribution through the weighted exponential distribution as the count case by Kokonendji et al. (2008)? See also Kokonendji (2014) for some references. Therefore, how to investigate the multivariate connections to over-, equi- and under-variation through $\boldsymbol{m}\mapsto\mathrm{GVI}_{F}(\boldsymbol{m})$ or $\mathrm{MVI}_{F}(\boldsymbol{m})$ ? How, for instance, to discriminate some closed distributions from these indexes? See, e.g., Dey and Kundu (2009) for a univariate case. Statistical tests of these multivariate variation indexes can be produced in the direction of Aerts and Haesbroeck (2017); see also Feltz and Miller (1996).

Appendix A. On a broader multivariate exponential distribution

According to Cuenin et al. (2016), taking $p=2$ in their multivariate Tweedie (1984) models of flexible dependence structure, another way to define a $k$ -variate exponential distribution is given by $\mathscr{E}_{k}(\boldsymbol{\Lambda})$ . The $k\times k$ symmetric variation matrix $\boldsymbol{\Lambda}=(\lambda_{ij})_{i,j\in\{1,\ldots,k\}}$ is such that $\lambda_{ij}=\lambda_{ji}\geq 0$ , the mean of the marginal exponential is $\lambda_{ii}>0$ , and the nonnegative correlation terms satisfy

[TABLE]

with $R(i,j)=\sqrt{\lambda_{ii}/\lambda_{jj}}\,(1-\lambda_{ii}^{-1}\sum_{\ell\neq i,j}\lambda_{i\ell})\in(0,1)$ . The construction of Cuenin et al. (2016) is perfectly defined having $k(k+1)/2$ parameters as in $\mathscr{E}_{k}(\boldsymbol{\mu},\boldsymbol{\rho})$ . Furthermore, we attain the exact bounds of the correlation terms in (13). The main fact is that Cuenin et al. (2016) pointed out the construction and simulation of the negative correlation structure from the positive one of (13) by using the inversion method.

The negativity of a correlation component is important for the rare phenomenon of undervariation in a bivariate/multivariate positive continuous model. Figure 2 (right) plots a limit shape of any bivariate positive continuous distribution with very strong negative correlation (in red), which is not the diagonal line of the upper bound ( $+1$ ) of positive correlation (in blue); see, e.g., Cuenin et al. (2016) for bivariate count model. Contrarily, Figure 2 (left) represents the classic lower ( $-1$ ) and upper ( $+1$ ) bounds of correlations on $\mathbb{R}^{2}$ or finite support.

Appendix B. Construction of GVI

In order to extend appropriately the univariate VI $=\sigma^{2}m^{-2}$ to the $k$ -dimensional one for any positive continuous random vector $\boldsymbol{Y}$ on $(0,\infty)^{k}$ having positive (elementwise) mean vector $\boldsymbol{m}=(m_{1},\ldots,m_{k})^{\top}$ and covariance matrix $\boldsymbol{\Sigma}$ , we consider the product of two matrices, namely $\boldsymbol{\Sigma}\boldsymbol{M}^{-1}$ , where $\boldsymbol{M}=\boldsymbol{m}\boldsymbol{m}^{\top}$ is the $k\times k$ matrix outer product of $\boldsymbol{m}$ and which is well-defined. According to the singularity of $\boldsymbol{M}=\boldsymbol{m}\boldsymbol{m}^{\top}$ , the unique Moore-Penrose inverse $\boldsymbol{M}^{+}$ of $\boldsymbol{M}$ is therefore

[TABLE]

Then, we have $\boldsymbol{\Sigma}\boldsymbol{M}^{+}=(\boldsymbol{M}^{+}\boldsymbol{\Sigma})^{\top}$ . Since the rank of $\boldsymbol{M}$ is equal to 1, then $\boldsymbol{M}^{+}\boldsymbol{\Sigma}$ is also of rank 1 and has only one positive eigenvalue:

[TABLE]

where “ $\mathrm{tr}(\cdot)$ ” stands for the trace operator.

This quantity $\lambda$ does not depend on the number $k$ of variables and it is numerically comparable to the univariate VI $=\sigma^{2}m^{-2}$ . Also, it characterizes uniquely the $\boldsymbol{M}^{+}\boldsymbol{\Sigma}$ matrix, leading to the following definition of GVI. Note finally that if $\boldsymbol{\Sigma}=\boldsymbol{0}$ then we easily deduce $\lambda=0$ , and conversely. We thus have the natural ordering of the half nonnegative real line for $\lambda\geq 0$ .

Appendix C. Proofs of the asymptotic results

Proof of Proposition 2. Part (i): Let $\mathbf{Z}=(\ldots,Y_{j},\ldots;\ldots,Y_{j}Y_{\ell},\ldots)_{j\in\{1,\ldots,k\};\ell\in\{j,\ldots,k\}}^{\top}$ , $\mathbf{Z}_{i}=(\ldots,Y_{ij},\ldots;\ldots,Y_{ij}Y_{i\ell},\ldots)_{j\in\{1,\ldots,k\};\ell\in\{j,\ldots,k\}}^{\top}$ , for $i\in\{1,\ldots,n\}$ , and the map $\Phi:(0,\infty)^{k}\times\mathbb{R}^{k(k+1)/2}\to(0,\infty)$ given through $\Phi(\mathbb{E}\mathbf{Z})=\mathrm{GVI}(\boldsymbol{Y})$ and $\Phi(n^{-1}\sum_{i=1}^{n}\mathbf{Z}_{i})=\widehat{\mathrm{GVI}}_{n}(\boldsymbol{Y})$ ; i.e., for $\boldsymbol{\theta}=(\ldots,m_{j},\ldots;\ldots,\sigma_{j\ell},\ldots)_{j\in\{1,\ldots,k\};\ell\in\{j,\ldots,k\}}^{\top}$ , $\Phi(\boldsymbol{\theta})=({\boldsymbol{m}}^{\top}\boldsymbol{\Sigma}{\boldsymbol{m}})/(\boldsymbol{m}^{\top}\boldsymbol{m})^{2}$ , where $\boldsymbol{m}=(m_{1},\ldots,m_{k})^{\top}$ is the mean vector of $\boldsymbol{Y}$ and $\boldsymbol{\Sigma}=(\sigma_{j\ell})_{j,\ell\in\{1,\ldots,k\}}$ is the covariance matrix of $\boldsymbol{Y}$ with $\sigma_{j\ell}=\sigma_{\ell j}$ . Since $\Phi$ is differentiable at $\boldsymbol{\theta}$ , the multivariate delta method (e.g., Serfling, 1980, Theorem A of Section 3.3) allows one to deduce that, as $n\to\infty$ ,

[TABLE]

To check that $\mathrm{cov}\boldsymbol{\mathbf{Z}}=\boldsymbol{\Gamma}$ of the proposition under the assumption on the fourth order moments of $Y_{j}$ , one can rewrite $\mathbf{Z}$ in the following order: $\mathbf{Z}=(\mathbf{Y},\widetilde{\mathbf{Y}})$ with $\boldsymbol{Y}=(Y_{1},\ldots,Y_{k})^{\top}$ and $\widetilde{\boldsymbol{Y}}=(\widetilde{Y}_{1},\ldots,\widetilde{Y}_{\widetilde{k}})^{\top}$ such that $\widetilde{k}=k+(k-1)+\cdots+1=k(k+1)/2$ and

[TABLE]

Then, the three main block matrices of $\boldsymbol{\Gamma}$ are successively found to be

[TABLE]

To see that $\partial\Phi(\boldsymbol{\theta})/\partial\boldsymbol{\theta}=\boldsymbol{\Delta}$ , we first expand $\Phi$ as follows:

[TABLE]

Then, direct calculations provide all components of $\boldsymbol{\Delta}$ : for $j\in\{1,\ldots,k\}$ , one has

[TABLE]

and $\Delta_{jj}=\partial\Phi(\boldsymbol{\theta})/\partial\sigma_{jj}=m_{j}^{2}/\left(\boldsymbol{m}^{\top}\boldsymbol{m}\right)^{2}$ while for $\ell\in\{j+1,\ldots,k\}$ , $\Delta_{j\ell}=\partial\Phi(\boldsymbol{\theta})/\partial\sigma_{j\ell}=2{m_{j}m_{\ell}}/\left(\boldsymbol{m}^{\top}\boldsymbol{m}\right)^{2}$ . This ends the proof of Part (i).

Part (ii): Introduce $\mathbf{W}=(Y_{1},\ldots,Y_{k};Y_{1}^{2},\ldots,Y_{k}^{2})^{\top}$ , $\mathbf{W}_{i}=(Y_{i1},\ldots,Y_{1k};Y_{i1}^{2},\ldots,Y_{ik}^{2})^{\top}$ for $i\in\{1,\ldots,n\}$ and the map $\Psi:(0,\infty)^{2k}\to(0,\infty)$ defined by $\Psi(\boldsymbol{\theta})=(\sum_{j=1}^{k}m_{j}^{2}\sigma_{jj})/(\sum_{j=1}^{k}m_{j}^{2})^{2}$ with $\boldsymbol{\theta}=(m_{1},\ldots,m_{k};\sigma_{11},\ldots,\sigma_{kk})^{\top}$ . Then, one has $\Psi(\mathbb{E}\mathbf{W})=\mathrm{MVI}(\boldsymbol{Y})$ and $\Psi(n^{-1}\sum_{i=1}^{n}\mathbf{W}_{i})=\widehat{\mathrm{MVI}}_{n}(\boldsymbol{Y})$ . The function $\Psi$ is differentiable at the point $\boldsymbol{\theta}$ and, therefore, a straightforward application of the multivariate delta method leads to the conclusion that, as $n\to\infty$ ,

[TABLE]

Here, it is now trivial that $\mathrm{cov}\boldsymbol{W}=\boldsymbol{\Pi}$ of the theorem under the assumption of the finite moments on $Y_{j}$ and also that $\partial\Psi(\boldsymbol{\theta})/\partial\boldsymbol{\theta}=\boldsymbol{\Lambda}$ with $\Lambda_{j}=\partial\Psi(\boldsymbol{\theta})/\partial m_{j}=\{2m_{j}\sigma_{jj}-4m_{j}\left(\sum_{j^{\prime}=1}^{k}m_{j^{\prime}}^{2}\right)\,\Psi(\boldsymbol{\theta})\}\left(\boldsymbol{m}^{\top}\boldsymbol{m}\right)^{-2}$ and $\Lambda_{jj}=\partial\Psi(\boldsymbol{\theta})/\partial\sigma_{jj}=m_{j}^{2}\left(\boldsymbol{m}^{\top}\boldsymbol{m}\right)^{-2}$ for all $j\in\{1,\ldots,k\}$ . This concludes the proof. $\blacksquare$

Proof of Proposition 3. According to the both continuous maps $\Phi:(0,\infty)^{k}\times\mathbb{R}^{k(k+1)/2}\to(0,\infty)$ defined through $\Phi(\mathbb{E}\mathbf{Z})=\mathrm{GVI}(\boldsymbol{Y})$ and $\Phi(n^{-1}\sum_{i=1}^{n}\mathbf{Z}_{i})=\widehat{\mathrm{GVI}}_{n}(\boldsymbol{Y})$ and $\Psi:(0,\infty)^{2k}\to(0,\infty)$ such that $\Psi(\mathbb{E}\mathbf{W})=\mathrm{MVI}(\boldsymbol{Y})$ and $\Psi(n^{-1}\sum_{i=1}^{n}\mathbf{W}_{i})=\widehat{\mathrm{MVI}}_{n}(\boldsymbol{Y})$ in the proof of Proposition 2, the desired result is easily deduced from $n^{-1}\sum_{i=1}^{n}\mathbf{Z}_{i}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\mathbb{E}\mathbf{Z}$ and $n^{-1}\sum_{i=1}^{n}\mathbf{W}_{i}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\mathbb{E}\mathbf{W}$ , respectively. $\blacksquare$

References

[1] Abid, R., Kokonendji, C.C., Masmoudi, A. (2019a). Geometric dispersion models with real quadratic v-functions. Statistics and Probability Letters 145, 197-204.
[2] Abid, R., Kokonendji, C.C., Masmoudi, A. (2019b). Geometric Tweedie regression models for continuous and semicontinuous data with variation phenomenon. AStA Advances in Statistical Analysis, DOI:10.1007/s10182-019-00350-8.
[3] Abid, R., Kokonendji, C.C., Masmoudi, A. (2019c). Poisson-exponential-Tweedie regression models for ultra-overdispersed count data and applications. Submitted for publication.
[4] Abramowitz, M., Stegun, I.A. (1972). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Dover Publications, New York.
[5] Aerts, S., Haesbroeck, G. (2017). Robust asymptotic tests for the equality of multivariate coefficients of variation, TEST 26, 163–187.
[6] Albert, A., Zhang, L. (2010). A novel definition of the multivariate coefficient of variation, Biometrical Journal 52, 667–675.
[7] Angelo, C., Brian, R. (2019). Package boot, https://cran.r-project.org/web/packages/boot/
[8] Arnold, B.C., Tony Ng, H.K. (2011). Flexible bivariate beta distributions, Journal of Multivariate Analysis 102, 1194–1202.
[9] Balakrishnan, N., Basu, A.P. (1995). The Exponential Distribution: Theory, Models and Applications, Gordon and Breach, Amsterdam.
[10] Barlow, R.A., Proschan, F. (1981). Statistical Theory of Reliability and Life Testing: Probability Models, To begin with, Silver Springs, Maryland.
[11] Barndorff-Nielsen, O.E. (1997). Normal inverse Gaussian distribution and stochastic volatility modelling, Scandinavian Journal of Statistics 24, 1–13.
[12] Basu, A.P. (1988). Multivariate exponential distributions and their applications in reliability. In: Handbook of Statistics, vol. 7, Quality Control and Reliability, P.R. Krishnaiah and C.R. Rao (eds), Elsevier, Amsterdam, 467–477.
[13] Bonat, W.H., Jørgensen, B. (2016). Multivariate covariance generalized linear models, Journal of the Royal Statistical Society Series C (Appl. Statist.) 65, 649–675.
[14] Boubacar Maïnassara, Y., Kokonendji, C.C. (2014). On normal stable Tweedie models and power-generalized variance functions of only one component. TEST 23, 585-606.
[15] Casalis, M. (1996). The $2d+4$ simple quadratic natural exponential families on $\mathbb{R}^{d}$ , Annal of Statistics 24, 1828–1854.
[16] Cramér, H. (1974). Mathematical Methods of Statistics, Princeton University Press, Princeton.
[17] Cuenin, J., Jørgensen, B., Kokonendji, C.C. (2016). Simulations of full multivariate Tweedie with flexible dependence structure, Computional Statistics 31, 1477–1492.
[18] Dey, A.K., Kundu, D. (2009). Discriminating among the log-normal, Weibull, and generalized exponential distributions, IEEE Transactions on Reliability 58, 416–424.
[19] Feltz, C.J., Miller, G.E. (1996). An asymptotic test for the equality of coefficients of variation from $k$ populations, Statistics in Medicine 15, 647–658.
[20] Fisher, R.A. (1934). The effects of methods of ascertainment upon the estimation of frequencies, Annals of Eugenics 6, 13-25.
[21] Hayashi, F. (2000). Econometrics, Princeton University Press, URL:http://fhayashi.fc2web.com/hayashi_econometrics.htm, Chapter 10, 665–667.
[22] Joe, H. (2014). Dependence Modeling with Copulas, Monographs on Statistics and Applied Probability 134, Chapman & Hall - CRC Press, London.
[23] Johnson, R.A., Wichern, D.W. (2007). Applied Multivariate Statistical Analysis, 6th Edition, Pearson Prentice Hall, New Jersey.
[24] Jørgensen, B., Kokonendji, C.C. (2016). Discrete dispersion models and their Tweedie asymptotics, AStA Advances in Statistical Analysis 100, 133–153.
[25] Kokonendji, C.C. (2014). Over- and underdispersion models. In: N. Balakrishnan (Ed.) The Wiley Encyclopedia of Clinical Trials - Methods and Applications of Statistics in Clinical Trials, Vol. 2 (Chap. 30), Wiley, New York, pp. 506-526.
[26] Kokonendji, C.C., Mizère, D., Balakrishnan, N. (2008). Connections of the Poisson weight function to overdispersion and underdispersion, Journal of Statistical Planning and Inference 138, 1287–1296.
[27] Kokonendji, C.C., Moypemna Sembona, C.C. (2018). Characterization and classification of multiple stable Tweedie models. Lithuanian Mathematical Journal 58, 441-456.
[28] Kokonendji, C.C., Puig, P. (2018). Fisher dispersion index for multivariate count distributions: A review and a new proposal, Journal of Multivariate Analysis 165, 180–193.
[29] Kotz, S., Balakrishnan, N., Johnson, L.N. (2000). Continuous Multivariate Distributions, Wiley, Chichester.
[30] Marshall, A.W., Olkin, I. (1967). A multivariate exponential distribution, Journal of American Statistical Association 62, 30–44.
[31] Pearson, K. (1896). Mathematical contributions to the theory of evolution. III. Regression, heredity and panmixia, Philosophical Transactions of the Royal Society, Series A, 187, 253–318.
[32] Python Software Foundation. (2019). Python Language Reference, Version 3.7.3, Available at http://www.python.org
[33] R Core Team. (2018). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna. http://cran.r-project.org/
[34] Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics, Wiley, New York.
[35] Su, P. (2015). Generation of Multivariate Data with Arbitrary Marginals: Package, https://cran.r-project.org/web/packages/NORTARA/.
[36] Teimouri, M., Gupta, A.K. (2011). On a bivariate Weibull distribution, Advances and Applications in Statistics 22, 77–106.
[37] Touré, A.Y., Dossou-Gbété, S., Kokonendji, C.C. (2019). Asymptotic normality of the test statistics for relative dispersion and relative variation indexes, Submitted for publication.
[38] Tweedie, M.C.K. (1984). An index which distinguishes between some important exponential families. In: Ghosh, J.K., Roy, J. (eds.) Statistics: Applications and New Directions. Proceedings of the Indian Statistical Golden Jubilee International Conference, Calcutta, pp. 579–604.

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Abid, R., Kokonendji, C.C., Masmoudi, A. (2019 a). Geometric dispersion models with real quadratic v-functions. Statistics and Probability Letters 145, 197-204.
2[2] Abid, R., Kokonendji, C.C., Masmoudi, A. (2019 b). Geometric Tweedie regression models for continuous and semicontinuous data with variation phenomenon. A St A Advances in Statistical Analysis, DOI:10.1007/s 10182-019-00350-8.
3[3] Abid, R., Kokonendji, C.C., Masmoudi, A. (2019 c). Poisson-exponential-Tweedie regression models for ultra-overdispersed count data and applications. Submitted for publication.
4[4] Abramowitz, M., Stegun, I.A. (1972). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Dover Publications, New York.
5[5] Aerts, S., Haesbroeck, G. (2017). Robust asymptotic tests for the equality of multivariate coefficients of variation, TEST 26, 163–187.
6[6] Albert, A., Zhang, L. (2010). A novel definition of the multivariate coefficient of variation, Biometrical Journal 52, 667–675.
7[7] Angelo, C., Brian, R. (2019). Package boot, https://cran.r-project.org/web/packages/boot/
8[8] Arnold, B.C., Tony Ng, H.K. (2011). Flexible bivariate beta distributions, Journal of Multivariate Analysis 102, 1194–1202.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Relative variation indexes for multivariate continuous distributions on [0,∞)k[0,\infty)^{k}[0,∞)k and extensions

Abstract

keywords:

1 Introduction

2 Multivariate variation indexes

2.1 Basic definitions

2.2 Interpretation and properties

Proposition 1**.**

3 Illustrations and comments

3.1 Bivariate beta distribution of Arnold and Tony Ng (2011)

3.2 Bivariate Weibull distribution of Teimouri and Gupta (2011)

3.3 Multivariate exponential distribution of Marshall and Olkin (1967)

3.4 Multiple stable Tweedie (MST) models

4 Estimation and asymptotic properties

Proposition 2**.**

Proposition 3**.**

5 Numerical applications

5.1 Some scenarios of bivariate cases and a real 444-variate dataset

5.2 Other multivariate cases and simulation studies

6 Concluding remarks and extensions

Appendix A. On a broader multivariate exponential distribution

Appendix B. Construction of GVI

Appendix C. Proofs of the asymptotic results

References

Relative variation indexes for multivariate continuous distributions on $[0,\infty)^{k}$ and extensions

Proposition 1.

Proposition 2.

Proposition 3.

5.1 Some scenarios of bivariate cases and a real $4$ -variate dataset