A Bayesian Semiparametric Gaussian Copula Approach to a Multivariate   Normality Test

Luai Al-Labadi; Forough Fazeli Asl; Zahra Saberi

arXiv:1907.01736·stat.ME·July 5, 2019

A Bayesian Semiparametric Gaussian Copula Approach to a Multivariate Normality Test

Luai Al-Labadi, Forough Fazeli Asl, Zahra Saberi

PDF

TL;DR

This paper introduces a Bayesian semiparametric method combining Dirichlet processes and Gaussian copulas to test multivariate normality, demonstrating strong performance on simulated and real data.

Contribution

It develops a novel Bayesian multivariate normality test using a copula approach with theoretical insights and practical validation.

Findings

01

Excellent performance on simulated data

02

Effective detection of non-normality in real data

03

Theoretical properties established for the method

Abstract

In this paper, a Bayesian semiparametric copula approach is used to model the underlying multivariate distribution $F_{t r u e}$ . First, the Dirichlet process is constructed on the unknown marginal distributions of $F_{t r u e}$ . Then a Gaussian copula model is utilized to capture the dependence structure of $F_{t r u e}$ . As a result, a Bayesian multivariate normality test is developed by combining the relative belief ratio and the Energy distance. Several interesting theoretical results of the approach are derived. Finally, through several simulated examples and a real data set, the proposed approach reveals excellent performance.

Tables7

Table 1. Table 1: The mean of the Energy distance between the true and the posterior-based model based on the Kendall’s τ 𝜏 \tau for a = 1 , 5 , 10 𝑎 1 5 10 a=1,\,5,\,10 and n = 1000 𝑛 1000 n=1000 .

True distribution	$a$	${\bar{d}}_{ℰ, N} (F^{*}, F_{t r u e})$	True distribution	$a$	${\bar{d}}_{ℰ, N} (F^{*}, F_{t r u e})$
	1	0.00524		1	0.00528
$N_{2} (𝟎_{2}, A_{2})$	5	0.00559	${(𝒫_{V I I} (1, 1, 1))}^{2}$	5	0.00551
	10	0.00581		10	0.00573
	1	0.00517		1	0.00533
$t_{5} (𝟎_{2}, I_{2})$	5	0.00538	$E (0.5) \otimes E (0.25)$	5	0.00575
	10	0.00563		10	0.00581
	1	0.00564		1	0.00520
$L N_{2} (𝟎_{2}, B_{2})$	5	0.00568	$B (1, 2) \otimes B (2, 1)$	5	0.00557
	10	0.00597		10	0.00563

Table 2. Table 2: d ¯ ℰ , N ( F ∗ , F t r u e ) subscript ¯ 𝑑 ℰ 𝑁 superscript 𝐹 ∗ subscript 𝐹 𝑡 𝑟 𝑢 𝑒 \overline{d}_{\mathcal{E},N}(F^{\ast},F_{true}) based on the Gaussian rank, Kendall, and Spearman correlation coefficients for n = 1000 𝑛 1000 n=1000 and a = 1 𝑎 1 a=1 .

True distribution	Gaussian rank	Kendall’s $τ$	Spearman’s $ρ$
${(𝒫_{V I I} (1, 1, 1))}^{2}$	0.00557	0.00528	0.00567
$N_{2} (𝟎_{2}, A_{2})$	0.00542	0.00524	0.00556

Table 3. Table 3: Relative belief ratios and strength (str)-s for testing the bivariate normality assumption with various alternatives and choices of a 𝑎 a based on the Kendall’s τ 𝜏 \tau with n = 50 𝑛 50 n=50 .

Alternative distribution	$a$	$R B$	$s t r$	E-test’s p-value	Alternative distribution	$a$	$R B$	$s t r$	E-test’s p-value
	1	3.54	0.823			1	0.04	0.005
$N_{2} (𝟎_{2}, I_{2})$	5	3.26	0.832	0.8794	$E (0.5) \otimes E (0.25)$	5	0.00	0.000	$2.2 \times 10^{- 16}$
	10	2.48	0.997			10	0.00	0.000
	1	3.76	0.999			1	0.80	0.110
$N_{2} (𝟎_{2}, A_{2})$	5	2.82	0.859	0.8442	$𝒮^{2} (L N (0, 0.25))$	5	0.62	0.151	$2.2 \times 10^{- 16}$
	10	2.22	0.884			10	0.60	0.240
	1	0.18	0.017			1	2.92	0.854
$L N_{2} (𝟎_{2}, B_{2})$	5	0.18	0.033	$2.2 \times 10^{- 16}$	${(𝒫_{V I I} (1, 1, 10))}^{2}$	5	2.74	0.999	0.7035
	10	0.06	0.007			10	2.26	0.878
	1	0.10	0.000			1	0.34	0.017
$N M I X 1$	5	0.02	0.002	0.0050	${(χ_{5}^{2})}^{2}$	5	0.22	0.026	0.03518
	10	0.00	0.000			10	0.04	0.002
	1	3.64	1.000			1	0.58	0.099
$N M I X 2$	5	3.32	0.834	0.8744	$N (0, 1) \otimes χ_{5}^{2}$	5	0.40	0.070	0.4020
	10	2.34	0.874			10	0.30	0.095
	1	0.80	0.210			1	0.63	0.160
$t_{5} (𝟎_{2}, I_{2})$	5	0.60	0.181	0.0452	$N (0, 1) \otimes t_{3}$	5	0.44	0.092	0.0502
	10	0.20	0.010			10	0.40	0.020

Table 4. Table 4: The proportion of rejecting (POR) ℋ 0 subscript ℋ 0 \mathcal{H}_{0} out of 1000 replications based on the Kendall’s τ 𝜏 \tau for a = 1 𝑎 1 a=1 and sample of size n = 50 𝑛 50 n=50 .

Distribution	POR $ℋ_{0}$	Distribution	POR $ℋ_{0}$
$N_{2} (𝟎_{2}, I_{2})$	0.057 ^†	$N M I X 1$	0.794 ^‡
$N_{2} (𝟎_{2}, A_{2})$	0.070^†	$N M I X 2$	0.077^‡
$L N_{2} (𝟎_{2}, B_{2})$	0.801^‡	${(𝒫_{V I I} (1, 1, 10))}^{2}$	0.106^‡
$𝒮^{2} (L N (0, 0.25))$	0.798^‡	${(χ_{5}^{2})}^{2}$	0.782^‡
$t_{5} (𝟎_{2}, I_{2})$	0.499^‡	$N (0, 1) \otimes χ_{5}^{2}$	0.657^‡
$E (0.5) \otimes E (0.25)$	0.999^‡	$N (0, 1) \otimes t_{3}$	0.551^‡

Table 5. Table 5: RB(Strength) of a sample of size 50 generated from ( E ( 0.5 ) ) 2 superscript 𝐸 0.5 2 (E(0.5))^{2} when there is prior-data conflict (a tiny overlap between the effective support regions).

$H$	$a = 1$	$a = 5$	$a = 10$
$F_{𝜽_{𝐱}}$	0.080(0.004)	0.040(0.006)	0.00(0.000)
$N_{2} (𝐱, I_{2})$	18.46(1.000)	9.180(0.989)	9.08(0.969)
$N_{2} (𝟑_{2}, S_{𝐱})$	19.00(1.000)	10.71(1.000)	8.32(0.582)

Table 6. Table 6: The mean of the prior distance d ¯ ℰ , N , a ( F N , H ) subscript ¯ 𝑑 ℰ 𝑁 𝑎 subscript 𝐹 𝑁 𝐻 \overline{d}_{\mathcal{E},N,a}(F_{N},H) for various choices of H 𝐻 H when 𝐱 2 × 50 ∼ ( E ( 0.5 ) ) 2 similar-to subscript 𝐱 2 50 superscript 𝐸 0.5 2 \mathbf{x}_{2\times 50}\sim(E(0.5))^{2} based on the Kendall’s τ 𝜏 \tau with N = 1000 𝑁 1000 N=1000 and a = 1 𝑎 1 a=1 .

$H$	${\bar{d}}_{ℰ, N, a} (F, H)$	$H$	${\bar{d}}_{ℰ, N, a} (F, H)$
$F_{𝜽_{𝐱}}$	0.00672	$N_{2} (𝟑_{2}, A_{2})$	0.00633
$N_{2} (𝟎_{2}, I_{2})$	0.00661	$N M I X 2$	0.00658

Table 7. Table 7: Description of notations

Notation: Description
1. $𝐜_{2} := {(c, c)}^{T}$ , $I_{2} := (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})$ , $A_{2} := (\begin{matrix} 1 & 0.2 \\ 0.2 & 1 \end{matrix})$ and $B_{2} := (\begin{matrix} 0.25 & 0.2 \\ 0.2 & 0.025 \end{matrix})$ .
2. $E (λ)$ : An exponentional distribution with rate $λ$ .
3. $t_{r}$ : A $t$ -Studen distribution with $r$ degrees of freedom.
4. $B (α, β)$ : A Beta distribution with shape 1 parameter $α$ and shape 2 parameter $β$ .
5. $χ_{r}$ : A chi-square distribution with $r$ degrees of freedom.
6. $𝒫_{V I I} {(1, 1, r)}^{⋆}$ : A pearson type $V I I$ (aka $t$ -Student) distribution with location parameter 1, scale parameter 1 and $r$ degrees of freedom.
7. $F_{1} \otimes F_{2}$ : A bivariate distribution with two independent marginal distributions $F_{1}$ and $F_{2}$ .
8. $t_{r} {(𝟎_{2}, I_{2})}^{†}$ : A bivariate $t$ -student distribution with location parameter $𝟎_{2}$ , scale parameter $I_{2}$ and $r$ degrees of freedom.
9. $L N_{2} {(𝟎_{2}, B_{2})}^{‡}$ : A bivariate lognormal distribution with mean vector $𝟎_{2}$ and covariance matrix $B_{2}$ .
10. $𝒮^{2} {(L N (0, 0.25))}^{†}$ : A bivariate spherical distribution with lognormal distribution $L N (0, 0.25)$ for radii.
11. $N M I X 1^{†}$ : $0.9 N_{2} (𝟎_{2}, I_{2}) + 0.1 N_{2} (𝟑_{2}, I_{2})$
12. $N M I X 2^{†}$ : $0.9 N_{2} (𝟎_{2}, A_{2}) + 0.1 N_{2} (𝟎_{2}, I_{2})$ .

Equations54

\big{|}C(\mathbf{u})-C(\mathbf{v})\big{|}\leq\sum_{i=1}^{m}|u_{i}-v_{i}|.

\big{|}C(\mathbf{u})-C(\mathbf{v})\big{|}\leq\sum_{i=1}^{m}|u_{i}-v_{i}|.

F (t_{1}, \dots, t_{m}) = C (F_{1} (t_{1}), \dots, F_{m} (t_{m})),

F (t_{1}, \dots, t_{m}) = C (F_{1} (t_{1}), \dots, F_{m} (t_{m})),

C (u) = F (F_{1}^{- 1} (u_{1}), \dots, F_{m}^{- 1} (u_{m})),

C (u) = F (F_{1}^{- 1} (u_{1}), \dots, F_{m}^{- 1} (u_{m})),

{C_{R} (u) := Φ_{R} (Φ^{- 1} (u_{1}), \dots, Φ^{- 1} (u_{m})) ∣ r_{ij} \in [- 1, 1]},

{C_{R} (u) := Φ_{R} (Φ^{- 1} (u_{1}), \dots, Φ^{- 1} (u_{m})) ∣ r_{ij} \in [- 1, 1]},

H^{*} = a (a + n)^{- 1} H + n (a + n)^{- 1} F_{n},

H^{*} = a (a + n)^{- 1} H + n (a + n)^{- 1} F_{n},

P = i = 1 \sum \infty L^{- 1} (Γ_{i}) δ_{Y_{i}} / i = 1 \sum \infty L^{- 1} (Γ_{i}),

P = i = 1 \sum \infty L^{- 1} (Γ_{i}) δ_{Y_{i}} / i = 1 \sum \infty L^{- 1} (Γ_{i}),

P_{N} = i = 1 \sum N J_{i, N} δ_{Y_{i}},

P_{N} = i = 1 \sum N J_{i, N} δ_{Y_{i}},

R B_{Ψ} (ψ ∣ x) = π_{Ψ} (ψ ∣ x) / π_{Ψ} (ψ),

R B_{Ψ} (ψ ∣ x) = π_{Ψ} (ψ ∣ x) / π_{Ψ} (ψ),

Π_{Ψ} [R B_{Ψ} (ψ ∣ x) \leq R B_{Ψ} (ψ_{0} ∣ x) ∣ x] .

Π_{Ψ} [R B_{Ψ} (ψ ∣ x) \leq R B_{Ψ} (ψ_{0} ∣ x) ∣ x] .

d_{E} (F, G) = 2 E ∥ X - Y ∥ - E ∥ X - X^{'} ∥ - E ∥ Y - Y^{'} ∥,

d_{E} (F, G) = 2 E ∥ X - Y ∥ - E ∥ X - X^{'} ∥ - E ∥ Y - Y^{'} ∥,

d_{E, n} (F, G) = \frac{2}{n} i = 1 \sum n E ∣∣ x_{i} - Y ∣∣ - \frac{1}{n ^{2}} ℓ, m = 1 \sum n ∣∣ x_{ℓ} - x_{m} ∣∣ - E ∣∣ Y - Y^{'} ∣∣,

d_{E, n} (F, G) = \frac{2}{n} i = 1 \sum n E ∣∣ x_{i} - Y ∣∣ - \frac{1}{n ^{2}} ℓ, m = 1 \sum n ∣∣ x_{ℓ} - x_{m} ∣∣ - E ∣∣ Y - Y^{'} ∣∣,

F^{*} (t_{1}, \dots, t_{m}) = C_{R^{*}} (F_{1}^{*} (t_{1}), \dots, F_{m}^{*} (t_{m})) .

F^{*} (t_{1}, \dots, t_{m}) = C_{R^{*}} (F_{1}^{*} (t_{1}), \dots, F_{m}^{*} (t_{m})) .

\displaystyle\big{|}C_{R^{\ast}}(F^{\ast}_{1}(t_{1}),\ldots,F^{\ast}_{m}(t_{m}))-F_{true}(\mathbf{t})\big{|}

\displaystyle\big{|}C_{R^{\ast}}(F^{\ast}_{1}(t_{1}),\ldots,F^{\ast}_{m}(t_{m}))-F_{true}(\mathbf{t})\big{|}

\displaystyle+\big{|}H^{\ast}(\mathbf{t})-F_{true}(\mathbf{t})\big{|}=I_{1}+I_{2}.

I_{1}

I_{1}

\displaystyle\leq\sum_{i=1}^{m}\big{|}(F^{\ast}_{i}(t_{i})-H^{\ast}_{i}(t_{i})\big{|}.

\displaystyle Pr\left\{\big{|}F^{\ast}_{i}(t_{i})-H^{\ast}_{i}(t_{i})\big{|}\geq\epsilon\right\}\leq\dfrac{H^{\ast}_{i}(t_{i})\left(1-H^{\ast}_{i}(t_{i})\right)}{(a+n+1)\epsilon^{2}}.

\displaystyle Pr\left\{\big{|}F^{\ast}_{i}(t_{i})-H^{\ast}_{i}(t_{i})\big{|}\geq\epsilon\right\}\leq\dfrac{H^{\ast}_{i}(t_{i})\left(1-H^{\ast}_{i}(t_{i})\right)}{(a+n+1)\epsilon^{2}}.

\displaystyle Pr\left\{\big{|}F^{\ast}_{i}(t_{i})-H^{\ast}_{i}(t_{i})\big{|}\geq\epsilon\right\}\leq\dfrac{1}{4\,k^{2}\epsilon^{2}}.

\displaystyle Pr\left\{\big{|}F^{\ast}_{i}(t_{i})-H^{\ast}_{i}(t_{i})\big{|}\geq\epsilon\right\}\leq\dfrac{1}{4\,k^{2}\epsilon^{2}}.

H_{0} : F_{t r u e} \in F,

H_{0} : F_{t r u e} \in F,

H_{0} : F_{t r u e} = F_{θ_{x}} .

H_{0} : F_{t r u e} = F_{θ_{x}} .

F (t_{1}, \dots, t_{m}) = C_{R_{x}} (F_{1} (t_{1}), \dots, F_{m} (t_{m})),

F (t_{1}, \dots, t_{m}) = C_{R_{x}} (F_{1} (t_{1}), \dots, F_{m} (t_{m})),

d_{E, N, a} (F_{N}, H) = 2 i = 1 \sum N J_{i, N} E_{H} ∣∣ x_{i} - Y ∣∣ - i, j = 1 \sum N J_{i, N} J_{j, N} ∣∣ x_{i} - x_{j} ∣∣ - E_{H} ∣∣ Y - Y^{'} ∣∣,

d_{E, N, a} (F_{N}, H) = 2 i = 1 \sum N J_{i, N} E_{H} ∣∣ x_{i} - Y ∣∣ - i, j = 1 \sum N J_{i, N} J_{j, N} ∣∣ x_{i} - x_{j} ∣∣ - E_{H} ∣∣ Y - Y^{'} ∣∣,

E_{F_{N}} (d_{E, N, a} (F_{N}, H)) a . s . d_{E, N} (F_{N}, H),

E_{F_{N}} (d_{E, N, a} (F_{N}, H)) a . s . d_{E, N} (F_{N}, H),

E_{F_{N}} (d_{E, N, a} (F_{N}, H)) = \frac{2}{N} i = 1 \sum N E_{H} ∣∣ x_{i} - Y ∣∣ - \frac{a}{( a + 1 ) N ^{2}} i, j = 1 \sum N ∣∣ x_{i} - x_{j} ∣∣ - E_{H} ∣∣ Y - Y^{'} ∣∣.

E_{F_{N}} (d_{E, N, a} (F_{N}, H)) = \frac{2}{N} i = 1 \sum N E_{H} ∣∣ x_{i} - Y ∣∣ - \frac{a}{( a + 1 ) N ^{2}} i, j = 1 \sum N ∣∣ x_{i} - x_{j} ∣∣ - E_{H} ∣∣ Y - Y^{'} ∣∣.

\displaystyle\big{|}C_{R}(F_{N1}(x_{1}),\ldots,F_{Nm}(x_{m}))-C_{R}(F_{1}(x_{1}),\ldots,F_{m}(x_{m}))\big{|}\leq\sum_{i=1}^{m}\big{|}F_{N1}(x_{i})-F_{1}(x_{i})\big{|}.

\displaystyle\big{|}C_{R}(F_{N1}(x_{1}),\ldots,F_{Nm}(x_{m}))-C_{R}(F_{1}(x_{1}),\ldots,F_{m}(x_{m}))\big{|}\leq\sum_{i=1}^{m}\big{|}F_{N1}(x_{i})-F_{1}(x_{i})\big{|}.

\displaystyle\big{|}C_{R^{\ast}}(F^{\ast}_{1}(t_{1}),\ldots,F^{\ast}_{m}(t_{m}))-F_{\boldsymbol{\theta}_{\mathbf{x}}}(t_{1},\ldots,t_{m})\big{|}\geq\big{|}H^{\ast}(\mathbf{t})-F_{\boldsymbol{\theta}_{\mathbf{x}}}(\mathbf{t})\big{|}-I_{1}.

\displaystyle\big{|}C_{R^{\ast}}(F^{\ast}_{1}(t_{1}),\ldots,F^{\ast}_{m}(t_{m}))-F_{\boldsymbol{\theta}_{\mathbf{x}}}(t_{1},\ldots,t_{m})\big{|}\geq\big{|}H^{\ast}(\mathbf{t})-F_{\boldsymbol{\theta}_{\mathbf{x}}}(\mathbf{t})\big{|}-I_{1}.

R B_{D_{E}} (d ∣ x) = M {\hat{F}_{D_{E}} (\hat{d}_{(i + 1) / M} ∣ x) - \hat{F}_{D_{E}} (\hat{d}_{i / M} ∣ x)},

R B_{D_{E}} (d ∣ x) = M {\hat{F}_{D_{E}} (\hat{d}_{(i + 1) / M} ∣ x) - \hat{F}_{D_{E}} (\hat{d}_{i / M} ∣ x)},

\sum_{\{i\geq i_{0}:\widehat{RB}_{D}(\hat{d}_{i/M}\,|\,\mathbf{x})\leq\widehat{RB}_{D}(0\,|\,\mathbf{x})\}}\big{(}\hat{F}_{D_{\mathcal{E}}}(\hat{d}_{(i+1)/M}\,|\,\mathbf{x})-\hat{F}_{D_{\mathcal{E}}}(\hat{d}_{i/M}\,|\,\mathbf{x})\big{)}.

\sum_{\{i\geq i_{0}:\widehat{RB}_{D}(\hat{d}_{i/M}\,|\,\mathbf{x})\leq\widehat{RB}_{D}(0\,|\,\mathbf{x})\}}\big{(}\hat{F}_{D_{\mathcal{E}}}(\hat{d}_{(i+1)/M}\,|\,\mathbf{x})-\hat{F}_{D_{\mathcal{E}}}(\hat{d}_{i/M}\,|\,\mathbf{x})\big{)}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Bayesian Semiparametric Gaussian Copula Approach to a Multivariate Normality Test

Luai Al-Labadi Corresponding author: [email protected] Department of Mathematical and Computational Sciences, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6, Canada.

Forough Fazeli Asl [email protected] Department of Mathematical Sciences, Isfahan University of Technology, Isfahan 84156-83111, Iran.

Zahra Saberi [email protected] Department of Mathematical Sciences, Isfahan University of Technology, Isfahan 84156-83111, Iran.

Abstract

In this paper, a Bayesian semiparametric copula approach is used to model the underlying multivariate distribution $F_{true}$ . First, the Dirichlet process is constructed on the unknown marginal distributions of $F_{true}$ . Then a Gaussian copula model is utilized to capture the dependence structure of $F_{true}$ . As a result, a Bayesian multivariate normality test is developed by combining the relative belief ratio and the Energy distance. Several interesting theoretical results of the approach are derived. Finally, through several simulated examples and a real data set, the proposed approach reveals excellent performance.

Keywords: Dirichlet process, Energy distance, Multivariate normality test, Relative belief inferences, Semiparametric Gaussian copula model.

MSC 2010 62F15, 62G10, 62H15

1 Introduction

Semiparametric copulas are useful tools in multivariate data analysis. They are used for modelling a multivariate distribution whose dependence structure is induced by a known copula and whose marginal distributions are estimated; see, for example, Sancetta and Satchell (2004), Segers et al. (2014) and the references therein. We point out to the interesting work of Rosen and Thompson (2015) who proposed a semiparametric methodology for modeling a multivariate distribution whose dependence structure is induced by a Gaussian copula and whose marginal distributions are estimated nonparametrically via mixtures of B-spline densities. The authors take a Bayesian approach, using Markov chain Monte Carlo methods for inference.

In the present paper, a Bayesian Semiparametric copula approach based on the Dirichlet process and the Gaussian copula is proposed to model the underlying multivariate distribution $F_{true}$ . In addition, recognizing that many recent applications of research are developed based on the assumption of multivariate normality (Fernandez, 2010 and Zhu et al., 2014), a test to assess this assumption is developed. Recent procedures tackling this problem can be found in Kim and Park (2018), Madukaife and Okafor (2018), Henze and Visagie (2019) and Al-Labadi et al. (2019a). We highlight that, while most available works in the area of the hypothesis testing using copula approaches are related to assess independence (Genest and Rémillard, 2004; Kojadinovic and Holmes, 2009; Medovikov, 2016; Belalia et al., 2017), the proposed test is Bayesian and considers modeling the dependence structure and the marginal behaviors of the data separately to assess the multivariate normality assumption. Briefly, all univariate marginal distributions of $F_{true}$ are assumed to have the Dirichlet process to define posterior-based and prior-based models of $F_{true}$ . A Gaussian copula model is then utilized to induce the dependence structure. The test follows by comparing the concentration of the posterior-based model to the concentration of the prior-based model about the family of multivariate normal distributions (hypothesized model) via the so-called relative belief ratio. In this comparison, a Bayesian counterpart of the Energy distance is developed. The proposed test is easy to implement with a powerful performance and allows to state evidence for or against the null hypothesis. Also, unlike the test presented in Al-Labadi et al. (2019a), which is restricted to assess the family of multivariate normal distributions for the hypothesized model, the proposed test can be extended to assess all families of multivariate distributions (model checking problems).

The rest of the paper is structured as follows. A relevant background containing some definitions and generic properties are reviewed in Section 2. In Section 3, a Bayesian semiparametric Gaussian copula approach based on the Dirichlet process is proposed for modeling multivariate distributions. The choice of the hyperparameter of the Dirichlet process and the estimation method of the parameter of the Gaussian copula are discussed in Section. In Section 5, a Bayesian multivariate normality (MVN) test based on the proposed approach and the Energy distance is developed. The main steps of a computational algorithm to implement the MVN test are outlined in Section 6. The performance of the approach and its application to the MVN test is clarified through some simulation studies and a real data example in Section 7. The results show that the proposed test works well in all covered cases and it is very powerful. Finally, Section 8 concludes the paper with a summary of the results. Some notations related to the Section 7 are given in the Appendix.

2 Relevant background

2.1 Copula-based Model

In multivariate analysis, copula models are introduced by Sklar (1959) as a common tool to model multivariate distributions. Following Nelsen (2006), an $m$ -dimensional copula ( $m$ -copula) is a nondecreasing and right continuous function $C$ from $[0,1]^{m}$ into $[0,1]$ such that, for every $\mathbf{u}=(u_{1},\ldots,u_{m})\in[0,1]^{m}$

(i)

$C(u_{i},\ldots,u_{i-1},0,u_{i+1},\ldots,u_{m})=0$ , for $i=1,\ldots,m$ ( $C$ is grounded). 2. (ii)

$C(1,\ldots,1,u_{i},1,\ldots,1)=u_{i}$ , for $i=1,\ldots,m$ ( $C$ has margins). 3. (iii)

For every $\mathbf{v}=(v_{1},\ldots,v_{m})\in[0,1]^{m}$ such that $u_{i}\leq v_{i}$ for all $i$ , the $C$ -volume $V\left([\mathbf{u},\mathbf{v}]\right)\geq 0$ ( $C$ is $m$ -increasing), where $m$ -cube $[0,1]^{m}$ is $m$ product of $[0,1]$ and $[\mathbf{u},\mathbf{v}]$ is the $m$ -box $[u_{1},v_{1}]\times\cdots\times[u_{m},v_{m}]$ .

From (iii), it is obvious that every $m$ -copula $C$ is nondecreasing in each variable and satisfies in the Lipschitz condition. That is, for every point $\mathbf{u},\mathbf{v}\in[0,1]^{m}$ ,

[TABLE]

Hence, any $m$ -copula $C$ is uniformly continuous on $[0,1]^{m}$ .

The following key theorem of Sklar (1959) illustrates the role of the $m$ -copulas to model the multivariate distribution functions through their univariate margins.

Theorem 1

(Sklar’s theorem) Let $F$ be an $m$ -variate distribution function with marginal distribution functions $F_{1},\ldots,F_{m}$ . Then there exists an $m$ -copula $C$ such that for all $\mathbf{t}\in\overline{\mathbb{R}}^{m}$

[TABLE]

where $\overline{\mathbb{R}}^{m}$ is $m$ product of the extended real line $[-\infty,\infty]$ . If $F_{1},\ldots,F_{m}$ are all continuous, then $C$ is unique and can be written as

[TABLE]

for any $\mathbf{u}\in[0,1]^{m}$ where $F^{-1}_{i}(u_{i})=\inf\{t\in\mathbb{R}|\,F_{i}(t)\geq u_{i}\}$ , otherwise; $C$ is uniquely determined on $Ran(F_{1})\times\cdots\times Ran(F_{m})$ , where $Ran(F_{i})$ denotes the range of the distribution function $F_{i}$ for $i=1,\ldots,m$ . Conversely, if $C$ is an $m$ -copula and $F_{1},\ldots,F_{m}$ are distribution functions, then $F$ , defined by (2), is an $m$ -variate distribution function with marginal distribution functions $F_{1},\ldots,F_{m}$ .

In general, the dependence structure of multivariate distributions is modeled by the copula. For this purpose, some families of copulas have been developed. See, for example, the work of Joe (1997), Trivedi and Zimmer (2005), and Nelsen (2006). One instance of special interest is the family of Gaussian copulas. Beside that it satisfies both Fréchet-Hoeffding lower and upper bounds (Nelsen 2006, Theorem 2.10.2), it has only one dependence parameter restricted to the symmetric interval $[-1,1]$ , which makes it simple to apply. Formally, let $\Phi^{-1}$ be the inverse of the cumulative distribution function (cdf) of the univariate standard normal distribution and $\Phi_{R}$ be the cdf of the multivariate normal distribution with zero mean vector $\mathbf{0}_{m}$ and correlation matrix $R=(r_{ij})$ for $1\leq i,j\leq m$ , then the family of Gaussian copulas is defined by

[TABLE]

where $\mathbf{u}\in[0,1]^{m}$ . Following Chen et al. (2006), assume that, for any $\mathbf{u}\in[0,1]^{m}$ , $C(\mathbf{u})=C_{R}(\mathbf{u})$ , then the multivariate distribution $F$ is of a semiparametric Gaussian copula model $F(t_{1},\ldots,t_{m})=C_{R}(F_{1}(t_{1}),\ldots,F_{m}(t_{m}))$ , with unknown parameter $R$ and unknown marginal cdf $F_{i}$ , for $i=1\ldots m$ . A detailed discussion about the semiparametric Gaussian copula model will be presented in Section 3 based on using the Dirichlet process.

The following algorithm shows the steps of generating a sample of random vectors from an $m$ -variate distribution $F$ with marginal cdf’s $F_{1},\ldots,F_{m}$ using a Gaussian copula model with correlation matrix $R$ .

2.2 Dirichlet Process

The Dirichlet process prior, introduced by Ferguson (1973), is the most commonly used prior in Bayesian nonparametric inferences. A remarkable collection of nonparametric inferences have been devoted to this prior. Here we only present the most relevant definitions and properties of this prior. Consider a space $\mathfrak{X}$ with a $\sigma$ -algebra $\mathcal{A}$ of subsets of $\mathfrak{X}$ , let $H$ be a fixed probability measure on $(\mathfrak{X},\mathcal{A}),$ called the base measure, and $a$ be a positive number, called the concentration parameter. A random probability measure $P=\left\{P(A):A\in\mathcal{A}\right\}$ is called a Dirichlet process on $(\mathfrak{X},\mathcal{A})$ with parameters $a$ and $H,$ denoted by $P\sim{DP}(a,H),$ if for every measurable partition $A_{1},\ldots,A_{k}$ of $\mathfrak{X}$ with $k\geq 2\mathfrak{,}$ the joint distribution of the vector $\left(P(A_{1}),\ldots\,P(A_{k})\right)$ has the Dirichlet distribution with parameter $aH(A_{1}),\ldots,$ $aH(A_{k})$ . Also, it is assumed that $H(A_{j})=0$ implies $P(A_{j})=0$ with probability one. Consequently, for any $A\in\mathcal{A}$ , $P(A)\sim$ beta $(aH(A),a(1-H(A)))$ , ${E}(P(A))=H(A)\$ and ${Var}(P(A))=H(A)(1-H(A))/(1+a).$ Accordingly, the base measure $H$ plays the role of the center of $P$ while the concentration parameter $a$ controls the variation of $P$ around $H$ . One of the most well-known properties of the Dirichlet process is the conjugacy property. That is, when the sample $x=(x_{1},\ldots,x_{n})$ is drawn from $P\sim DP(a,H)$ , the posterior distribution of $P$ given $x$ , denoted by $P^{\ast}$ , is also a Dirichlet process with concentration parameter $a+n$ and base measure

[TABLE]

where $F_{n}$ denotes the empirical cumulative distribution function (cdf) of the sample $x$ . Note that, $H^{\ast}$ is a convex combination of the base measure $H$ and the empirical cdf $F_{n}$ . Therefore, $H^{\ast}\rightarrow H$ as $a\rightarrow\infty$ while $H^{\ast}\rightarrow F_{n}$ as $a\rightarrow 0.$ A guideline about choosing the hyperparameters $a$ and $H$ will be covered in Section 4. Following Ferguson (1973), $P\sim{DP}(a,H)\$ can be represented as

[TABLE]

where $\Gamma_{i}=E_{1}+\cdots+E_{i}$ with $E_{i}\overset{i.i.d.}{\sim}$ exponential $(1),Y_{i}\overset{i.i.d.}{\sim}H$ independent of the $\Gamma_{i},L^{-1}(y)=\inf\{x>0:L(x)\geq y\}$ with $L(x)=a\int_{x}^{\infty}t^{-1}e^{-t}dt,x>0,$ and ${\delta_{a}}$ the Dirac delta measure. The series representation (4) implies that the Dirichlet process is a discrete probability measure even for the cases with an absolutely continuous base measure $H$ . Note that, by imposing the weak topology, the support of the Dirichlet process could be quite large. Recognizing the complexity when working with (4), Zarepour and Al-Labadi (2012) proposed the following finite representation as an efficient method to simulate the Dirichlet process. They showed that the Dirichlet process $P\sim DP(a,H)$ can be approximated by

[TABLE]

with the monotonically decreasing weights $J_{i,N}=\frac{G_{a/N}^{-1}(\frac{\Gamma_{i}}{\Gamma_{N+1}})}{\sum_{j=1}^{N}G_{a/N}^{-1}(\frac{\Gamma_{i}}{\Gamma_{N+1}})},$ where $\Gamma_{i}$ and $Y_{i}$ are defined as before, $N$ is a positive large integer and $G_{a/N}$ denotes the complement-cdf of the $\text{gamma}(a/N,1)$ distribution. Note that, $G^{-1}_{a/N}(p)$ is the $(1-p)$ -th quantile of the $\text{gamma}(a/N,1)$ distribution. The following algorithm describes how the approximation (5) can be used to generate a sample from $DP(a,H)$ .

The Dirichlet process can also be obtained from the following finite mixture models developed by Ishwaran and Zarepour (2002). Let $P_{N}$ has the from given (5) with $(J_{i,N})_{1\leq i\leq N}\sim$ Dirichlet $(a/N,\ldots,a/N)$ . Then $E_{P_{N}}(g)\rightarrow E_{P}(g)$ in distribution as $N\rightarrow\infty$ , for any measurable function $g:\mathbb{R}\rightarrow\mathbb{R}$ with $\int_{\mathbb{R}}|g(x)|\,H(dx)<\infty$ and $P\sim DP(a,H)$ . In particular, $(P_{N})_{N\geq 1}$ converges in distribution to $P$ , where $P_{N}$ and $P$ are random values in the space $M_{1}(\mathbb{R})$ of probability measures on $\mathbb{R}$ endowed with the topology of weak convergence. To generate $(J_{i,N})_{1\leq i\leq N}$ put $J_{i,N}=G_{i,N}/\sum_{i=1}^{N}G_{i,N}$ , where $(G_{i,N})_{1\leq i\leq N}$ is a sequence of i.i.d. gamma $(a/N,1)$ random variables independent of $(Y_{i})_{1\leq i\leq N}$ . This form of approximation leads to some results in Section 5.

2.3 Relative Belief Inferences

The relative belief ratio, developed by Evans (2015), becomes a widespread measure of statistical evidence. See, for example, the work of Al-Labadi and Evans (2018), Al-Labadi et al. (2017, 2018), and Al-Labadi et al. (2019a, 2019b) for implementation of the relative belief ratio on different stimulating univariate hypothesis testing problems. In details, let $\{f_{\theta}:\theta\in\Theta\}$ be a collection of densities on a sample space $\mathfrak{X}$ and let $\pi$ be a prior on the parameter space $\Theta$ . Note that the densities may represent discrete or continuous probability measures but they are all with respect to the same support measure $d\theta$ . After observing the data $x,$ the posterior distribution of $\theta$ , denoted by $\pi(\theta\,|\,x)$ , is a revised prior and is given by the density $\pi(\theta\,|\,x)=\pi(\theta)f_{\theta}(x)/m(x)$ , where $m(x)=\int_{\Theta}\pi(\theta)f_{\theta}(x)\,d\theta$ is the prior predictive density of $x.$ For a parameter of interest $\psi=\Psi(\theta),$ let $\Pi_{\Psi}$ be the marginal prior probability measure and $\Pi_{\Psi}(\cdot|\,x)$ be the marginal posterior probability measure. It is assumed that $\Psi$ satisfies regularity conditions so that the prior density $\pi_{\Psi}$ and the posterior density $\pi_{\Psi}(\cdot\,|\,x)$ of $\psi$ exist with respect to some support measure on the range space for $\Psi$ . The relative belief ratio for a value $\psi$ is then defined by $RB_{\Psi}(\psi\,|\,x)=\lim_{\delta\rightarrow 0}\Pi_{\Psi}(N_{\delta}(\psi\,)|\,x)/\Pi_{\Psi}(N_{\delta}(\psi\,)),$ where $N_{\delta}(\psi\,)$ is a sequence of neighborhoods of $\psi$ converging nicely to $\psi$ as $\delta\rightarrow 0$ (Evans, 2015). When $\pi_{\Psi}$ and $\pi_{\Psi}(\cdot\,|\,x)$ are continuous at $\psi,$ the relative belief ratio is defined by

[TABLE]

the ratio of the posterior density to the prior density at $\psi.$ Therefore, $RB_{\Psi}(\psi\,|\,x)$ measures the change in the belief of $\psi$ being the true value from a priori to a posteriori.

Since $RB_{\Psi}(\psi\,|\,x)$ is a measure of the evidence that $\psi$ is the true value, if $RB_{\Psi}(\psi\,|\,x)$ $>1$ , then the probability of the $\psi$ being the true value from a priori to a posteriori is increased, consequently there is evidence based on the data that $\psi$ is the true value. If $RB_{\Psi}(\psi\,|\,x)<1$ , then the probability of the $\psi$ being the true value from a priori to a posteriori is decreased. Accordingly, there is evidence against based on the data that $\psi$ being the true value. For the case $RB_{\Psi}(\psi\,|\,x)=1$ there is no evidence either way.

Obviously, $RB_{\Psi}(\psi_{0}\,|\,x)$ measures the evidence of the hypothesis $\mathcal{H}_{0}:\Psi(\theta)=\psi_{0}$ . Large values of $RB_{\Psi}(\psi_{0}\,|\,x)=c$ provides strong evidence in favor of $\psi_{0}$ . However, there may also exist other values of $\psi$ that had even larger increases. Thus, it is also necessary, however, to calibrate whether this is strong or weak evidence for or against $\mathcal{H}_{0}.$ A typical calibration of $RB_{\Psi}(\psi_{0}\,|\,x)$ is given by the strength

[TABLE]

The value in (6) indicates that the posterior probability that the true value of $\psi$ has a relative belief ratio no greater than that of the hypothesized value $\psi_{0}.$ Noticeably, (6) is not a p-value as it has a very different interpretation. When $RB_{\Psi}(\psi_{0}\,|\,x)<1$ , there is evidence against $\psi_{0},$ then a small value of (6) indicates strong evidence against $\psi_{0}$ . On the other hand, a large value for (6) indicates weak evidence against $\psi_{0}$ . Similarly, when $RB_{\Psi}(\psi_{0}\,|\,x)>1$ , there is evidence in favor of $\psi_{0},$ then a small value of (6) indicates weak evidence in favor of $\psi_{0}$ , while a large value of (6) indicates strong evidence in favor of $\psi_{0}$ .

2.4 Energy Distance

The Energy distance, presented by Székely (2003), is an appropriate tool to determine the equality of distributions. In general, the Energy distance between two $m$ -variate distribution function $F$ and $G$ is defined by

[TABLE]

where $\mathbf{X},\mathbf{X}^{\prime}\overset{i.i.d.}{\sim}F$ , $\mathbf{Y},\mathbf{Y}^{\prime}\overset{i.i.d.}{\sim}G$ and $\|\mathbf{a}\|=\sqrt{\mathbf{a}^{T}\mathbf{a}}$ denotes Euclidean norm of vector $\mathbf{a}=(a_{1},\ldots,a_{m})$ . Székely and Rizzo (2013) showed that $d_{\mathcal{E}}(F,G)\geq 0$ such that equality holds if and only if $F=G$ . Note that, from (Székely, 2003), the Energy distance (7) is rotation invariant. This property makes it appropriate for testing goodness-of-fit problems in higher dimensions. Specifically, let $G$ be the hypothesized distribution and $\mathbf{x}_{m\times n}=(\mathbf{x}_{1},\ldots,\mathbf{x}_{n})$ be the observed sample from $F$ . Then, the one sample Energy distance corresponding to (7) is defined by

[TABLE]

where $\mathbf{x}_{i}\in\mathbb{R}^{m}$ , for $i=1,\ldots,n$ , and the expectations are taken with respect to the distribution $G$ . The special important case occurs when $G$ is a multivariate normal distribution where the $\mathsf{R}$ package energy is usually used for implementing (8). For further discussion about Energy distance consult Székely (2003) and Székely and Rizzo (2013).

3 A Bayesian Semiparametric Gaussian Copula Approach for Modeling Multivariate Distributions

In this section, we propose a Bayesian semiparametric copula approach based on the Gaussian copula as a flexible model for modeling multivariate distributions. For a brevity, we refer to this procedure as the BSPGC (Bayesian semiparametric Gaussian copula) approach. Specifically, let $\mathbf{x}_{m\times n}=(\mathbf{x}_{1},\ldots,\mathbf{x}_{n})$ be a sample of size $n$ from an unknown $m$ -variate distribution $F_{true}$ with maginal cdf’s $F_{1},\ldots,F_{m}$ . Note that, the subscript $m\times n$ may be omitted whenever it is clear in the context. To model $F_{true}$ based on the BSPGC approach we use the prior $F_{i}\sim DP(a,H_{i})$ , where $H_{i}$ is the $i$ -th marginal cdf of a given $m$ -variate distribution $H$ . So, by (3), for a given choice of $a$ , $F^{\ast}_{i}=F_{i}|\mathbf{x}_{i}\sim DP(a+n,H_{i}^{\ast})$ for $i=1,\ldots,m$ . Consider the joint cdf $H^{\ast}$ corresponding to marginal cdf’s $H_{1}^{\ast},\ldots,H_{m}^{\ast}$ with correlation matrix $R^{\ast}$ . Then, the posterior-based model is defined by

[TABLE]

The next lemma shows that $F^{\ast}$ approaches to the true distribution $F_{true}$ when the sample size increases.

Lemma 2

Let $\mathbf{x}_{m\times n}$ be a sample from $m$ -variate distribution function $F_{true}$ with unknown marginal cdf’s $F_{1},\ldots,F_{m}$ . Assume that $F^{\ast}_{i}\sim DP(a+n,H_{i}^{\ast})$ , for $i=1,\ldots,m$ . For any $\mathbf{t}=(t_{1},\ldots,t_{m})\in\mathbb{R}^{m}$ , $C_{R^{\ast}}(F^{\ast}_{1}(t_{1}),\ldots,F^{\ast}_{m}(t_{m}))\xrightarrow{a.s.}F_{true}(\mathbf{t})$ as $n\rightarrow\infty$ .

Proof. For any $\mathbf{t}\in\mathbb{R}^{m}$ , the triangle inequality implies

[TABLE]

From (3), for any $\mathbf{t}\in\mathbb{R}^{m}$ , $H^{\ast}(\mathbf{t})\xrightarrow{a.s.}F_{true}(\mathbf{t})$ as $n\rightarrow\infty$ . Hence, the continuous mapping theorem implies $I_{2}\xrightarrow{a.s.}0$ as $n\rightarrow\infty$ . On the other hand, from Sklar’s theorem 1 and Lipschitz condition (1), we have

[TABLE]

Note that, from the property of the Dirichlet process, for any $t_{i}\in\mathbb{R}$ and $\epsilon>0$ , Chebyshev’s inequality implies

[TABLE]

Substituting $n=k^{2}+b$ , for $k\in\mathbb{N}$ and $b\in\{0,1,\ldots\}$ , gives

[TABLE]

Since $\sum_{k=1}^{\infty}k^{-2}$ converges, then $\sum_{k=1}^{\infty}Pr\left\{\big{|}F^{\ast}_{i}(t_{i})-H^{\ast}_{i}(t_{i})\big{|}\geq\epsilon\right\}<\infty.$ Hence, by the first Borel Cantelli lemma, $\big{|}F^{\ast}_{i}(t_{i})-H^{\ast}_{i}(t_{i})\big{|}\xrightarrow{a.s.}0$ , as $k\rightarrow\infty$ or $n\rightarrow\infty$ . This completes the proof.

4 Selecting $a$ , $H$ and the Method of Estimation of $R^{\ast}$ in the BSPGC Approach

The proposed method for modeling multivariate distributions depends on $a$ , $H$ and $R^{\ast}$ . Hence, it is necessary to look carefully at the impact of these parameters on the approach. For instance, from (3), a large value of $a$ can increase the effect of the $m$ -variate distribution $H$ instead of $F_{n}$ in the posterior-based model (9). The following lemma shows the effect of the value of $a$ on the model (9).

Lemma 3

Let $\mathbf{x}_{m\times n}$ be a sample from $m$ -variate distribution function $F_{true}$ with unknown marginal cdf’s $F_{1},\ldots,F_{m}$ . Also, let $H$ be a known $m$ -variate cdf with marginal cdf’s $H_{1},\ldots,H_{m}$ and $H^{\ast}$ be the $m$ -variate cdf, defined in (3), with marginal cdf’s $H^{\ast}_{1},\ldots,H^{\ast}_{m}$ . Assume that $F^{\ast}_{i}\sim DP(a+n,H^{\ast}_{i})$ , for $1\leq i\leq m$ . Then, for any $\mathbf{t}\in\mathbb{R}^{m}$ , $C_{R^{\ast}}(F^{\ast}_{1}(t_{1}),\ldots,F^{\ast}_{m}(t_{m}))\xrightarrow{a.s.}H(\mathbf{t})$ as $a\rightarrow\infty$ .

Proof. The proof is similar to the proof of Lemma 2. For this, assume that $a=k^{2}c$ for $k\in\{0,1,\ldots\}$ and a fixed positive number $c$ . For any fixed $n$ , replace $F_{true}(\mathbf{t})$ and $a$ , respectively by, $H(\mathbf{t})$ and $k^{2}c$ in the proof of Lemma 2. Then the result follows.

It follows from Lemma 3 that increasing the value of $a$ can lead to some errors. To avoid this issue, we propose to choose $a$ to be at most $0.5\,n$ as recommended in Al-Labadi and Zarepur (2017).

The choice of $H$ is also very significant and there are two main issues to reflect. The first one is the independence of the approach to the choice of $H$ (invariance property). As pointed in Table 6, the approach is invariance to the choice of any continuous multivariate distribution. The second issue is the compatibility between $H$ and the data. This is typically called prior-data conflict (Evans and Moshonov, 2006; Al-Labadi and Evans, 2017, Al-Labadi and Evans, 2018, Al-Labadi and Wang, 2019). As illustrated in Section 7.2, the existence of prior-data conflict yields to a failure of the approach and thus should be avoided. Since $E(F_{i})=H_{i}$ , where $F_{i}\sim DP(a,H_{i})$ for $i=1,\ldots,m$ , a reasonable choice of $H$ that ensures the avoidance of prior-data conflict is the $m$ -variate normal distribution $N_{m}(\overline{\mathbf{x}},S_{\mathbf{x}})$ , where $\overline{\mathbf{x}}=1/n\sum_{i=1}^{n}\mathbf{x}_{i}$ and $S_{\mathbf{x}}=1/(n-1)\sum_{i=1}^{n}(\mathbf{x}_{i}-\overline{\mathbf{x}})(\mathbf{x}_{i}-\overline{\mathbf{x}})^{T}$ .

To carry on the approach, it is essential to estimate the correlation matrix $R^{\ast}$ . For this, we first generate a sample from the mixture distribution in (3). Then, based on the generated sample, $R^{\ast}$ is estimated by one of the following three common procedures: the Gaussian correlation rank, the Kendal’s $\tau$ or the Spearman’s $\rho$ . In Section 7, we performed a simulation study to compare the effect of these three methods on the quality of the approach. As a result, we recommend using Kendal’s $\tau$ correlation coefficients with $a=1$ in the proposed model.

5 A MVN Test Based on the BSPGC Approach

Let $\mathbf{x}_{m\times n}$ be a sample of size $n$ from an unknown $m$ -variate distribution $F_{true}$ . The problem to be addressed in this section is to test the hypothesis

[TABLE]

where $\mathcal{F}=\{N_{m}(\boldsymbol{\mu}_{m},\Sigma_{m}):\boldsymbol{\mu}_{m}\in\mathbb{R}^{m},\det(\Sigma_{m})>0\}$ . Note that, whenever $\boldsymbol{\mu}_{m}$ and $\Sigma_{m}$ are unknown, they are to be estimated by the sample mean vector $\overline{\mathbf{x}}$ and sample covariance matrix $S_{\mathbf{x}}$ , respectively. Thus, for $\boldsymbol{\theta}_{\mathbf{x}}=(\overline{\mathbf{x}},S_{\mathbf{x}})$ , $F_{\boldsymbol{\theta}_{\mathbf{x}}}=N_{m}(\overline{\mathbf{x}},S_{\mathbf{x}})$ is the best representative of the family $\mathcal{F}$ to compare with distribution $F_{true}$ . Hence, testing (11) is equivalent to test

[TABLE]

Now, we continue as follows. Let $H=F_{\boldsymbol{\theta}_{\mathbf{x}}}$ with marginal cdf’s $H_{1}=F_{\boldsymbol{\theta}_{\mathbf{x}_{1}}},\ldots,H_{m}=F_{\boldsymbol{\theta}_{\mathbf{x}_{m}}}$ . Here, for $i=1,\ldots,m$ , $F_{\boldsymbol{\theta}_{\mathbf{x}_{i}}}$ is the cdf of the univariate normal distribution with mean $\overline{x}_{i}$ and variance $s^{2}_{i}$ , where $\overline{x}_{i}$ and $s^{2}_{i}$ are the $i$ -th element of $\overline{\mathbf{x}}$ and $i$ -th diagonal element of $S_{\mathbf{x}}$ , respectively. Assume that $F_{i}\sim DP(a,H_{i})$ . For any $\mathbf{t}\in\mathbb{R}^{m}$ , let

[TABLE]

be the prior-based model, where $R_{\mathbf{x}}=\left(r_{\mathbf{x},ij}\right)$ is the correlation matrix of the $m$ -variate distribution $F_{\boldsymbol{\theta}_{\mathbf{x}}}$ and to be estimated as discussed in Section 4. Note that, as pointed out earlier, setting $H_{i}=F_{\boldsymbol{\theta}_{\mathbf{x}_{i}}}$ ensures compatibility between the data and the prior which will certainly avoid prior-data conflict. More details about the effect of the prior-data conflict on the approach is clarified in Section 7, where it is revealed that the existence of prior data conflict leads to erroneous result of the test.

Recalling the posterior-based model as defined in Section 3, to proceed with the approach, the energy distance is used to compute the distance between this model and $F_{\boldsymbol{\theta}_{\mathbf{x}}}$ (posterior distance) and the distance between the prior-based model and $F_{\boldsymbol{\theta}_{\mathbf{x}}}$ (prior distance). The next lemma proposes a Bayesian counterpart of the distance (8) as an appropriate tool to measure dissimilarities between the proposed models and the null distribution. This is considered a very convenient tool for assessing MVN in high dimensional problems ( $m>n$ ).

Lemma 4

Let $F_{true}$ be an $m$ -variate distribution function with unknown marginal cdf’s $F_{1},\ldots,F_{m}$ and $H$ be a known $m$ -variate distribution function with marginal cdf’s $H_{1},\ldots,H_{m}$ . Assume that $F_{i}\sim DP(a,H_{i})$ , and $F_{N_{i}}$ ’s are the approximation of Dirichlet process $F_{i}$ ’s, given by Ishwaran and Zarepour (2002). Consider the Energy distance between $F_{N}=C_{R}(F_{N_{1}}(t_{1}),\ldots,F_{N_{m}}(t_{m}))$ and $H$ as

[TABLE]

where $(J_{i,N})_{1\leq i\leq N}\sim Dirichlet(a/N,\ldots,a/N)$ , $\mathbf{Y},\mathbf{Y}^{\prime}\overset{i.i.d.}{\sim}H$ , $(\mathbf{x}_{1},\ldots,\mathbf{x}_{N})$ is an observed sample from $F_{N}$ and $R$ is the correlation matrix of $H$ . Then, as $a\rightarrow\infty$

[TABLE]

where $d_{\mathcal{E},N}(F_{N},H)$ is defined in (8) with $F=F_{n}$ , $G=H$ and $n=N.$

Proof. From properties of Dirichlet distribution, $E_{F_{N}}(J_{i,N})=1/N$ and $E_{F_{N}}(J_{i,N}J_{j,N})$ $=a/((a+1)N^{2})$ . Then

[TABLE]

The proof is immediately followed by letting $a\rightarrow\infty$ in (15).

The next lemma allows us to use the approximation of the Dirichlet process in the prior-based and posterior-based models for approximating the distribution of the posterior and the prior distances computed by (14).

Lemma 5

Let $F_{true}$ be an $m$ -variate distribution function with unknown marginal cdf’s $F_{1},\ldots,F_{m}$ and $H$ be a known $m$ -variate distribution function with marginal cdf’s $H_{1},\ldots,H_{m}$ . Assume that $F_{i}\sim DP(a,H_{i})$ and $F_{Ni}$ ’s are the approximation of the Dirichlet process $F_{i}$ ’s, given in (5). Then, for any $(t_{1},\ldots,t_{m})\in\mathbb{R}^{m}$ , $C_{R}(F_{N1}(t_{1}),\ldots,$ $F_{Nm}(t_{m}))\xrightarrow{a.s.}C_{R}(F_{1}(t_{1}),\ldots,F_{m}(t_{m}))$ , as $N\rightarrow\infty$ , where $R$ is the correlation matrix of $H$ .

Proof. From Lipschitz condition (1), we have

[TABLE]

Since $F_{Ni}(x_{i})\xrightarrow{a.s.}F_{i}(x_{i})$ , for $1\leq i\leq m$ (Zarepur and Al-Labadi, 2012), the result follows.

The procedure is continued by considering $d_{\mathcal{E},N,a}(F_{N},F_{\boldsymbol{\theta}_{\mathbf{x}}})$ as the Energy distance between the prior-based model (13) and the null distribution $F_{\boldsymbol{\theta}_{\mathbf{x}}}$ using formula (14). Similarly, consider $d_{\mathcal{E},N,a}(F^{\ast}_{N},F_{\boldsymbol{\theta}_{\mathbf{x}}})$ for the posterior-based model (9) and the null distribution. Then, the relative belief ratio is used to compare the concentration of the posterior distribution $d_{\mathcal{E},N,a}(F^{\ast}_{N},F_{\boldsymbol{\theta}_{\mathbf{x}}})$ to the prior distribution $d_{\mathcal{E},N,a}(F_{N},F_{\boldsymbol{\theta}_{\mathbf{x}}})$ about zero. As shown in the next lemma, if $\mathcal{H}_{0}$ is true, the distribution of the posterior distance should be more concentrated about 0 than the distribution of the prior distance; otherwise, the distribution of the prior distance should be more concentrated at 0 than the distribution of posterior distance. The comparison is made by computing the relative belief ratio with the interpretation as discussed in Section 2.

Lemma 6

Let $\mathbf{x}_{m\times n}$ be a sample from $m$ -variate distribution function $F_{true}$ with unknown marginal cdf’s $F_{1},\ldots,F_{m}$ . Assume that $\boldsymbol{\theta}_{\mathbf{x}}\xrightarrow{a.s.}\boldsymbol{\theta}_{0}$ and $F_{\boldsymbol{\theta}_{\mathbf{x}}}\xrightarrow{a.s.}F_{\boldsymbol{\theta}_{0}}$ as $n\rightarrow\infty$ . Let $F^{\ast}_{i}\sim DP(a+n,H^{\ast}_{i})$ , for $i=1,\ldots,m$ . For any $\mathbf{t}=(t_{1},\ldots,t_{m})\in\mathbb{R}^{m}$ as $n\rightarrow\infty$

(i)

$\big{|}C_{R^{\ast}}(F^{\ast}_{1}(t_{1}),\ldots,F^{\ast}_{m}(t_{m}))-F_{\boldsymbol{\theta}_{\mathbf{x}}}(t_{1},\ldots,t_{m})\big{|}\xrightarrow{a.s.}0,$ * when $\mathcal{H}_{0}$ is true.* 2. (ii)

$\liminf\big{|}C_{R^{\ast}}(F^{\ast}_{1}(t_{1}),\ldots,F^{\ast}_{m}(t_{m}))-F_{\boldsymbol{\theta}_{\mathbf{x}}}(t_{1},\ldots,t_{m})\big{|}\displaystyle{\overset{a.s.}{>}}0,$ * when $\mathcal{H}_{0}$ is not true,*

where $R^{\ast}$ is the correlation matrix of $H^{\ast}$ , defined in (3).

Proof. To prove (i), substitute $F_{true}$ in (3) by $F_{\boldsymbol{\theta}_{\mathbf{x}}}$ . From (3), for any $\mathbf{t}\in\mathbb{R}^{m}$ , $H^{\ast}(\mathbf{t})\xrightarrow{a.s.}F_{true}(\mathbf{t})$ as $n\rightarrow\infty$ . If $\mathcal{H}_{0}$ is true, then $F_{true}(\mathbf{t})=F_{\boldsymbol{\theta}_{0}}(\mathbf{t})$ . Hence, the proof of (i) immediately follows from the proof of Lemma 2. To prove (ii), Consider $I_{1}$ as in (3). Applying the triangle inequality gives

[TABLE]

Similar to the proof of Lemma 2, $I_{1}\xrightarrow{a.s.}0$ and $\big{|}H^{\ast}(\mathbf{t})-F_{\boldsymbol{\theta}_{\mathbf{x}}}(\mathbf{t})\big{|}\xrightarrow{a.s.}\big{|}F_{true}(\mathbf{t})-F_{\boldsymbol{\theta}_{0}}(\mathbf{t})\big{|}$ as $n\rightarrow\infty$ . Since $\mathcal{H}_{0}$ is not true, $\big{|}F_{true}(\mathbf{t})-F_{\boldsymbol{\theta}_{0}}(\mathbf{t})\big{|}\displaystyle{\overset{a.s.}{>}}0$ which completes the proof of (ii).

The effect of the value of $a$ on the posterior-based model was considered in Lemma 3. It is also interesting to consider the effect of the value of $a$ on the proposed MVN test.

Lemma 7

Let $\mathbf{x}_{m\times n}$ be a sample from $m$ -variate distribution function $F_{true}$ with unknown marginal cdf’s $F_{1},\ldots,F_{m}$ . Let $F_{\boldsymbol{\theta}_{\mathbf{x}}}$ be the cdf of $N_{m}(\overline{\mathbf{x}},S_{\mathbf{x}})$ with marginal cdf’s $F_{\boldsymbol{\theta}_{\mathbf{x}_{1}}},\ldots,F_{\boldsymbol{\theta}_{\mathbf{x}_{m}}}$ . If $F_{i}\sim DP(a,F_{\boldsymbol{\theta}_{\mathbf{x}_{i}}})$ , for $i=1,\ldots,m$ , then $C_{R_{\mathbf{x}}}(F_{1}(t_{1}),\ldots,F_{m}(t_{m}))$ $\xrightarrow{a.s.}F_{\boldsymbol{\theta}_{\mathbf{x}}}(\mathbf{t})$ as $a\rightarrow\infty$ , where $R_{\mathbf{x}}$ is the correlation matrix of $F_{\boldsymbol{\theta}_{\mathbf{x}}}$ .

Proof. The proof is similar to the proof of Lemma 3 and is omitted.

Lemmas 3 and 7 show that for a too large value of $a$ (relative to the sample size) both the posterior-based and prior-based models are approaching to the null model $F_{\boldsymbol{\theta}_{\mathbf{x}}}$ . Hence, the comparison between the posterior and prior distance to detect the normality can lead to an error in which we may accept $\mathcal{H}_{0}$ when it is not true and reject $\mathcal{H}_{0}$ when it is true. As recommended in Section 7, we should consider $a$ at most $0.5\,n$ .

At the end of this section, it is worth pointing out that the proposed test can be extended to assess any family of multivariate distributions. For this, it is enough to consider a different family of multivariate distributions in the hypothesis (11) and use its best representative distribution as $H$ in the methodology, which may be more challenging for some multivariate models.

6 Main Steps for Testing the MVN

The following computational algorithm summaries the main steps to test $\mathcal{H}_{0}$ . This algorithm is viewed as a generalized version of Algorithm B of Al-Labadi and Evans (2018). Observe that, since closed forms of the densities of $D_{\mathcal{E}}=d_{\mathcal{E},N,a}(F_{N},F_{\boldsymbol{\theta}_{\mathbf{x}}})$ and $D_{\mathcal{E}}|\mathbf{x}=d_{\mathcal{E},N,a}(F^{\ast}_{N},F_{\boldsymbol{\theta}_{\mathbf{x}}})$ are typically not available, relative belief ratios need to be approximated via simulation.

Algorithm 3 Relative belief algorithm based on the BSPGC approach for testing MVN

Use Algorithm 2 to generate (approximately) marginal cdf $F_{i}$ ’s from $DP(a,F_{\boldsymbol{\theta}_{\mathbf{x}_{i}}})$ , for $1\leq i\leq m$ . 2. 2.

Generate a sample of $N$ values from the $m$ -variate distribution $F_{\boldsymbol{\theta}_{\mathbf{x}}}$ and estimate the correlation matrix $R_{\mathbf{x}}$ , denoted by $\widehat{R}_{\mathbf{x}}$ , as discussed in Section 4. 3. 3.

Use the generated marginal cdf’s $F_{i}$ and set $R=\widehat{R}_{\mathbf{x}}$ in Algorithm 1 to get a sample of $N$ values from prior-based model (13). 4. 4.

Use (14) for the sample generated in steps 3 to compute the prior distance $d_{\mathcal{E},N,a}(F_{N},F_{\boldsymbol{\theta}_{\mathbf{x}}})$ . 5. 5.

Repeat steps (1)-(4) to obtain a sample of $r$ values from the prior of $D_{\mathcal{E}}$ . 6. 6.

Repeat steps (1)-(5) by replacing $a$ by $a+n$ , $F_{i}$ by $F^{\ast}_{i}$ , $F_{\boldsymbol{\theta}_{\mathbf{x}_{i}}}$ by $H^{\ast}_{i}$ , $F_{\boldsymbol{\theta}_{\mathbf{x}}}$ by $H^{\ast}$ , $R_{\mathbf{x}}$ by $R^{\ast}$ , $F_{N}$ by $F^{\ast}_{N}$ , $D_{\mathcal{E}}$ by $D_{\mathcal{E}}|\mathbf{x}$ and prior by posterior. This yields to a sample of $r$ values from the posterior of $D_{\mathcal{E}}|\mathbf{x}$ . 7. 7.

Let $M$ be a positive number. Let $\hat{F}_{D_{\mathcal{E}}}$ denote the empirical cdf of $D_{\mathcal{E}}$ based on the prior sample in step (5) and for $i=0,\ldots,M,$ let $\hat{d}_{i/M}$ be the estimate of $d_{i/M},$ the $(i/M)$ -th prior quantile of $D_{\mathcal{E}}.$ Here $\hat{d}_{0}=0$ , and $\hat{d}_{1}$ is the largest value of $d_{\mathcal{E}}$ . Let $\hat{F}_{D_{\mathcal{E}}}(\cdot\,|\mathbf{x})$ denote the empirical cdf of $D_{\mathcal{E}}|\mathbf{x}$ based on the posterior sample in step (10). For $d\in[\hat{d}_{i/M},\hat{d}_{(i+1)/M})$ , estimate $RB_{D_{\mathcal{E}}}(d\,|\,\mathbf{x})={\pi_{D_{\mathcal{E}}}(d|\mathbf{x})}/{\pi_{D_{\mathcal{E}}}(d)}$ by

[TABLE]

the ratio of the estimates of the posterior and prior contents of $[\hat{d}_{i/M},\hat{d}_{(i+1)/M}).$ Thus, we estimate $RB_{D_{\mathcal{E}}}(0\,|\,\mathbf{x})=\frac{\pi_{D_{\mathcal{E}}}(0|\mathbf{x})}{\pi_{D_{\mathcal{E}}}(0)}$ by $\widehat{RB}_{D_{\mathcal{E}}}(0\,|\,\mathbf{x})=M\widehat{F}_{D_{\mathcal{E}}}(\hat{d}_{p_{0}}\,|\,\mathbf{x})$ where $p_{0}=i_{0}/M$ and $i_{0}$ is chosen so that $i_{0}/M$ is not too small (typically $i_{0}/M\approx 0.05)$ . 8. 8.

Estimate the strength $DP_{D_{\mathcal{E}}}\big{(}RB_{D_{\mathcal{E}}}(d\,|\,\mathbf{x})\leq RB_{D_{\mathcal{E}}}(0\,|\,\mathbf{x})\,|\,\mathbf{x}\big{)}$ by the finite sum

[TABLE]

For fixed $M,$ as $N\rightarrow\infty,r\rightarrow\infty,$ then $\hat{d}_{i/M}$ converges almost surely to $d_{i/M}$ and (16) and (17) converge almost surely to $RB_{D_{\mathcal{E}}}(d\,|\,\mathbf{x})$ and $DP_{D_{\mathcal{E}}}\big{(}RB_{D_{\mathcal{E}}}(d\,|\,\mathbf{x})\leq RB_{D_{\mathcal{E}}}(0\,|\,\mathbf{x})\,|\,\mathbf{x}\big{)}$ , respectively. The consistency of the proposed test is achieved by Proposition 6 of Al-Labadi and Evans (2018). As a recommendation, one should try different values of $a$ to make sure the right conclusion has been obtained. However, we found out that setting $a=1$ gives adequate results. More details about implementing the approach is discussed in the following section.

7 Simulation Studies

This section is divided into two subsections. In the first subsection, the quality of the approach to model multivariate distributions is investigated, where different choices of $a$ , $H$ and $R^{\ast}$ are considered. The evaluation technique relies on using the mean of the Energy distance $\overline{d}_{\mathcal{E},N}(F^{\ast},F_{true})$ based on $r$ replications. Note that, from Lemma 4, one may consider using the package energy available in $\mathsf{R}$ to compute the distance. We generated samples each of size $n=1000$ from a variety of bivariate distributions. The notations of the used distributions are listed (Table 7) in Appendix A. In this study, we set $N=r=1000$ in Algorithm 3 with steps (1)-(6). Note that, for the methodology to work well, we expect $\overline{d}_{\mathcal{E},N}(F^{\ast},F_{true})$ to be close to zero. In the second subsection, the proposed test is illustrated through several examples.

7.1 Checking the Quality of the Posterior-based Model

The performance of the posterior-based model (i.e. the quality of estimating the model) is illustrated by considering the bivariate distributions given in Table 1 with some choices of $a$ . The results are reported based on the Kendall’s correlation coefficients. From Table 1, the close values of $\overline{d}_{\mathcal{E},N}(F^{\ast},F_{true})$ to zero indicates to the good performance of the methodology to model bivariate distributions, particularly when $a=1$ . Note that, as mentioned in Lemma 3, with increasing the value of $a$ , the accuracy of the methodology will be decreased. For more illustration, part (a) of Figure 1 gives the boxplots of the energy distance between $F_{true}=N_{2}(\mathbf{0}_{2},A_{2})$ and its corresponding $F^{\ast}$ for $a=1,\,5,\,10.$ Boxplots of marginal distributions are also given in part (b) of Figure 1. Also, the marginal densities of $N_{2}(\mathbf{0}_{2},A_{2})$ and its $F^{\ast}$ are given in Figure 2. The bivariate scatter plots are shown below the diagonal, histograms on the diagonal and the Kendall correlation above the diagonal. Correlation ellipses and loess smooths (red lines) are also shown.

Next, we inspect the effect of choosing different correlation coefficients such as the Gaussian rank, the Kendall’s $\tau$ and the Spearman’s $\rho$ on the posterior-based model. Consider $(\mathcal{P}_{VII}(1,1,1))^{2}$ and $N_{2}(\mathbf{0}_{2},A_{2})$ as two true distributions (consult Table 7 in Appendix A for the notations). Table 2 reports the results for $a=1$ . Note that, the $\mathsf{R}$ package rococo is used to estimate $R^{\ast}$ based on the Gaussian rank correlation coefficients. It follows from Table 2 that the performance of the methodology is approximately the same for different correlation coefficients.

7.2 Checking MVN Based on the BSPGC Approach

The proposed normality test is illustrated through some interesting examples discussed in Henze and Visagie (2019). Note that, $NMIX1$ is a skewed heavy-tailed and $NMIX2$ is a symetric heavy-tailed distribution. Also, $(\mathcal{P}_{VII}(1,1,r))^{2}$ , for $r\geq 10$ is a symetric distribution and has very similar behavior with a bivariate normal distribution. For a given sample of size $n=50$ , generated from distributions in Table 3, the bivariate normality assumption is checked. For all cases, we set $N=r=1000$ and $M=20$ in Algorithm 3. To study the sensitivity of the approach, various values of $a$ are considered. The results of the proposed test are reported in Table 3. The results are also compared to the Energy (E)-test (Székely and Rizzo, 2013). Reminding that we want $RB>1$ and the strength close to 1 when $\mathcal{H}_{0}$ is true and $RB<1$ and the strength close to 0 when $\mathcal{H}_{0}$ is false, it is seen from Table 4 that the proposed test has an excellent performance to accept or reject the bivariate normality assumption. The type I error and the power of the test are also reported in Table 4. They show that the proposed test is powerful in both accepting and rejecting $\mathcal{H}_{0}$ .

The next example uses a real data set.

Real data example (Swiss Heads): In this example, we consider the data of six readings on the dimensions of the heads of 200 twenty year old Swiss soldiers given by Flury and Riedwyl (1988). The variables are minimal frontal breadth, breadth of angulus mandibulae, true facial height, length from glabella to apex nasi, length from tragion to nasion, and length from tragion to gnathion. The problem is to assess the six-variate normality assumption for this data set. The E-test’s p-value is $2.2\times 10^{-16}$ , which shows strong evidence to reject the six-variate normality assumption. The proposed test presents $RB=0$ and strength $=0$ based on the Kendall’s $\tau$ and $a=1$ which follows the methodology also presents strong evidence to reject the six-variate normality assumption.

We end this subsection by investigating the effect of the prior-data conflict on the approach. This is in fact highlights the effect of the choice of $H$ in the prior-based model. For this, consider the results of MVN test when $F_{true}=(Exp(0.5))^{2}$ for different choices of $H$ in Table 5. Clearly, when $H=F_{\boldsymbol{\theta}_{\mathbf{x}}}$ the results are correct; otherwise, they are incorrect. Another concern is to check the effect of the double use of the data by considering $H$ as $F_{\boldsymbol{\theta}_{\mathbf{x}}}$ in the prior distance. Particularly, Table 6 gives the mean of the prior distance $\overline{d}_{\mathcal{E},N,a}(F_{N},H)$ for various choices of $H$ . It is obvious from this table that the prior distance is invariant with respect to the choice of $H$ .

8 Concluding Remarks

A BSPGC approach and its application to the MVN test have been suggested. In this procedure, a Gaussian copula model has been utilized to induce the dependence structure of the underlying multivariate distribution $F_{true}$ . The Dirichlet process then has been constructed on the unknown margins of $F_{true}$ to define the prior-based and posterior-based models, respectively. The test has been developed by using the relative belief ratio for comparing the concentration of the distribution of the distance between the posterior-based model and the null distribution versus the concentration of the distribution of the distance between the prior-based model and the null distribution at zero. The Energy distance has been applied to compute distances as an appropriate tool especially in high dimensional problems. The methodology has been examined by a simulation study to clarify its excellent performance. Finally, application of the test including a real data example has been presented. A main advantage of the procedure is that it takes into account the dependence structure of the data in the MVN test. The extension of the procedure to different areas of the multivariate data analysis by considering various families of copula will be a part of a future research work.

Appendix A Relevant Notations

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Al-Labadi, L., Baskurt, Z., and Evans, M. (2017). Goodness of fit for the logistic regression model using relative belief. Journal of Statistical Distributions and Applications , 4(1), 1.
2[2] Al-Labadi, L., Baskurt, Z., and Evans, M. (2018). Statistical reasoning: choosing and checking the ingredients, inferences based on a measure of statistical evidence with some applications. Entropy , 20(4), 289.
3[3] Al-Labadi, L. and Evans, M. (2017). Optimal robustness results for relative belief inferences and the relationship to prior-data conflict. Bayesian Analysis , 12(3), 705–728.
4[4] Al-Labadi, L. and Evans, M. (2018). Prior-based model checking. Canadian Journal of Statistics , 46(3), 380–398.
5[5] Al-Labadi, L., Fazeli Asl, F., and Saberi, Z. (2019 a). A Bayesian nonparametric test for assessing multivariate normality. Technical Report ar Xiv:1904.02415.
6[6] Al-Labadi, L., Patel, V., Vakiloroayaei, K., and Wan, C. (2019 b). Kullback-Leibler divergence for Bayesian nonparametric model checking. Technical Report ar Xiv:1903.00669.
7[7] Al-Labadi, L., and Wang, C. (2019). Measuring Bayesian robustness using Rényi’s divergence and relationship with Prior-Data conflict. Technical Report ar Xiv:1905.05945.
8[8] Al-Labadi, L. and Zarepour, M. (2017). Two-sample Kolmogorov-Smirnov test using a Bayesian nonparametric approach. Mathematical Methods of Statistics , 26(3), 212–225.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A Bayesian Semiparametric Gaussian Copula Approach to a Multivariate Normality Test

Abstract

1 Introduction

2 Relevant background

2.1 Copula-based Model

Theorem 1

2.2 Dirichlet Process

2.3 Relative Belief Inferences

2.4 Energy Distance

3 A Bayesian Semiparametric Gaussian Copula Approach for Modeling Multivariate Distributions

Lemma 2

4 Selecting aaa, HHH and the Method of Estimation of R∗R^{\ast}R∗ in the BSPGC Approach

Lemma 3

5 A MVN Test Based on the BSPGC Approach

Lemma 4

Lemma 5

Lemma 6

Lemma 7

6 Main Steps for Testing the MVN

7 Simulation Studies

7.1 Checking the Quality of the Posterior-based Model

7.2 Checking MVN Based on the BSPGC Approach

8 Concluding Remarks

Appendix A Relevant Notations

4 Selecting $a$ , $H$ and the Method of Estimation of $R^{\ast}$ in the BSPGC Approach