Objective Bayesian analysis for the multivariate skew-t model

Antonio Parisi; Brunero Liseo

arXiv:1705.01282·stat.ME·May 4, 2017·Stat. Methods Appl.

Objective Bayesian analysis for the multivariate skew-t model

Antonio Parisi, Brunero Liseo

PDF

TL;DR

This paper introduces a Bayesian approach for the multivariate skew-t model, including a new parameterization, priors, and a sampler, with extensions to regression and frontier models, and provides an R package for implementation.

Contribution

It presents a novel Bayesian framework for the multivariate skew-t model, extending previous skew-normal models and offering practical tools like an R package.

Findings

01

Successful Bayesian inference for multivariate skew-t models

02

Extension to regression and stochastic frontier models demonstrated

03

Implementation via the mvst R package available

Abstract

We perform a Bayesian analysis of the p-variate skew-t model, providing a new parameterization, a set of non-informative priors and a sampler specifically designed to explore the posterior density of the model parameters. Extensions, such as the multivariate regression model with skewed errors and the stochastic frontiers model, are easily accommodated. A novelty introduced in the paper is given by the extension of the bivariate skew-normal model given in Liseo & Parisi (2013) to a more realistic p-variate skew-t model. We also introduce the R package mvst, which allows to estimate the multivariate skew-t model.

Tables1

Table 1. Table 1: Models’ posterior probabilities.

Model	$N$	Student- $t$	SN	ST
$\hat{π} (M \| 𝒚)$	6.60e-14	2.22e-01	3.87e-11	7.78e-01

Equations106

f_{W} (w;, α, Ω, ν) = 2 t_{p} (w; ν) T_{1} (α^{'} w (\frac{ν + p}{Q _{w} + ν})^{1/2}; ν + p),

f_{W} (w;, α, Ω, ν) = 2 t_{p} (w; ν) T_{1} (α^{'} w (\frac{ν + p}{Q _{w} + ν})^{1/2}; ν + p),

f_{Y} (y; ξ, α, Σ, ν) = 2 t_{p} (y; ν) T_{1} (α^{'} ω^{- 1} (y - ξ) (\frac{ν + p}{Q _{y} + ν})^{1/2}; ν + p),

f_{Y} (y; ξ, α, Σ, ν) = 2 t_{p} (y; ν) T_{1} (α^{'} ω^{- 1} (y - ξ) (\frac{ν + p}{Q _{y} + ν})^{1/2}; ν + p),

Q_{y}

Q_{y}

t_{p} (y; ν)

δ = \frac{1}{( 1 + α ^{'} Ω α ) ^{1/2}} Ω α

δ = \frac{1}{( 1 + α ^{'} Ω α ) ^{1/2}} Ω α

\binom{Z}{\boldsymbol{X}}\sim N_{p+1}\left[\binom{0}{\boldsymbol{0}},\left(\begin{array}[]{cc}1&\boldsymbol{\delta}^{T}\\ \boldsymbol{\delta}&\Omega\end{array}\right)\right],

\binom{Z}{\boldsymbol{X}}\sim N_{p+1}\left[\binom{0}{\boldsymbol{0}},\left(\begin{array}[]{cc}1&\boldsymbol{\delta}^{T}\\ \boldsymbol{\delta}&\Omega\end{array}\right)\right],

U = (- 1)^{I_{(- \infty, 0)} (Z)} X, V \sim Γ (ν /2, ν /2),

U = (- 1)^{I_{(- \infty, 0)} (Z)} X, V \sim Γ (ν /2, ν /2),

Y = ξ + ω U V^{- 1/2} \sim S T_{p} (ξ, α, Σ, ν)

Y = ξ + ω U V^{- 1/2} \sim S T_{p} (ξ, α, Σ, ν)

f_{p + 2} (y, z, v)

f_{p + 2} (y, z, v)

ψ

ψ

G

L (θ; y, z, v)

L (θ; y, z, v)

L (θ; y, z, v)

L (θ; y, z, v)

\hat{ψ}_{C M L}

\hat{ψ}_{C M L}

\hat{ξ}_{C M L}

\hat{G}_{C M L}

\overset{ε}{^}_{i} = y_{i} - \hat{ξ}_{C M L} - \hat{ψ}_{C M L} \frac{∣ z _{i} ∣}{v _{i}} .

\overset{ε}{^}_{i} = y_{i} - \hat{ξ}_{C M L} - \hat{ψ}_{C M L} \frac{∣ z _{i} ∣}{v _{i}} .

n lo g (\overset{ν}{^}_{C M L} /2) - n ψ (\overset{ν}{^}_{C M L} /2) = i = 1 \sum n v_{i} - i = 1 \sum n lo g (v_{i}) - n,

n lo g (\overset{ν}{^}_{C M L} /2) - n ψ (\overset{ν}{^}_{C M L} /2) = i = 1 \sum n v_{i} - i = 1 \sum n lo g (v_{i}) - n,

π (θ^{⋆}) = π (ξ) π (δ, Σ) π (ν) .

π (θ^{⋆}) = π (ξ) π (δ, Σ) π (ν) .

π (δ, Σ) = π (δ ∣Σ) π (Σ)

π (δ, Σ) = π (δ ∣Σ) π (Σ)

\begin{array}[]{l}\pi(\boldsymbol{\xi})\propto 1\\ \Sigma\sim IW(m,\Lambda)\end{array}

\begin{array}[]{l}\pi(\boldsymbol{\xi})\propto 1\\ \Sigma\sim IW(m,\Lambda)\end{array}

δ^{'} Ω^{- 1} δ < 1,

δ^{'} Ω^{- 1} δ < 1,

π (δ ∣Σ) = (\frac{π ^{p /2}}{Γ ( p /2 + 1 )} ∣Ω∣)^{- 1} .

π (δ ∣Σ) = (\frac{π ^{p /2}}{Γ ( p /2 + 1 )} ∣Ω∣)^{- 1} .

∣ J ∣ = j = 1 \prod p (G_{j j} + ψ_{j}^{2})^{- 1/2} .

∣ J ∣ = j = 1 \prod p (G_{j j} + ψ_{j}^{2})^{- 1/2} .

π (θ_{⋆} ∣ y)

π (θ_{⋆} ∣ y)

π (θ_{⋆} ∣ y) \leq \overset{π}{ˉ} (θ_{⋆} ∣ y) = π (ξ) π (Σ) π (α ∣Σ) π (ν) i = 1 \prod n [2 t_{p} (y_{i}; ν)] .

π (θ_{⋆} ∣ y) \leq \overset{π}{ˉ} (θ_{⋆} ∣ y) = π (ξ) π (Σ) π (α ∣Σ) π (ν) i = 1 \prod n [2 t_{p} (y_{i}; ν)] .

\overset{π}{ˉ} (ξ, Σ, ν ∣ y) = π (ξ) π (Σ) π (ν) i = 1 \prod n [2 t_{p} (y_{i}; ν)] .

\overset{π}{ˉ} (ξ, Σ, ν ∣ y) = π (ξ) π (Σ) π (ν) i = 1 \prod n [2 t_{p} (y_{i}; ν)] .

\tilde{ζ}_{j}^{(t)}

\tilde{ζ}_{j}^{(t)}

ζ_{j}^{(t)}

H^{(t)} = - j = 1 \sum N ζ_{j}^{(t)} lo g (ζ_{j}^{(t)}),

H^{(t)} = - j = 1 \sum N ζ_{j}^{(t)} lo g (ζ_{j}^{(t)}),

\overset{p}{^} (y) \approx \frac{\sum _{t = 1}^{T} H ^{(t)} \sum _{j = 1}^{N} ζ ~ _{j}^{(t)}}{N \sum _{t = 1}^{T} H ^{(t)}} .

\overset{p}{^} (y) \approx \frac{\sum _{t = 1}^{T} H ^{(t)} \sum _{j = 1}^{N} ζ ~ _{j}^{(t)}}{N \sum _{t = 1}^{T} H ^{(t)}} .

π (ν ∣ \dots) \propto \frac{( ν /2 ) ^{n ν /2}}{( Γ ( ν /2 ) ) ^{n}} (i = 1 \prod n v_{i})^{\frac{ν}{2} - 1} exp {- \frac{\sum _{i = 1}^{n} v _{i}}{2} ν} .

π (ν ∣ \dots) \propto \frac{( ν /2 ) ^{n ν /2}}{( Γ ( ν /2 ) ) ^{n}} (i = 1 \prod n v_{i})^{\frac{ν}{2} - 1} exp {- \frac{\sum _{i = 1}^{n} v _{i}}{2} ν} .

π (v_{i} ∣ \dots) = \frac{1}{k _{v_{i}}} v_{i}^{C - 1} exp {- A_{i} v_{i} - B_{i} v_{i}}, v_{i} > 0

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Objective Bayesian analysis for the multivariate skew-t model111Preprint submitted to Statistical Methods and Applications.

Antonio Parisi

DEF, University of Rome Tor Vergata

[email protected]

Brunero Liseo

MEMOTEF, Sapienza University of Rome

(April 30, 2017)

Abstract

We perform a Bayesian analysis of the $p$ -variate skew-t model, providing a new parameterization, a set of non-informative priors and a sampler specifically designed to explore the posterior density of the model parameters. Extensions, such as the multivariate regression model with skewed errors and the stochastic frontiers model, are easily accommodated. A novelty introduced in the paper is given by the extension of the bivariate skew-normal model given in Liseo & Parisi (2013) to a more realistic $p$ -variate skew-t model. We also introduce the R package mvst, which allows to estimate the multivariate skew-t model.

Keywords: Multivariate skew-t model, Multivariate skew-normal model, Objective Bayes inference, Population Monte Carlo sampler, skewness.

1 Introduction

In the last two decades there has been an explosion of interest around the possibility of constructing models which generalize the Gaussian distributions in terms of skewness and extra-kurtosis. Interest can be partially explained with the empirical observations of phenomena, in different disciplines which could not be easily represented via Gaussian distributions. See Genton (2004) and Azzalini & Capitanio (2014) for general accounts. In this perspective, different proposals of skew-Student $t$ distributions have been proposed and now they play a prominent role as empirical models for heavy-tailed data, particularly in finance (Rachev et al., 2008).

Among the various proposals we mention the skew-t distribution obtained as a scale mixture of skew-normal densities (Azzalini & Capitanio, 2003); the “two-piece” $t$ distributions of Hansen (1994) and Fernandez & Steel (1999); the skew-t distribution arising from a conditioning argument (Branco & Dey, 2001; Azzalini & Capitanio, 2003); the skew-t distribution of Jones & Faddy (2003), obtained by transforming a beta random variable, and the skew-t distribution arising from a sinh-arcsinh transformation (Rosco et al., 2011). In practice, the most used of these are the Azzalini-type skew-t distribution, in the form arising from scale mixing Azzalini’s skew-normal distribution (Azzalini & Capitanio, 2003) and the “two-piece” $t$ distribution.

In the paper we will concentrate on the Azzalini-type skew-t distribution. For a Bayesian analysis of the “two-piece” $t$ distribution one can refer to Rubio et al. (2015) and Leisen et al. (2016) where a new objective prior is introduced for the degrees of freedom parameter.

Following Azzalini & Capitanio (2014), their version of the multivariate skew-t distribution can be obtained as a scale mixture of multivariate skew-normal distributions. Let $W_{0}\sim SN_{p}(\boldsymbol{0},\boldsymbol{\Omega},\boldsymbol{\alpha})$ , where $\boldsymbol{\Omega}$ is the correlation matrix of the multivariate normal density appearing in the density of $W_{0}$ , and $V\sim\Gamma(\nu/2,\nu/2)$ .

Let $W=V^{-\frac{1}{2}}W_{0}$ ; integrating out $V$ , one obtains the density of a $p$ -variate skew-t random vector as

[TABLE]

where $Q_{\boldsymbol{w}}=\boldsymbol{w}^{\prime}\boldsymbol{\Omega}^{-1}\boldsymbol{w}$ .

The joint estimation of the skewness vector $\alpha$ and the degrees of freedom parameter $\nu$ is hard even in the scalar case. For the symmetric Student’s $t$ distribution, it is known that the likelihood function tends to infinity when $\nu$ goes to zero (Fernandez & Steel, 1999). Fonseca & al. (2008) gave a condition for the existence of the MLE of $\nu$ in that case. For the skew-t distribution, the deviance approach has been implemented in Azzalini & Genton (2008), where now the replacement of the MLE of $(\alpha,\nu)$ is based on the null hypothesis $H_{0}:(\alpha,\nu)=(\alpha_{0},\nu_{0})$ and on a $\chi_{2}^{2}$ distribution. However, simulation results have shown that this procedure provides only a partial solution to the problem. Alternatively, the modified score function approach has been applied to the skew-t distribution by Sartori (2006), although no proof of the finiteness of the resulting shape estimator has been provided; besides, this method requires the degrees of freedom parameter $\nu$ to be fixed. Branco et al. (2011) provides an objective Bayesian solution to this problem in the scalar case.

In this paper we propose a method which generalizes both the results in Branco et al. (2011) and Liseo & Parisi (2013). In fact we describe a Bayesian analysis of the $p$ -variate skew-t (ST) model, providing a parameterization, a set of non-informative priors and a sampler specifically designed to explore the posterior density of the parameters of the model. Extensions of the model, such as the multivariate regression model with skewed errors and the stochastic frontiers model, are straightforward.

The main novelty of the present paper is given by the extension of the bivariate skew-normal (SN) model given in Liseo & Parisi (2013) to a more realistic $p$ -variate ST model. Several issues arise in this extension, the most important of which is related to the elicitation of the prior distribution for the shape parameter and the sampling strategy for an additional set of latent variables.

This paper also introduces the R (R Core Team, 2015) package mvst, which is available in the CRAN repository.

Several other packages are available for dealing with skew-symmetric distributions; among others, the R packages sn (Azzalini, 2015), EMMIXuskew (Lee & McLachlan, 2013), mixsmsn (Prates et al., 2013) and the Stata (StataCorp., 2015) suite of commands st0207 by Marchenko & Genton (2010): however, most of them only rely upon the frequentist approach.

The rest of the paper is organized as follows: the second section introduces the model and the notation, along with the complete likelihood function and complete maximum likelihood estimators. It finally provides the prior distributions and the proof that the posterior distribution is proper.

The third section introduces the sampler and describes a set of proposal distributions.

Results from a simulation study are given in section four.

Throughout the paper, we will switch between three different parameterizations, characterized by the sets of parameters $\boldsymbol{\theta}_{\star}$ , $\boldsymbol{\theta}^{\star}$ and $\boldsymbol{\theta}$ ; the former allows us to provide the proofs of our main results, the second one is the most sensible to elicit the prior distributions, while the latter is useful for the sampling strategy.

2 The model

The density of the multivariate skew-t random vector has been given in (1). For inferential purposes it is often necessary to introduce location and scale parameters, via the transformation $\boldsymbol{Y}=\boldsymbol{\xi}+\boldsymbol{\omega}\boldsymbol{W}$ . We then finally say that a random vector $\boldsymbol{Y}$ is distributed as a $p$ -variate skew-t distribution, denoted by $\boldsymbol{Y}\sim ST_{p}(\boldsymbol{\xi},\boldsymbol{\alpha},\Sigma,\nu)$ , if its pdf is given by

[TABLE]

where $\boldsymbol{\xi}$ and $\boldsymbol{\alpha}$ are $p$ -dimensional location and shape parameters, $\boldsymbol{\omega}$ is a diagonal matrix with the marginal scale parameters, so that $\Sigma=\boldsymbol{\omega}\Omega\boldsymbol{\omega}$ represents the scale matrix and $\nu$ represents the number of degrees of freedom. Moreover,

[TABLE]

There exist a useful stochastic representation of the random vector $\boldsymbol{Y}$ which is given in the following proposition.

Proposition 2.0.1

Let

[TABLE]

and let $I_{A}(\cdot)$ be the indicator function of the set $A$ ; define

[TABLE]

and

[TABLE]

with $V$ independent of $U$ . Then, (a) the random vector

[TABLE]

and (b) the joint density of $(\boldsymbol{Y},Z,V)$ is given by

[TABLE]

Proof: the result is a direct consequence of the definition of the skew-t distribution. Details can be found in Appendix A.

2.1 Augmented likelihood function

The above stochastic representation suggests to express the density of a skew-t random vector as the marginal density of the augmented vector given in (2.0.1).

It is useful to define the parameter vectors $\boldsymbol{\theta}^{\star}=(\boldsymbol{\xi},\boldsymbol{\delta},\Sigma,\nu)$ and $\boldsymbol{\theta}=(\boldsymbol{\xi},\boldsymbol{\psi},G,\nu)$ , where

[TABLE]

Using the new parameterization $\boldsymbol{\theta}$ , and in the presence of a sample of $n$ i.i.d. observations $\boldsymbol{y}_{i}$ from a $p$ -dimensional $ST(\boldsymbol{\xi},\boldsymbol{\psi},G,\nu)$ , the augmented likelihood function is

[TABLE]

where $\boldsymbol{z}=(z_{1},\dots,z_{n})^{\prime}$ , $\boldsymbol{v}=(v_{1},\dots,v_{n})^{\prime}$ , $\varepsilon_{i}=\boldsymbol{y}_{i}-\boldsymbol{\xi}-\boldsymbol{\psi}\frac{|z_{i}|}{\sqrt{v_{i}}}$ .

2.1.1 Complete maximum likelihood estimators

The complete maximum likelihood (CML hereafter) estimators are obtained as if we had observed the values of the latent variables $Z_{i}$ ’s and $V_{i}$ ’s. We will make use of the CML estimatates for the initialization of the sampling strategy, described below. They incorporate an additional piece of information, hence they could also be useful as a benchmark to evaluate and compare different estimators in a simulation experiment.

Given $\boldsymbol{z}$ and $\boldsymbol{v}$ , the likelihood (2.1) gets transformed into

[TABLE]

After straightforward calculations, the CML estimators are obtained as:

[TABLE]

where

[TABLE]

The estimator for $\nu$ have not a closed form expression: it is the solution of the following equation

[TABLE]

where $\psi(\cdot)$ denotes the digamma function.

2.2 Prior distributions

We assume the following prior structure for the parameters

[TABLE]

As pointed out in Liseo & Parisi (2013), when $p>1$ , even following an objective Bayesian approach, $\boldsymbol{\delta}$ and $\Sigma$ cannot be considered a priori independent of each other. This depends on the expression of $G=\boldsymbol{\omega}(\Omega-\boldsymbol{\delta}\boldsymbol{\delta}^{\prime})\boldsymbol{\omega}$ : in order to guarantee the positive definiteness of $G$ , one should consider, both in the analytical expression and in the computations, the constraint $\Omega-\boldsymbol{\delta}\boldsymbol{\delta}^{\prime}\succ 0$ .

We further consider the decomposition

[TABLE]

and we assume a flat prior for $\boldsymbol{\xi}$ and a conjugate Inverse Wishart prior for $\Sigma$ . This way we adopt the “usual” objective priors for the location and scale parameters as in the multivariate Normal model, which is nested in the multivariate ST model, as $\boldsymbol{\delta}=\boldsymbol{0}$ and $1/\nu\to 0$ . In practice, we set

[TABLE]

In real applications, we will take $m=0$ and $\Lambda=\boldsymbol{0}$ . In §2.3, we prove that the use of an improper prior on $(\boldsymbol{\xi},\Sigma)$ produces proper posterior distributions, provided that the prior on the degrees of freedom parameter $\nu$ is proper and discrete over $\mathbb{N}$ . Then we assume a uniform prior for $\nu$ over a set of 20 values ranging from 1 to 100.

Finally, we need to specify $\pi(\boldsymbol{\delta}|\Sigma)$ . For each value of $\Sigma$ , the parameter $\boldsymbol{\delta}$ lies in a $p$ -dimensional region whose shape only depends on $\Sigma$ or $\Omega$ . In particular, given the expression of $\delta$ , it is easy to verify that

[TABLE]

must hold, so the conditional parameter space is an ellipsoid, say $\Delta_{\Sigma}$ , given by expression (6), centered at the origin and contained in the hyper-cube $(-1,1)^{p}$ . In any simulation based approach care must be taken that the proposed values actually satisfy (6). For computational convenience we prefer to directly include this constraint on the prior. In the bivariate case, Liseo & Parisi (2013) used an approximation of the Jeffreys’ prior, normalized over $\Delta_{\Sigma}$ . This normalization step, for large $p$ , may become computationally demanding. For this reason, we propose to adopt a uniform prior over $\Delta_{\Sigma}$ , whose volume can be evaluated in a closed form, so the normalizing constant is analytically tractable. Then we assume: $(\boldsymbol{\delta}|\Sigma)\sim U(\Delta_{\Sigma}),$ that is

[TABLE]

In the practical application of the ST model, we will use the $\boldsymbol{\theta}$ parameterization for our sampling strategy. Hence, we need to compute the Jacobian of the transformation $\boldsymbol{\theta}^{\star}\rightarrow\boldsymbol{\theta}$ , which is given by

[TABLE]

2.3 Posterior propriety

Proposition 2.3.1

The posterior distribution of the model is proper.

Proof: Let $\boldsymbol{\theta}_{\star}=(\boldsymbol{\xi},\boldsymbol{\alpha},\Sigma,\nu)$ , using the parameterization in (2),

[TABLE]

Since the c.d.f. $T_{1}(\cdot)$ is bounded by 1, one obtains

[TABLE]

Notice that the parameter $\boldsymbol{\alpha}$ only appears in the prior distribution; then it can be integrated out to obtain

[TABLE]

The above expression is proportional to the posterior density of the parameters of a multivariate Student- $t$ model, with priors given as in §2.2. Theorem 1 in Fernandez & Steel (1999) then guarantees that the posterior distribution of our ST model is proper as soon as the prior on $\nu$ is proper and $n\geq p+1$ except, possibly, for a set of Lebesgue measure zero in $\mathbb{R}^{n\times p}$ . The finite precision of the data recording process can lead, under some choices for the prior distributions, to improper posterior distributions. However, it is possible to verify this condition for any given dataset, and we refer to the cited article for details.

3 The sampler

In the following, we describe the sampling strategy. We have used a Population Monte Carlo algorithm (PMC hereafter, see Cappé et al., 2004), which improves and generalizes the one used in Liseo & Parisi (2013) for the bivariate SN model.

As a Monte Carlo method, the PMC sampler doesn’t rely on convergence arguments, hence it can overcome the problem of multimodality of the posterior distribution; moreover, it offers a great flexibility in choosing the proposal density functions. For example, we use (approximations of) the full conditional distributions as proposal densities.

The outline of the algorithm for the ST model is as follows:

•

At iteration 0, a population of $N$ particles $\boldsymbol{\eta}^{(0)}_{1:N}$ , containing the values of $\boldsymbol{\theta}^{(0)}_{1:N}$ , $\boldsymbol{z}^{(0)}_{1:N}$ and $\boldsymbol{v}^{(0)}_{1:N}$ , is initialized. A possible initialization is described in §3.1.

•

At a generic iteration $t$

–

new values for the particles are proposed following a proposal distribution $q(\boldsymbol{\eta}^{(t)})$ , whose parameters possibly depend on the populations of particles in the previous iterations,

–

the importance weights are computed as

[TABLE]

where $\tilde{\pi}$ and $\tilde{\zeta}$ denote the unnormalized posterior density function and importance weights.

–

A set of quantities are obtained on the basis of the current particles and weights. This set includes the estimates of the parameters $\boldsymbol{\eta}^{(t)}$ , a quantity related to the performance of the sampler in the $t$ -th iteration

[TABLE]

and all the other objects of interest.

–

the particles $\boldsymbol{\eta}^{(t)}_{1:N}$ are multinomially resampled using the weigths $\boldsymbol{\zeta}^{(t)}$ .

•

After $T$ iterations, the final estimates are obtained as a weighted mean of the estimates $\tilde{\boldsymbol{\eta}}^{(1:T)}$ with (unnormalized) weights given by $H^{(1:T)}$ .

A quantity of special interest which can be easily obtained using the PMC is the marginal likelihood of each model. It can be estimated as

[TABLE]

3.1 Initial values for parameters

The initial points are sampled by mimicking the stochastic representation of the model. Then

the values of $\nu^{(0)}_{1:N}$ are sampled from the prior distribution; 2. 2.

given $\nu^{(0)}_{1:N}$ the values of the latent variables $\boldsymbol{z}^{(0)}_{1:N}$ and $\boldsymbol{v}^{(0)}_{1:N}$ are sampled by the respective sampling distributions described in Proposition 2.0.1; 3. 3.

given $\nu^{(0)}$ , $\boldsymbol{z}^{(0)}_{1:N}$ and $\boldsymbol{v}^{(0)}_{1:N}$ , the parameters $\boldsymbol{\xi}^{(0)}_{1:N}$ , $\boldsymbol{\psi}^{(0)}_{1:N}$ and $G^{(0)}_{1:N}$ are obtained as the CML estimates of the parameters, as described in §2.1.1.

3.2 Proposals

For the common parameters of SN and ST models, the proposal distributions are similar to those reported in Liseo & Parisi (2013); our versions are given in appendix B. The ST model, however, also includes the parameter $\nu$ and the latent variables $V_{i}$ ’s.

The parameter $\nu$ assumes values on a finite set, hence it is easy to simulate from its full conditional distribution

[TABLE]

Instead, to our knowledge, there is no simple way to draw values from the full conditional distribution of $V_{i}$ , which is given by

[TABLE]

where

[TABLE]

and $k_{v_{i}}$ is the normalizing constant.

When $B_{i}=0$ (for example in the symmetric case, where $\boldsymbol{\psi}$ is a null vector), then the full conditional for $v_{i}$ has a Gamma distribution. Otherwise, the sign of $B_{i}$ determines the right tail behaviour: when $B_{i}$ is positive (negative), the right tail of the full conditional distribution is thicker (lighter) than the right tail of a Gamma distribution.

Hence, we cannot propose values from a Gamma distribution, as it could jeopardize the validity of the method when $B_{i}<0$ . On the other hand, proposing from a distribution with a thick tail could represent a huge loss in the efficiency of the sampler. For these reasons, we propose values using a rejection sampler (see, for example, Robert & Casella, 2004, §2.3) having the full conditional distribution as target density. We will

define the distribution of the instrumental variable of the rejection sampler; 2. 2.

choose the parameters of this distribution by minimizing the Kullback-Leibler divergence with respect to the target distribution; 3. 3.

obtain the constant $M$ required by the rejection sampling algorithm; 4. 4.

obtain the normalizing constant $k_{v_{i}}$ , required by the PMC algorithm.

Details are as follows:

define $W=R^{2}$ , with $R\sim\Gamma(\alpha_{v},\beta_{v})$ ; the instrumental density function is

[TABLE]

this density has a right tail which is thicker than the one of the target distribution; 2. 2.

If we set $\alpha^{\star}_{v}=2C$ (see Appendix C), the $KL(f||\pi_{v_{i}})$ divergence, as a function of $\beta_{v}$ , has a minimum (in $\mathbb{R}^{+}$ ) in

[TABLE]

Using the parameters $\alpha_{v}^{\star}$ and $\beta_{v}^{\star}$ we will optimise the efficiency of the rejection sampler. 3. 3.

the Rejection Sampling algorithm requires a constant $M$ for which

[TABLE]

The value of $M$ can be found by defining the ratio $m(v_{i})=\pi(v_{i}|\cdots)/f(v_{i})$ ; given the parameters of the instrumental density, this function has a maximum in

[TABLE]

The value of $M$ can be finally obtained as $m(v_{i}^{\star})$ . 4. 4.

To obtain the value of

[TABLE]

we use eq. 3.462 (1) in Gradshteyn & Ryzhik (1994, GR hereafter), with $\nu=2C>0$ , $\beta=A_{i}>0$ , $\gamma=B_{i}$ ,

[TABLE]

where $D_{p}(z)$ is the parabolic cylinder function (GR, eq. 9.240) with $p=-2C$ and $z=B_{i}/\sqrt{2A_{i}}$ , hence

[TABLE]

where $\Upsilon(\alpha,\gamma;z)$ denotes the confluent hypergeometric function (GR, eq. 9.210).

4 Simulation study

In this section we use simulated data to evaluate the performances of the proposed approach. Since the multivariate ST model may be considered an encompassing model including, as special cases, the multivariate Student- $t$ model, the multivariate SN model and the multivariate normal one, it is of primary importance to verify the ability of the proposed approach to discriminate among these nested models.

For each of the four models, we have generated 50 samples; for each sample, we compute the posterior probabilities of each candidate model. These posterior probabilities are estimated using (7) together with a uniform prior over the model space.

In our simulations, each sample consists of $n=300$ observations with $p=4$ and

[TABLE]

Samples from the SN and ST models have been generated using $\boldsymbol{\alpha}=(4,4,4,4)^{\prime}$ . Data generated from the Student- $t$ and ST models have $\nu=10$ .

For each sample we have run the PMC algorithm using 20000 particles for each of 6 iterations. Results are summarized in the following four plots.

The barplot in Fig. 1 depicts the results for the Normally distributed samples. Each column stacks the posterior probabilities of the 4 candidate models estimated for a single sample. To improve the readability of the plot, bars have been rearranged in order to have a decreasing probability for the true model. Here, the true model is correctly identified in 47 cases; in the remaining cases (3 our of 50), the Student- $t$ model is preferred. The posterior probabilities for the remaining models are always very low.

The situation is even more extreme when data come from a Student- $t$ distribution (Fig. 2): here the true model is always correctly identified, with small to negligible probabilities for the other models.

The worst performance of our approach happens when the data are generated from a SN distribution. In Fig. 3, it is possible to notice that the procedure detects the correct SN model in about 25% of the cases, and it more often prefers the multivariate normal model: this can be justified by the fact that the multivariate SN model is notoriously the most difficult to deal with, because of the multimodality phenomenon, described in Liseo & Parisi (2013).

Also in the ST case (Fig. 4), the true model has been correctly identified in 44 cases. In almost all the other cases, the SN model has been preferred.

4.1 The mvst package

The simulation results have been obtained in R, using the package mvst. It contains functions to estimate the parameters of the ST (and nested) models, and to simulate data from them. It uses the model and the proposals described above, even if it allows to define customized prior and proposal distributions.

It makes use of the GNU Scientific Library (see Gough, 2009) to speed up the heaviest parts of the code and, in particular, for the computation of (8). Besides, it requires three R packages: mvtnorm (Genz et al., 2015), MCMCpack (Martin et al., 2011) and mnormt (Azzalini & Genz, 2016). It also makes use of three scripts available in the RcppGSL package (Eddelbuettel & Romain, 2015).

5 A real dataset

As a final illustration of the proposed algorithm, we consider the wine data of the Grignolino cultivar, used in §6.2.6 of Azzalini & Capitanio (2014). The dataset contains 71 observations on 3 variables (chloride, glycerol and magnesium). Data are available in the sn package.

We have performed a PMC sampler with 6 iterations, 20000 particles each. The posterior probabilities for the four models are given in Table (1). Models with light tails have negligible probabilities, while the preferred model is skew-t.

Given this model, the posterior mean for $\nu$ is approximately equal to 3.22, while the ML estimate in Azzalini & Capitanio (2014) is equal to 3.4.

Appendix A Proof of Proposition 2.0.1

(a): From one of the possible definitions of a multivariate ST r.v., it is known that $\boldsymbol{U}\sim SN_{p}(\boldsymbol{0},\boldsymbol{\alpha},\boldsymbol{\Omega},\nu)$ ; since $\boldsymbol{Y}$ is a simple transformation of $\boldsymbol{U}$ , its distribution is readily obtained.

(b): Start from $f(y,z,v)=f(v)f(z)f(y\mid z,v)$ . By assumption, $f(z)$ is a standard Gaussian density, and

[TABLE]

Then, by using simple results on conditional Gaussian densities, one gets

[TABLE]

Hence the result in (2.0.1).

Appendix B Proposal distributions

We use the full conditional distributions as proposals for the latent variables $\boldsymbol{Z}$ and $\boldsymbol{\xi}$ : each $Z_{i}$ has the following full conditional distribution

[TABLE]

where

[TABLE]

The variables $Z_{i}$ can be drawn as the product of $Z^{+}_{i}$ , a normal r.v. with parameters $m_{i}$ and $v_{\theta}$ truncated in 0 and the sign $S_{i}$ , uniform on $\{-1,1\}$ . To generate values $Z^{+}$ a rejection sampler has been employed (see Robert, 1995).

The parameter $\boldsymbol{\xi}$ has the following full conditional density:

[TABLE]

The parameters $\boldsymbol{\psi}$ and $G$ have untractable full conditional distributions. To obtain a proposal distribution, they are approximated using only the contribution of the likelihood to the full conditional density.

The parameter $\boldsymbol{\psi}$ has the following full conditional distribution

[TABLE]

where $\mathbbm{1}_{x}(\cdot)$ denotes the indicator function. By ignoring the first two factors, we obtain the following proposal distribution

[TABLE]

The proposal distribution has a positive density on $\mathbb{R}^{p}$ , while the full conditional is bounded on $\Delta_{\Sigma}$ . This feature improves the ability of the sampler to explore the parameter space; moreover, particles which don’t respect the constraint (6) will be automatically discarded, as they have null prior (and posterior) probability density, hence a null importance weight.

The parameter $G$ has the following full conditional density

[TABLE]

Ignoring the prior term we obtain

[TABLE]

Appendix C Details about the Rejection Sampler

For a generic latent variable $V_{i}$ , the Kullback Leibler divergence $KL(f||\pi_{v})$ is given by

[TABLE]

which has an analytical solution for $\alpha_{v}^{\star}=2C$ :

[TABLE]

This divergence has always one (and only one) minimum in $\mathbb{R}^{+}$ , given by

[TABLE]

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Azzalini & Capitanio (2014) Azzalini, A. (2014). The Skew-Normal and related families , (with the collaboration of A. Capitanio). Cambridge: Cambridge University Press.
2Azzalini (2015) Azzalini, A. (2015). The R package sn : The Skew-Normal and Skew- t 𝑡 t distributions (version 1.3-0) . Università di Padova, Italia.
3Azzalini & Capitanio (2003) Azzalini, A. & Capitanio, A. (2003). Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t distribution. Journal of the Royal Statistical Society, B , 65, 367–389.
4Azzalini & Genton (2008) Azzalini, A. & Genton, M. (2008). Robust likelihood methods based on the skew-t and related distributions. International Statistical Review , 76, 106–119
5Azzalini & Genz (2016) Azzalini, A. & Genz, A. (2016). The R package mnormt : The multivariate normal and t 𝑡 t distributions (version 1.5-4). http://azzalini.stat.unipd.it/SW/Pkg-mnormt
6Branco & Dey (2001) Branco, M. D. & Dey, D. (2001). A general class of multivariate skew-elliptical distributions. Journal of Multivariate Analysis , 77, 1–15.
7Branco et al. (2011) Branco, M.D. , Genton, M.G. & Liseo, B. (2011). Objective Bayesian Analysis of Skew-t Distributions. Scandinavian Journal of Statistics , 40 (1), 63–85
8Cappé et al. (2004) Cappé, O. , Guillin, A. , Marin, J. M. & Robert, C. P. (2004). Population Monte Carlo. J. Comput. Graph. Statist. 13 , 907–929.