Testing for normality in any dimension based on a partial differential equation involving the moment generating function
Norbert Henze, Jaco Visagie

TL;DR
This paper introduces a new class of affine invariant tests for multivariate normality based on a PDE involving the moment generating function, demonstrating strong power and consistency.
Contribution
It develops a novel testing method using PDEs for the moment generating function, applicable in any dimension, with proven null distribution and power against alternatives.
Findings
Tests are affine invariant and consistent.
Strong power against heavy-tailed distributions.
Connection with multivariate skewness measures.
Abstract
We use a system of first-order partial differential equations that characterize the moment generating function of the -variate standard normal distribution to construct a class of affine invariant tests for normality in any dimension. We derive the limit null distribution of the resulting test statistics, and we prove consistency of the tests against general alternatives. In the case , a certain limit of these tests is connected with two measures of multivariate skewness. The new tests show strong power performance when compared to well-known competitors, especially against heavy-tailed distributions, and they are illustrated by means of a real data set.
| 2.5 | 3 | 4 | 5 | 7 | |
|---|---|---|---|---|---|
| 2.6013 | 0.7787 | 0.2022 | 0.0861 | 0.0277 | |
| 4.7153 | 0.5430 | 0.0458 | 0.0094 | 0.0011 |
| 20 | |||||||
|---|---|---|---|---|---|---|---|
| 50 | |||||||
| 100 | |||||||
| 200 | |||||||
| 300 | |||||||
| 20 | |||||||
| 50 | |||||||
| 100 | |||||||
| 200 | |||||||
| 300 | |||||||
| 20 | |||||||
| 50 | |||||||
| 100 | |||||||
| 200 | |||||||
| 300 | |||||||
| 20 | |||||||
| 50 | |||||||
| 100 | |||||||
| 200 | |||||||
| 300 |
| NMIX1 | ||||||||||
| NMIX2 | ||||||||||
| NMIX1 | ||||||||||
| NMIX2 | ||||||||||
| NMIX1 | ||||||||||
| NMIX2 | ||||||||||
| NMIX1 | ||||||||||
| NMIX2 | ||||||||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Testing for normality in any dimension based on a partial differential equation involving the moment generating function
Norbert Henze1, Jaco Visagie2
*1**Institute of Stochastics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
*2**Department of Statistics, University of Pretoria, South Africa
**Abstract. We use a system of first-order partial differential equations that characterize the moment generating function of the -variate standard normal distribution to construct a class of affine invariant tests for normality in any dimension. We derive the limit null distribution of the resulting test statistics, and we prove consistency of the tests against general alternatives. In the case , a certain limit of these tests is connected with two measures of multivariate skewness. The new tests show strong power performance when compared to well-known competitors, especially against heavy-tailed distributions, and they are illustrated by means of a real data set.
Keywords. Moment generating function; test for multivariate normality; direct sum of Hilbert spaces; multivariate skewness; weighted -statistic
2010 AMS classification: 62H15, 62G20.**
1 Introduction
It is often of interest to check whether an observed -dimensional dataset is compatible with the assumption of coming from a multivariate normal distribution. Such a model check is of practical interest to researchers, as many multivariate techniques rely on the assumption of multivariate normality (for short: multinormality). As a consequence, it is not surprising that there is ongoing interest in testing for multinormality, as can be witnessed by various recent papers on the subject, see for example [28], [33], [10] and [23] as well as the references therein. Research into the practical implementation of these tests is also undertaken regularly, see, e.g., [25] and [23] regarding the implementation in the statistical software package R.
The purpose of this paper is not to review the multitude of tests that hitherto has been proposed (for an account of the state of the art regarding affine invariant procedures before 2002, see [16]), but to introduce and study a new class of tests that is based on a certain partial differential equation. To be specific, let be independent and identically distributed (i.i.d.) copies of a -dimensional random (column) vector , the distribution of which is assumed to be absolutely continuous with respect to -dimensional Lebesgue measure. All random vectors are defined on a common probability space .
Writing N for the -dimensional normal distribution with mean vector and non-degenerate covariance matrix and for the class of all non-degenerate -variate normal distributions, a classical problem is to test the null hypothesis
[TABLE]
against general alternatives. Since the class is closed with respect to full rank affine transformations, any genuine test statistic based on should also be invariant with respect to such transformations, i.e., we should have for each nonsingular ()-matrix and each , see [16] for an account of the importance of affine invariance in connection with testing for multivariate normality. Writing for the sample mean and for the sample covariance matrix of , where the superscript denotes the transpose of vectors and matrices, a necessary and sufficient condition for a test statistic to be affine invariant is that it is based on the scalar products
[TABLE]
where , , are the so-called scaled residuals of , see [16]. Here, denotes the unique symmetric square root of which, due to the absolute continuity of the distribution of , exists with probability one if , see [9]. The latter condition is tacitly assumed to hold in what follows.
The novel idea for constructing a test of is the following: Suppose is a -dimensional random vector, and assume that the moment generating function (MGF) exists for each and satisfies the system of partial differential equations
[TABLE]
Writing for the gradient of a function at , (1.2) may be succinctly written as , . Taking into account that , the unique solution of this equation in the case is , which is the MGF of the standard normal distribution. If , and we fix , the solution of (1.2) for is
[TABLE]
for some function . We thus have , , which shows that is differentiable. Moreover,
[TABLE]
Using (1.2) with then gives
[TABLE]
the solution of which is for some function . Inserting this expression into (1.3) and continuing this way, we finally obtain
[TABLE]
where denotes the Euclidean norm in . Notice that this unique solution of (1.2) is the MGF of the standard normal distribution N, where is the unit matrix of order .
If has some non-degenerate normal distribution, the scaled residuals should be approximately independent, with a distribution close to N, at least for large . Hence, a natural approach for testing is to consider the empirical MGF
[TABLE]
of , and to employ the weighted -statistic
[TABLE]
where
[TABLE]
Rejection of is for large values of . The role of will be discussed later.
Using the relations
[TABLE]
and putting , the test statistic defined in (1.5) takes the form
[TABLE]
which is amenable to computational purposes. Notice that is affine invariant.
The remainder of this paper is structured as follows: Section 2 deals with the convergence in distribution of under , and Section 3 is devoted to the problem of consistency of the new test (which seems to be new even in the univariate case). In Section 4 we let tend to infinity while keeping fixed. Under this setting, converges to a certain linear combination of two well-known measures of multivariate skewness. In Section 5 we compare the finite-sample power behavior of the new test to that of several classical and recent tests for both univariate and multivariate normality. Section 6 illustrates the procedures by means of a real data set. Section 7 presents some conclusions, while Section 8 contains several technical proofs which do not form part of the main text.
2 The limit null distribution of
In this section, we derive the limit distribution of the test statistic defined in (1.5) under the null hypothesis (1.1). In view of affine invariance, we will assume and . Put
[TABLE]
and let denote the separable Hilbert space of (equivalence classes of) measurable functions that are square integrable with respect to the finite measure on the -field of Borel sets of , given by the weight function . The inner product and the resulting norm on will be denoted by
[TABLE]
respectively. Since in (2.1) is -valued, we consider the Hilbert space , which is the -fold (orthogonal) direct sum , consisting of all ordered -tuples , equipped with the inner product
[TABLE]
where , . Notice that the norm on satisfies
[TABLE]
Since is a random element of and , the aim is to prove for some centred Gaussian random element of . By the Continuous Mapping Theorem (CMT), we would then have . Here and in the sequel, denotes convergence in distribution of random elements (especially: of random variables). Moreover, refers to convergence in probability to zero of random elements . The main result of this section is as follows.
Theorem 2.1
(Convergence of )*
Suppose that has some non-degenerate -variate normal distribution. If the weight function in (1.6) satisfies , there is some centred Gaussian random element of having covariance (matrix) kernel K(s,t)=\mathbb{E}\big{[}W(s)W(t)^{\top}\big{]}, where*
[TABLE]
, so that as .
Corollary 2.2
The limit distribution of as under is that of
[TABLE]
Proof of Theorem 2.1. To highlight the main idea of the proof, let
[TABLE]
and put
[TABLE]
Notice that are i.i.d. (centred) random elements of . Moreover, the condition implies . By a Hilbert space central limit theorem (CLT), see e.g. [26], there is some centred Gaussian random element of such that . The idea now is to approximate by a suitable random element of so that as , and , where are i.i.d. centred random elements of satisfying . The assertion would then follow from the CLT in Hilbert spaces and Slutzky’s lemma. To this end, put
[TABLE]
A Taylor expansion gives
[TABLE]
where . By some algebra, it follows that
[TABLE]
where
[TABLE]
Notice that the first term on the right-hand side of (2.6) is , as given in (2.3). By Proposition 8.1, we have , and Proposition 8.2 yields
[TABLE]
In view of Proposition 8.3, (2.6) implies , where
[TABLE]
A straightforward calculation shows , . Moreover, due to the condition , we have . Hence, are i.i.d. centred random elements of , and the CLT in Hilbert spaces yields the assertion, since the kernel figuring in (2.2) is given by (for details in computing , see Proposition 8.4).
The following result provides some information on the limit null distribution of .
Theorem 2.3
Let be a random variable with the distribution of , where is given in Theorem 2.1. We then have
[TABLE]
Proof. By Fubini’s theorem, it follows that
[TABLE]
Writing for the trace of a square matrix , we have
[TABLE]
Now, some straightforward algebra yields the assertion.
In the univariate case , we also obtained an explicit expression for the variance of , by using the relation
[TABLE]
By tedious calculations, it follows that
[TABLE]
where , and .
Table 1 contains expectation and variance of in the univariate case for various values of .
3 Consistency
In this section, let have an absolutely continuous distribution, and suppose that exists for each . In view of affine invariance, let w.l.o.g. and .
Theorem 3.1
We have
[TABLE]
Proof. Fix , and put . Let be the Banach space of continuous functions , equipped with the norm . Recall from (1.4), and put
[TABLE]
Let , where is given in (2.4). From (2.4) we obtain
[TABLE]
Since the existence of implies , Theorem 5.2 of [3] yields
[TABLE]
From and Kolmogorov’s variance criterion for averages (see [24], p. 73), we have for each . Hence Proposition 8.5 implies
[TABLE]
Using the notation also for a function , (2.5) gives
[TABLE]
By the strong law of large numbers (SLLN) in , the first factor on the right-hand side converges almost surely to , and thus (3.4) entails
[TABLE]
Putting
[TABLE]
the triangle inequality gives
[TABLE]
Invoking (3.5), (3.2), (3.3) and Proposition 8.5, we have
[TABLE]
Writing id for the identity function on , the triangle inequality yields
[TABLE]
From (3.6), (3.8) and the SLLN in it follows that
[TABLE]
Likewise, we have
[TABLE]
and thus
[TABLE]
Upon combining (3.9) and (3.10), and using Fatou’s lemma, we obtain
[TABLE]
Since was arbitrary, the assertion follows.
Remark 3.2
If has some non-degenerate non-normal distribution with existing moment generating function, then for at least one . Consequently, the right-hand side of (3.1) is strictly positive, and thus
[TABLE]
Hence, the test is consistent against each such alternative. We conjecture that (3.11) holds for any non-normal alternative distribution. A proof of such a result, however, remains an open problem.
4 The limit
In this section, we will show that, for fixed , the statistic , after a suitable scaling, approaches a linear combination of two well-known measures of multivariate skewness as .
Theorem 4.1
We have (pointwise on the underlying probability space)
[TABLE]
where
[TABLE]
is nonnegative invariant sample skewness in the sense of [27], and
[TABLE]
denotes sample skewness in the sense of Móri, Rohatgi and Székely, as defined in [29].
Proof. We start with (1.7) and use
[TABLE]
as . Multiplying this expression with the term within the rightmost bracket of (1.7), and using the relations , , ,
[TABLE]
as well as
[TABLE]
the result follows by tedious but straightforward algebra.
Remark 4.2
It is interesting to note a similarity between Theorem 4.1 and Theorem 2.1 of [15], who showed that the BHEP-statistic for testing for multivariate normality, after suitable rescaling, approaches the linear combination as a smoothing parameter (called in that paper) tends to [math]. Since and are related by , this corresponds to letting tend to infinity. The same linear combination also showed up as a limit statistic in [18]. Notice that, in the univariate case, the limit statistic figuring in Theorem 4.1 is nothing but three times squared sample skewness. It should be stressed that tests for multivariate normality based on or (or on related measures of multivariate skewness and kurtosis) lack consistency against general alternatives, see, e.g., [4], [5], [13], and [14].
5 Monte Carlo results
In this section we compare the finite-sample power performance of the newly proposed test to those of several competing tests for normality, both for the univariate and the multivariate case. In the case the competing procedures are
- a)
the Cramér-von Mises () test, 2. b)
the Anderson-Darling () test, 3. c)
the Shapiro-Wilk () test, 4. d)
the Jarque-Bera () test, 5. e)
the Zghoul () test.
The R package contains the functions cvm.test and ad.test, which can be used in order to calculate the test statistic and the associated p-value for each of the first two tests mentioned above, see [12]. The Shapiro-Wilk test can be performed using the function included in the package. The R package contains a function called , which can be used in order to calculate the test statistic and p-value associated with the fourth test mentioned above, see [34]. Each of these tests are well-known and will not be discussed further.
The test of Zghoul (see [36]) is based on the empirical moment generating function. [36] includes a Monte Carlo study which indicates that the finite-sample power performance of the test compares favorably to that of its competitors, especially against symmetric alternatives with kurtosis higher than that of the normal distribution. However, the mentioned paper fails to provide the mathematical theory underlying this test. [19] more recently provided this theory, including a proof that the test is consistent against general alternatives.
The test statistic of Zghoul is a weighted -statistic, given by
[TABLE]
where is a smoothing parameter. Based on the finite-sample performance reported in [36], the author recommended setting equal to either or . The numerical results presented below include the powers obtained using both of these values for the smoothing parameter; the resulting tests are denoted by and respectively. The test statistic can be rewritten in the computationally amenable form
[TABLE]
This test rejects normality for large values of the test statistic.
Turning our attention to the multivariate case, we consider the finite-sample power performance of the newly proposed test to those of some prominent competing tests, and to two very recent tests for multinormality. These procedures are
- a)
Mardia’s tests based on skewness and kurtosis, 2. b)
the Henze-Zirkler test, 3. c)
the energy test of Székely and Rizzo, 4. d)
a recent test of Henze, Jiménez-Gamero and Meintanis, 5. e)
a recent test of Henze and Jiménez-Gamero.
a): **Mardia’s tests based on skewness and kurtosis
** Mardia’s test for multinormality based on sample skewness rejects for large values of , where is given in (4.1). Notice that is a consistent estimator of . Under normality we have , and the limit distribution of as is , see [27]. The limit distribution of under elliptical symmetriy has been derived by [5].
Sample kurtosis in the sense of Mardia is given by
[TABLE]
which is an estimator of .
Under normality, we have , and the limit null distribution of is the normal distribution , see [27]. The test based on rejects for large and small values of .
The R package contains a function , which calculates both Mardia’s skewness and kurtosis measures as well as the p-values associated with the corresponding tests from multivariate normality, see [11]. Below we denote the tests based on Mardia’s skewness and kurtosis by and , respectively.
We stress that there are several other measures of skewness and kurtosis, see Sections 3 and 4 of [16]. The deficiences of such measures as statistics for purportly “directed tests” for multivariate normality have been addressed in [4], [5], [13] as well as [14].
b): **The Henze-Zirkler test
** Writing for the empirical characteristic function of the scaled residuals , [21] proposed the test statistic
[TABLE]
for some fixed . The test statistic can be rewritten as
[TABLE]
The Henze-Zirkler test is obtained by setting
[TABLE]
This choice of corresponds to the optimal bandwidth for a multivariate nonparametric density estimator with a Gaussian kernel. The Henze-Zirkler test is included because of its impressive power performance reported in previous studies, see, e.g., [28].
The Henze-Zirkler test (denoted below) is programmed in the function in the R package , see [25]. This test rejects normality for large values of the test statistic. is equivalent to a test by [7], as remarked in [16].
c): **The energy test
** [33] proposed the test statistic
[TABLE]
Here, the first expectation is with respect to a random vector , which is independent of and has the distribution , and
[TABLE]
This test, denoted by , is known as the energy test. Rejection of is for large values of . The function in the R package calculates this test statistic as well as the corresponding p-value, see [31]. The energy test is also reported to have excellent power performance, see [23].
d): **The Henze–Jiménez-Gamero–Meintanis test
** By generalizing a characterization of normality involving both the characteristic and the moment generating function (see [35]) to the multivariate case, [18] proposed to base a test of on the weighted -statistic
[TABLE]
for some parameter . Putting , can be rewritten as
[TABLE]
The test based on uses an upper rejection region.
A disadvantage of this test is the need to calculate a four-fold sum in order to evaluate the test statistic. Hence, the amount of computer time required in order to perform this test is substantially more than the time required for the other tests under consideration.
e): **The Henze–Jiménez-Gamero test
** [17] present a multivariate generalization of a recent class of tests for univariate normality (see [19]) based on the empirical moment generating function. The test statistic is
[TABLE]
where is a fixed parameter. An alternative representation of is
[TABLE]
which is amenable to computation. This test rejects for large values of the test statistic.
The numerical results below were obtained using the software package R, see [30]. Monte Carlo simulation was used in order to estimate the upper percentiles of the distribution of for the sample sizes and dimensions . Table 2 provides the empirical -percentiles of obtained by simulating one million samples from a distribution using . Notice that, for ease of comparison, the values associated with in the rightmost column of Table 2 have been multiplied by 100, since these values are quite close to 0. The tables presented were obtained using the R package , see [22]. From Table 2, we see that the speed of convergence to the asymptotic null distribution is slow and depends on the value of as well as on the dimension of the data.
5.1 Power results
In the power study presented in this section we simulate from several univariate and multivariate distributions. The critical values presented in Table 2 are used in order to calculate the power results for the newly proposed test. For the remaining tests, the critical values used are also calculated based on simulations under . The power estimates presented below are based on 10 000 random samples. Notice that, due to the computational complexity of , the number of simulations used in order to obtain both critical values and power estimates for this test are reduced by a factor of 10. This step is necessary in order to ensure that the numerical results can be obtained within a reasonable time.
Power results are reported for a sample size of and data dimensions . A nominal significance level of is used throughout. As is pointed out in [28], the maximum possible standard error for each power estimate is . Thus, the 99% confidence interval for each reported power estimate is contained in the interval obtained by subtracting 1% from, and adding 1% to, the stated power.
The results for the univariate case are given in Table 3. The entries are percentages of rejection of , rounded to the nearest integer. The power against the standard normal distribution shows that the nominal level is maintained very closely. As for the alternative distributions, NMIX1 and NMIX2 denote mixtures of the normal distributions and . The mixture NMIX1 gives equal weight to these distributions, while NMIX2 is obtained when the probability of sampling from the standard normal distribution is 0.75. The remaining alternative distributions considered are -distributions with 3, 5 and 10 degress of freedom, the lognormal distributions with parameters and (denoted by ), the -distributions with 5 and 15 degrees of freedom, the standard logistic distribution, the Weibull distributions with shape parameters 10 and 20, the Pearson type VII distributions with 5 and 10 degrees of freedom (denoted by ), and the skew-normal law with skewness parameters 3 and 5 (denoted by ), see [1].
The R Package contains the function , which can be employed to simulate random variables from this distribution, see [6]. The R Package disposes of the function , which can be used to simulate random variates from the skew normal distribution, see [2].
Tables 3,4, 5 and 6 report the powers calculated in the cases where equals 1, 2, 3 and 5 respectively. Note that the subscript in the names of the test statistics is omitted in the tables in order to save space.
The results shown in Table 3 indicate that the newly proposed class of tests exhibit substantial power against the distributions considered. outperforms each of the competing tests in terms of the estimated powers against the , and laws, as well as both of the Weibull distributions considered. The newly proposed tests also proves to be serious competitors against each of the remaining distributions considered, especially for small values of (in which case is often only outperformed by the Jarque-Bera test).
We now turn our attention to the case . As was the case for the univariate tests, the powers against 16 alternative distributions are reported for each of the data dimensions considered. The powers of each of the tests against the standard normal distribution are also included in the relevant tables. The alternative distributions considered include mixtures of normal laws, distributions with independent marginals, distributions where each marginal is normal except for one, a spherically symmetric distribution, and a distribution for which every marginal is normal, but the joint distribution does not follow the normal law.
The parameter combinations used for the mixtures of normal distributions were taken from [33]. Let denote a normal mixture, where the probability of sampling from is and the probability of sampling from is . Let and denote -dimensional column vectors of 0’s and 3’s, respectively, and let denote a ()-matrix containing 1’s on the main diagonal and 0.9’s on each off diagonal entry. The normal mixtures are constructed by combining , and . The first mixture, denoted by NMIX1, is . This distribution is skewed with heavy tails. The second mixture, denoted by NMIX2, is . This is a symmetric, heavy-tailed distribution. In addition, we included two multivariate -distributions; the -distribution for and . Next, we included distributions with independent marginals, the latter being the -distribution with 15 and 20 degrees of freedom respectively, the distribution, the gamma distributions with parameters and , as well as the Pearson Type VII distributions with 10 and 20 degrees of freedom.
Three -dimensional distributions are obtained by combining independent standard normal marginals with one non-normal distribution. This distribution is denoted by , where denotes the non-normal marginal distribution. The three alternatives considered for are the -distributions with 5 and 10 degrees of freedom, respectively, as well as the -distribution with 3 degrees of freedom.
Spherically symmetric distributions can be defined in R using the function from the R package , see [32]. Tables 4–6 display the estimated powers of the various tests considered against the -dimensional spherically symmetric distribution, where the radius of the distribution follows a lognormal distribution with parameters 0 and 0.5. This distribution is denoted by .
Let and denote positive definite ()-matrices with 1’s on the main diagonal, where has the constant and the constant on each off diagonal entry. The final distribution considered is the mixture . This distribution is a non-normal -variate distribution with normal marginals.
As was the case in the univariate setting, the newly proposed test is associated with several high powers reported in Tables 4, 5, and 6. When comparing the results for the distributions with independent marginals, we see that outperforms each of the competitors against both of the distributions with marginals as well as both of the distributions with gamma marginals considered. This is also the case aginst the \textrm{N}(0,1)^{d-1}\otimes$$\chi^{2}(5) and \textrm{N}(0,1)^{d-1}\otimes$$\chi^{2}(10) distributions. The mentioned predominance is for each of the data dimensions considered. Furthermore, the newly proposed test statistic shows high power against the remaining distributions for finite values of . Specifically, when the new test outperforms its competitors against the spherically symmetric distribution. In the case , none of the competing tests are able to outperform the newly proposed class of tests against the second normal mixture considered, the -distribution with 10 degrees of freedom or the distribution. The corresponding list of distributions in the case where is obtained by substituting the -distribution with 10 degrees of freedom for the -distribution with 5 degrees of freedom and adding the distribution. Finally, none of the competing tests is able to outperform the newly proposed class of tests against the distribution.
In most of the cases considered, the power of the newly proposed class of tests is a monotone function of . Based on the numerical results presented, it is recommeded that be used when performing the test as this value results in reasonably high power against the majority of the alternatve distributions considered.
6 A real data example
The payoff function of certain types of financial derivatives depends on the joint behaviour of multiple stocks or indexes; an important example is the class of basket options. When calculating the price of a basket option, it is often assumed that the log-returns of the stocks or indexes considered are realized from a multivariate normal distribution (this assumption is an extention of the celebrated Black-Merton-Scholes model for options on a single stock or index). As a result, testing the hypothesis that observed financial log-returns follow a multivariate normal law is of interest when pricing basket options. For more details regarding the pricing of these options, the interested reader is referred to [8].
As a practical application, we consider the log-returns associated with three major indexes traded in the financial market of the United States. 50 daily log-returns were calculated for the period ending 29 December 2017, the relevant prices were downloaded from http://finance.yahoo.com. The first index considered is the Dow Jones Industrial Average (DJIA) index; this index is comprised of a price-weighted average of 30 large publicly owned companies. The second is the Standard & Poor 500 (S&P 500); a market-capitalization weighted index comprising 500 large companies. Finally, we consider the log-returns of the National Association of Securities Dealers Automated Quotations (NASDAQ) composite index. We are interested in testing the hypothesis that the log-returns are realized from a multivariate normal distribution using the newly proposed test.
Table 7 shows the estimated p-values associated with the newly proposed tests for various values of , the reported p-values were obtained using one million Monte Carlo simulations in each case. The results indicate that the hypothesis of multivariate normality is rejected at a 1% significance level for each value of considered.
7 Conclusion
We proposed and studied a new class of affine invariant tests for normality in any dimension that are based on a partial differential equation involving the moment generating function. Some properties of the limit null distribution of the test statistic have been derived, and the consistency of this class of tests against general alternatives has been proved under some mild conditions. For fixed , the test statistic , after suitable scaling, approaches a linear combination of two measures of multivariate skewness as .
A Monte Carlo study investigates the finite-sample performance of compared to those of competing tests in the univariate and multivariate settings. The competing tests considered for univariate normality comprise four well-known tests, while, in the multivariate case, we include four prominent classical tests for multinormality and two very recent tests. The numerical results indicate that often exhibits power greater than those associated with several of its competitors, both in univariate and multivariate settings. Based on the numerical results obtained, it is recommended that is used when performing the test.
Acknowledgement: The authors thank Tobias Jahnke for providing the proof of Proposition 8.5. The second author’s work is based on research supported by the National Research Foundation, South Africa (Research chair: Non-parametric, Robust Statistical Inference and Statistical Process Control, Grant number 71199). Opinions expressed and conclusions arrived at are those of the authors and are not necessarily to be attributed to the NRF.
8 Appendix: Technical proofs
Proposition 8.1
We have , where is given in (2.9).
Proof. Putting , we have
[TABLE]
Recall from (3.7) and put . We have . Furthermore, using and
[TABLE]
as well as
[TABLE]
we obtain
[TABLE]
where . Expanding the curly brackets, the leading terms are those that do not involve any of . We concentrate on
[TABLE]
which originates from choosing the first term within each of the curly brackets. The other terms are treated similarly. Notice that
[TABLE]
and that , . By Proposition 10.2 of [18], the integral is of order (notice that in that paper corresponds to (our) ). From display (10.6) and display (10.7) of [18] we have and . Since , it follows that
[TABLE]
Proposition 8.2
For given in (2.8), we have
[TABLE]
Proof. Observe that , where
[TABLE]
Use
[TABLE]
and with given in (3.7) together with (see Prop. 10.1. of [18]) and
[TABLE]
to show . Next, where
[TABLE]
Since
[TABLE]
and , one may use (8.1) and Fubini’s theorem to show . To proceed, rewrite in the form
[TABLE]
Since replacing the first factor with its expectation means adding a term that is asymptotically negligible, we have
[TABLE]
To tackle , we rewrite its transpose in the form
[TABLE]
and use display (2.13) of [20], according to which
[TABLE]
where . Putting
[TABLE]
and , we have
[TABLE]
Abbreviating the matrix within curly brackets by , we have . Since , it follows that
[TABLE]
Now, observe that is a mean of centred random vectors, and invoking Fubini’s theorem the expectation of the integral is seen to be of order . Thus, , and hence (since the matrix figuring in (8.5) is asymptotically negligible) we have
[TABLE]
Upon combing this result with (8.2) and (8.4), the assertion follows.
Proposition 8.3
For given in (2.7), we have
[TABLE]
Proof. Notice that , where
[TABLE]
To show , use n\big{\|}S_{n}^{-1/2}-\textrm{I}_{d}\big{\|}^{2}_{2}\,\|\overline{X}_{n}\|^{2}=O_{\mathbb{P}}(n^{-1}) together with
[TABLE]
and Fubini’s theorem, since \mathbb{E}\big{\|}n^{-1}\sum_{j=1}^{n}\exp(t^{\top}X_{j})(X_{j}-t)\big{\|}^{2}=O(n^{-1}), due to the fact that the summands are centred random vectors. Likewise,
[TABLE]
Since each of the summands is a product of centred random vectors or random variables, we have , and Fubini’s theorem yields . To conclude the proof, observe that, by (8.3),
[TABLE]
where , . Notice that is a mean of i.i.d. random matrices, and that
[TABLE]
Straightforward calculations show that replacing with the right-hand side of the last equation means adding a term that is asymptotically negligible. Hence, the first term on the right-hand side of (8.6) is
[TABLE]
The second term is since .
Proposition 8.4
(Calculation of )*
Recall from (2.10). Putting and , we have*
[TABLE]
Writing for the zero matrix of order , we have
[TABLE]
The assertion now follows from straightforward calculations.
Proposition 8.5
Let be a sequence of symmetric positive definite ()-matrices and an increasing sequence of positive real numbers satisfying so that
[TABLE]
We then have .
Proof. Let be the diagonal matrix consisting of the positive eigenvalues of so that and
[TABLE]
Since the assumptions imply , choose so large that for each . Putting , we have
[TABLE]
and thus . Now, implies , whence
[TABLE]
In view of (8.7) and , the assertion follows.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Azzalini, A. (1985). A class of distributions which includes the normal ones. Scand. J. Stat. , 12 , 171–178.
- 2[2] Azzalini, A. (2017). The R package ’sn’: The skew-normal and skew-t distributions. R package version 1.5-0. http://azzalini.stat.unipd.it/SN
- 3[3] Barndorff-Nielsen, O. (1963). On the behaviour of extreme order statistics. Ann. Math. Stat. , 34 , 992–1002.
- 4[4] Baringhaus, L., and Henze, N. (1991). Limit distributions for measures of multivariate skewness and kurtosis based on projections. J. Multiv. Anal. , 38 , 51–69.
- 5[5] Baringhaus, L., and Henze, N. (1992). Limit distributions for Mardia’s measure of multivariate skewness. Ann. Stat. , 20 , 1889–1902.
- 6[6] Becker, M, and Klößner, S. (2017). Pearson DS: Pearson Distribution System. R package version 1.0. https://CRAN.R-project.org/package=Pearson DS
- 7[7] Bowman, A.W., and Foster, P.J. (1993). Adaptive smoothing and density based tests of multivariate normality. J. Amer. Stat. Ass. , 88 , 529–537.
- 8[8] Caldana, R., Fusai, G., Gnoatto, A., and Grasselli, M. (2016). General closed-form basket option pricing bounds. Quant. Finance , 16 , 535–554.
