Semiparametric estimation in the normal variance-mean mixture model
Denis Belomestny, Vladimir Panov

TL;DR
This paper introduces a semiparametric estimation method for variance-mean mixture models, focusing on estimating the normal mean and the mixing distribution density, with demonstrated effectiveness on simulated and real data.
Contribution
It presents a novel two-step semiparametric estimation procedure for variance-mean mixtures, combining parametric mean estimation with nonparametric mixing density recovery.
Findings
Effective estimation demonstrated on simulated data
Successful application to real financial data
Improved understanding of mixture model parameters
Abstract
In this paper we study the problem of statistical inference on the parameters of the semiparametric variance-mean mixtures. This class of mixtures has recently become rather popular in statistical and financial modelling. We design a semiparametric estimation procedure that first estimates the mean of the underlying normal distribution and then recovers nonparametrically the density of the corresponding mixing distribution. We illustrate the performance of our procedure on simulated and real data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Financial Risk and Volatility Modeling
Semiparametric estimation in the normal variance-mean mixture model
Denis Belomestnylabel=e1][email protected] [ University of Duisburg-Essen
Thea-Leymann-Str. 9, 45127 Essen, Germany
and
Laboratory of Stochastic Analysis and its Applications
National Research University Higher School of Economics
Shabolovka, 26, 119049 Moscow, Russia
Vladimir Panovlabel=e2][email protected] [ Laboratory of Stochastic Analysis and its Applications
National Research University Higher School of Economics
Shabolovka, 26, 119049 Moscow, Russia
Abstract
In this paper we study the problem of statistical inference on the parameters of the semiparametric variance-mean mixtures. This class of mixtures has recently become rather popular in statistical and financial modelling. We design a semiparametric estimation procedure that first estimates the mean of the underlying normal distribution and then recovers nonparametrically the density of the corresponding mixing distribution. We illustrate the performance of our procedure on simulated and real data.
variance-mean mixture model,
semiparametric inference,
Mellin transform,
generalized hyperbolic distribution,
keywords:
T1 This work has been funded by the Russian Academic Excellence Project “5-100”.
Contents
1 Introduction and set-up
A normal variance-mean mixture is defined as
[TABLE]
where stands for the density of a normal distribution with mean and variance , and is a mixing distribution on As can be easily seen, a random variable has the distribution (1) if and only if
[TABLE]
The variance-mean mixture models play an important role in statistical modelling and have many applications. In particular, such mixtures appear as limit distributions in the asymptotic theory for dependent random variables and they are also useful for modelling data stemming from heavy-tailed and skewed distributions, see, e.g. Barndorff-Nielsen, Kent and Sørensen [6], Barndorff-Nielsen [4], Bingham and Kiesel [9], Bingham, Kiesel and Schmidt [10]. If is the generalized inverse Gaussian distribution, then the normal variance-mean mixture distribution coincides with the so-called generalized hyperbolic distribution. The latter distribution has an important property that the logarithm of its density function is a smooth unimodal curve approaching linear asymptotes. This type of distributions was used to model the sizes of the particles of sand (Bagnold [2], Barndorff-Nielsen and Christensen [5]), or the diamond sizes in marine deposits in South West Africa (Barndorff-Nielsen [3]).
In this paper we study the problem of statistical inference for the mixing distribution and the parameter based on a sample from the distribution with density This problem was already considered in the literature, but mainly in the parametric situations. For example, in the case of the generalised hyperbolic distributions some parametric approaches can be found in Jørgensen [12], and Karlis and Lillestöl [13]. There are also few papers dealing with the general semiparametric case. For example, Korsholm [14] considered the statistical inference for a more general model of the form
[TABLE]
and proved the consistency of the non-parametric maximum likelihood estimator for the parameters and , whereas was treated as an nuisance probability distribution. Although the maximum likelihood (ML) approach of Korsholm is rather general, its practical implementation would meat serious computational difficulties, since one would need to solve rather challenging optimization problem. Note that the ML approach for similar models was also considered by van der Vaart [19]. Among other papers on relevant topic, let us mention the paper by Tjetjep and Seneta [18], where the method of moments was used for some special cases of the model (1), and the paper by Zhang [20], which is devoted to the problem of estimating the mixing density in location (mean) mixtures.
The main contribution of this paper is a new computationally efficient estimation approach which can be used to estimate both the parameter and the mixing distribution in a consistent way. This approach employs the Mellin transform technique and doesn’t involve any type of high-dimensional optimisation. We show that while our estimator of converges with parametric rate, a nonparametric estimator of the density of has much slower convergence rates.
The paper is organized as follows. In Section 2 the problem of statistical inference for is studied. Section 3 is devoted to the estimation of under known and Section 4 extends the results of Section 3 to the case of unknown A simulation study is presented in Section 5 and a real data example can be found in Section 6.
2 Estimation of
First note that the density in the normal variance-mean model can be represented in the following form
[TABLE]
where
[TABLE]
This observation in particularly implies that
[TABLE]
and therefore, dividing (4) by (5), we get
[TABLE]
The formula (6) represents in terms of The representation (6) can also be written in the form
[TABLE]
which looks similar to the entropy of
[TABLE]
For a comprehensive overview of the methods of estimating , we refer to [7]. Note also that the estimation of the functionals like (7) was considered in [16]. Typically, the parametric convergence rates for the estimators of such functionals can be achieved only under very restrictive assumptions on the density In an approach presented below, we avoid these restrictive conditions and prove a square root convergence under very mild assumptions. Let be a Lipschitz continuous function on satisfying
[TABLE]
for some Set
[TABLE]
then the function is monotone and This property suggests the following method to estimate Without loss of generality we may assume that for some Set
[TABLE]
with
[TABLE]
Note that since and
[TABLE]
for all the function is monotone and is unique. The following theorem describes the convergence properties of in terms of the norm where
Theorem 2.1**.**
Let and be such that
[TABLE]
Then
[TABLE]
with a constant depending on and only.
3 Estimation of with known
In this section, we assume that the distribution function has a Lebesgue density and our aim is to estimate from the i.i.d. observations of the random variable with the density provided that the parameter is known. The idea of the estimation procedure is based on the following observation. Due to the representation (2), the characteristic function of has the form:
[TABLE]
where is the Laplace transform of the r.v. and is the characteristic exponent of the normal r.v. with mean and variance Our approach is based on the use of the Mellin transform technique. Set
[TABLE]
then by the integral Cauchy theorem
[TABLE]
where is the curve on the complex plane defined as the image of by mapping from to that is, is the set of points satisfying Therefore, we get
[TABLE]
so that the Mellin transform can be estimated from data via
[TABLE]
where
[TABLE]
and is sequence of positive numbers tending to infinity as This choice of the estimate for is motivated by the fact that the function
[TABLE]
is bounded for any iff , and the function
[TABLE]
is bounded for any iff . Therefore, both integrals in (10) converge. Moreover, note that this estimate possesses the property which also holds for the original Mellin transform
The Mellin transform is closely connected to the Mellin tranform of the density Indeed,
[TABLE]
Therefore, the Mellin transform of the density of the r.v. can be represented as
[TABLE]
Using the last expression and taking into account (10), we define the estimate of by
[TABLE]
Finally, we apply the inverse Mellin transform to estimate the density of the r. v. Since the inverse Mellin transform of is given by
[TABLE]
for any we define the estimate of the mixing density via
[TABLE]
for some and a sequence as The convergence rates of the estimate crucially depend on the asymptotic behavior of the Mellin transform of the true density function In order to specify this behavior, we introduce two classes of probability densities:
[TABLE]
where , . For instance, the gamma-distribution belongs to the first class, and the beta-distribution - to the second, see [8].
The following convergence rates are proved in Section 7.2.
Theorem 3.1**.**
Let and for some
- (i)
If for some , then under the choice it holds for any
[TABLE]
*for any where stands for an inequality up to some positive finite constant depending on and * 2. (ii)
If for some , then for any and any
[TABLE]
for any where stands for an inequality up to some positive finite constant depending on and
4 Estimation of with unknown
Using the same strategy as in the previous section and substituting the true value by the estimate we arrive at the following estimate of the density function in the case of an unknown
[TABLE]
where The next theorem shows that the difference between and is basically of order
Theorem 4.1**.**
Let the assumptions of Theorem 3.1 be fulfilled, for some and . Furthermore, let be a consistent estimate of Then for any
[TABLE]
for any where
[TABLE]
and are positive deterministic sequences such that
[TABLE]
as where stands for an inequality with some positive finite constant depending on the parameters of the corresponding class. In particular, in the setup of Theorem 3.1(i),
Corollary 4.2**.**
In the setup of Theorem 2.1, it holds
[TABLE]
for any
5 Numerical example
In this section, we illustrate the performance of estimation our algorithm in the case, when is the distribution function of the so-called generalized inverse Gaussian distribution with a density
[TABLE]
where and is the Bessel function of the third kind. Trivially, is an exponential class of distributions. Furthermore, it is interesting to note that these distributions are self-decomposable, see [11], and therefore infinitely divisible.
With this choice of the mixing distribution , the random variable defined by (2), has the so-called generalized hyperbolic distribution, GH (), with a density function, which can be explicitly computed via (1). In particular, in the case the density function is of the form
[TABLE]
It would be an interesting to note that the plot of the log-density has two asymptotes , see Figure 1. For some other properties of this distribution, we refer to [9].
The aim of this simulation study is to estimate and based on the observations of the r.v. . Following the idea of Section 2, we first choose the odd weighting function
[TABLE]
Note that is bounded and supported on For our numerical study, we take and The boxplots of the estimate based on simulation runs are presented on Figure 2.
Next, we estimate the density function for where constitute an equidistant grid on To this end, we use the estimate constructed in Section 3,
[TABLE]
where is the empirical characteristic function of the random variable . The error of estimation is measured by
[TABLE]
We take and the parameters and are chosen by numerical optimization of the functional which yields in our case the values and Following the ideas of Section 4, we consider also the estimate which is obtained from by replacing with its estimate see (14). The difference between and (which was theoretically considered in Theorem 4.1) is illustrated by boxplots on Figure 3, which shows that the quality of these estimates is essentially the same.
6 Real data example
In this section, we provide an example of the application of our model (1) for describing the diamond sizes in marine deposits in South West Africa. The motivation for using this model in this problem can be found in the paper by Sichel [17]: “According to one geological theory diamonds were transported from inland down the Orange River… One would expect then that the diamond traps would catch the larger stones preferentially and that the average stone weights would decrease as distance from the river mouth increased. Intensive sampling has actually proved this hypothesis to be correct…”
Later, Sichel claims that although “for relatively small mining areas, and particular for a single-trench unit, the size distributions appear to follow the two-parameter lognormal law,” for large mining areas, one should expect that the parameters of the lognormal law depend on the distance from the mouth of the river. Moreover, taking into account the geological studies, it is reasonable to assume that these parameters are related inversely to the distance from the mouth of the river and related directly to each other. Based on these ideas, Sichel proposes to use the model (1) (or, more precisely, a slightly more general model (3)) with corresponding to the gamma distribution. Later, Barndorff-Nielsen [3] applied the same model with corresponding to the generalized inverse Gaussian distribution, which was presented above in Section 5.
Below we apply our approach to the same data, which can be found (in aggregated form) both in [3] (p. 409) and [17] (p. 242). We have 1022 observations of stone sizes, measured in carats, and aim to fit the model (1) to the density of the logarithms of these sizes. The estimation scheme consists of 2 steps.
First, we estimate the parameter by defined in (8). In this example, we got an estimated value of the parameter equal to . Note that the positive sign of this estimate is important due to the demand on direct relation between the parameters. 2. 2.
Second, we estimate the density by defined in (14) for from the equidistant grid on with step . The plot of this function is given as Figure 4.
To illustrate the performance of our procedure, we also estimate the density fitted by the model (1):
[TABLE]
The performance of this estimate can be visually checked by Figure 5.
7 Proofs
7.1 Proof of Theorem 2.1
Note that
[TABLE]
The first summand in the r.h.s. can be bounded by taking into account that
[TABLE]
for some and hence on the event it holds
[TABLE]
Note that the function
[TABLE]
is positive on and attains its minimum at with
[TABLE]
Hence
[TABLE]
and continuing line of reasoning in (16), we get
[TABLE]
Furthermore, due to the monotonicity of and the fact that on the event we get
[TABLE]
Set then
[TABLE]
with By the Rosenthal and Lyapunov inequalities
[TABLE]
where the second inequality follows from Next, applying the maximal inequality (see Theorem 8.4 in [15]), we get
[TABLE]
for some constants and depending on and Hence
[TABLE]
Finally, we get the result
[TABLE]
7.2 Proof of Theorem 3.1
1. The bias of
[TABLE]
Taking into account (11), we get that the last term in this representation can be written as
[TABLE]
where
[TABLE]
Substituting these expressions into (17) and taking into account that and , we derive
[TABLE]
where
[TABLE]
Upper bound for directly follows from our assumption on the asymptotic Mellin transform In the exponential case,
[TABLE]
whereas in the polynomial case To show the asymptotical behavior of , , we first derive the upper bound for the characteristic function
[TABLE]
where the last asymptotic inequality follows from the integration by parts:
[TABLE]
for all Second, it holds
[TABLE]
for any such that and some constant , see [1]. Therefore, we get
[TABLE]
2. Next, we consider the variance of the estimate
[TABLE]
where
[TABLE]
The last inequality follows from the fact that for any integrable function and any random variable it holds
[TABLE]
It would be a worth mentioning that
[TABLE]
and moreover it holds
[TABLE]
where we use the assumption . Finally, we conclude that
[TABLE]
where we use that
[TABLE]
3. Set then we have the following the bias-variance decomposition:
[TABLE]
For instance, in the exponential case, this decomposition yields
[TABLE]
The last expression suggests the choice under which
[TABLE]
Choosing in the form we arrive at the desired result.
7.3 Proof of Theorem 4.1
It’s easy to see that
[TABLE]
where
[TABLE]
Below we consider in details the treatment for the second term follows the same lines. Denote
[TABLE]
In this notation,
[TABLE]
Note that
[TABLE]
Next, applying the Taylor theorem for the function in the vicinity of zero with we get
[TABLE]
where
[TABLE]
where and Note that uniformly on and it holds
[TABLE]
and moreover From (20) it follows then
[TABLE]
and
[TABLE]
In the sequel we assume for simplicity that on the second stage (estimation of ) we use another sample, independent of that was used for the estimation of . Substituting (21) into (19), we get (15) with
[TABLE]
Note that due to the Minkowski inequality ,
[TABLE]
Taking into account that and moreover
[TABLE]
we get that
[TABLE]
where we use the inequality (18). Therefore, under our choice of and , we get Analogously,
[TABLE]
This observation completes the proof.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Andrews, G.E., Askey, R., and Roy, R. Special functions, volume 71 of Encyclopedia of mathematics and its applications . Cambridge University Press, 1999.
- 2[2] Bagnold, R.A. The physics of blown sand and desert dunes . Matthew, London, 1941.
- 3[3] Barndorff-Nielsen, O. Exponentially decreasing functions for the logarithm of particle size. Proc.R.Soc.London , A(353):401–419, 1977.
- 4[4] Barndorff-Nielsen, O. Normal inverse gaussian distributions and stochastic volatility modelling. Scandinavian Journal of statistics , 24(1):1–13, 1997.
- 5[5] Barndorff-Nielsen, O. and Christensen, C. Erosion, deposition, and size distributions of sand. Proc.R.Soc.London , A(417):335–352, 1988.
- 6[6] Barndorff-Nielsen, O., Kent, J., and Sórensen, M. Normal variance-mean mixtures and z distributions. International statistical review , 50:145–159, 1982.
- 7[7] Beirlant, J., Dudewicz, E. J., Györfi, L. and van der Meulen, E. C. Nonparametric entropy estimation: an overview. Int. J. Math. Stat. Sci. , 6(1):17–39, 1997.
- 8[8] Belomestny, D. and Panov, V. Statistical inference for generalized Ornstein-Uhlenbeck processes. Electron. J. Statist. , 9:1974–2006, 2015.
