Tails and probabilities for $p$-outside values
Pavlina Jordanova

TL;DR
This paper introduces a new method for classifying distribution tails using probabilities of outside values, which are invariant under affine transformations and do not rely on moments, addressing limitations of existing tail measures.
Contribution
It proposes a novel tail characterization based on outside value probabilities, overcoming dependence on moments and distributional scale, inspired by Tukey's box plots.
Findings
Outside value probabilities effectively characterize tail behavior.
The method is invariant under affine transformations.
It provides a consistent tail measure across distributional types.
Abstract
The task for a general and useful classification of the tail behaviors of probability distributions still has no satisfactory solution. Due to lack of information outside the range of the data the tails of the distribution should be described via many characteristics. Index of regular variation is a good characteristic, but it puts too many distributions with very different tail behavior in one and the same class. One can consider for example Pareto(), Frchet() and Hill-horror() with one and the same fixed parameter . The main disadvantage of VaR, expectiles, and hazard functions, when we speak about the tails of the distribution, is that they depend on the center of the distribution and on the scaling factor. Therefore, they are very appropriate for predicting "big losses", but after a right characterization of the distributional type of…
| n | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| 0.0452 | 0.0146 | 0.0064 | 0.0033 | 0.0019 | |
| n | 6 | 7 | 8 | 9 | 10 |
| 0.0012 | 0.0008 | 0.0006 | 0.0004 | 0.0003 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFinancial Risk and Volatility Modeling · Statistical Distribution Estimation and Applications · Hydrology and Drought Analysis
Tails and probabilities for -outside values
Pavlina Jordanova
Faculty of Mathematics and informatics, Konstantin Preslavsky University of Shumen,
Universitetska 115, Shumen, Bulgaria
Abstract
The task for a general and useful classification of the tails of probability distributions still has no satisfactory solution. Due to lack of information outside the range of the data the tails of the distribution should be described via many characteristics. Index of regular variation is a good characteristic, but it puts too many distributions with very different tail behavior in one and the same class. One can consider for example Pareto(), Frchet() and Hill-horror() with one and the same fixed parameter . The main disadvantage of VaR, expectiles, and hazard functions, when we speak about the tails of the distribution, is that they depend on the center of the distribution and on the scaling factor. Therefore they are very appropriate for predicting ”big losses”, but after a right characterization of the distributional type of ”the payoff”. When analyzing the tail of the observed distribution we need some characteristic which does not depend on the moments because in the most important cases of the heavy-tailed distributions theoretical moments do not exist and the corresponding empirical moments fluctuate too much. In this paper, we show that probabilities for different types of outside values can be very appropriate characteristics of the tails of the observed distribution. They do not depend on increasing affine transformations and do not need the existence of the moments. The idea origins from Tukey’s box plots, and allows us to obtain one and the same characteristic of the tail of the observed distribution within the whole distributional type with respect to all increasing affine transformations. These characteristics answer the question:
At what extent we can observe ”unexpected” values?
keywords:
Tail inference , Point estimators
MSC:
[2010] 60E15 , 60E05 , 62G32
mytitlenotemytitlenotefootnotetext: The author is grateful to the bilateral projects Bulgaria - Austria, 2016-2019, ”Feasible statistical modelling for extremes in ecology and finance”, BNSF, Contract number 01/8, 23/08/2017 and WTZ Project No. BG 09/2017..myfootnotemyfootnotefootnotetext: Corresponding author: [email protected], [email protected]
1 Motivation and history of the problem
The task for a general and useful classification of probability distributions with respect to the their tails seems to be still open. Embrechts et al. (1997) [20] have made a very useful figure about the relations between different subclasses of heavy-tailed distributions in the sense of the infinite moment generating function for all positive arguments. However, this classification puts too many distributions with very different tail behavior in one and the same class. According to this classification for example Pareto(), Frchet() and Hill-horror() distributions, with one and the same fixed parameter belong to one and the same class of distributions with regularly varying tails with parameter . However, the chance to observe ”unexpected” value in these three cases is very different, especially for the Hill-Horror distribution. See Figure 2. Comparison of the corresponding hazard rate functions , cumulative distribution functions (c.d.fs), and probability density functions (p.d.fs) when is fixed, and close to the ends of the support of the corresponding distributions is a relatively good approach, but all these characteristics depend on the center of the distribution and the scale parameters. In order to delete these dependencies usually, we normalize the considered random variable(r.v.) with the variance. However, in order to do this, we need existence not only of the first but also of the second moment of the observed distribution. In the most important cases of heavy-tailed distributions, these moments do not exist and this approach is not applicable. Therefore we need some characteristics which describe separately the left and the right tail of the distribution and do not depend on the moments. Index of regular variation, in cases when it is meaningful, is not enough. These lead us to the idea about usage of quantiles, VaR and expectiles, described e.g. in Daouia et al. (2018) [12] or Marinelli et al. (2007) [38]. They are good characteristics, but also depend on the center of the distribution and the scaling factor. Therefore they are very appropriate for predicting ”big looses” within a fixed family of distributional type, but first, you need a right characterization of the type of the ”the profit and loss” distribution. A long critical review about the kurtosis can be seen in Balanda and MacGillary (1988) [4]. Harter (1959) [29] uses sample quasi-ranges in estimating population standard deviation. Mosteller (1946) [39] and Sarhan (1954) [52] propose estimators of the mean and standard deviation which are functions of order statistics. They give us the idea to work with something related to the quantile spread.
Due to lack of information outside the range of the data the tails of the distribution should be described via many characteristics. Through the paper, we show that probabilities for different orders of outside values can be appropriate characteristics for solving this task. Their properties outperform the properties of the kurtosis, tail index and hazard function when speaking about classification with respect to the tail of the observed distribution. The main their advantages are that they do not depend on the center and the scaling factor of the distribution, and do not need the existence of the moments. They are useful for answering the question:
At what extend we should observe ”unexpected” values?
The idea origins from Tukey’s box plots (1977) [58] and Balanda and MacGillary’s (1990) [5] spread-spread plot. However instead of the quantiles here we use probabilities. This allows us to obtain one and the same characteristic of the tail of the observed distribution within all distributional type with respect to increasing affine transformations.
In Section 2 we define and investigate the general properties of probabilities for -outside values. In Section 3 their explicit forms are calculated and plotted for the most popular probability distributions. Section 4 investigates asymptotic properties of empirical left and right -fences, and the estimators of probabilities for left and right -outside values. A result about strong consistency of relative frequency estimator completes that part.
Different estimators of the exponent of regular variation are proposed in Hill (1975) [30], Pickands (1975)[47] and Deckers-Einmahl-de Haan Dekkers (1989) [15], Einmahl and Guillou (2008) [19], t-Hill Stehlik et al. (2010) [57], Pancheva and Jordanova (2012) [44, 34], Jordanova et al. (2016) [33] among others. The mean of order generalization of the t-Hill and Hill statistics is introduced by Beran et al. (2014) [6], Caeiro et al. (2016) [8] and Paulauskas and Vaiciulis (2017) [46]. Another approach can be seen in Huisman et al. (2001) [31] who recommend to correct small-sample bias of Hill estimators via weighted averages of its values for different thresholds. In Sections 5 the previous results are applied and a completely new approach for estimating the parameter of the heaviness of the tail of the observed distribution is demonstrated. Four of the examples consider cumulative distribution functions (c.d.fs.) with regularly varying right tail. These are Pareto, Frchet, Log-logistic and Hill-horror cases. It is easy to realize that in order to estimate their indexes of regular variation working with small samples only distribution sensitive estimators can be useful. The main idea of this section is to show that our approach works also in the case when the c.d.f. of the observed r.v. does not have regularly varying right tail. We depict this result via an example about distribution (30) which tail is not regularly varying. The paper finishes with some conclusive remarks. The proofs are sent to the Appendix section. All plots and computations are made via software R (2018)[48].
In this work we do not use the second order regular variation introduced in de Haan and Stadtmueller (1996) [14] for distributions with regularly varying tails, because it is applicable only for huge samples. A very comprehensive study of analyzing extreme values under the second order regularly varying condition can be found e.g. in de Haan and Ferreira (2006) [13]. The corresponding properties of the convolutions and the central limit theorem are obtained by Geluk and de Haan (1997) [24].
Through the paper we use the following notations: is for the equality in distribution, means convergence in distribution, denotes almost sure convergence. means that the considered r.v. belongs to the corresponding probability type. is asymptotic equivalence. is for the beta function, and denotes Beta distribution with parameters and . , means that a r.v. has Pareto c.d.f.
[TABLE]
More general definitions of Pareto distributions, together with very useful descriptions of the relations between them and the most important other distributions could be seen e.g. in Arnold (2015) [3].
, means that a r.v. has Frchet c.d.f.
[TABLE]
is an abbreviation of the fact that the r.v. has a Negative Weibull c.d.f.
[TABLE]
We consider only absolutely continuous distributions. For the theoretical quantile function of the c.d.f. is defined as
[TABLE]
Let be a sample of independent observations on a r.v. with c.d.f. . Here we denote the corresponding order statistics by . In Parzen (1979) [45], Hyndman et al. (1996) [32], Langford (2006) [37] among others, one can find different definitions of empirical -quantiles, . We use the following one , where means the integer part of . 111As it is noticed in Chu (1957) [10], for large samples, these methods are equivalent because we consider only absolutely continuous distributions.
2 Using probabilities for -outside values for characterising the tails of the observed distribution
The idea about classification of distributions based on quartiles and box plots comes from Tukey (1977) [58], and recently was reminded by Devore (2015) [18] and Jordanova and Petkova (2017) [35]. Here we generalize this concept and introduce one more parameter in the definition of outside values, which allows the researchers to decide at what extend atypical observations would be called ”outside value”.
Denote by
[TABLE]
and by
[TABLE]
correspondingly empirical p-right- and empirical p-left-fence.
Their sum is equal to . The difference between these quantiles is very well known. It is called empirical quantile spread(quasi range), and is considered e.g. in Gumbel(1944)[26], Monsteller (1946) [39], and Balanda and MacGillary (1990) [5]. The meaning of these values comes from the expression
[TABLE]
It is clear that analogously to Tukey’s box-plot (1977) [58], one can use empirical box plot of order p. Its borders are determined via the values
[TABLE]
The most frequently . This case is partially investigated in the supplementary material of Soza et al. (2019) [56].
Sample right or left -outside values are the observations which fall outside the interval . Their absolute frequencies, strongly depend on the sample size.
Definition 1
Assume . We call on observation sample(empirical)
right -outside values* if *
- 2.
left -outside values* if *
For , the definition coincides with the one in Devore (2015) [18]. He calls them ”extreme right” and ”extreme left outliers”.
Denote by , and the numbers of these outside values in a sample of independent observations. According to the A.Kolmogorov’s Zero-one law, never mind how small, but strictly positive is one can almost sure observe such outside values in a large enough sample of observations on a r.v. with many light tailed distributions, e.g. Gaussian. Therefore the number of outside values is not too informative. In order to classify distributions with respect to their tail behaviour we propose to compare their theoretical probabilities an observation to be an outside value of the considered type. Denote by
[TABLE]
the observed r.v. to be left or right -outside value. Here, analogously to and we have denoted by
[TABLE]
theoretical p-right fence and by
[TABLE]
theoretical p-left fence. Their properties are analogous to the properties of empirical p-right- and p-left-fences.
Let us note that for any absolutely continuous c.d.f.
[TABLE]
is the median and
It is not difficult to check that and are monotone. Therefore by monotonicity of probability measures, the characteristics and are also monotone.
Theorem 1
For a fixed if there exist , and , then,
a)
* is increasing in ;*
b)
* is decreasing in ;*
c)
* and are non-decreasing in .*
For , and are correspondingly the probabilities an observation to be left- or right- extreme outlier. Jordanova and Petkova (2018) [36] and Soza et al.(2019) [56] denote these probabilities by and . In that case, the authors obtain them for Pareto, Frchet, , , and Hill-Horror distributions. Further on, we generalize these results.
Theorem 2
Assume , and and are strictly monotone, well defined, and continuous function in the considered values. The characteristics , , possess the following properties:
a)
* and .*
b)
, , .
c)
If , then , .
d)
If , then , .
e)
If is continuous and strictly increasing
[TABLE]
and
[TABLE]
[TABLE]
f)
If is continuous and strictly decreasing
[TABLE]
and
[TABLE]
[TABLE]
g)
**
**
h)
**
**
i)
For all ,
[TABLE]
k)
Let , and . Denote correspondingly the right-, left-, and double-truncated r.vs. by , , . If , and , then the relations between their probabilities for -outside values, , and are the following:
[TABLE]
If , and
, then
[TABLE]
, then
[TABLE]
For any r.v. , such that , if
If , then
[TABLE]
If , then
[TABLE]
If
, then
[TABLE]
If , then
[TABLE]
Remark 1
Note that in Theorem 1, l), the expressions for and do not depend on the exact value of but only on the fact if , or . The last means that, according to this classification of absolutely continuous probability distributions with respect to the tails of their c.d.fs., if we decide to change these characteristics and take a logarithm, the exact value of the basis of the logarithm is not important for probabilities for outside values of the transformed distribution. Only the fact that it is bigger or less than can influence and .
Corollary of f): Let be fixed.
If , then
- 2.
If , then
The next corollary corresponds to the well-known experience that taking a logarithm with basis bigger than one of the data we decrease the chance to observe right -outside values and increase the chance to observe left -outside values. Together with Theorem 3, they show once again the appropriateness of these characteristics when speaking about the tail of the observed distribution.
Corollary of e): Let be fixed. Suppose ,
if , then
[TABLE]
[TABLE]
If , then
[TABLE]
[TABLE]
According to characteristics, taking powers bigger than 1 of the data we increase the chance to observe right -outside values, and decrease the chance to observe left -outside values in the observed distribution.
Theorem 3
For , and ,
a)
, then
[TABLE]
b)
If , and , then and
[TABLE]
c)
If , and , then and
[TABLE]
Application of these probabilities requires knowledge about their values for different distributions. Therefore we have calculated some of their explicit forms in the next section.
3 The most important particular cases
In order to choose the most appropriate class for modeling the tails of the c.d.f. of the observed r.v. we can first calculate the probabilities for left and right -outside values, for as more as possible distributional types, and then to compare these probabilities with corresponding estimators. This approach is analogous to the comparison of the means in cases when we are interested in the center of the distribution. Let us now present the exact values of these characteristics in some of the most popular cases of probability distributions used in practice for modeling heavy tails. Till the end of this section, we assume that .
The dependencies of on the parameter , which characterises the tail of the corresponding distribution in cases when is , or Frchet, Pareto, Stable, Weibull positive, , , log-Pareto, Hill horror or Burr distributed are depicted on Figure 2 and Figure 2, and could be seen also in Jordanova and Petkova (2018) [36], and in the supplementary material of Soza et al. (2019) [56].
Exponential distribution. Let , and be Exponential with mean .
It is well known that is a scale parameter of the exponential distribution, therefore due to Th. 1, c) without lost of generality (w.l.g.) we can assume that and this will not change the values of and . In this case,
[TABLE]
where is the solution of the equation and .
[TABLE]
In particular, for , the empirical right fence is asymptotically unbiased and efficient estimator for the theoretical right fence
[TABLE]
[TABLE]
where is the Polygamma function (for the last limit see Guo and Feng (2013) [28]), and See Jordanova and Petkova (2017-2018) [36, 35].
Further on in this section (due to properties b) and c) Theorem 2, w.l.g. we assume that and .
Generalized Pareto distribution (GPD). Consider and .
[TABLE]
We have already considered the case . In that case, it is well known that, the GPD coincides with Exponential distribution. So, here we assume that . Then the quantile function is We replace it in the formula for , then in the definition for , and obtain
[TABLE]
In order to replace (22) in the last probability we need to consider separately the following two cases:
Case . In this case . The last expression is equal to , therefore .
Case . In this case , therefore
[TABLE]
Analogously we replace the quantile function in the definition for , then in , and obtain
[TABLE]
In order to calculate this expression we need to determine the sign of . Therefore again we consider two cases.
Case . Because of , we have , and
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Therefore
[TABLE]
In case , and this expression coincide with the one in Jordanova and Petkova (2018)[36]
[TABLE]
Case . Because of we have that , and
In case
[TABLE]
[TABLE]
[TABLE]
[TABLE]
we have that .
If ,
[TABLE]
When speaking about heavy tails we can not forget about Extreme value distributions with respect to linear transformations. Therefore in the next three points, we will consider them. At the beginning of the last century Fisher and Tippet (1928)[23], Gnedenko (1943) [25], and Gumbel (1958)[27] have shown that they appear as limiting distributions of maxima of i.i.d. r.vs. after appropriate affine transformations.
Frchet distribution. Let . W.l.g. we assume that in (2) .
Therefore for , .
[TABLE]
and because of
[TABLE]
[TABLE]
when , and , for .
Analogously
[TABLE]
Now we need to consider the expression after the inequality. As far as for all the following four expressions are equivalent
[TABLE]
[TABLE]
[TABLE]
[TABLE]
we have
[TABLE]
Figure 2 represents the dependence of on in case . The explicit formula for this case can be seen also in Jordanova and Petkova (2018) [36].
Weibull negative distribution. Consider . W.l.g. . See (3). Therefore for ,
Note that its positive version coincides in distribution with a standard Exponentially distributed r.v. raised to the power . Therefore, for , according to our classification, this distribution has heavier right tail than the exponential one and for vice versa. This can be seen also on Figure 2, where standard exponential distribution is depicted as . More precisely
[TABLE]
and because of for all we have , therefore
[TABLE]
Figure 2, depicts the dependence of , (i.e. is Weibull positive) on .
Analogously
[TABLE]
As far as for all
[TABLE]
[TABLE]
[TABLE]
[TABLE]
we have
[TABLE]
Gumbell distribution. Let , and
[TABLE]
W.l.g. and . ,
[TABLE]
[TABLE]
Jordanova and Petkova (2018) [36] have calculated that and .
Logistic distribution. Assume , and
[TABLE]
W.l.g. and . ,
[TABLE]
Log-logistic distribution. Assume , and
[TABLE]
W.l.g. and . As far as , and for
[TABLE]
therefore by definition
[TABLE]
The plot of this function of for can be seen on Figure 4. We observe that for a fixed its right tail is very similar to the corresponding tails of Pareto and Frchet distribution and heavier than the Stable one. Analogously
[TABLE]
therefore for , and
[TABLE]
and otherwise.
In the next two cases we assume that and . W.l.g. , because it is a location parameter, and and because they are scale parameters. See Burr (1942) [7] or Einmahl et al. (2008) [19].
Burr distribution. Let
[TABLE]
For , the quantile function is
[TABLE]
(see also Nair et al. (2013) [40]). Therefore in our context from (27) we have that if
[TABLE]
then
[TABLE]
and otherwise.
For and the inequality
[TABLE]
the definition of , and (27) entail
[TABLE]
The dependence of on and for different values of is depicted on Figure 2. We see that when increases, the chance to observe -outside values in the considered distribution decreases. The last means that for Burr distribution not only but also influences the tail-behaviour.
Reverse Burr distribution.
[TABLE]
For , the corresponding quantile function is
[TABLE]
We are interested in the case when . It guarantees that
[TABLE]
Therefore from (28) we have
[TABLE]
The dependence of on and for different values of is depicted on Figure 4. We observe that for the tail behavior of the Reverse-Burr distribution is much more sensitive on than on .
If
[TABLE]
and otherwise.
Gompertz distribution. Assume , , , and
[TABLE]
For , and ,
For we have that is equal to
[TABLE]
and otherwise. In particular .
As far as for all therefore from (29) and the definition of we have that it is equal to
[TABLE]
Note that or all fixed , and within the considered distributions with in this study, this distribution has smallest value of .
The next two distributions that we consider belong to the class of so-called p-max-stable laws. They can be generalized to a strictly increasing affine transformation and , and characteristics will not change. Using power normalizations Pancheva (1985) [43] obtained them as limiting laws of power transformed maximums. Falk et al. (2004) [22] describe domains of attraction of these laws under power normalization. We have already seen in Corollary 1 of e), Theorem 1 that these transformations increase the values of . Ravi and Saeb (2012) [49] have obtained their entropies.
type. (See Pancheva (1985) [43]). Let , and , i.e.
[TABLE]
The quantile function of this distribution is . 222It can be seen e.g. in the supplementary material of Soza et al. (2019) [56] who consider the case . Therefore
[TABLE]
and for , is equal to
[TABLE]
otherwise . The last mean that for large it is possible to observe also left outside values.
Now let us consider the right tail. As far as for all
[TABLE]
[TABLE]
[TABLE]
and
[TABLE]
[TABLE]
Figure 2 shows their dependence on in case . This case is considered in the supplementary material of Soza et al. (2019) [56].
type. Let ,
[TABLE]
It is one of the limiting distributions of power-transformed maxima obtained in Pancheva (1984) [43]. Ravi and Saeb (2012) [49] calculate its Shannon entropy. type is known also as log-Weibull law, and it is one of the p-max stable laws. The quantile function has the form , Therefore
[TABLE]
And for we have that is equal to
[TABLE]
and otherwise.
When consider the right tail for all as far as
[TABLE]
[TABLE]
[TABLE]
[TABLE]
The last expression is equal to , therefore is equal to
[TABLE]
As in the previous case for large values of it is possible to observe both left and right outside values. The dependance of on is depicted on Figure 2. We call the distribution of , ” positive”, and we have denoted it by .
Note that the function is decreasing in and therefore for
[TABLE]
See the supplementary material of Soza et al. [56].
type. Log-Pareto law with parameter seems to be introduced in Cormann and Reiss (2009) [11]. More precisely here we assume that
[TABLE]
This distribution belongs to class considered e.g. in de Haan and Ferreira (2006) [13] or Embrehts et al. (1997) [20]. Ravi and Saeb (2012) [49] investigate their entropies. Due to Corollary 1 of Theorem 1 its tail is heavier than the tail of Frchet distribution. The quantile function of this distribution is . Therefore for
[TABLE]
otherwise .
When consider the right tail for all as far as
[TABLE]
[TABLE]
[TABLE]
therefore from the definition of we obtain
[TABLE]
The dependence of on could be seen on Figure 2. We observe that its tail behaviour almost coincide with log-Frchet, i.e. , and within the considered distributions, for fixed according to the last two distributions have highest probabilities to observe extreme outside values. Having in mind that without transformations the tails of Frchet and Pareto distributions are heavy-tailed Cl. Neves et al. (2008) [41], Corman and Reiss (2009) [11] or Falk (2004) [22] call them ”super heavy-tailed”.
Further on in this section, we consider distributions which quantile function have no explicit form. Therefore we use R software (2018) [48] in order to obtain obtain and . Then we come back to the well-known formulas for c.d.f. and obtain characteristics.
Normal distribution. Assume , and .
W.l.g. and . Due to the symmetry of this distribution with respect to (w.r.t.) Oy, for all we have that . In particular .
-distribution. Assume and .
The symmetry of the p.d.fs. of these distributions w.r.t. implies , for all . The values of these characteristics for are presented in Table 1, and could be seen also in Jordanova and Petkova [36], and in the supplementary material of Soza et al. [56].
Gamma distribution. Assume , and which means that
[TABLE]
W.l.g. we assume that . The plot of the dependence of on is depicted on Figure 2.
Hill-horror distribution. For Embrechts et al. (1997) [20] define it via its quantile function
[TABLE]
Then . For the values of see Figure 2. We observe that this distribution has one of the heaviest right tails within the considered probability types.
Following this approach, we can find explicit values or plots of and characteristics for many other distributions and in this way to compare their tails. For example log-Positive Weibull, log-Gumbel, Invence-Gamma, Log-Gamma, Beta prime, or powers bigger than one, of these and other distributions.
4 Properties of the estimators
In the previous section, we have considered some particular cases of probability laws and we have shown which parameter governs the heaviness of the tail of the corresponding distribution according to our classification. For the distributions with regularly varying tails, it coincides with the very well-known index of regular variation. In this section, we obtain different asymptotic properties of the estimators of the corresponding parameters of heaviness of the tails. The general formula for the joint distribution of order statistics is very well known. Together with the formula for their conditional distributions they could be found e.g. in Nevzorov (2001) [42] or in Arnold et al. (1992) [2]. The following lemma is their immediate corollary. Its first part summarises the same results in the terms of equality in distributions. In vi) we have expressed the bivariate vector of order statistics as a bivariate function of independent r.vs. This allows as to make the same with the fences in the next property. Finally two explicit formulae for the probability mass functions of the numbers of left and right -outside values in a sample of independent observations are presented. Due to their complicated forms further on the section proceeds with asymptotic results.
Lemma 1
If , , , is some c.d.f. of a r.v. and is a constant, then
- i)
* where is the -th order statistics in a sample of independent observations on i.i.d. Exponential r.vs. with parameter .* 2. ii)
, where is the -th order statistics in a sample of independent observations on i.i.d. Pareto distributed r.vs. with parameters and . 3. iii)
, where is the -th order statistics in a sample of independent observations on i.i.d. Frchet distributed r.vs. with parameter and scale parameter . 4. iv)
* and , where is the -th order statistic of a sample of independent observations on a r.v. with absolutely continuous c.d.f. , .* 5. v)
For
[TABLE]
[TABLE]
where is the -th order statistics in a sample of independent observations on i.i.d. r.vs. with c.d.f. , , is the -th order statistics in a sample of independent observations on i.i.d. r.vs. with c.d.f. , , , and . 6. vi)
Assume and are independent, and . Denote by , and .333Then and . Then for , , ,
- (a)
**
[TABLE]
*Moreover . * 2. (b)
The empirical -right fences
[TABLE] 3. (c)
The empirical -left fences
[TABLE] 7. vii)
For ,
**
[TABLE]
**
[TABLE]
Remark 2
As an additional result, we can use the above theorem to obtain different univariate and bivariate distributions and new relations between them. This approach is analogous to the one applied by Eugene et al. (2002) [21] or Cordeiro et al. (2012) [1] among others, who consider Generalized-Beta generalized distributions.
Remark 3
Using the general formula for the moments of order statistics, for , and
[TABLE]
which could be seen e.g. in the books of Arnold et al. (1992)[2] or Nevzorov (2001) [42], we can easily obtain the general formulae for the mean and the variance of and in cases when they exist. For example in Pareto (1) case, for , , and , are asymptotically unbiased estimators correspondingly for and . More precisely
[TABLE]
[TABLE]
The following result is an immediate corollary of the definition of convergence in probability, quantile transform, a.s. convergence of empirical quantiles to the corresponding theoretical one, and Slutsky’s theorem about continuous functions. See e.g. Embrechts et al. (1997) [20].
Theorem 4
Given a sample of independent observations, for any fixed
[TABLE]
Cadwell (1953) [9] finds the distribution of quasi-ranges in samples from a normal population. He gives us the idea about the next result. Rider (1959) [50] obtains their exact distribution in case of samples from an exponential population. Sarhan et al. (1963) [53] propose simplified estimates in this case. The asymptotic normality of the appropriately normalized univariate distributions of the central order statistics is investigated in Smirnov (1949) [54]. Note that in the next theorem, because of the special choice of the numbers of order statistics, and his conditions and are satisfied. Moreover, in our case . The theorem about the joint distribution of the central order statistics of i.i.d. observations, could be seen e.g. in Nair (2013) [40], p.330, or Arnold et al. (1992) [2], p. 226, among others. The multivariate delta method is a very powerful technique for obtaining confidence intervals in such cases. It can be seen e.g. in Sobel (1982) [55]. In the next theorem we use these results and obtain the limiting distribution of the fences of central order statistics.
Theorem 5
Consider a sample of , observations on a r.v. with c.d.f. and p.d.f. . Suppose that there exists and . Then
[TABLE]
[TABLE]
and for
[TABLE]
[TABLE]
*where and
*
These allows as to compute the asymptotic confidence intervals of these estimators for , , and , fixed. Denote and . If the conditions of Theorem 5 are satisfied, then given ,
[TABLE]
[TABLE]
The next theorem explains why different probabilities for outside values can be useful for estimating the tail behaviour of the observed distribution. For a fixed and , we apply the approach of Dembinska (2012) [17], for the bivariate case, and obtain that and are strongly consistent estimators correspondingly of and . Moreover Dembinska (2017) [16] shows that this approach works not only for i.i.d., but also also for strictly stationary and ergodic sequences.
Theorem 6
*Let be fixed. Assume and
.*
If , then
[TABLE] 2. 2.
If , then
[TABLE]
5 Simulation study
In this section we assume that are independent realizations of , with c.d.f. . Jordanova and Petkova (2017-2018) [35, 36] assume that has regularly varying tail. More precisely they consider only the cases when there exists , such that for all ,
[TABLE]
The number is called ”index of regular variation of the tail of c.d.f.”, see e.g. de Haan and Ferreira (2006) [13] or Resnick (1987) [51]. Using the explicit form of the corresponding probabilities for extreme outliers according the definition in Devore(2015) [18],(-outliers) Jordanova and Petkova (2017-2018) [35, 36] obtain distribution sensitive estimators of the unknown parameter which governs the tail of the considered distributional type. The algorithm consists of the following three main steps.
Using the results from the previous two sections the explorer chooses the most appropriate probability type (let us call it ) for modeling the tail of the distribution of the observed r.v. 2. 2.
Using the formula for in case one expresses the unknown parameter . 3. 3.
Replace the theoretical characteristics in the previous step with the corresponding estimators and obtain a new estimator for the parameter which governs the tail behavior.
In their work Jordanova and Petkova (2017-2018) [35, 36] compare the obtained in this way estimators in Pareto, Frchet, and Hill-Horror case with Hill, t-Hill, Pickands, and Deckers-Einmahl-de Haan estimators and depict the results via a simulation study. Here we consider two more cases: Log-Logistic case (25) and case (30). Although in the last case the right tail of the c.d.f. is not regularly varying the next study shows that the approach still gives very good results.
In any of the following five examples using the functions implemented in R (2018), [48] we have simulated samples of independent observations separately on . Then for any fixed and for any fixed sample we have computed the estimators , , and . Here is the numbers of right extreme outside values in the considered sample of independent observations, and means one of the abbreviations , , , or explained below. Finally we have fixed one of the last estimators and we have averaged the corresponding values of over the considered . The next Figures 6-14 depict the dependence of these values, together with the corresponding asymptotic normal 95% confidence intervals, on the real type of the observed r.v., and on the sample size, for , or . We have chosen only the cases when is small because our observations show that for a fixed sample size the more the outside values, the heavier of the tail of the c.d.f. is and the better the corresponding estimator is.
Let us depict this approach with some examples. In any of them we suppose that and .
Example 1
Assime . See (1). Having a sample of independent observations on , analogously to the generalized method of moments, and following the above algorithm Jordanova and Petkova (2018) [36] obtain
[TABLE]
Example 2
If , see (2), Jordanova and Petkova (2018) [36] propose
[TABLE]
Example 3
Let be Hill-Horror distributed. This distribution is usually defined via its quantile function (32). Given Jordanova and Petkova (2018) [36] use
[TABLE]
The next two estimators seems to be new. They show that this approach can be applied in much wider than the regularly varying case.
Example 4
Let , see (30). It is difficult to solve (31) with respect to , therefore we solve the equation
[TABLE]
When express and replace the theoretical characteristics with the corresponding empirical one we obtain the estimator
[TABLE]
Example 5
Suppose follow Log-logistic probability law (25). The equation (26) have no explicit solution for , therefore we solve the equation
[TABLE]
Then we replace the theoretical characteristics with the corresponding empirical one we obtain the estimator
[TABLE]
Figures 6-14 depict the dependence of these estimators, together with their empirical 95% confidence intervals on the sample size, probability law of the simulated r.v., and the estimated parameter . The names of the estimators in these figures are abbreviated as follows: , , , , and .
The above simulation study shows that within the considered set of distributions given a small sample of observations the considered estimators outperform the properties of the well-known estimators proposed by Hill (1975) [30], Pickands (1975)[47] and Deckers-Einmahl-de Haan Dekkers (1989) [15]. Within the right probability type, the rate of convergence of any of them increases when the sample size increases and decreases. However, according to our investigation, these estimators are too distribution sensitive. The biggest their advantage is that they are applicable for relatively small samples.
6 Conclusive remarks
To the best knowledge of the author, a universal numerical characteristic of the tail of the c.d.f., which is invariant within distributional type (with respect to increasing affine transformation) is still not known. Here we show that probabilities of the events an observation to be -outside value can be very useful in this sense. They can be used for making a reasonable classification of the tails of probability distributions. They outperform the role e.g. of the excess in characterizing the tail of the observed distribution because they do not depend on the moments of the observed r.v. and could be applied also in cases when moments do not exist. Their estimators are appropriate for usage in preliminary statistical analysis in presence of corresponding outside values. They can help the practitioners to find the most appropriate classes of probability laws for modeling the tails of the distribution of the observed r.v. Within that family the parameter which influences the tails needs further estimation. According to our simulation study, the proposed algorithm for making estimators gives better results when decreases. The fast rate of convergence allows one to apply these estimators also for relatively small samples. However, the main disadvantage of all these estimators is that they are distribution sensitive. The last means that their good properties may disappear if the distributional type is not correctly determined.
7 Acknowledgements
The author would like to thank Prof. Milan Stehlik for bringing her to the question about statistical modeling of extremes given small samples.
8 Apendix
Proof of Theorem 1: a) By definition of and formula for derivative of the inverse function we obtain
[TABLE]
is a density function of the r.v. , therefore it is non-negative. The difference because . Therefore .
b) By definition of and the same formula for derivative of the inverse function we have that
[TABLE]
Now the difference because . Therefore .
c) follows by a), b), and monotonicity of probability measures. Q.A.D.
Proof of Theorem 2: b) For from the definition of the quantile function we have that . Therefore
[TABLE]
c) For again from the definition of the quantile function . Therefore
[TABLE]
d) In this case , therefore and
[TABLE]
e) Because of is a strictly increasing and continuous function we have that and . Therefore
[TABLE]
These, together with the definition of and monotonicity of probability measures entail (6).
(7) is a corollary of (10), applied for .
f) The equalities and entail
[TABLE]
Now the definitions of , and , and the monotonicity of probability measures entail (10).
(11) follows by (6), when replace the function with and take into account that .
g) As far as
[TABLE]
[TABLE]
h) Using we obtain
[TABLE]
[TABLE]
i) Consider . The relation between the quantile function of the exceedances and the c.d.f. of (see e.g. in Nair et al. (2013) [40]) entails
[TABLE]
Analogously for . 444This property can be obtained also as a corollary of b).
k) We obtain this property when replace the relations between the quantile functions of left-, right-, and double-truncated r.vs., and , i.e.
[TABLE]
in the definitions of and . The above equalities is not difficult to calculate, and could be found e.g. in Nair et al. (2013) [40].
Finally we use the definitions of and .
l) Assume .
- Case . In this case , and the function is increasing in , therefore
[TABLE]
[TABLE]
- Case . here we use the fact that , therefore our computations in the previous case imply
[TABLE]
[TABLE]
m) Case . In this case , and the function is increasing in , therefore we apply Theorem 1, e) (4) and (5) and obtain the desired result.
Case . Now , and the function is decreasing in , therefore we apply Theorem 1, f) (8) and (9) complete the proof.
n) In case we take into account that , and the functions and are decreasing in , then we apply Theorem 1, f) (8) and (9) and finish the proof of this case.
Analogously, if then , and the function is increasing in . Therefore we apply Theorem 1, e) and after some algebra complete the proof.
Q.A.D.
Proof of Corollary of f): Assume . Consider the case . Theorem 1, f), applied for entails
[TABLE]
By the definition for and monotonicity of probability measures in order to prove that we need to show that
[TABLE]
The last inequality is equivalent to
[TABLE]
For , , therefore , which completes the proof of this part.
The assertion for the case follows by the fact that
[TABLE]
Q.A.D.
Proof of Corollary of e): Assume .
- Case . Because of , in order to use Theorem 2, e) for , which is increasing in we need to prove that
[TABLE]
The function is also increasing in , therefore the above inequality is equivalent to
[TABLE]
The function is increasing in , and for , . Therefore, for , proves (48) and completes the proof of (13).
By Theorem 2, f), applied for , (because now is also increasing in ) in order to compare and in (12) we have to show that
[TABLE]
Which is the same as
[TABLE]
The function is decreasing in , and because of by assumption , for , . Therefore, for , proves (49) and (12).
- Case . Because of , by the previous case, and the definitions for and , entail
[TABLE]
and complete the proof of (14).
The fact that and (13) entail
[TABLE]
and this proves inequalities in (15). Q.A.D.
Proof of Theorem 3: a) Consider , and almost sure positive r.v. , i.e. . According to monotonicity of probability measures, the definition of , and the equalities , we need to show that
[TABLE]
It is the same as
[TABLE]
Denote by . The last inequality is true, because of for any fixed and the function
[TABLE]
is decreasing in . Therefore (50) is also true, and the proof of (16) is completed.
b) The r.v. is almost sure positive, so we have the same for . Therefore, by the definition of it is enough to show that, in this case . Now we use the equality in the definition of and obtain that given the condition in c), the value of
[TABLE]
c) It is enough to prove the second inequality in (18). It is equivalent to
[TABLE]
and using the equality it is the same as
[TABLE]
The monotonicity of probability measures entails that the above inequality would be true if
[TABLE]
[TABLE]
[TABLE]
Again denote by , and consider the function
[TABLE]
Given , it is decreasing for and . Therefore , for . This entails (51) and completes the proof of (18). Q.A.D.
Proof of Theorem 5: Let us fix . From the theorem about the joint asymptotic normality of the order statistics, because the limit exists we have that for any subsequence ,
[TABLE]
where the asymptotic covariance matrix of this bivariate distribution is
[TABLE]
and the asymptotic correlation between these two order statistics is .
1.) Consider the function . For and it is continuously differentiable. The asymptotic mean is
[TABLE]
The Jacobian of the transformation is
[TABLE]
Now we apply the Multivariate Delta method Sobel (1982) [55], and obtain that the asymptotic variance of is
[TABLE]
2.) Analogously, because of
[TABLE]
we consider function .
For and it is continuously differentiable.
The asymptotic mean is
[TABLE]
In order to obtain the asymptotic variance of we calculate the Jacobian of the transformation. It is
[TABLE]
Now we apply the Multivariate Delta method (see e.g. Sobel (1982) [55]), calculate and obtain the asymptotic variance of is
[TABLE]
Q.A.D.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Alexander, C., Cordeiro, G. M., Ortega, E. M., Sarabia, J. M.: Generalized beta-generated distributions. Computational Statistics and Data Analysis. 56(6), (2012) 1880–1897.
- 2[2] Arnold, B. C., Balakrishnan, N., Nagaraja, H. N.: A first course in order statistics. In Applied Mathematics SIAM, 54, Philadelphia (1992).
- 3[3] Arnold, B. C.: Pareto distributions, Second Edition, Chapman and Hall / / CRC Press Taylor & Francis Group, Boca Raton, London, New York (2015).
- 4[4] Balanda, K. P., Mac Gillivray, H. L.: Kurtosis: a critical review. The American Statistician. 42(2), (1988) 111–119.
- 5[5] Balanda, K. P., Mac Gillivray, H. L.: Kurtosis and spread. Canadian Journal of Statistics. 18(1), (1990) 17–30.
- 6[6] Beran, J., Dieter Schell, Stehlik, M.: The harmonic moment tail index estimator: asymptotic distribution and robustness. Annals of the Institute of Statistical Mathematics. 66(1), (2014) 193–220.
- 7[7] Burr, I. W.: Cumulative frequency functions. Annals of Mathematical Statistics. 13 (2), (1942) 215-–232.
- 8[8] Caeiro, F., Gomes, M. I., Beirlant, J., de Wet, T.: Mean-of-order p reduced-bias extreme value index estimation under a third-order framework. Extremes. 19(4), (2016) 561–589.
