Logarithm of ratios of two order statistics and regularly varying tails
Pavlina K. Jordanova, Milan Stehl\'ik

TL;DR
This paper develops a new estimator for the tail index of distributions with regularly varying tails, based on logarithms of ratios of order statistics, and demonstrates its superior performance through simulations.
Contribution
It introduces a novel estimator for the tail index using ratios of order statistics, which is unbiased, efficient, and normal asymptotically, outperforming existing methods.
Findings
Proposed estimator outperforms Hill, t-Hill, Pickands, and Deckers-Einmahl-de Haan estimators in simulations.
Derived explicit formulas for the mean and variance of the estimator.
Validated the estimator's effectiveness for Pareto distributed data.
Abstract
Here we suppose that the observed random variable has cumulative distribution function with regularly varying tail, i.e. , . Using the results about exponential order statistics we investigate logarithms of ratios of two order statistics of a sample of independent observations on Pareto distributed random variable with parameter . Short explicit formulae for its mean and variance are obtained. Then we transform this function in such a way that to obtain unbiased, asymptotically efficient, and asymptotically normal estimator for . Finally we simulate Pareto samples and show that in the considered cases the proposed estimator outperforms the well known Hill, t-Hill, Pickands and Deckers-Einmahl-de Haan estimators.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
aff1]Faculty of Mathematics and Informatics, Konstantin Preslavsky University of Shumen,
115 ”Universitetska” str., 9712 Shumen, Bulgaria. aff3]Institute of Statistics, Universidad de Valparaíso, Valparaíso, Chile. aff4]Department of Applied Statistics and Linz Institute of Technology, Johannes Kepler University, Altenbergerstrasse 69, 4040 Linz, Austria.
\corresp
[cor1]Corresponding author: [email protected]
Logarithm of ratios of two order statistics
and regularly varying tails
Pavlina K. Jordanova
Milan Stehlík
[
[
[
Abstract
Here we suppose that the observed random variable has cumulative distribution function with regularly varying tail, i.e. , . Using the results about exponential order statistics we investigate logarithms of ratios of two order statistics of a sample of independent observations on Pareto distributed random variable with parameter . Short explicit formulae for its mean and variance are obtained. Then we transform this function in such a way that to obtain unbiased, asymptotically efficient, and asymptotically normal estimator for . Finally we simulate Pareto samples and show that in the considered cases the proposed estimator outperforms the well known Hill, t-Hill, Pickands and Deckers-Einmahl-de Haan estimators.
1 HISTORY OF THE PROBLEM
The usefulness of regularly varying (RV) functions in economics seems to be discussed for the first time during modeling of the wealth in our society by Pareto distribution, called to the name of Vilfredo Pareto (1897). J. Karamata (1933) provides their definition and integral representation. Later on the Convergence to types theorem, proved by R. A. Fisher, L. H. C. Tippett (1928), and B.V. Gnedenko (1948) plays a key role for their future applications. It is well known that this class of distributions describes very well the domain of attraction of stable distribution (see Mandelbrot (1960) [15]) and max-domain of attraction of Frchet distribution (see M. Frchet (1927)). Laurens de Haan (1970) and co-authors [3, 4, 5] develop the main machinery for working with cumulative distribution functions(c.d.fs.) with such tail behaviour. Let us remind that the c.d.f. has regularly varying right tail with parameter , if
[TABLE]
After their works the topic spread over the world very fast and many estimators of the index of regular variation are proposed, see e.g. Hill (1975) [10], Pickands (1975)[19] and Deckers-Einmahl-de Haan (1989) [6], t-Hill (Stehlik and co-authors (2010) [24, 9, 25], and Pancheva and Jordanova (2012) [11, 14]), among others.
Here we show the usefulness of functions of two central order statistics in estimating the parameter of regular variation. Under very general settings we show that the logarithm of the fraction of two specific central order statistics is an weakly consistent and asymptotically normal estimators of the logarithm of the corresponding theoretical quantiles. Then we use these functions and obtain our estimator for . Its main advantage is that it is very flexible and provides a useful accuracy given mid-range and small samples. Pareto case, considered in Section 3 motivates our investigation. First we define a biased form of the estimator. Then using results about order statistics, which could be seen e.g. in Nevzorov (2001) [18] we obtain explicit formulae for its mean and variance. This allows us to define unbiased correction which is asymptotically efficient. Then we prove asymptotic normality and obtain large sample confidence intervals. Our simulation study depicts the advantages of the considered estimators over Hill, t-Hill, and Deckers-Einmahl-de Haan estimators. The paper finishes with some conclusive remarks.
Trough the paper we assume that are independent observations on a random variable(r.v.) , and denote by the corresponding increasing order statistics.
[TABLE]
denotes the -the Generalized harmonic number of power , and , is for the well-known -th harmonic number.
The main object of interest in this point are the statistics
[TABLE]
The estimator it is obtained in Jordanova et al. [13] via quantile matching procedure. About the last procedure see e.g. Sgouropoulos et al. (2015) [21].
Along the paper means convergence in distribution.
2 GENERAL RESULTS
In 1933 - 1949 Smirnoff [22] shows that in case of central order statistics, and more precisely for and such that and , the asymptotic distribution of is a standard normal. Moreover it seems that he has a similar results about bivariate order statistics. It could be seen e.g. in Arnold et al. (1992) [1], p. 226, Mosteller (1946) [16] p.338, Nair [17], p.330, or Wilks [27] among others. The multivariate delta method is a very powerful technique for obtaining confidence intervals in such cases. In the next theorem we apply them and obtain the limiting distribution of the logarithmic differences of central order statistics.
Smirnoff’s theorem. Assume for , , , , and . Then
[TABLE]
where the covariance matrix
[TABLE]
We apply this theorem together with the Multivariate delta method and obtain asymptotic normality of the estimators, discussed in this paper.
Theorem 1. Consider a sample of , independent observations on a r.v. with c.d.f. and p.d.f. . If there exists and , then for
[TABLE]
The variance in (1) is , where , and
Proof: We will apply the Theorem of Smirnoff for and and Multivariate delta method.
By assumptions the conditions , are satisfied. And for we have , , and , therefore the Smirnoff’s theorem on the joint asymptotic normality of the order statistics, says that
[TABLE]
where the asymptotic covariance matrix of this bivariate distribution is
[TABLE]
and the asymptotic correlation between these two order statistics is .
Consider the function . For and it is continuously differentiable.
The Jacobian of the transformation is
[TABLE]
The asymptotic mean is
[TABLE]
Now we apply the Multivariate Delta method, which could be seen e.g. in Sobel (1982) [23], and obtain that the asymptotic variance of is
[TABLE]
Q.A.D.
Slutsky’s theorem about continuous functions together with the definition of convergence in probability, application of quantile transform, and Smirnoff’s theorem about a.s. convergence of empirical quantiles to corresponding theoretical one, lead us to the following result. Without lost of generality we consider only a.s. positive r.vs, however the result could be easily transformed for or , .
Theorem 2. Assume . If , , then for
[TABLE]
3 PARETO CASE
In this section we assume that are independent observations on a r.v. with Pareto c.d.f.
[TABLE]
Briefly we will denote this by . Different generalizations of this distributions could be seen in Arnold (2015) [2]. The number is called ”index of regular variation of the tail of c.d.f.”. It determines the tail behaviour of the c.d.f. See e.g. de Haan and Ferreira [5], Resnick [20], or Jordanova [12].
Denote by , the fact that the r.v. has c.d.f.
[TABLE]
The results in the following theorem allow us later on, in Corollaries 1 and 2, to obtain unbiased, consistent, and asymptotically efficient estimators of the parameter .
Theorem 3. Assume are order statistics of independent observations on a r.v. , , , and are integer.
i)
Denote by a Beta distributed with parameters , and . Then
[TABLE]
where is the -th order statistics in a sample of independent observations on i.i.d. Exponential r.vs. with parameter , and is the - th order statistic of a sample of independent observations on exponentially distributed r.v. with parameter . Its probability density function is
[TABLE]
ii)
and
Proof: Let us fix , integers. Because of is a strictly increasing function, it is well known that the probability quantile transform, entails
[TABLE]
where are order statistics of independent identically distributed (i.i.d.) r.vs. with . Then, because of the multiplicative property of the exponential distribution
[TABLE]
where are order statistics of i.i.d. r.vs. with . See e.g. de Haan and Ferreira [5]. Denote the logarithm with basis by log. Because of , , is an increasing function, thus
[TABLE]
The last equality could be seen e.g. in de Haan and Ferreira [5] or Arnold et al. (1992) [1].
i) Follows by the equality , the well known relation and the formula for probability density function (p.d.f.) of order statistics of a sample of i.i.d. r.vs. See e.g. p. 7 Nevzorov [18].
ii) The mean, and the variance of the last order statistics are very well investigated. See e.g. Nevzorov [18], p.23. Using his results and the main properties of the expectation and the variance we obtain:
[TABLE]
Q.A.D.
In the next corollary is useful when working with finite samples. We obtain that for any , and for fixed the estimators are unbiased for . The accuracy of these estimators in that case is explicitly calculated. However these estimators are applicable also for large enough samples, because for they are weakly consistent and asymptotically efficient.
Corollary 1. Assume , are order statistics of independent observations on a r.v. , , . Then, for all , and ,
i)
Denote by a Beta distributed with parameters , and . Then
[TABLE]
where is the -th order statistics in a sample of independent observations on i.i.d. Exponential r.vs. with parameter . is the - th order statistic of a sample of independent observations on exponentially distributed r.v. with parameter . Its probability density function is
[TABLE]
ii)
and
iii)
For all ,
[TABLE]
iv)
The estimator is asymptotically efficient. For ,
[TABLE]
v)
The estimator is weekly consistent. More precisely, for all ,
Proof: i) and ii) follow by Theorem 1, definition of and the relations
[TABLE]
iii) is corollary of ii) and Chebyshev’s inequality.
iv) It is well known that where is the EulerMascheroni constant, , and is the Digamma function. By ii) for any fixed , we have
[TABLE]
In the last equality we have used the well known solution of the Basel problem, and more precisely the limit
v) is a consequence of ii), iii) and iv). Q.A.D.
In the previous proof we have seen that for any fixed , . Therefore, although are biased, they are asymptotically unbiased, asymptotically normal, weakly consistent and asymptotically efficient estimators for . The next conclusions follow by the relation , and the main properties of the mean and the variance.
Corollary 2. Assume , are order statistics of independent observations on a r.v. , , .
i)
Denote by a Beta distributed with parameters , and . Then, for all ,
[TABLE]
where is the -th order statistics in a sample of independent observations on i.i.d. Exponential r.vs. with parameter . is the - th order statistic of a sample of independent observations on exponentially distributed r.v. with parameter . Its probability density function is
[TABLE]
ii)
For all , and
iii)
For all , and ,
[TABLE]
iv)
estimator is asymptotically unbiased and asymptotically efficient. More precisely
[TABLE]
v)
estimator is weekly consistent. For all ,
Applications of the previous results require knowledge about confidence intervals. Therefore, in the the next theorem, we obtain asymptotic normality of these estimators which allows us later on to construct large sample confidence intervals.
Theorem 4. If , , , then for all , and ,
[TABLE]
[TABLE]
[TABLE]
Proof: In this case , and . Therefore, , ,
[TABLE]
For we have , , , and therefore we can apply Smirnoff’s theorem about the joint asymptotic normality of the order statistics and Theorem 1. In order to determine and let us note that . Therefore
[TABLE]
The equalities
[TABLE]
lead us to (9). When we multiply the numerator in (9) by , and the denominator by , and use that we obtain (10). If we multiply both sides of (9) by , and use that we obtain (11). Q.A.D.
Now we are ready to compute the corresponding confidence intervals. Let us chose and denote by , quantile of the standard normal distribution. Using (11), and the definition of we obtain
[TABLE]
[TABLE]
Therefore for any fixed , the corresponding asymptotic confidence intervals for when are:
[TABLE]
Simulation study
Let us now depict the rate of convergence of for different values of and . Figures 1-2 represent the dependence of and the corresponding confidence intervals, on . They are plotted via software R [26]. The real values of are plotted via straight dense line. In order to visualise the values of the estimators for any of the lines we have simulated 100 samples of realizations of Pareto distributed r.v. with c.d.f. (5), correspondingly for . Separately for any fixed and the values of are averaged over these 100 samples and presented correspondingly by dense , dashed , dash-dot , and dotted lines. Then, for any fixed , and we have computed and plotted also -confidence intervals (red lines) for , calculated by formula (12) using the averaged values of instead of separate estimators . We observe that when decreases and increases, the accuracy of the estimators improves. However, because of the sample size increases with we can not chose too big for small samples.
If we compare these results with those about the well known Hill [10], t-Hill[24, 11], Deckers-Einmahl-de Haan [6, 7], or Pickands [19] estimators, described very well in Embrechts et al. [8], we observe that in this case estimator have better properties, especially given a small sample.
4 CONCLUSIONS
The paper points out good properties of couples of central order statistics for obtaining consistent and asymptotically normal estimators of the parameter of regular variations of the tail of the c.d.f. of the observed r.v. We consider more thoroughly Pareto case, where we transform the logarithm of the fraction of the order statistics in such a way that to obtain at least asymptotically unbiased and asymptotically efficient estimator. However our results about the general case show that an analogous approach could be applied in many other cases of distributions with regularly varying tails of the c.d.f. For example: Fréchet, Pareto, Log-logistic, Hill-horror among others. The biggest advantage of the proposed estimators is that they can be very useful for working with relatively small samples.
5 ACKNOWLEDGMENTS
The authors are grateful to the bilateral projects Bulgaria - Austria, 2016-2019, Feasible statistical modelling for extremes in ecology and finance, BNSF, Contract number 01/8, 23/08/2017.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Arnold, B.C., Balakrishnan, N., Nagaraja, H.N.: A first course in order statistics. 54, SIAM (1992)
- 2[2] Arnold, B.C.: Pareto distributions. Second Edition, Chapman and Hall / / CRC Press Taylor & Francis Group, Boca Raton, London, New York (2015)
- 3[3] de Haan, L.: On Regular Variation and Its Application to the Weak Convergence of Sample Extremes, Mathematical Centre Tract, 32, Mathematics Centre, Amsterdam, Holland (1970)
- 4[4] de Haan, L., Stadtmüller, U.: Generalized regular variation of second order, Journal of the Australian Mathematical Society, 61(3),381–395 (1996)
- 5[5] de Haan, L. and Ferreira, A.: Extreme Value Theory: An introduction, Springer Series in Operations Research and Financial Engineering, Springer, New York (2006)
- 6[6] Dekkers, Arnold LM, Einmahl, John HJ, de Haan, Laurens: A moment estimator for the index of an extreme-value distribution, The Annals of Statistics, 1833–1855 JSTOR (1989)
- 7[7] Einmahl, J.H.J., Fils-Villetard, A., Guillou, A.: Statistics of extremes under random censoring, Bernoulli, 14(1), 207–227 Bernoulli Society for Mathematical Statistics and Probability (2008)
- 8[8] Embrechts, P., Klüppelberg, Cl., Mikosch, Th.: Modelling extremal events: for insurance and finance, Springer Science & Business Media, 33, (2013)
