Alpha Estimation via Sample Splitting: A Two-Sample Framework for Stable-like Distributions
Cornelis J. Potgieter, Jacques van Appel, Sudharshan Samaratunga

TL;DR
This paper introduces a novel semiparametric estimator for the stability index alpha in stable distributions, using a two-sample approach via random splitting to improve robustness and computational efficiency.
Contribution
It proposes a new alpha estimator leveraging sample splitting and empirical quantiles, avoiding complex likelihood calculations and enhancing robustness and efficiency.
Findings
Estimator is consistent and asymptotically normal.
Performs well in small samples and heavy-tailed scenarios.
Offers significant computational advantages over maximum likelihood methods.
Abstract
Stable distributions provide a flexible framework for modeling heavy-tailed and skewed data, with the stability index quantifying tail heaviness. We propose a new semiparametric estimator for that leverages the two-sum closure property of stable distributions within a location-scale framework. The method transforms a single sample into two pseudo-independent samples via repeated random splitting and estimates using weighted least squares applied to empirical quantiles. This approach avoids intractable likelihood calculations, offers computational advantages over maximum likelihood estimation, and remains robust to skewness. We establish consistency and asymptotic properties of the estimator and assess its finite-sample performance via simulation. Results indicate competitive accuracy, particularly in small samples and heavy-tailed settings, with substantial…
| Bias | RMSE | Bias | RMSE | Bias | RMSE | ||||
|---|---|---|---|---|---|---|---|---|---|
| 0.102 | 0.515 | 0.102 | 0.515 | 0.102 | 0.515 | ||||
| -0.036 | 0.201 | 0.090 | 0.209 | 0.005 | 0.223 | ||||
| -0.064 | 0.148 | 0.091 | 0.153 | -0.025 | 0.139 | ||||
| -0.065 | 0.145 | 0.092 | 0.149 | -0.026 | 0.134 | ||||
| -0.066 | 0.144 | 0.092 | 0.148 | -0.027 | 0.132 | ||||
| 0.105 | 0.398 | 0.105 | .398 | 0.105 | 0.398 | ||||
| -0.010 | 0.131 | 0.097 | 0.174 | 0.020 | 0.156 | ||||
| -0.025 | 0.090 | 0.095 | 0.131 | 0.003 | 0.092 | ||||
| -0.026 | 0.087 | 0.094 | 0.127 | 0.002 | 0.087 | ||||
| -0.026 | 0.086 | 0.094 | 0.126 | 0.001 | 0.085 | ||||
| 0.041 | 0.253 | 0.041 | 0.253 | 0.041 | 0.253 | ||||
| -0.009 | 0.086 | 0.049 | 0.110 | 0.006 | 0.110 | ||||
| -0.013 | 0.061 | 0.051 | 0.082 | 0.001 | 0.063 | ||||
| -0.012 | 0.058 | 0.052 | 0.080 | 0.001 | 0.059 | ||||
| -0.012 | 0.057 | 0.052 | 0.079 | 0.001 | 0.057 | ||||
| Bias | RMSE | Bias | RMSE | Bias | RMSE | ||||
|---|---|---|---|---|---|---|---|---|---|
| -0.285 | 0.520 | -0.285 | 0.520 | -0.285 | 0.520 | ||||
| -0.127 | 0.234 | -0.285 | 0.321 | -0.148 | 0.251 | ||||
| -0.092 | 0.155 | -0.281 | 0.289 | -0.084 | 0.154 | ||||
| -0.088 | 0.147 | -0.281 | 0.286 | -0.076 | 0.140 | ||||
| -0.087 | 0.143 | -0.280 | 0.285 | -0.074 | 0.133 | ||||
| -0.174 | 0.338 | -0.174 | 0.338 | -0.174 | 0.338 | ||||
| -0.076 | 0.162 | -0.173 | 0.200 | -0.075 | 0.162 | ||||
| -0.046 | 0.100 | -0.173 | 0.180 | -0.027 | 0.091 | ||||
| -0.044 | 0.094 | -0.174 | 0.179 | -0.021 | 0.081 | ||||
| -0.043 | 0.092 | -0.174 | 0.179 | -0.019 | 0.077 | ||||
| -0.129 | 0.254 | -0.129 | 0.254 | -0.129 | 0.254 | ||||
| -0.044 | 0.113 | -0.122 | 0.143 | -0.050 | 0.121 | ||||
| -0.027 | 0.072 | -0.124 | 0.129 | -0.015 | 0.069 | ||||
| -0.024 | 0.066 | -0.124 | 0.128 | -0.011 | 0.061 | ||||
| -0.024 | 0.065 | -0.124 | 0.128 | -0.011 | 0.059 | ||||
| 2.67 | 12.67 | 0.15 | 5.81 | 0.00 | 1.10 | |||
| 2.04 | 27.93 | 0.15 | 22.43 | 0.00 | 13.90 | |||
| 2.65 | 45.53 | 0.27 | 47.30 | 0.00 | 45.97 | |||
| MLE | 0.089 | 0.090 | 0.059 | 0.064 | 0.042 | 0.045 | ||
|---|---|---|---|---|---|---|---|---|
| MQE | 0.124 | 0.176 | 0.083 | 0.121 | 0.060 | 0.088 | ||
| SSE3 | 0.127 | 0.157 | 0.090 | 0.110 | 0.065 | 0.074 | ||
| SSE9 | 0.144 | 0.127 | 0.087 | 0.081 | 0.058 | 0.053 | ||
| SSE19 | N/A | N/A | 0.121 | 0.112 | 0.062 | 0.057 | ||
| Monte Carlo Relative Efficiency | |||||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFinancial Risk and Volatility Modeling · Statistical Distribution Estimation and Applications · Advanced Statistical Methods and Models
A Split-Sample Approach for Estimating the Stability Index of a Stable Distribution
Sudharshan Samaratunga and Cornelis J Potgieter Corresponding Author: [email protected]
Department of Statistical Science, Southern Methodist University
Abstract
The class of stable distributions is used in practice to model data that exhibit heavy tails and/or skewness. The stability index of a stable distribution is a measure of tail heaviness and is often of primary interest. Existing methods for estimating the index parameter include maximum likelihood and methods based on the sample quantiles. In this paper, a new approach for estimating the index parameter of a stable distribution is proposed. This new approach relies on the location-scale family representation of the class of stable distributions and involves repeatedly partitioning the single observed sample into two independent samples. An asymptotic likelihood method based on sample order statistics, previously used for estimating location and scale parameters in two independent samples, is adapted for estimating the stability index. The properties of the proposed method of estimation are explored and the resulting estimators are evaluated using a simulation study.
Some Key Words: Stable Distributions, Tail Index, Characteristic Exponent, Location-Scale Model, Split-Sample Estimation, Data Permutation.
1 Introduction
The stable probability law was introduced by Paul Lévy in 1924 in his work on sums of independent and identically distributed (iid) random variables. Originating from his attempt to generalize the central limit theorem, the class of stable distributions is defined as follows: A non-degenerate random variable is said to have a stable distribution if and only if for all , there exist constants and such that . Here are independent copies of and the symbol is used to denote equality in distribution. The random variable is called strictly stable if and only if for all values of . See Feller (2008) and Nolan (2013), for a comprehensive overview of existing results on stable distributions.
Stable distributions are typically described in terms of their characteristic functions. The random variable is said to have a stable distribution if the characteristic function of , for all real , is given by
[TABLE]
where is commonly referred to as the stability index, is a skewness parameter, is a scale parameter and is a location parameter. The index parameter , also called the stability index or characteristic exponent, measures the heaviness of the tails of the distribution. As decreases, the tail heaviness of the distribution increases. For , the mean of the distribution does not exist, while for , the variance of the distribution does not exist either. In general, a stable random variable with index parameter possesses absolute moments of order where ; that is, for .
A sum of iid stable random variables with common characteristic exponent is again stable, retaining the characteristic exponent of the original distribution. This property is termed stability. From a practical point of view, stable distributions are an attractive option for modeling data that exhibit heavy tails and skewness as these features can easily capture by stable distributions. With three exceptions discussed below, stable distributions do not have closed-form density functions. However, all non–degenerate stable distributions are continuous with infinitely differentiable density functions. The three special cases in which there exists a closed-form expression for the density function are the normal, Cauchy and Lévy distributions. When setting , (1) corresponds to the characteristic function of a Normal distribution. Similarly, upon setting , the distribution is Cauchy while upon setting , the resulting distribution is Lévy with and being the scale and location parameters respectively.
The lack of a closed-form density function together with the non-existence of moments has made parameter estimation a historically challenging task. While some applications require the estimation of all four parameters, in many instances the parameter of greatest interest is the stability index which determines the tail heaviness of the distribution. This paper focuses only on the estimation of the parameter . Many methods have been proposed to estimate , all of which fall in to three categories: maximum likelihood, quantile methods and characteristic function methods. Quantile methods include both methods based on sample quantiles and methods based on extreme order statistics.
DuMouchel (1973, 1975) did extensive work on using a maximum likelihood type method to estimate the parameters of a stable distribution. His method relied on grouping the data into bins and numerically maximizing an approximate log–likelihood function. Mittnik et al. (1999) used the fast Fourier transform to estimate the parameters, while Nolan (2001) developed routines for numerical computation of the integrals involved in the Fisher information matrix which can then also be used for maximum likelihood estimation. Lombardi (2007) proposed a MCMC method to estimate the stable parameters. Buckle (1995) developed Bayesian methodology for inference in stable distributions, while Peters et al. (2012) proposed likelihood-free Bayesian inference for stable models. Still, direct maximization of the likelihood function presents many challenges in practice and it is therefore not a popular approach when estimating the tail index.
Fama & Roll (1968, 1971) did early work on estimating using order statistics, but their method only applied to symmetric stable distributions for . McCulloch (1986) developed the quantile method now popular in application which works for both symmetric and skew stable distributions and . The Hill estimator, see Hill (1975), is a popular measure of tail heaviness for distributions with Pareto-like tails, and can also be adapted to estimate . However, the Hill estimator tends to have large bias in small to moderately sized samples.
Authors that have considered estimators making direct use the characteristic function include Press (1972), Paulson et al. (1975), Koutrouvelis (1980) and Brockwell & Brown (1981). These methods have good performance properties, but some still shy away from models that rely on inference in the complex domain.
In this paper, the problem of estimating is viewed through the lens of representing stable random variables as members of a location–scale family of distributions. The general framework presented relies on repeatedly partitioning the observed data into two independent samples, say and . The sample is created by selecting values from the observed data without replacement. Thereafter, the remaining observations are randomly paired and the sample is created by adding the data values in each pair. This “split-sample” approach allows the estimation of to be treated as a two-sample estimation problem. In particular, a quantile–based method of estimation is proposed in this paper. This quantile method is a variation of a method due to Potgieter & Lombard (2012) who consider estimating location and scale parameters for two independent samples belonging to the same location-scale family.
In Section 2, the location–scale representation of stable distributions is discussed and a connection between the scale parameter in the two-sample setting and the stability index of a single sample is established. In Section 3, the split-sample estimator is formally defined. Additionally, the two-sample quantile method of Potgieter and Lombard is reviewed and extended to the present framework for estimating . Section 4 presents results from a simulation study carried out to investigate the number of sample partitions needed to give good RMSE performance of the proposed estimator. Section 5 deals with some practical considerations, such as choosing the number of quantiles to use in the Potgieter & Lombard method. Section 6 compares the proposed estimator to other existing methods and recommendations for implementing the method are discussed in Section 7.
2 Stable Distribution Location–Scale Representation
Suppose that and are iid random variables and define . As the sum of two stable random variables with common parameters and is again stable, it follows that there are constants and such that where indicates equality in distribution. Thus, the distribution of has two equivalent representations,
[TABLE]
and
[TABLE]
where is given by
[TABLE]
Here, relation (3) is derived as the characteristic function of the sum of two iid stable variables, , while (2) is the characteristic function of the random variable and follows from the properties of a location-scale transformation. As (2) and (3) are equivalent, the parameters in the two cases must also be equal. Specifically, by equating the scale parameters in the two formulations of , it follows that . Solving for gives
[TABLE]
Therefore, the problem of estimating is equivalent to that of estimating , the scale parameter relating and . This relation forms the basis of the estimation procedure proposed in this paper.
Now, let and be two independent random variables such that for appropriate constants and . For the time being, assume that independent samples and are observed. Potgieter & Lombard (2012) proposed a nonparametric method called asymptotic likelihood (AL) for estimating and from the two independent samples. Their method only assumes that the random variables and have continuous and strictly increasing distribution functions with differentiable density functions. Despite the general lack of closed-form expressions for the density and distribution functions, stable distributions do satisfy these assumptions. The AL method can therefore be applied to the stable setting to estimate , and subsequently , from the independent - and -samples. The question of obtaining these samples is further addressed in Section 3, while the remainder of this section gives a brief overview of the implementation of the AL method in the present setting.
Let and (correspondingly and ) denote the respective distribution (density) functions of random variables and . For iid random samples and , denote the respective order statistics by and . Let and denote, respectively, empirical distribution functions of and made continuous using linear interpolation. That is, let for , for ,
[TABLE]
and for , define to be the interpolated value between the pairs and , . A similar definition holds for . The continuous empirical quantile functions and are uniquely defined by the relation .
Now, when the relation holds, the quantile functions and satisfy
[TABLE]
Define and . Potgieter & Lombard (2012) show that for fixed , the independent random vectors , and , converge in distribution to multivariate normal distributions with common covariance matrix as .
Now, let , and define parameter vector . The parameters in can be estimated using the established asymptotic normality. Define vectors
[TABLE]
and
[TABLE]
where and are kernel density estimators of and respectively. It then follows that as , converges in distribution to a multivariate normal distribution with zero mean and covariance matrix given by
[TABLE]
where and the component of the asymptotic log-likelihood of involving the parameters is . Now define,
[TABLE]
where is with replaced by . The estimator that minimizes , cannot expressed in closed form but can be easily found using standard numerical optimization routines. The component estimators and are called the AL estimators of and .
3 Split–sample estimator
The method for estimating outlined in Section 2 assumes the availability of two independent samples satisfying the specified location-scale relationship, while only the equivalent of an -sample is observed in practice. Suppose this sample consists of observations from a stable distribution with unknown index parameter . It is possible to create two independent samples from the single observed sample using the following method: First select observations randomly and treat them as the -sample. Next, form randomly pairs from the remaining observations and sum the observations in each pair. Treat these sums as the -sample. By the properties of stable distributions, the relation holds for these constructed samples. Additionally, the and samples constructed in this way are independent and therefore the AL method proposed by Potgieter & Lombard can be used to estimate and then, subsequently, using (4). It should be noted that although the method guarantees , the estimator resulting from applying (4) to may not be in the interval . It is therefore reasonable to define
[TABLE]
The perceived discontinuity in the definition of results from a discontinuity in when . Since this method involves splitting the sample, we referred to the estimator as the split–sample estimator (SSE) of .
The estimator proposed above, of course, uses only one random permutation of the data. Ideally, all possible sample permutations would be constructed, and each permutation would be used to construct an estimate of and/or . Finally, these estimators would then be combined to create some ensemble estimator of . Specifically, let be a function that is permutation-invariant in the first arguments and also permutation-invariant in the last arguments . For a random permutation of the integers , define
[TABLE]
to be the statistic calculated for the permutation . Letting denote the total number of possible permutations of the data, define
[TABLE]
where denotes the value of statistic for the kth permutation of the data. Note that the effect here is that of creating a -statistic with a symmetric kernel. Whereas is not symmetric in its arguments, is such. In the present setting, the statistic evaluated for one permutation represents an AL estimate of based on a single permutation of the data, whereas the ideal is to calculate by evaluating for all possible data permutations. However this is not realistic, as there are unique possible - and -samples that can be created using the proposed data-splitting method. Even when and are only moderately large, this constitutes too large a number of sample permutations to practically evaluate all of them. For example, considering the scenario with sample size , there are approximate such sample permutations and when , there are approximately such sample permutations. The number of possible sample permutations grows at a super-exponential rate.
Of course, the inability to evaluate all possible data permutations should not steer one towards the other extreme where only a single data permutation is used to estimate , as a big loss in efficiency could result. A compromise is proposed, in that data permutation process creating and samples is repeated where is some ”large” integer and these estimates of can then be combined in an appropriate manner to estimate .
Let denote the estimates of resulting from randomly splitting the sample times. The question of how to combine these to create an estimate is now considered. Proposed here are three ways of combining the estimates to find an estimate of :
- (i)
Define . That is, is the average of the values . Using this, define estimator ,
[TABLE] 2. (ii)
Let denote the value obtained after applying transformation (7) to the for . Here, denotes the estimate of for the random split of the sample. Define estimator . 3. (iii)
Estimate using . That is, instead of taking the average as in (ii), the median of the estimates of is evaluated.
Here, estimators and fall within the outlined framework of approximating a -statistic with symmetric kernel. On the other hand, is outside this framework, but is included as a robust alternative. The performance of the three estimators, as well as the number splits to be used, are investigated in the simulation study presented in the next section.
4 Simulation Study
The accuracy of the estimators defined in the previous section will depend on , the number of random splits used. A too small results in an estimate with large variability, while a very large detracts from the practical viability of the approach due to computational cost. Therefore, a good choice of is essential. A simulation study has been performed to assess how the choice of affects the defined estimators. In the simulation, samples of size were drawn from standardized stable distributions ( and ) for values and . The samples were split such that when evaluating the AL estimator. In addition, the AL method was implemented using equally spaced -values, . The three estimators , were evaluated for . A total of random samples were drawn for each configuration of . The mean value of the estimate and the Monte Carlo RMSE was evaluated for each configuration.
Table 1 presents results for the case and Table 2 presents results for . Generally, as the number of sample splits increases, the both the bias and the RMSE tend to decrease. Generally there is a steady decrease in RMSE when going from to , but there is only a small decrease in RMSE when going from to , while going from to , the decrease seems to be negligible. Simulation results for parameter configurations not presented here all show a similar pattern to those seen in Tables 1 and 2. It is clear that RMSE continues to decrease as increases, but that this reduction diminishes for . To illustrate, in the context of Table 2, when further increasing to , the RMSE of decreases from at to at and the RMSE of similarly decreases from to . For practical purposes, a recommendation is made to use . This value ensures fast computation, but already shows good performance. A practitioner who wanted to see any further improvement in RMSE would have to choose a whole order of magnitude larger.
When comparing the estimators , , it should be noted that consistently performs much worse (in terms of bias and RMSE) than both and . This can be explained, at least in part, by the truncation that occurs when applying transformation (7) from the -scale to the -scale. When a large proportion of the are outside the interval , that same proportion of are on the boundaries ([math] and ) of the parameter space. As the estimator is calculated by averaging on the -scale, a large proportion of boundary values can increase the bias of the estimate. On the other hand, is calculated by averaging on the -scale and the truncation only comes into play when is outside . Similarly, since is the median of the split-sample estimates, the truncation only comes into play if more than of the values are truncated to a specific boundary.
To illustrate the occurrence of boundary values, Table 3 reports the results of a simulation study in which samples were generated from a standard stable distribution with and with . For each simulated set of data, values were calculated from random splits. The table reports the average percentage of that were truncated to either [math] or .
Table 3 About Here
The content of Table 3 is unsurprising. As the sample size increases, the occurrence of truncation to the boundaries decreases. The one exception is when . This value is very close to the boundary and a very large sample size would have to be observed before there will be substantial decrease in boundary truncation.
In terms of a “best estimator”, there does not appear to be a clear choice between and . In Tables 1 and 2, the RMSE of is generally smaller than that of , but the RMSE values are very close to one another. Figure 1 shows the RMSE of the three estimators for with , and based on samples for each value of . A simple smoother was applied to the RMSE values to enhance readability of plot. Similar plots (not shown here) were produced for settings with and also ; the same general trends were visible in these.
Figure 1 About Here
Inspection of Figure 1 shows that and perform better than over a large part of the parameter space. The estimator performs better in the approximate range , but performs very poorly outside this range. As the true becomes smaller and the underlying data distribution has heavier tails, has the best performance among the three estimators. When , performs better than the other two estimators. Generally, the RMSE of and are very similar with having smaller RMSE over a large range of .
5 Choice of and
The companion questions of how large to choose and how to choose the values have not yet been addressed. Intuition might suggest that one should choose as large as possible. However, the estimator can perform very poorly when is chosen too large. To illustrate, samples of size were generated from a Cauchy distribution () and estimators were calculated for each sample using equi-spaced , for and . Based on simulated samples, , and . The accuracy of the estimator decreases as increases. The AL method of Potgieter & Lombard (2012) is similar to a GLS method described by Hsieh (1995). A result from the latter shows to be the optimal sample-size dependent rate for . Practically, this means should not be chosen too large. Based on extensive simulation work, it is recommended that choices or be used for most applications. These values performed well across a wide range of sample sizes. If the underlying distribution has very heavy tails, say , the choice tends to be more robust to outliers.
Once a choice of has been made, it is still unclear what the best approach is to choosing the values . This will be investigated in terms of the asymptotic distribution of the estimation of based on a single random split. Potgieter & Lombard (2012) derive an expression for the asymptotic covariance matrix of where and denote the AL estimators based on two independent samples. Denote this covariance matrix . The elements of the matrix are somewhat tedious expressions involving the points , as well as the density and quantile functions and of the underlying distribution. These expressions are omitted for brevity. For , a standard application of the delta method gives
[TABLE]
where indicates the element in the row, column of .
The asymptotic variance in (10) does provide a view of the difficulty inherent in choosing “optimal” -values. Noting that is an implicit function and for the underlying stable distribution, one could choose the -values such that is a minimized. Of course, in practice this is not possible as and are unknown. Simulation studies not reported here were done to see how such optimal values would compare against the simple choice of equi-spaced values , . While improvement in RMSE was observed in some instances, the simple choice of equi-spaced was usually very competitive with (and in a few instances outperfomed) the asymptotic optimal values. Therefore, the recommendation is made here to use equally spaced values. Further study is recommended to develop an adaptive approach for choosing the -values. For example, after finding an initial estimate of , this initial estimate can be used to update the -values and then re-estimate .
6 Comparison of estimators
In this section, the split–sample AL approach developed is compared with two other existing methods for estimating the stability index. The first of these, maximum likelihood (ML), is computationally expensive. However, it is included here as a performance benchmark as maximum likelihood estimators are asymptotically unbiased and minimum variance estimators. Next, McCulloch’s quantile estimator (MQE) is also considered. The MQE estimator is widely used in practice and, similar to the split–sample estimator, is based on sample order statistics.The split–sample estimator based on random partitions and using equi-spaced -values, for and choices and (hereafter referred to as SSEk). Samples were drawn from a distribution with and for sample sizes . The RMSE was estimated from samples drawn for each parameter configuration. The results for the case are reported below in Table 4. In this table, the boldface entries correspond to the estimates with the smallest and second smallest RMSE.
Table 4 About Here
Several interesting observations can be made upon inspection of Table 4. Consider first the symmetric case where . As one would expect, the ML estimator has the smallest RMSE among all the estimators considered. When the sample size is small (), the MQE has the second smallest RMSE. It should be noted, however, that it only performs marginally better than the split–sample estimator. Specifically, SSE3 is competitive at sample size as is SSE9 is at sample size . For sample , SSE9 outperforms MQE. A few additional simulations were done at even larger sample sizes and this trend was also observed there. In larger samples, SSE9 always has smaller RMSE than MQE. It should also be noted that no numerical values for RMSE were reported for estimator SSE and sample size , as convergence problems were frequently encountered when calculating the estimator. This is likely an artifact of using and (corresponding to the 5th and 95th sample percentiles) to do estimation in a small sample drawn from a heavy-tailed distribution. These sample percentiles have large variance and will often be values to what some might label extreme observations. In the asymmetric case (), the estimator SSE9 always performs better than the MQE.
These simulation results are fairly representative of what was observed for other values of . Generally, SSE19 becomes the preferred estimator over SSE9 as either the value of increases or as the sample size increases. Additionally, the relative efficiency of the methods was also estimated in the simulation study. Table 5 reports the ratios for a sample size of . Here, a values of indicates superiority of SSE9 relative to MQE.
Table 5 About Here
In Table 5, it is evident that SSE9 generally performs much better than MQE. The estimated relative efficiency is often well below . There are two notable exceptions to this general statement. In the symmetric case, when ranges from to , the MQE is very competitive, even outperforming SSE9 slightly at . In the asymmetric case, MQE performs better than SSE9 when , but nowhere else.
7 Recommendations
The split-sample approach shows promise as a method for estimating the stability index . In a side-by-side comparison with the McCulloch quantile estimator, the split-sample approach frequently outperforms the McCulloch estimator. The split-sample approach could conceivably be further improved by choosing the design points in some adaptive way as suggested at the end of Section 5. Additionally, the problem of estimating the standard error of the estimator has not been considered here. The bootstrap is one option, but does suffer from computational cost in that it becomes a nested problem involving a first-level bootstrap sampling procedure and a second-level data splitting procedure. Both of these questions are being considered by the authors in ongoing research.
As a final remark, the question of computational cost does arise when considering the practical implementation of the split-sample approach. Specifically, the process of permuting the data does become more time-consuming as the sample size gets large. However, the estimator proposed is highly parallelizable. In large sample situations, this will more than compensate for any computing time required to permute the data times.
Tables and Figures
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Brockwell & Brown (1981) Brockwell, P & Brown, B (1981), ‘High-efficiency estimation for the positive stable laws,’ Journal of the American Statistical Association , 76 (375), pp. 626–631.
- 2Buckle (1995) Buckle, D (1995), ‘Bayesian inference for stable distributions,’ Journal of the American Statistical Association , 90 (430), pp. 605–613.
- 3Chambers et al. (1976) Chambers, J, Mallows, C & Stuck, B (1976), ‘A method for simulating stable random variables,’ Journal of the American Statistical Association , 71 , pp. 340–344.
- 4Copas (1975) Copas, J (1975), ‘On the unimodality of the likelihood for the cauchy distribution,’ Biometrika , 62 (3), pp. 701–704.
- 5Csorgo (1981) Csorgo, S (1981), ‘Limit behaviour of the empirical characteristic function,’ The Annals of Probability , pp. 130–144.
- 6de Haan & Resnick (1980) de Haan, L & Resnick, S (1980), ‘A simple asymptotic estimate for the index of a stable distribution,’ Journal of the Royal Statistical Society Series B (Methodological) , 42 , pp. 83–87.
- 7Du Mouchel (1973) Du Mouchel, WH (1973), ‘Stable distributions in statistical inference: 1. symmetric stable distributions compared to other symmetric long-tailed distributions,’ Journal of the American Statistical Association , 68 (342), pp. 469–477.
- 8Du Mouchel (1975) Du Mouchel, WH (1975), ‘Stable distributions in statistical inference: 2. information from stably distributed samples,’ Journal of the American Statistical Association , 70 (350), pp. 386–393.
