Limit Laws for Sums of Logarithms of k-Spacings
Paul Deheuvels

TL;DR
This paper proves that the sum of logarithms of k-spacings in a sample follows a normal distribution as the sample size grows.
Contribution
The paper establishes asymptotic normality for sums of logarithms of k-spacings under general conditions.
Findings
The sum of logarithms of k-spacings is asymptotically normal.
The result holds for a wide class of distribution functions with Riemann integrable densities.
The findings extend and complete prior research on k-spacings.
Abstract
Let Z=Z1,…,Zn be an i.i.d. sample from the absolutely continuous distribution function F(z):=P(Z≤z), with density f(z):=ddzF(z). Let Z1,n<…<Zn,n be the order statistics generated by Z1,…,Zn. Let Z0,n=a:=inf{z:F(z)>0} and Zn+1,n=b:=sup{z:F(z)<1} denote the end-points of the common distribution of these observations, and assume that the density f is Riemann integrable and bounded away from 0 over each interval [a′,b′]⊂(a,b). For a specified k≥1, we establish the asymptotic normality of the sum of logarithms of the k-spacings Zi+k,n−Zi−1,n for i=1,…,n−k+2. Our results complete previous investigations in the literature conducted by Blumenthal, Cressie, Shao and Hahn, and the references therein.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematical Dynamics and Fractals · Point processes and geometric inequalities · Advanced Numerical Analysis Techniques
1. Introduction and Results
Let be a sequence of independent replicæ of a non-degenerate random variable X, with distribution function , defined for . We denote by and the distribution end-points, and we assume that a version of the the density of X exists for and is Riemann integrable and bounded away from 0 over each interval . For each , we set and , and for each , we denote the order statistics of by , which fulfill almost surely the strict inequalities
Given a specified integer , we are concerned with the limiting behavior as of the sums of the logarithms of the k-spacings , defined for by
Since the first and last among the k-spacings in (2), namely, and , are possibly infinite, we set
and consider only the finite k-spacings .
Our goal is to investigate the limiting behavior, as , of the statistic
defined for each set of integers , , , and . In the present paper, we establish the asymptotic normality of when k, p, and q are fixed, and . The motivation of these statistics is to provide tests of the goodness of fit of the null hypothesis that X is uniformly distributed on with , against the alternative. Darling [1] introduced the statistic , and, later, Blumenthal [2] showed that, under the assumption that f is continuous on and bounded away from 0 on this interval, we have, as
Here, and in the sequel, we write “ ” to denote weak convergence and “ ” to denote equality in distribution. We let stand for the Gaussian distribution with mean and variance . Throughout, we use the convention that and . We denote by Euler’s constant (see, e.g., (14) below) and set for the value taken by the Riemann zeta function for (we refer to Remark 2 in the sequel for some basic facts concerning these mathematical objects).
Under the null hypothesis that X is uniformly distributed on , which implies that a.s., (6) reduces to
For , denote by the upper quantile of the distribution. It follows from (6) that the test rejecting when is asymptotically consistent with size when . This result may be refined by using the exact distribution of , which was obtained in tractable form by Deheuvels and Derzko [3]. The corresponding results allow for the practical use of the so-called Darling–Blumenthal test of uniformity.
Some practical problems arise for the use of the above-described test in the presence of ties (see, e.g., pp. 118–124 in Hájek and Šidák [4]) when some of the observed spacings in the sequence are null, in which case is not properly defined. In practice, the use of the k-spacings for the choice of that is sufficiently large allows us to overcome this difficulty. This motivates the study of the limiting behavior for a specified choice of the integer .
We work under the assumptions listed in – below:
- (F.1) ;
- (F.2)Either and f is Riemann integrable and bounded away from 0 in a right neighborhood of a, or and f is monotone in a right neighborhood of a;
- (F.3)Either and f is Riemann integrable and bounded away from 0 in a left neighborhood of b, or and f is monotone in a left neighborhood of b.
Our main result is stated in Theorem 1 below. We set for the digamma function (see Remark 2 in the sequel for details on Euler’s constant , Riemann’s function, the Gamma function , and the polygamma functions for ).
Theorem 1. Under –2– , for each specified set of integers , , and , we have, as ,
Remark 1. Under the assumptions above, for any specified set of integers , , and , we have
Therefore, when the conclusion of Theorem 1 holds for some specified pairs of integers with and , it holds for all other pairs of integers , fulfilling this condition. Because of this, we set and give a proof of the theorem for and .
The proof of Theorem 1 is given in the next section, together with additional results of interest. We mention at this point some historical details about the sums of the logarithms of spacings and related topics. The study of spacings has received considerable attention in the literature, ever since the pioneering work of Darling [1] (see, e.g., Pyke [5,6], Deheuvels [7], and the references therein). To the best of our knowledge, the best result coming close to Theorem 1 was obtained by Cressie [8,9] for and . Cressie established a variant of (8) under the assumption that the density f of X is a bounded step function (see Theorem 5.1, p. 352 in [8]). For and , a version of Theorem 1 was given by Blumenthal [2] under rather strenuous conditions on f, assumed to be, at least, twice differentiable. Shao and Hahn [10] largely improved Blumenthal’s theorem by showing that (8) holds for , , and when under the assumption that f is Riemann integrable and bounded away from 0 on . Recently, Deheuvels and Derzko [11] (see also [3]) relaxed the assumptions of Blumenthal and Shao and Hahn by giving a version of Theorem 1 for , allowing a and b to be possibly infinite. The present paper improves these results by covering the case of an arbitrary under less strenuous conditions on f. We refer to Shao and Hahn [10], del Pino [12,13], Czekała [14], and the references therein for discussions and further references on the statistical applications of this theorem.
Remark 2.( ) The following relations and definitions hold, relating Euler’s constant to the Riemann zeta function in (6) (see, e.g., Spanier and Oldham [15] and Gradshteyn and Ryzhik [16], p. xxix). We have, for each and ,
where B_n_ is the n-th Bernoulli number. In particular, for r = 2 and m = 1,
The Bernoulli numbers {Bn : n ≥ 0} are defined as the constants in the expansion
The Euler constant γ may be defined by either one of the relations
( ) The Euler Gamma function , digamma function , and polygamma function are respectively defined via the relations (see, e.g., §§6.3–6.4 in Abramowitz and Stegun [17]) for and ,
which fulfills
In particular, when is an integer, we obtain (see, e.g., Formulas 6.3.2 and 6.4.2 in [17])
which fulfills
( ) Routine computations show that, as ,
whence, as ,
( ) In view of (12) and (20), we readily obtain that, for ,
This, in turn, implies that, as ,
so that the limiting variance of in (8) equals
as .( ) Likewise, we infer from (14) that, as ,
so that the limiting centering factor of in (8) equals
as . By all this, for large specified values of k, follows approximatively, as , a normal distribution, with expectation and variance . This gives a heuristical motivation for the use of the statistic (taken with specified large values of k) to estimate the factor .
2. Proofs
2.1. Properties of the Gauss Hypergeometric Function
In our proofs, we make use of a series of identities related to hypergeometric functions, which are of independent interest. For any and , define the Pochhammer function by
We note that, whenever and ,
In particular, we have for each . We refer to 18:3:1; 18:3:2, and 18:10:1 in Spanier and Oldham [15] for additional properties of the Pochhammer function. Recalling (16), (23) and (24), we obtain readily that, for , and
The usual Gauss hypergeometric function is defined for and with by
The function is defined for when (see, e.g., Ch.4 in Rainville). In particular (see, e.g., 60:7:2 in [15]), when this condition holds,
In particular, we have
The general hypergeometric function of order is defined for integer by
The following identity relates higher-order hypergeometric functions to lower-order ones. We have
In particular,
For , , , and , we have (see, e.g., 60:10:3 in [15])
Proposition 1. We have, for , ,
and
Proof. By combining (27) with (28), taken with , so that and , we obtain
which is (32). By combining (25) with (32), we obtain, in turn, that
which is (33). The proof of (34) follows along the same lines with the formal replacement of by . Namely, we obtain
which is (34). □
Proposition 2. We have, for ,
and
Proof. By (27) and (28), we have, for ,
whence by letting and making use of (28),
When , we have, for each ,
so that
Next, we make use of (16), which yields the expansions
and
It follows readily that
By all this, we obtain
which is (35). Given (35), we infer from (25) and (35) that
which is (36). Likewise, we infer from (25) and (35) that
which yields (37). □
2.2. Preliminary Results and Moment Calculations
The special case where X follows a uniform distribution on will play an instrumental role in our proofs. For a general F, keeping in mind that the existence of implies that F is continuous, we set , and we observe that these random variables are independent, each with a uniform distribution on . For each , we denote by
the order statistics of , with the convention that and for . We note that the inequalities in (38) hold a.s. We therefore assume that, without the loss of generality, they are fulfilled on the probability space on which is defined. The uniform k-spacings are then given for by
For , denote by a random variable following a Gamma distribution with mean r. Namely, , and, for , has density on , given by
where for . When , is exponentially distributed with a unit mean. In this special case, we use the alternative notation . In general, for , we denote by an exponentially distributed r.v. Z with mean , fulfilling . For and , we denote by a random variable following a Beta distribution with parameters p and q, meaning that has density given by
The functions and are related by Euler’s formula. For any and , we have
We extend this definition when either or by setting
We refer to Ch. 17, 19, and 25 in Johnson, Kotz, and Balakrishnan [18,19] for useful details concerning the Gamma, exponential, and Beta distributions. In particular, we have the following useful distributional identity (see, e.g., p.12 in David [20]). For any ,
In particular, we have the distributional identity, for any ,
The following lemma plays an instrumental role in our proofs.
Lemma 1. For , , and , let , , and be three independent
Gamma-distributed r.v.’s. Then, the r.v.’s
are independent and follow and distributions, respectively. Set further
and
Then, the r.v. is independent of the random pair .
Proof. Several variants of the above results have been given in the literature (see, e.g., §25.2, p. 212 in Johnson, Kotz and Balakrishnan [19]). As the proofs are simple, we give details, limiting ourselves to (46). By the change of the variables and ⇔ and , the joint density of is given by
which is sufficient for our needs. □
In view of (38), set , and observe that constitutes a sequence of independent , unit mean exponential random variables. For each , the order statistics of fulfill the relations
Set, for convenience, for . We will need the following useful fact, closely related to Lemma 1 (refer to Sukhatme [21] and Malmquist [22], and see, e.g., pp. 20–21 in David [20]):
Fact **1.**For each , the random variables
are independent, each following an exponential distribution.
It will be convenient, later on, to make use of the relation following from (51),
In Lemma 2 below, we evaluate the moments of the logarithms of Gamma-distributed random variables, which will play an instrumental role later on. As usual, we make use of the convention that .
Lemma 2. Let be an integer, and let be a Gamma-distributed random variable with mean k. Then, for each ,
and
Proof. Recalling the definition (40) of for and the definition (53) of , we obtain that, by integrating by parts, for ,
For , these relations reduce to (see, e.g., Formula 4.331, p. 573, in Gradshteyn and Ryzhik [16])
Recalling the definition (18) of for , and the definition (53) of , by a straightforward induction on k, we infer from the above relations that, for an arbitrary (integer) ,
which is (53). Likewise, in view of (54) and (56), by integrating by parts, we see that, for ,
which is (54). In the same spirit, to establish (56), we integrate by parts to obtain the recursion formula, for ,
By combining (58) with (60), we readily obtain that
For , we combine (57) with the fact that (see, e.g., Formula 4.335, p. 574, in Gradshteyn and Ryzhik [16])
In view of (53), (60) and (61), the relation (56) is straightforward. □
Lemma 3. Let denote a Gamma-distributed random variable with expectation k. Then, for each integer , we have
Proof. Even though the relation (62) is a direct consequence of (53) and (56), below, we give an alternate proof of this statement based upon Lemma 1. The corresponding arguments will be instrumental for the proof of the forthcoming Proposition 3. We may write, making use of (46) in Lemma 3, for each integer and ,
where
with , and
with , are independent r.v.’s. Following the arguments of Lemma 1, we note that, in the above relations, and are two independent Gamma-distributed random variables, with expectations equal to m and ℓ, respectively. In view of (50), by combining (44) with (45) and (46), we readily obtain that
Set for convenience , with for ; we infer from (63) and (64) that
By combining (57) and (61) with (62), we see that . Therefore, (62) follows readily from (66), taken with and . □
Lemma 4. Let , where and are integers. Then,
and
Proof. We may write, by (19) and (20),
and
as sought. □
Lemma 5. Let denote an i.i.d. sequence of exponentially distributed random variables. For any , set and . Then, for any , we have
and
Proof. We have
Since and are independent, it follows that
Making use of (67) and (57), we see that , , , and . By all this, we obtain
which is (69). Given (69), the proof of (70) follows from the relations and . We note that, when , (69) yields
which is in agreement with (56). □
Proposition 3. Let be an integer, and let , , and be independent Gamma-distributed random variables. Then, we have
Proof. Set for convenience . When , we have , and, therefore, by (7), , which is in agreement with (71). Likewise, when , , and, hence, , which is also in agreement with (71). In fact, when , (71) may be rewritten into
where we have made use of (35), (36) and (37), taken with and . Given that the values of and are in agreement with (71), we may limit ourselves to establish this relation when ℓ and j fulfill . In the remainder of our proof, we therefore assume that this condition is fulfilled.We make use of the notation and conclusions of Lemma 1 to write that
where the random pair
is independent of
Now, since , we infer from (53) and (56) that
and
Next, set
and
We infer from (74) and (75), in combination with (67) and (68), taken with the formal change of into , that
By all this, we infer from (74), (77) and (78) that
Next, we observe that the joint distribution of coincides with that of , where
We then observe that
is independent of . Given this fact, we make use of the Taylor expansion of
to obtain that
Recall Euler’s formula and the definition of the Pochhammer symbol when and , and, in general,
Recalling that , we infer from (81) that
which, when combined with (80), readily yields (71). □
Let be fixed, and assume that X is uniformly distributed on . Set to be as in (5).
Proposition 4. Under the assumptions above, we have
Proof. Denote by an i.i.d. sequence of exponentially distributed r.v.’s, with mean 1. For each , set
and set, in view of Lemma 1, for each ,
We keep in mind that and that is independent of . Set further
and
Set, likewise,
Observe that . Moreover, the r.v.’s and are independent. Note further that, for each , and . By (53), it follows that, for each ,
whence
and
We have, therefore,
Next, we note that the -valued r.v.’s form a stationary k-dependent sequence. Since, by (53) and (56), for all ,
the partial sums of this sequence are asymptotically normal in . It follows readily that, as ,
Here, we have made use of the fact that, as ,
so that, in (88),
Likewise, making use of (70), we see that
Making use of (69), an easy argument shows that, in turn,
In view of (87), we readily obtain (83) from this last relation. □
Remark 3. Let for denote the quantile function of X. Assume that both and are continuous. In this case, we may define the quantile density function of X by , which is continuous for . We may then set, for ,
Having proved Theorem 1 for , the conclusion for a general f follows by routine arguments based on this observation, relating uniform spacings to general spacings. We omit the details.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Darling D.A. On a class of problems related to the random division of an interval Ann. Math. Stat.19532423925310.1214/aoms/1177729030 · doi ↗
- 2Blumenthal S. Logarithms of sample spacings SIAM J. Appl. Math.1968161184119110.1137/0116099 · doi ↗
- 3Deheuvels P. Derzko G. Exact laws for products of uniform spacings Austrian J. Stat.2003322947
- 4Hájek J. Šidák Z. Theory of Rank Tests Academic Press New York, NY, USA 1967
- 5Pyke R. Spacings J. R. Stat. Soc. B 19652739543610.1111/j.2517-6161.1965.tb 00602.x · doi ↗
- 6Pyke R. Spacings revisited Proceedings of the 6th Berkeley Symposium University of California Press Berkeley, CA, USA 1972 Volume 1417427
- 7Deheuvels P. Spacings and applications Proceedings of the 4th Pannonian Symposium on Mathematical Statistics Bad Tatzmannsdorf, Austria 4–10 September 1983130
- 8Cressie N. On the logarithms of high-order spacings Biometrika 19766334335510.1093/biomet/63.2.343 · doi ↗
