Efficient Calculation of the Joint Distribution of Order Statistics
Jonathan von Schroeder, Thorsten Dickhaus

TL;DR
This paper introduces novel recursive methods for efficiently computing the joint distribution of order statistics of independent variables, enabling exact and rounded calculations, with applications in multiple hypothesis testing.
Contribution
It generalizes existing recursive formulas to improve numerical computation of joint distributions of order statistics, both exactly and approximately.
Findings
Developed generalized recursive formulas for joint distribution calculation.
Achieved exact and faithfully rounded numerical results.
Discussed applications in stepwise multiple hypothesis testing.
Abstract
We consider the problem of computing the joint distribution of order statistics of stochastically independent random variables in one- and two-group models. While recursive formulas for evaluating the joint cumulative distribution function of such order statistics exist in the literature for a longer time, their numerical implementation remains a challenging task. We tackle this task by presenting novel generalizations of known recursions which we utilize to obtain exact results (calculated in rational arithmetic) as well as faithfully rounded results. Finally, some applications in stepwise multiple hypothesis testing are discussed.
| Exact Probability | Steck | Rel. Err. (Steck) | Bolshev | Rel. Err. (Bolshev) | |
|---|---|---|---|---|---|
| 2 | 9.76562E-04 | 9.76562E-04 | 0.00000E+00 | 9.76562E-04 | 0.00000E+00 |
| 3 | 9.53674E-07 | 9.53674E-07 | 0.00000E+00 | 9.53674E-07 | 0.00000E+00 |
| 4 | 9.31323E-10 | 9.31323E-10 | 0.00000E+00 | 9.31323E-10 | 0.00000E+00 |
| 5 | 9.09495E-13 | 9.09495E-13 | 0.00000E+00 | 9.09495E-13 | 0.00000E+00 |
| 6 | 8.88178E-16 | 8.88178E-16 | 0.00000E+00 | 8.88178E-16 | 0.00000E+00 |
| 7 | 8.67362E-19 | 8.67362E-19 | 0.00000E+00 | 1.73472E-18 | 1.00000E+00 |
| 8 | 8.47033E-22 | 8.47033E-22 | 0.00000E+00 | 1.10114E-20 | 1.20000E+01 |
| 9 | 8.27181E-25 | 8.27181E-25 | 0.00000E+00 | 6.85071E-21 | 8.28100E+03 |
| 10 | 8.07794E-28 | 8.07794E-28 | 0.00000E+00 | -2.70517E-20 | 3.34884E+07 |
| 11 | 7.88861E-31 | 7.88861E-31 | 0.00000E+00 | -1.12683E-16 | 1.42842E+14 |
| 12 | 4.33103E-30 | -1.75898E-20 | 4.06134E+09 | 2.83880E-16 | 6.55456E+13 |
| 2 | 0.56539 | 0.50342 | ||||
|---|---|---|---|---|---|---|
| 3 | 0.54576 | 0.49842 | 0.44439 | |||
| 4 | 0.53446 | 0.49583 | 0.45256 | 0.40451 | ||
| 5 | 0.52712 | 0.49440 | 0.45819 | 0.41837 | 0.37494 | |
| 6 | 0.52201 | 0.49357 | 0.46241 | 0.42840 | 0.39148 | 0.35175 |
| 7 | 0.51827 | 0.49310 | 0.46574 | 0.43606 | 0.40399 | 0.36955 |
| 8 | 0.51543 | 0.49285 | 0.46846 | 0.44214 | 0.41383 | 0.38349 |
| 9 | 0.51320 | 0.49273 | 0.47073 | 0.44710 | 0.42178 | 0.39473 |
| 10 | 0.51142 | 0.49270 | 0.47266 | 0.45124 | 0.42836 | 0.40398 |
| 11 | 0.50997 | 0.49272 | 0.47434 | 0.45475 | 0.43390 | 0.41174 |
| 12 | 0.50877 | 0.49278 | 0.47580 | 0.45777 | 0.43863 | 0.41833 |
| 13 | 0.50776 | 0.49286 | 0.47709 | 0.46039 | 0.44271 | 0.42401 |
| 14 | 0.50690 | 0.49295 | 0.47823 | 0.46268 | 0.44627 | 0.42895 |
| 15 | 0.50616 | 0.49306 | 0.47925 | 0.46472 | 0.44941 | 0.43329 |
| 16 | 0.50552 | 0.49316 | 0.48018 | 0.46653 | 0.45219 | 0.43712 |
| 17 | 0.50497 | 0.49327 | 0.48101 | 0.46815 | 0.45467 | 0.44053 |
| 18 | 0.50447 | 0.49338 | 0.48177 | 0.46962 | 0.45690 | 0.44358 |
| 19 | 0.50404 | 0.49348 | 0.48246 | 0.47094 | 0.45891 | 0.44634 |
| 20 | 0.50365 | 0.49358 | 0.48309 | 0.47215 | 0.46074 | 0.44883 |
| 21 | 0.50330 | 0.49368 | 0.48367 | 0.47325 | 0.46240 | 0.45109 |
| 22 | 0.50298 | 0.49378 | 0.48421 | 0.47427 | 0.46392 | 0.45316 |
| 23 | 0.50269 | 0.49387 | 0.48471 | 0.47520 | 0.46532 | 0.45505 |
| 24 | 0.50243 | 0.49395 | 0.48517 | 0.47605 | 0.46660 | 0.45679 |
| 25 | 0.50219 | 0.49404 | 0.48559 | 0.47685 | 0.46779 | 0.45840 |
| 26 | 0.50198 | 0.49412 | 0.48599 | 0.47759 | 0.46889 | 0.45988 |
| 27 | 0.50178 | 0.49420 | 0.48637 | 0.47828 | 0.46991 | 0.46126 |
| 28 | 0.50159 | 0.49427 | 0.48671 | 0.47892 | 0.47086 | 0.46254 |
| 29 | 0.50142 | 0.49434 | 0.48704 | 0.47952 | 0.47175 | 0.46373 |
| 30 | 0.50126 | 0.49441 | 0.48735 | 0.48008 | 0.47258 | 0.46485 |
| 31 | 0.50111 | 0.49447 | 0.48764 | 0.48060 | 0.47336 | 0.46589 |
| 32 | 0.50097 | 0.49453 | 0.48791 | 0.48110 | 0.47409 | 0.46687 |
| 33 | 0.50084 | 0.49459 | 0.48817 | 0.48157 | 0.47478 | 0.46779 |
| 34 | 0.50072 | 0.49465 | 0.48841 | 0.48201 | 0.47542 | 0.46866 |
| 35 | 0.50060 | 0.49470 | 0.48864 | 0.48242 | 0.47604 | 0.46948 |
| 36 | 0.50050 | 0.49475 | 0.48886 | 0.48282 | 0.47661 | 0.47025 |
| 37 | 0.50040 | 0.49480 | 0.48907 | 0.48319 | 0.47716 | 0.47098 |
| 38 | 0.50030 | 0.49485 | 0.48926 | 0.48354 | 0.47768 | 0.47167 |
| 39 | 0.50021 | 0.49489 | 0.48945 | 0.48388 | 0.47817 | 0.47233 |
| 40 | 0.50012 | 0.49494 | 0.48963 | 0.48420 | 0.47864 | 0.47295 |
| 41 | 0.50004 | 0.49498 | 0.48980 | 0.48451 | 0.47909 | 0.47354 |
| 42 | 0.49996 | 0.49502 | 0.48996 | 0.48480 | 0.47951 | 0.47411 |
| 43 | 0.49989 | 0.49506 | 0.49012 | 0.48508 | 0.47992 | 0.47465 |
| 44 | 0.49982 | 0.49509 | 0.49027 | 0.48534 | 0.48031 | 0.47516 |
| 45 | 0.49975 | 0.49513 | 0.49041 | 0.48559 | 0.48068 | 0.47565 |
| 46 | 0.49969 | 0.49516 | 0.49055 | 0.48584 | 0.48103 | 0.47612 |
| 47 | 0.49963 | 0.49520 | 0.49068 | 0.48607 | 0.48137 | 0.47657 |
| 48 | 0.49957 | 0.49523 | 0.49081 | 0.48629 | 0.48169 | 0.47700 |
| 49 | 0.49951 | 0.49526 | 0.49093 | 0.48651 | 0.48201 | 0.47741 |
| 50 | 0.49946 | 0.49529 | 0.49104 | 0.48672 | 0.48230 | 0.47781 |
| Eff Sz. | n | est. -pwr | -pwr | Diff in std | ||
|---|---|---|---|---|---|---|
| 1 | 0.60000 | 5 | 70 | 0.24900 | 0.26691 | 1.23987 |
| 2 | 0.60000 | 5 | 80 | 0.39600 | 0.39977 | 0.24081 |
| 3 | 0.60000 | 5 | 90 | 0.53800 | 0.53479 | 0.18527 |
| 4 | 0.60000 | 5 | 100 | 0.65700 | 0.65626 | 0.05057 |
| 5 | 0.60000 | 20 | 50 | 0.02800 | 0.02538 | 0.48379 |
| 6 | 0.60000 | 20 | 60 | 0.15700 | 0.13890 | 1.69096 |
| 7 | 0.60000 | 20 | 70 | 0.37800 | 0.36864 | 0.61103 |
| 8 | 0.60000 | 20 | 80 | 0.59900 | 0.62231 | 1.50514 |
| 9 | 0.60000 | 60 | 40 | 0.00200 | 0.00143 | 0.43439 |
| 10 | 0.60000 | 60 | 50 | 0.09900 | 0.08584 | 1.49443 |
| 11 | 0.60000 | 60 | 60 | 0.49200 | 0.49307 | 0.06522 |
| 12 | 0.60000 | 100 | 30 | 0.00000 | 0.00000 | Inf |
| 13 | 0.60000 | 100 | 40 | 0.00600 | 0.00658 | 0.22041 |
| 14 | 0.60000 | 100 | 50 | 0.27000 | 0.30726 | 2.80406 |
| 15 | 0.80000 | 5 | 40 | 0.25200 | 0.26037 | 0.59552 |
| 16 | 0.80000 | 5 | 50 | 0.50200 | 0.49870 | 0.19588 |
| 17 | 0.80000 | 5 | 60 | 0.70400 | 0.70951 | 0.39928 |
| 18 | 0.80000 | 20 | 30 | 0.03600 | 0.03969 | 0.60858 |
| 19 | 0.80000 | 20 | 40 | 0.35700 | 0.36732 | 0.64187 |
| 20 | 0.80000 | 60 | 20 | 0.00000 | 0.00004 | 0.18826 |
| 21 | 0.80000 | 60 | 30 | 0.14700 | 0.15775 | 0.88238 |
| 22 | 0.80000 | 100 | 20 | 0.00300 | 0.00013 | 8.59402 |
| 23 | 0.80000 | 100 | 30 | 0.50600 | 0.48563 | 1.42549 |
| 24 | 1.00000 | 5 | 30 | 0.39200 | 0.39421 | 0.14658 |
| 25 | 1.00000 | 20 | 20 | 0.04500 | 0.04534 | 0.05420 |
| 26 | 1.00000 | 20 | 30 | 0.61400 | 0.63660 | 1.40588 |
| 27 | 1.00000 | 60 | 20 | 0.22500 | 0.19941 | 2.00156 |
| 28 | 1.00000 | 100 | 20 | 0.58600 | 0.57569 | 0.64138 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Efficient Calculation of the Joint Distribution of Order Statistics
Jonathan von Schroeder
Thorsten Dickhaus
Institute for Statistics, University of Bremen, Germany
Abstract
We consider the problem of computing the joint distribution of order statistics of stochastically independent random variables in one- and two-group models. While recursive formulas for evaluating the joint cumulative distribution function of such order statistics exist in the literature for a longer time, their numerical implementation remains a challenging task. We tackle this task by presenting novel generalizations of known recursions which we utilize to obtain exact results (calculated in rational arithmetic) as well as faithfully rounded results. Finally, some applications in stepwise multiple hypothesis testing are discussed.
keywords:
Bolshev’s recursion , faithful rounding , multiple testing , Noe’s recursion , rational arithmetic , Steck’s recursion
MSC:
[2010] 62G30 , 65C50 , 65C60
\floatsetup
[table]font=scriptsize,style=plaintop
1 Introduction
The joint distribution of order statistics of stochastically independent random variables plays a pivotal role in the theory of empirical processes and in nonparametric statistics; see, e. g., Shorack and Wellner (2009) and Dickhaus (2018). For instance, the exact finite-sample null distributions of classical goodness-of-fit tests like the Kolmogorov-Smirnov and the Cramér-von Mises test as well as those of modern ”higher criticism” goodness-of-fit tests rely on such joint distributions; cf. Gontscharuk et al. (2015), Gontscharuk et al. (2016a), and Finner and Gontscharuk (2018) for recent developments and further references. In simultaneous statistical inference, the joint distribution of ordered -values is needed to analyze the type I and type II error behaviour of stepwise rejective multiple test procedures; cf. Chapter 5 of Dickhaus (2014).
In the case that are identically distributed (we refer to this case as a one-group model), classical recursive methods like Bolshev’s recursion, Noe’s recursion, and Steck’s recursion allow for computing the joint cumulative distribution function (cdf) of exactly; cf. Section 9.3 of Shorack and Wellner (2009). A generalization of Steck’s recursion to two-group models has been introduced by Blanchard et al. (2014). The other aforementioned recursions can be generalized in an analogous manner, as we will demonstrate in Section 3 of the present work.
While conceptually appealing, numerical properties of the aforementioned recursions are not well understood yet, and existing implementations into computer software often refer to rule-of-thumb-type upper bounds on such that the respective implementation is trustworthy. For example, Art B. Owen reports in his implementation of the two-sided version of Noe’s recursion in C (see https://www.stat.washington.edu/jaw/RESEARCH/SOFTWARE/BERKJONES/BJ-RBJ-C-Code/noe.c) that the recursion works well for but ”For larger n (eg 1800 or more) […] unexplained odd behavior.” Similarly, in the R Package mutoss (cf. Blanchard et al. (2010)) the following comment is made on the implementation of Bolshev’s recursion: ”Because of numerical issues should not be greater than 100.” Recently, Moscovich and Nadler (2017) introduced a computational method for one-group models. However, they do not consider the numerical accuracy of their approach rigorously.
In this work, we contribute to the analysis of the numerical accuracy and the computational complexity of existing approaches for computing the joint distribution of in a mathematically rigorous manner. Furthermore, we provide novel computational techniques for one- and two-group models which are guaranteed to provide accurate results for arbitrary sample size . The rest of the material is structured as follows. In Section 2, we introduce the relevant quantities. The (generalized) recursions for one- and two-group models are provided in Section 3, together with a rigorous analysis of their computational complexities and their numerical properties. Our proposed exact computational methods rely on rational arithmetic (Section 4) and on faithful rounding (Section 5), respectively. Applications in multiple hypothesis testing are given in Section 6, and we conclude with a discussion in Section 7. Lengthy proofs as well as pseudo code for the considered algorithms are deferred to the Appendix.
2 Order Statistics
Throughout the following sections, we let for a natural number . Consider stochastically independent, real-valued random variables , which are all driven by the same probability measure . Let , and recursively define
[TABLE]
for . Then we call the order statistics of , which we will denote by in the remainder. The random variable will be called the -th order statistic of the random vector . Let denote the marginal cdf of for . This paper will present methods for the quick and numerically stable calculation of
[TABLE]
assuming that where are two continuous distribution functions on and with denoting the number of ’s distributed according to , . Since it holds that
[TABLE]
it follows that , where . Therefore, it is sufficient to consider the calculation of for an arbitrary continuous distribution function and argument . In the sequel, we suppress the dependence on and notationally, and write for notational convenience.
As outlined in the Introduction, for there exist many well known recursions (see e. g. Section 9.3 of Shorack and Wellner (2009)) for computing . There are also newer approaches based on numerical integration(see Moscovich et al. (2016)) or based on the Poisson process (see Moscovich and Nadler (2017)). Unfortunately, the former cannot be easily generalized to the case , since the Lebesgue density of an order statistic is in general not piece-wise constant. The latter is very fast due to usage of the Fourier transform, but numerically unstable for small values of the ’s. This can for instance be demonstrated using the thresholds of the well-known linear step-up test (cf. Benjamini and Hochberg (1995)) for control of the false discovery rate (FDR); see Figure 1. Glueck et al. (2008a) proposed an algorithm with exponential complexity (cf. (Glueck et al., 2008a, Theorem 4.2)), resulting in a very high computational effort for moderate or large values of . However, the method of Glueck et al. (2008a) can be used to compute -variate marginal distributions for , because in such cases the complexity of their approach reduces to .
Since we are mostly concerned with the full joint distribution, we extend the approach suggested by Blanchard et al. (2014) and provide generalizations of Bolshev’s and Noe’s recursions. We compare them to the generalization of Steck’s recursion proposed by Blanchard et al. (2014) and demonstrate that the Bolshev recursion is suitable for exact computations in rational arithmetic, whereas Noe’s recursion is numerically stable when computed in fixed-precision floating point arithmetic.
All our numerical calculations were performed on a Windows 7 machine with an Intel(R) Core(TM) i7-4790 CPU with 32 gigabytes of RAM.
3 The Generalized Recursions
Let , . Furthermore, let and be jointly stochastically independent. Let for and
[TABLE]
where , , denotes the -th order statistic of , and is an increasing sequence with values in . The following subsections provide formulas for efficiently calculating , and we discuss their computational and numerical properties.
3.1 Generalization of Bolshev’s Recursion
Lemma 1** (Generalization of Bolshev’s Recursion).**
The function from (1) satisfies the recursion
[TABLE]
where
[TABLE]
Moreover, we have the following recursive relationships for .
[TABLE]
For or this is simply the well-known Bolshev recursion.
3.2 Generalization of Steck’s Recursion
Lemma 2** (Generalization of Steck’s Recursion).**
Let . Then from (1) satisfies the recursion
[TABLE]
where
[TABLE]
Letting
[TABLE]
we can write
[TABLE]
Furthermore, we have the following recursion for .
[TABLE]
for and .
Proof.
See (Blanchard et al., 2014, Proposition 1) ∎
3.3 Generalization of Noe’s Recursion
Lemma 3** (Generalization of Noe’s Recursion).**
Let , , and for
[TABLE]
for .
Then the function from (1) satisfies
[TABLE]
for and .
Letting
[TABLE]
we can write
[TABLE]
3.4 Computational Complexity and Numerical Properties
The computational complexity (defined to be the number of elementary arithmetic operations on floating point numbers) of each of the aforementioned recursions is given by the following lemma.
Lemma 4**.**
The proposed recursions can be implemented using
Bolshev
**
Steck
**
Noe
**
elementary arithmetic operations (addition, subtraction, multiplication, division) and memory (assuming fixed-precision storage of all results).
The results of Lemma 4 suggest that Noe’s recursion might not be the best choice. However, for small values of the ’s, Bolshev’s recursion and Steck’s recursion are inherently numerically unstable. Consider for example and
[TABLE]
Then both recursions, when implemented in double precision floating point arithmetic, result in negative values and huge relative errors (cf. Table 1) which can be explained by inaccurate or catastrophic cancellation, respectively.
Noe’s recursion, if implemented in a reasonable manner, never results in negative values. Furthermore by (Jeannerod and Rump, 2018, Equation (3)) the relative error is bounded (if the coefficients are computed with a bounded relative error) since all summands are non-negative.
Remark 1**.**
Noe’s recursion can be easily parallelized since the can be, for any fixed , computed in parallel.
4 Exact Evaluation of Bolshev’s Recursion
If only elementary arithmetic operations are utilized and the number of such operations is not too large it is feasible to exactly evaluate expressions using rational arithmetic.111Our C++ implementation is based on The GNU Multiple Precision Arithmetic Library. We show that this is indeed the case for the Bolshev recursion as well its generalization for the two-group case presented in Section 3.1.
First we consider the case where , hence : Even though Bolshev’s recursion involves binomial coefficients our proposed Algorithm 1 (cf. the Appendix) for the one-group case evaluates it using only
[TABLE]
elementary arithmetic operations (addition, subtraction, multiplication, division). For an illustration, considering the sequence we observed the execution times depicted in Figure 2.
Remark 2**.**
Our proposed Algorithm 2 (cf. the Appendix) implements the two-group case in elementary arithmetic operations. Consequently, for equal sample sizes the number of operations is of . Notice that this is a marked improvement over the exponential complexity reported by (Glueck et al., 2008a, Theorem 4.2). Figure 3 illustrates the observed execution time for calculating and , where .
For not necessarily equal sample sizes and , our implementation of Algorithm 2 needs arithmetic operations.
Since the cdf of many interesting distributions is not available in a closed form the thresholds might either not be exactly calculable or simply not exactly representable as rational numbers. Lemma 5 analyzes the error propagation when and / or are inexact.
Lemma 5**.**
Let
[TABLE]
and denote by approximations thereof, which are obtained by replacing and by approximations and . If for it holds that for the it follows that for all
[TABLE]
where denotes the approximation of obtained by using and instead of and .
5 Faithfully Rounded Evaluation of Noe’s Recursion
We implemented faithfully rounded222That is, the result is either exact (if the exact value is a floating point number) or it is one of the two closest floating point numbers. floating-point computations as described by Rump and Lange (2017) as a portable single-header C++11 library.333Available at https://github.com/jvschroeder/PairArithmetic/. Utilizing this library we implemented the generalization of Noe’s recursion presented in Section 3.3 obtaining faithfully rounded results if no underflow occurs.444In our experience this is usually the case if the values of are not too close to the smallest (in absolute value) normal double, which equals on most computer architectures. In case of an underflow the results are smaller than the true values of , but never less than zero. Parallelization was implemented using Intel®Threading Building Blocks (TBB).
Notice that Noe’s recursion (and our generalization thereof) satisfies the NIC principle (No Inaccurate Cancellation, cf. (Rump and Lange, 2017, Definition 2.2)), that is there are no sums where at least one summand is not an input to the algorithm and the summands have opposite signs. Thus, by examining the evaluation tree (cf. (Rump and Lange, 2017, Definition 2.2)) of a concrete implementation, it is possible to calculate a number according to Equation (11) of Rump and Lange (2017). The result will be faithfully rounded if no under- or overflow occurs, and (when utilising a double precision floating point numbers). For our concrete implementation we obtain , provided that . Thus (assuming that no over- or underflow occurs) the result is guaranteed to be faithfully rounded if . Our implementation could be, in terms of , significantly improved by using binary summation. For example, for we obtain , while the corresponding number of in the case of binary summation would equal . The latter improvement however comes at an additional computational cost, and may be considered mostly of theoretical interest since the calculation for already takes approximately minutes on a core Intel CPU. Figure 3 compares the runtime of our implementation of our generalization of Noe’s recursion to that of the algorithm from the previous section. It becomes apparent that Noe’s recursion with faithful rounding is much faster then Bolshev’s recursion implemented in rational arithmetic. For practical applications, we therefore recommend Noe’s recursion with faithful rounding, at least if a fixed numerical precision is sufficient.
6 Applications in Multiple Hypothesis Testing
As discussed by Roquain and Villers (2011), the values of for all and are important building blocks for calculating the joint distribution of the number of rejections and the number of false rejections for step-up multiple tests. The random variables and play an important role when analyzing the type I and type II error behavior of such multiple tests. One important observation is, that the previously discussed recursions for calculating also calculate all such ’s as intermediate results.
Following Blanchard et al. (2014) we consider null hypotheses which are simultaneously under consideration under one and the same statistical model. We assume that associated -values are available on which the multiple test operates. Furthermore, we assume that (regarded as random variables) are jointly distributed according to one of the following models
The ’s are stochastically independent with marginal distributions
[TABLE]
where denotes the number of true null hypotheses among and is a given continuous cdf on .
Let denote a binomially distributed random variable, . Conditionally on , the ’s are jointly distributed according to .
A multiple test operating on is a measurable mapping , where hypothesis is rejected iff . Under denote by a constant random variable. Then the (random) number of rejections of the multiple test is given by , and is the (random) number of false rejections (type I errors).
In the following we will consider step-up procedures with critical values such that . The corresponding decision rule can be written as
[TABLE]
Summarizing results of Roquain and Villers (2011), the joint distribution of and for any step-up procedure has the following properties.
Lemma 6**.**
Let .
- (i)
Under the unconditional model it holds that
[TABLE]
where and . 2. (ii)
Under the conditional model it holds that
[TABLE]
where and .
Combining Lemma 6 with the previously discussed efficient evaluation of it is possible to calculate various summary statistics pertaining to the joint distribution of under the above models.
Definition 1**.**
**
- (a)
The FDR of is given by the expectation of the false discovery proportion (FDP) of , which is given by
[TABLE]
- (b)
Considering the number of correct rejections the average power of is given by
[TABLE]
where the convention is utilized and where (under ) or (under ), respectively.
- (c)
*The *power is the probability of rejecting at least of the false hypotheses:
[TABLE]
where, again, the convention is utilised and where (under ) or (under ), respectively.
In order to provide some numerical illustrations, we first consider the average power (cf. (8)) under where
[TABLE]
and . This is the setting considered in (Glueck et al., 2008b, Table 2) where the average power for was calculated for the Benjamini-Hochberg procedure (controlling the FDR at ) for independent two-sided one sample z-tests. In our notation, the Benjamini-Hochberg (linear step-up test) procedure equals with for . Table 2 illustrates the results obtained for . Due to space constraints only the first six columns (corresponding to ) are presented. The calculation of the full table (not presented here) took less than a second for an one magnitude larger (50 instead of 5) than the one considered by Glueck et al. (2008b). Figure 4 illustrates the time needed to calculate one row of such a table corresponding to some when utilizing our proposed algorithms.
As a second example, we consider the computation of from (9). Again, we choose as in the Benjamini-Hochberg case. An asymptotic approximation of this quantity for
[TABLE]
where denotes the distribution function of a non-central chi-squared random variable with degrees of freedom and non-centrality parameter , is given in (Izmirlian, 2018, Table 3). Our results can be used to calculate . Table 3 gives the faithfully rounded values for the -power for the parameters considered in (Izmirlian, 2018, Table 3).
We conclude by giving an example for the exact distribution of the FDP which shows why the FDR is not always an appropriate summary statistic. Consider again the multiple two-sided z-test described in Glueck et al. (2008b), that is given by (10), for and . It is clear that in Figure 5 the distribution of the FDP is neither symmetric about its mean (the FDR, which is depicted as dotted vertical line) nor concentrated around the FDR. A similar argumentation has been used by, among others, Blanchard et al. (2014) and Delattre and Roquain (2015) in order to motivate the computation of the full distribution of the FDP and to control its quantiles. The latter task is inherently computationally demanding.
7 Discussion
We have presented computationally efficient and numerically stable methods for calculating the joint distribution of order statistics. Such joint distributions have a multitude of important applications that require their repeated evaluation (to numerically solve optimization problems). Apart from the applications that we have presented in Section 6, they include, among others, the calibration goodness-of-fit tests with equal local levels (see Section 1.4 of Gontscharuk et al. (2016b)) and the adjustment of the asymptotically optimal rejection curve as proposed by Finner et al. (2012), see Equation (19) in their paper, and (Finner et al., 2009, Equation (6.1)) with the goal of obtaining valid critical values for a step-up-down procedure (guaranteeing strict FDR control). The latter applications have not been considered explicitly in the present work, because they merely refer to the one-group case. For this case, the methods of Moscovich et al. (2016) are already sufficiently accurate and fast.
Future extensions to our methods could include a normalization of the exponents (in Noe’s recursion) to avoid underflows and the exploration of potential efficiency gains in the exact computation of Bolshev’s recursion by a trade-off between the memory consumption and the frequency of normalizations of the intermediate rational numbers.
A preliminary version of our planned package (which utilizes RCPP, cf. Eddelbuettel (2013)) for the R language (R Core Team (2017)) is available at https://github.com/jvschroeder/OrdStat/ and can be installed using the devtools package:
Ψinstall.pacakges("devtools") Ψdevtools::install_github("jvschroeder/OrdStat")
The code used to generate the graphics and numerical examples is available at https://github.com/jvschroeder/OrdStatExamples/. The graphics were created using ggplot2 (Wickham (2016)) and the R package tikzDevice.
Acknowledgments
Jonathan von Schroeder is supported by the Deutsche Forschungsgemeinschaft (DFG) within the framework of RTG 2224 ”: Parameter Identification - Analysis, Algorithms, Applications”.
Appendix A Proofs
- Proof of Lemma 1.
Let , , and . Then, mimicking the approach of (Shorack and Wellner, 2009, p. 367 ff.) and (Blanchard et al., 2014, Proposition 1), it holds that
[TABLE]
Since this holds for any , it follows that
[TABLE]
where is given by (2).
The recursions for follow from the definition of the binomial coefficient and routine calculations. ∎
- Proof of Lemma 3.
Let be an increasing sequence in such that . Using notation similar to that of (Shorack and Wellner, 2009, p. 362 ff.) let and
[TABLE]
where the are the boundaries arranged in any ascending order. To extend the induction in (Shorack and Wellner, 2009, p. 364 ff.) to the two-group case we only need to mimick the the approach of (Blanchard et al., 2014, Proposition 1) to provide a recursive formula for , given that (where are given by (Shorack and Wellner, 2009, p. 362, Eq. (17) and Eq. (18))). To this end note that in this case
[TABLE]
and denotes the complement of .
If all , then it holds that
[TABLE]
which implies
[TABLE]
From the (11) it follows that . Thus and for it holds that
[TABLE]
which needs to be calculated for .
Let , , and for
[TABLE]
(which needs to be calculated for ). Then it holds that
[TABLE]
for all . ∎
- Proof of Lemma 4.
Counting the numer of operations in the loops of Algorithm 2 it follows that
[TABLE]
holds. For the space complexity simply note that, to use the recursions for , we need to keep track of at most and (which is pessimistic - cf. algorithm 2).
For Steck’s recursion first note that, using exponentiation by squaring, one can calculate in multiplications (where , c.f. (Knuth, 1981, p. 442, Algorithm A)). Thus the first row and last column of (cf. equations (4) (5)) can be calculated in
[TABLE]
Thus to calculate all the coefficient matrices we need at most
[TABLE]
arithmetic operations (since we can calculate for in using (2)). It remains to note that (3) needs at most arithmetic operations. For the space complexity simply note that we do not need to keep track of the previous coefficient matrices.
For Noe’s recursion first note that, for every , we can calculate for in . Furthermore, using (2), the binomial coefficients in (6) can be calculated in . Thus can be calculated in . Thus (assuming the necessary have already been calculated) is . Therefore the overall computational complexity is at most
[TABLE]
For the space complexity simply note, again, that we do not need to keep track of the previous coefficient matrices. ∎
- Proof of Lemma 5.
By Noe’s recursion (cf. lemma 3) the probability can be obtained by evaluating a polynomial of degree with only positive coefficients at (where is given by (7)). It is therefore sufficient to show that the statement is true for such polynomials when applied to non-negative arguments. To provide a concise proof we utilise interval arithmetic (cf. Kearfott (1996)). Due to linearity (since all coefficients and inputs are non-negative) it is sufficient to show the claim for monomials with , . Since and it follows that
[TABLE]
which implies
[TABLE]
due to (Kearfott, 1996, Equation (4)). ∎
- Proof of Lemma 6.
By (Roquain and Villers, 2011, Theorem 3.1) under the unconditional model and for a step-up procedure :
[TABLE]
where
[TABLE]
and denotes .
Furthermore under the conditional model and for a step-up procedure it holds (by (Roquain and Villers, 2011, Section 5.3)) that (where ):
[TABLE]
where denotes . ∎
Appendix B Algorithms
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Benjamini and Hochberg (1995) Benjamini, Y., Hochberg, Y., 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological), 289–300.
- 2Blanchard et al. (2010) Blanchard, G., Dickhaus, T., Hack, N., Konietschke, F., Rohmeyer, K., Rosenblatt, J., Scheer, M., Werft, W., 2010. μ 𝜇 \mu TOSS - Multiple hypothesis testing in an open software system. Journal of Machine Learning Research: Workshop and Conference Proceedings 11, 12–19.
- 3Blanchard et al. (2014) Blanchard, G., Dickhaus, T., Roquain, E., Villers, F., 2014. On least favorable configurations for step-up-down tests. Statistica Sinica 24 (1), 1–23.
- 4Delattre and Roquain (2015) Delattre, S., Roquain, E., 2015. New procedures controlling the false discovery proportion via Romano-Wolf’s heuristic. Ann. Stat. 43 (3), 1141–1177.
- 5Dickhaus (2014) Dickhaus, T., 2014. Simultaneous Statistical Inference with Applications in the Life Sciences. Springer-Verlag Berlin Heidelberg.
- 6Dickhaus (2018) Dickhaus, T., 2018. Theory of nonparametric tests. Cham: Springer.
- 7Eddelbuettel (2013) Eddelbuettel, D., 2013. Seamless R and C++ Integration with Rcpp. Springer, New York.
- 8Finner et al. (2009) Finner, H., Dickhaus, T., Roters, M., 2009. On the false discovery rate and an asymptotically optimal rejection curve. Ann. Stat. 37 (2), 596–618.
