Intermediate efficiency of some weighted goodness-of-fit statistics

Bogdan \'Cmiel; Tadeusz Inglot; Teresa Ledwina

arXiv:1906.09143·math.ST·June 24, 2019

Intermediate efficiency of some weighted goodness-of-fit statistics

Bogdan \'Cmiel, Tadeusz Inglot, Teresa Ledwina

PDF

TL;DR

This paper compares weighted goodness-of-fit tests, like Anderson-Darling and Eicker-Jaeschke, to the classical Kolmogorov-Smirnov test, focusing on tail detection and providing a quantitative, analytic evaluation of their efficiency.

Contribution

It introduces a tractable method for comparing weighted tests to the classical test and proposes a modified statistic within the Eicker-Jaeschke class as a competitive alternative.

Findings

01

Weighted tests show improved tail detection over classical tests.

02

Analytic comparison confirms the efficiency of proposed modifications.

03

Finite sample results support theoretical conclusions.

Abstract

This paper compares the Anderson-Darling and some Eicker-Jaeschke statistics to the classical unweighted Kolmogorov-Smirnov statistic. The goal is to provide a quantitative comparison of such tests and to study real possibilities of using them to detect departures from the hypothesized distribution that occur in the tails. This contribution covers the case when under the alternative a moderately large portion of probability mass is allocated towards the tails. It is demonstrated that the approach allows for tractable, analytic comparison between the given test and the benchmark, and for reliable quantitative evaluation of weighted statistics. Finite sample results illustrate the proposed approach and confirm the theoretical findings. In the course of the investigation we also prove that a slight and natural modification of the solution proposed by Borovkov and Sycheva (1968) leads to a…

Figures3

Click any figure to enlarge with its caption.

Equations228

S_{n} = n 0 < t < 1 sup \frac{∣ F ^ _{n} ( t ) - t ∣}{t ( 1 - t )}

S_{n} = n 0 < t < 1 sup \frac{∣ F ^ _{n} ( t ) - t ∣}{t ( 1 - t )}

E_{n} = E_{n} (κ_{n}) = n κ_{n} \leq t \leq 1 - κ_{n} sup \frac{∣ F ^ _{n} ( t ) - t ∣}{t ( 1 - t )}, κ_{n} \in (0, 1/2), κ_{n} \to 0 \mbox a s n \to \infty,

E_{n} = E_{n} (κ_{n}) = n κ_{n} \leq t \leq 1 - κ_{n} sup \frac{∣ F ^ _{n} ( t ) - t ∣}{t ( 1 - t )}, κ_{n} \in (0, 1/2), κ_{n} \to 0 \mbox a s n \to \infty,

G_{n} = G_{n} (κ) = n κ \leq t \leq 1 - κ sup \frac{∣ F ^ _{n} ( t ) - t ∣}{t ( 1 - t )}, κ \in (0, 1/2),

G_{n} = G_{n} (κ) = n κ \leq t \leq 1 - κ sup \frac{∣ F ^ _{n} ( t ) - t ∣}{t ( 1 - t )}, κ \in (0, 1/2),

H_{0} : F = F_{0}

H_{0} : F = F_{0}

H_{1} : F \neq = F_{0} .

H_{1} : F \neq = F_{0} .

F_{n}^{*} (t) = [(1 - ϑ_{n}) F_{0} + ϑ_{n} F_{1}] \circ F_{0}^{- 1} = t + ϑ_{n} [F_{1} \circ F_{0}^{- 1} (t) - t] .

F_{n}^{*} (t) = [(1 - ϑ_{n}) F_{0} + ϑ_{n} F_{1}] \circ F_{0}^{- 1} = t + ϑ_{n} [F_{1} \circ F_{0}^{- 1} (t) - t] .

F_{n} (t) = t + ϑ_{n} A (t), t \in (0, 1),

F_{n} (t) = t + ϑ_{n} A (t), t \in (0, 1),

K_{n} = n 0 < t < 1 sup ∣ \hat{F}_{n} (t) - t ∣,

K_{n} = n 0 < t < 1 sup ∣ \hat{F}_{n} (t) - t ∣,

∣∣ A ∣ ∣_{\infty} = 0 < t < 1 sup ∣ A (t) ∣ \mbox an d b_{K} (P_{ϑ_{n}}^{n}) = n ϑ_{n} ∣∣ A ∣ ∣_{\infty} .

∣∣ A ∣ ∣_{\infty} = 0 < t < 1 sup ∣ A (t) ∣ \mbox an d b_{K} (P_{ϑ_{n}}^{n}) = n ϑ_{n} ∣∣ A ∣ ∣_{\infty} .

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (K_{n} \geq n w_{n}) = c_{K} = 2

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (K_{n} \geq n w_{n}) = c_{K} = 2

\lim_{n\to\infty}P_{\vartheta_{n}}^{n}\Bigl{(}\Big{|}\frac{{\cal K}_{n}}{b_{{\cal K}}(P_{\vartheta_{n}}^{n})}-1\Big{|}\leq\epsilon\Bigr{)}=1

\lim_{n\to\infty}P_{\vartheta_{n}}^{n}\Bigl{(}\Big{|}\frac{{\cal K}_{n}}{b_{{\cal K}}(P_{\vartheta_{n}}^{n})}-1\Big{|}\leq\epsilon\Bigr{)}=1

C_{n} = C_{n} (τ) = n 0 < t < 1 sup \frac{∣ F ^ _{n} ( t ) - t ∣}{[ t ( 1 - t ) ] ^{τ}}, τ \in (0, 1/2),

C_{n} = C_{n} (τ) = n 0 < t < 1 sup \frac{∣ F ^ _{n} ( t ) - t ∣}{[ t ( 1 - t ) ] ^{τ}}, τ \in (0, 1/2),

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (T_{n} \geq n w_{n}) = c_{T}

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (T_{n} \geq n w_{n}) = c_{T}

M_{n} = lo g (S_{n} + 1)

M_{n} = lo g (S_{n} + 1)

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (G_{n} \geq n w_{n}) = c_{G} = 1/2.

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (G_{n} \geq n w_{n}) = c_{G} = 1/2.

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (E_{n} \geq n w_{n}) = 0.

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (E_{n} \geq n w_{n}) = 0.

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (E_{n} \geq n w_{n}) = c_{E} = 1/2.

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (E_{n} \geq n w_{n}) = c_{E} = 1/2.

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (S_{n} \geq n w_{n}) = 0.

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (S_{n} \geq n w_{n}) = 0.

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (M_{n} \geq n w_{n}) = c_{M} = 2.

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (M_{n} \geq n w_{n}) = c_{M} = 2.

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (C_{n} \geq n w_{n}) = 0 an d - n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (lo g (C_{n} + 1) \geq n w_{n}) = \frac{1}{τ} .

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (C_{n} \geq n w_{n}) = 0 an d - n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (lo g (C_{n} + 1) \geq n w_{n}) = \frac{1}{τ} .

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (C_{n} \geq n w_{n}) = c_{C} = 2^{1 - 4 τ} .

- n \to \infty lim \frac{1}{n w _{n}^{2}} lo g P_{0}^{n} (C_{n} \geq n w_{n}) = c_{C} = 2^{1 - 4 τ} .

{\cal E}_{n}=\sqrt{n}\max\Big{\{}\max_{i:F_{0}(X_{(i)})\in[\kappa_{n},1-\kappa_{n}]}\frac{\max\{|F_{0}(X_{(i)})-\frac{i}{n}|,|F_{0}(X_{(i)})-\frac{i-1}{n}|\}}{\sqrt{F_{0}(X_{(i)})(1-F_{0}(X_{(i)}))}},\frac{T_{n}}{\sqrt{\kappa_{n}(1-\kappa_{n})}}\Big{\}},

{\cal E}_{n}=\sqrt{n}\max\Big{\{}\max_{i:F_{0}(X_{(i)})\in[\kappa_{n},1-\kappa_{n}]}\frac{\max\{|F_{0}(X_{(i)})-\frac{i}{n}|,|F_{0}(X_{(i)})-\frac{i-1}{n}|\}}{\sqrt{F_{0}(X_{(i)})(1-F_{0}(X_{(i)}))}},\frac{T_{n}}{\sqrt{\kappa_{n}(1-\kappa_{n})}}\Big{\}},

F_{n} (t) = t + θ_{n} A (t), t \in (0, 1),

F_{n} (t) = t + θ_{n} A (t), t \in (0, 1),

A^{*} (t) = \frac{A ( t )}{t ( 1 - t )} .

A^{*} (t) = \frac{A ( t )}{t ( 1 - t )} .

t \to 0^{+} lim A^{*} (t) = t \to 1^{-} lim A^{*} (t) = 0,

t \to 0^{+} lim A^{*} (t) = t \to 1^{-} lim A^{*} (t) = 0,

t \in / [δ, 1 - δ] sup ∣ A^{*} (t) ∣ = \frac{1}{2} 0 < t < 1 sup ∣ A^{*} (t) ∣.

t \in / [δ, 1 - δ] sup ∣ A^{*} (t) ∣ = \frac{1}{2} 0 < t < 1 sup ∣ A^{*} (t) ∣.

b_{E} (P_{θ_{n}}^{n}) = n θ_{n} 0 < t < 1 sup ∣ A^{*} (t) ∣.

b_{E} (P_{θ_{n}}^{n}) = n θ_{n} 0 < t < 1 sup ∣ A^{*} (t) ∣.

t \in (0, 1) sup \frac{∣ A ( t ) ∣}{[ t ( 1 - t ) ] ^{1 - ϖ}} < \infty \mbox f or so m e ϖ \in [0, 1/2) .

t \in (0, 1) sup \frac{∣ A ( t ) ∣}{[ t ( 1 - t ) ] ^{1 - ϖ}} < \infty \mbox f or so m e ϖ \in [0, 1/2) .

t \in / [δ, 1 - δ] sup ∣ A^{*} (t) ∣ = \frac{1}{2} 0 < t < 1 sup ∣ A^{*} (t) ∣,

t \in / [δ, 1 - δ] sup ∣ A^{*} (t) ∣ = \frac{1}{2} 0 < t < 1 sup ∣ A^{*} (t) ∣,

\sup_{w\in\mathbb{R}}\frac{|F_{1}(w)-F_{0}(w)|}{{\bigl{\{}F_{0}(w)[1-F_{0}(w)]\bigr{\}}}^{1-\varpi}}<\infty\;\;\;\mbox{for some}\;\;\;\varpi\in[0,1/2).

\sup_{w\in\mathbb{R}}\frac{|F_{1}(w)-F_{0}(w)|}{{\bigl{\{}F_{0}(w)[1-F_{0}(w)]\bigr{\}}}^{1-\varpi}}<\infty\;\;\;\mbox{for some}\;\;\;\varpi\in[0,1/2).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Intermediate efficiency of some weighted goodness-of-fit statistics

**Bogdan Ćmiel

***Faculty of Applied Mathematics, AGH University of Science and Technology,

Al. Mickiewicza 30, 30-059 Cracov, Poland

e-mail: [email protected]

**Tadeusz Inglot

***Faculty of Pure and Applied Mathematics, Wrocław University of Science and Technology,

Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland

e-mail*: [email protected]

**Teresa Ledwina

***Institute of Mathematics, Polish Academy of Sciences,

ul. Kopernika 18, 51-617 Wrocław, Poland

e-mail*: [email protected]

Abstract: This paper compares the Anderson-Darling and some Eicker-Jaeschke statistics to the classical unweighted Kolmogorov-Smirnov statistic. The goal is to provide a quantitative comparison of such tests and to study real possibilities of using them to detect departures from the hypothesized distribution that occur in the tails. This contribution covers the case when under the alternative a moderately large portion of probability mass is allocated towards the tails. It is demonstrated that the approach allows for tractable, analytic comparison between the given test and the benchmark, and for reliable quantitative evaluation of weighted statistics. Finite sample results illustrate the proposed approach and confirm the theoretical findings. In the course of the investigation we also prove that a slight and natural modification of the solution proposed by Borovkov and Sycheva (1968) leads to a statistic which is a member of Eicker-Jaeschke class and can be considered an attractive competitor of the very popular supremum-type Anderson-Darling statistic.

MSC 2010 subject classifications: Primary 62G10; secondary 62G20, 60E15.

Key words and phrases: Anderson-Darling tests, asymptotic relative efficiency, Eicker-Jaeschke statistics, higher criticism, local alternatives, moderate deviations.

**1. Introduction

**

Weighted Kolmogorov-Smirnov-type goodness-of-fit tests have received a renewed interest in recent years; cf. Jager and Wellner (2004, 2007), Chicheportiche and Bouchaud (2012), Greenshtein and Park (2012), Charmpi and Ycart (2015), Gontscharuk et al. (2016), and Stepanova and Pavlenko (2018) for some illustration. A renaissance in research has to a large extent been driven by an application of a supremum version of the Anderson-Darling statistic in detecting sparse heterogenous mixtures, invented and developed by Donoho and Jin (2004, 2015). Obviously, weighted statistics of supremum-type are useful in many other problems as well. The renewed interest raises many unsolved questions for such structures; cf. the list of open problems on p. 2032 in Jager and Wellner (2007), and Section 5 in Ditzhaus (2018), for example. One of the questions concerns the power behavior of the considered statistics under nearby alternatives. Another one involves better understanding of the advantages and limitations of popular classes of nonparametric statistics, reconsidered recently in the context of detection of some mixtures. The aim of the present paper is to provide some tools and at least partial answers to these challenging questions.

For an exemplification of our approach, we study some selected Eicker-Jaeschke-type statistics and compare them with the classical Kolmogorov-Smirnov and the integral Anderson-Darling statistics. We focus on uniformity testing and restrict our attention to two representatives of the class:

[TABLE]

and its truncated variant

[TABLE]

where $\hat{F}_{n}(t)$ is the empirical distribution function of $n$ independent random variables with values in (0,1). ${\cal S}_{n}$ was proposed by Anderson and Darling (1952) while ${\cal E}_{n}$ is a consistent variant of the statistic

[TABLE]

introduced and studied by Borovkov and Sycheva (1968).

Borovkov and Sycheva (1968) have shown that if the type I error tends to 0 slower than exponentially, as $n\to\infty$ , then the uniform weight function $1/{\sqrt{t(1-t)}}$ ensures that ${\cal G}_{n}$ is asymptotically uniformly most powerful, in a certain sense, in some class of weighted statistics. A similar result for an exponentially decreasing type I error is contained in Borovkov and Sycheva (1970). Eicker (1979) and Jaeschke (1979) have obtained Darling-Erdös-type results for ${\cal S}_{n}$ and ${\cal E}_{n}$ , under the null model, and suggested that ${\cal S}_{n}$ is sensitive in detecting moderate tails while, in contrast, the classical unweighted Kolmogorov-Smirnov test, say ${\cal K}_{n}$ , is asymptotically sensitive in detecting changes in the central range of the null distribution. Révész (1982) provided some illustrative results supporting such statements, while Mason and Schuenemeyer (1983, 1992) defined and studied some formalization of the ability to detect central and local tail departures. They also studied a class of Rényi-type tests, being also weighted statistics, but with heavier weights than the uniform one. Jager and Wellner (2007) studied, among others, the optimal detection boundary of ${\cal S}_{n}$ for a sparse heterogenous mixture model. Ditzhaus (2018) extended the results in Jager and Wellner (2007) in many directions. Based on the findings of the two above mentioned papers, one sees that from the point of view of complete detectability of specific signals, a very large class of tests was shown to achieve the same completely detectable region, under very general signal models, as the very popular higher criticism test, related to the supremum-type Anderson-Darling statistic. It should be also strongly emphasized that all the above mentioned results on different forms of detectability were phrased in terms of the presence or absence of a power consistency under some convergent sequences of alternatives.

We would like to propose some quantitative results to study local power of some representatives of currently popular statistics from another perspective. Namely, an interesting question is how many observations are needed for these tests to attain a given power lying in the interval $(0,1)$ . Therefore, we shall compare the related numbers of observations via an appropriate asymptotic relative efficiency (ARE) notion. Moreover, we would like to show that careful introduction of the uniform weight, in a way proposed in (1.2), results in a stable and highly efficient solution. Surprisingly enough, this member of the Eicker-Jaeschke class has thus far received much less attention than ${\cal S}_{n}$ . To complete the picture of the sup-type Anderson-Darling statistic, we also consider its integral variant.

Our approach to computing the efficiency of the considered statistics relies on a pathwise variant of Kallenberg’s intermediate ARE. The variant, elaborated in Inglot et al. (2018), is flexible enough to be applicable to some cases which lack high regularity. Weighted goodness-of-fit statistics, based on the classical empirical process, fall into this category. The characteristic features of the intermediate efficiency are: type I error tending to 0 slower than exponentially; local alternatives converging to the null distribution slower than $1/\sqrt{n}$ ; and, in contrast to the above mentioned developments on different forms of distinguishability, and non-degenerate asymptotic powers under local alternatives. The efficiency shares the advantages of Bahadur’s and Pitman’s approaches, but is much more widely applicable. In particular, the intermediate efficiency exploits moderate deviations of test statistic under the null model, instead of large deviations inherent in Bahadur’s theory. For many weighted statistics large deviations are degenerate while moderate deviations are not. For a more detailed discussion, see Inglot et al. (2018).

In our efficiency calculations the classical unweighted Kolmogorov-Smirnov statistic ${\cal K}_{n}$ shall play the role of a benchmark with respect to which other statistics shall be compared. Basically, to get the efficiency, one has to guarantee non-degenerate asymptotic powers of test statistics under a given sequence of alternatives and non-degenerate moderate deviations under the corresponding null model. The last question calls for example for using ${\cal M}_{n}=\sqrt{\log({\cal S}_{n}+1)}$ in place of ${\cal S}_{n}$ . Sequences of alternatives are described in Section 2. In principle, they are defined via a fixed alternative distribution and a sequence of real parameters shrinking it to the null distribution. The efficiency allows for tractable analytic comparisons between two tests.

We give a sufficient condition on tails of local sequences under which the intermediate efficiency of ${\cal E}_{n}$ with respect to ${\cal K}_{n}$ exists and is positive. Under this condition, ${\cal E}_{n}$ is always at least as efficient as ${\cal K}_{n}$ and the efficiency of ${\cal E}_{n}$ with respect to ${\cal K}_{n}$ is always greater or equal to the efficiency ${\cal G}_{n}$ with respect to ${\cal K}_{n}$ . Moreover, we provide a sufficient condition, slightly stronger than that needed for ${\cal E}_{n}$ , under which the efficiency of ${\cal M}_{n}$ with respect to ${\cal K}_{n}$ exists and is 0. In such situations, ${\cal E}_{n}$ does much better than ${\cal M}_{n}$ , as a rule. Both sufficient conditions define local alternatives which do not shift too much mass towards one or two ends of $(0,1)$ and provide clear hints on which departures from the null model can or can not be detected by ${\cal E}_{n}$ and ${\cal M}_{n}$ , respectively. Besides, the values of the efficiency nicely reflect the finite sample powers. We illustrate this in Section 8, where testing for the standard Gaussian distribution is considered. In Section 9 we study the case when the tails of the alternative are more heavy than they were assumed in Section 8. We compare there the above mentioned tests via simulations and state the result saying that so-called weak variant of the intermediate efficiency of ${\cal E}_{n}$ with respect to ${\cal K}_{n}$ is infinite. The outcomes, along with the results of Section 8, show that ${\cal E}_{n}$ with a relatively small smoothing parameter $\kappa_{n}$ is a well balanced solution working nicely under different kinds of tails of alternatives.

The structure of the paper is as follows: In Section 3 we restate slightly generalized results of Inglot and Ledwina (2006) related to the Kolmogorov-Smirnov statistic ${\cal K}_{n}$ . Sections 4 and 5 collect necessary technical results on ${\cal M}_{n}$ and ${\cal E}_{n}$ . Section 6 presents respective results on the integral Anderson-Darling statistic ${\cal I}_{n}$ . Section 7 gives analytical formulas for the Kallenberg efficiencies of ${\cal M}_{n}$ , ${\cal E}_{n}$ , and ${\cal I}_{n}$ with respect to ${\cal K}_{n}$ , and discusses the results. Section 8 reports outcomes of some simulation experiments. Section 9 contains some preliminary study of efficiency of ${\cal E}_{n}$ with respect to ${\cal K}_{n}$ under heavy-tailed alternatives. We close with Section 10 containing some discussion of our results. All proofs are collected in the Appendix.

**2. Testing problem and sequences of alternatives

**

Throughout we rely on the setup and results of Inglot et al. (2018). As typical in the one-sample case, we denote the sample size by $n$ instead of $N$ , as it was done in the general problem considered ibidem. Let $X_{1},...,X_{n}$ be independent random variables with continuous distribution function $F$ . Denote by $F_{0}$ the null distribution function. Consider testing

[TABLE]

against the unrestricted alternative

[TABLE]

To introduce a class of sequences of alternatives approaching to $F_{0}$ , consider first a fixed alternative $F_{1}$ , a parameter $\vartheta_{n}\in(0,1)$ , the combination $(1-\vartheta_{n})F_{0}+\vartheta_{n}F_{1}$ and its transformation to (0,1) via $F_{0}$ . This yields the following alternative to the uniform distribution on (0,1)

[TABLE]

The function $F_{1}\circ F_{0}^{-1}$ is called the comparison distribution function or the ordinal dominance curve. If $F_{1}$ is absolutely continuous with respect to $F_{0}$ then the density, say $f^{*}$ , of $F_{1}\circ F_{0}^{-1}(t)$ with respect to the Lebesgue measure on $(0,1)$ exists. The density is labeled as the comparison density, the relative density or the grade density. In terms of densities, (2.1) reads as $f_{n}^{*}(t)=(1-\vartheta_{n}){\bf 1}_{(0,1)}(t)+\vartheta_{n}f^{*}(t)$ , where ${\bf 1}_{(0,1)}(t)$ stands for the uniform density on $(0,1)$ . See Handcock and Morris (1999), and Thas (2010) for details.

The above motivates us to consider the observations from $[0,1]$ , $\mathbb{H}_{0}$ : $F(t)=t,\;t\in(0,1)$ , and nearly null distribution functions of the form

[TABLE]

where $A(t)$ is continuous, $A(0)=A(1)=0,\;A\not\equiv 0$ , while $\vartheta_{n}\to 0$ as $n\to\infty$ . In many standard situations the function $A$ is absolutely continuous with a derivative $a$ , which is unbounded. For an illustration see Section 8. This is in sharp contrast to the situation we considered in the two-sample problem, treated in Inglot et al. (2018).

In what follows, by $P_{\vartheta_{n}}$ we denote the probability measure related to $F_{n}$ in (2.2) while $P_{0}$ stands for the uniform distribution on $(0,1)$ . Moreover, $P_{\vartheta_{n}}^{n}$ and $P_{0}^{n}$ denote $n$ fold products of $P_{\vartheta_{n}}$ and $P_{0}$ , respectively.

The intermediate slope of the classical Kolmogorov-Smirnov statistic ${\cal K}_{n}$

We have

[TABLE]

where $\hat{F}_{n}$ is the empirical distribution function of the sample. The intermediate slope of ${\cal K}_{n}$ , under (2.2) with $\vartheta_{n}\to 0$ in such a way that $\sqrt{n}\vartheta_{n}\to\infty$ , can be deduced from Inglot and Ledwina (2006). However, it should be noted that in that paper the corresponding sequences of alternatives were defined via densities. This forced an unnecessary assumption that the related $A$ should be absolutely continuous. Moreover, for convenience, it was assumed that $a=A^{\prime}$ is bounded. Under (2.2) no extra assumptions are needed. For completeness, we restate here the corresponding results. In particular, (3.4), below, follows immediately from the proof of Theorem 6.1 in Inglot and Ledwina (2006).

Define

[TABLE]

Proposition 1. For any positive $\{w_{n}\}$ , such that $w_{n}\to 0$ and $nw_{n}^{2}\to\infty$ , as $n\to\infty$ , it holds that

[TABLE]

and

[TABLE]

*for every $\epsilon>0$ . Consequently, the intermediate slope of ${\cal K}_{n}$ is $\;c_{{\cal K}}[b_{{\cal K}}(P_{\vartheta_{n}}^{n})]^{2}=2n\vartheta_{n}^{2}||A||_{\infty}^{2}$ .

Note that for ${\cal K}_{n}$ we have the moderate deviations (3.3) in the full range of $w_{n}$ ’s and (3.4) holds without any further assumptions on $A$ . As said before, ${\cal K}_{n}$ shall play the role of a benchmark procedure in our comparisons.

In the next section we list some weighted variants of ${\cal K}_{n}$ , which we shall further study, and present their moderate deviations under the null model. It should be emphasized that, in contrast to the benchmark procedure, the competitors do not need to have non-zero moderate deviations in the full range of $w_{n}$ ’s. This is very useful, as we shall see that it is a natural and an unavoidable restriction in the case of some weighted statistics.

To calculate the efficiencies of weighted statistics, with respect to ${\cal K}_{n}$ , we need for them some results analogous to (3.3) and (3.4) and, additionally, we have to identify sequences of alternatives for which asymptotic powers of these competitors of ${\cal K}_{n}$ are non-degenerate. These questions are solved in Sections 4 and 5. To get such asymptotic results, we shall consider some subclasses of functions $A$ in (2.2). The requirements are not very restrictive and many commonly used models fulfill them.

**4. Some weighted variants of ${\cal K}_{n}$ and their moderate deviations under $\mathbb{H}_{0}$

**

In addition to the statistics ${\cal S}_{n}$ and ${\cal E}_{n}$ , which are central in our study, for the purpose of some discussion we consider two additional statistics: ${\cal G}_{n}$ , defined in (1.3), and

[TABLE]

extensively investigated in the probabilistic literature; see Shorack and Wellner (1986) for some evidence.

For any of the above weighted statistics, say ${\cal T}_{n}$ , we study for which sequences $\{w_{n}\}$ , such that $w_{n}\to 0$ and $nw_{n}^{2}\to\infty$ , the limit

[TABLE]

exists. The number $c_{{\cal T}}$ is called the index of moderate deviations. Depending on whether $c_{{\cal T}}>0$ or $c_{{\cal T}}=0$ , we speak of non-degenerate or degenerate moderate deviations.

Obviously, the simplest solution is ${\cal G}_{n}$ . For this statistic, similarly as for ${\cal K}_{n}$ , moderate deviations exist and are non-degenerate in the whole range of $w_{n}$ ’s; cf. Lemma 1, below. As (3.3), the result is obtained by matching the KMT strong approximations and an asymptotic behavior of corresponding suprema of a weighted Brownian bridge. The last question is well studied, see Sec. II of Adler (1990) for some basic results and Lifschits (1995), Sec. 14, for further developments. The proof is skipped, as it is very similar to that for ${\cal K}_{n}$ ; cf. Inglot and Ledwina (1990) for details on ${\cal K}_{n}$ .

The statistic ${\cal E}_{n}$ can be seen to be a refined variant of ${\cal G}_{n}$ . In this case the situation is much more complex. Namely, if $\kappa_{n}$ tends to 0 relatively slowly, then, using again the strong approximation technique, we get non-degenerate moderate deviations. However, if the rate of convergence of $\kappa_{n}$ is too fast, then the index of moderate deviations is 0 for large class of sequences $\{w_{n}\}$ ; see Lemma 2. An even more extreme situation occurs in the case of ${\cal C}_{n}$ , for which the moderate deviations are non-degenerate only for a very restricted class of sequences $\{w_{n}\}$ ; cf. (ii) of Lemma 4. For ${\cal S}_{n}$ the index of moderate deviations is 0 for all allowable sequences $\{w_{n}\}$ ’s; see (i) of Lemma 3. In such circumstances, similarly as in the case of the Bahadur approach to an efficiency, one can search for a monotonic function (or a sequence of functions), which, after imposing on a given statistic, leads to tails commensurable with that of ${\cal K}_{n}$ . Obviously, such a monotonic transformation gives an equivalent test. It turns out that in the case of ${\cal S}_{n}$ the transformation $x\to\sqrt{\log(1+x)}$ does the job and

[TABLE]

exhibits a quantifiable moderate deviation behavior. The result is due to Mason (1985); cf. (ii) of Lemma 3, below. Similarly, the second statement in (i) of Lemma 4 is due to Mason (1985).

Lemma 1. *For any $w_{n}\to 0$ , and such that $nw_{n}^{2}\to\infty$ , it holds that *

[TABLE]

Lemma 2.

(i) Assume that $n\kappa_{n}\to\infty$ . Then for any $w_{n}\to 0$ , and such that $w_{n}/\sqrt{\kappa_{n}}\to\infty$ it holds

[TABLE]

(ii) *Suppose $\liminf_{n\to\infty}n\kappa_{n}/\log^{2}n>0$ . Then for any $w_{n}\to 0$ , and such that $w_{n}=o(\sqrt{\kappa_{n}})$ and $nw_{n}^{2}/\log\log n\to\infty$ , it holds that *

[TABLE]

Lemma 3.

(i) If $w_{n}\to 0$ and $nw_{n}^{2}\to\infty$ then

[TABLE]

(ii) For any $w_{n}\to 0$ , and such that $nw_{n}^{2}/\log\log n\to\infty$ , we have

[TABLE]

**Lemma 4.

**(i) Suppose that $w_{n}\to 0$ and $nw_{n}^{2}/\log n\to\infty$ . Then for any $\tau\in(0,1/2)$

[TABLE]

(ii) For any $w_{n}\to 0$ , and such that $nw_{n}^{2}\to\infty,\;w_{n}=o(\sqrt{\log n/n})$ , we have for $\tau\in(0,1/2)$

[TABLE]

Remark 1. With probability 1 it holds that

[TABLE]

where $T_{n}=\max\{|I_{1n}/n-1/n-\kappa_{n}|,|I_{2n}/n-1+\kappa_{n}|\}$ , $I_{1n}=\min\{1\leq i\leq n+1:F_{0}(X_{(i)})>\kappa_{n}\},\;I_{2n}=\max\{0\leq i\leq n:F_{0}(X_{(i)})<1-\kappa_{n}\}$ , $X_{(1)}\leq...\leq X_{(n)}$ are order statistics of the sample $X_{1},...,X_{n}$ while for convenience we additionally set $F_{0}(X_{(0)})=0,\;F_{0}(X_{(n+1)})=1$ . Lemma 2 (ii) and the above exhibit that abandoning some fraction of smallest and largest transformed observations in the sample allows for non-degenerate moderate deviations when using the uniform weight. The above shows also that the construction of the statistic ${\cal E}_{n}$ follows a similar idea as the modified higher criticism statistic $HC_{n}^{+}$ defined in Section 3 of Donoho and Jin (2004), where a slightly smaller fraction of smallest transformed observations was abandoned. Some simulated powers of $HC_{n}^{+}$ are reported and discussed in Li and Siegmund (2015).

The proof of Lemma 2 is provided in the Appendix. Also there, we justify the index [math] appearing in Lemma 3 and 4. The statement (ii) of Lemma 4 is a consequence of Proposition 2.5 in Inglot and Ledwina (1993). As mentioned earlier, (4.5) and the moderate deviations for $\sqrt{\log({\cal C}_{n}+1)}$ follow from Mason (1985). The above shows that even such standard weighted statistics behave very differently, and this illustrates the “irregularities”, we mentioned in Section 1. Anyway, for each of the considered examples there are sequences $\{w_{n}\}$ for which the respective index of moderate deviations is positive. This makes it possible to apply the pathwise variant of intermediate efficiency elaborated in Inglot et al. (2018). The next step in this direction is to study the asymptotic behavior of the statistics under sequences of alternatives. This question is studied below. To avoid repetitions of similar statements, we restrict our attention to presenting in full form only the respective results on ${\cal E}_{n}$ and ${\cal S}_{n}$ .

**5. An asymptotic behavior of ${\cal E}_{n}$ and ${\cal S}_{n}$ under sequences of alternatives and their intermediate slopes

**

We follow the scheme and notation of the definition of the pathwise variant of intermediate efficiency elaborated in Inglot et al. (2018). Therefore, we consider a particular sequence $\{\theta_{n}\},\;\theta_{n}\in(0,1),$ where $\theta_{n}\to 0$ , as $n\to\infty$ , and the related $F_{n}$ in (2.2), is given by

[TABLE]

where $A(t)$ is continuous and $A(0)=A(1)=0,\;A\not\equiv 0$ . As in Section 2, we set $P_{\theta_{n}}$ for the distribution of $F_{n}$ and $P_{\theta_{n}}^{n}$ for its $n$ -fold product. Additionally, introduce

[TABLE]

In the case of ${\cal E}_{n}$ , assume that $A$ satisfies

[TABLE]

where $A^{*}$ is defined in (5.2). Then there exists $\delta=\delta_{\cal E}(A)\in(0,1/2)$ such that

[TABLE]

Set

[TABLE]

Throughout $\Phi(w),\;w\in\mathbb{R}$ , stands for the standard normal distribution function.

Theorem 1. *Consider (5.1) with $A(t)$ satisfying (5.3) and $\theta_{n}\in(0,1),\;\theta_{n}=o(\sqrt{\kappa_{n}})$ , and $n\theta_{n}^{2}/\log\log n\to\infty$ . Then

(i) $\displaystyle\limsup_{n\to\infty}P_{\theta_{n}}^{n}({\cal E}_{n}-b_{\cal E}(P_{\theta_{n}}^{n})\leqslant w)\leqslant E_{2}(w),\;\;w\in\mathbb{R}$ ;

(ii) $\displaystyle\liminf_{n\to\infty}P_{\theta_{n}}^{n}({\cal E}_{n}-b_{\cal E}(P_{\theta_{n}}^{n})\leqslant w)\geqslant E_{1}(w),\;\;w>0,$

*where $E_{2}(w)=\Phi(w)$ is the standard normal distribution function, $E_{1}(w)$ is the distribution function of $\;\sup_{[\delta,1-\delta]}\bigl{\{}|B(t)|/\sqrt{t(1-t)}\bigr{\}}$ with $\delta$ defined in (5.4), while $B$ is a Brownian bridge.

Hence*, ${\cal E}_{n}/b_{\cal E}(P_{\theta_{n}}^{n})\stackrel{{\scriptstyle P_{\theta_{n}}^{n}}}{{\longrightarrow}}1$ , *and the intermediate slope of ${\cal E}_{n}$ under $\{P_{\theta_{n}}\}$ has the form $c_{\cal E}[b_{\cal E}(P_{\theta_{n}}^{n})]^{2},$ where $c_{\cal E}=1/2.$

Remark 2. In the case of ${\cal G}_{n}$ an analogue of Theorem 1 holds true for any $A$ in (2.2). The only difference is that in the description of $E_{1}(w)$ one should use $\kappa$ in the place of $\delta$ . Hence we get the following: For (5.1) with $\theta_{n}\in(0,1),\;\theta_{n}\to 0$ , and $n\theta_{n}^{2}\to\infty$ , the intermediate slope of ${\cal G}_{n}$ , under $\{P_{\theta_{n}}\}$ , has the form $c_{\cal G}[b_{\cal G}(P_{\theta_{n}}^{n})]^{2},$ where $c_{\cal G}=1/2,$ while $b_{\cal G}(P_{\theta_{n}}^{n})=\sqrt{n}\theta_{n}\sup_{\kappa\leq t\leq 1-\kappa}|A^{*}(t)|$ . A comparison of $b_{\cal G}(P_{\theta_{n}}^{n})$ with $b_{\cal E}(P_{\theta_{n}}^{n})$ supports the statement that ${\cal E}_{n}$ is a natural refinement of ${\cal G}_{n}$ .

We have also considered an analogue of Theorem 1 for ${\cal C}_{n}$ with fixed $\tau\in(0,1/2)$ . The result, together with Lemma 4 (ii), shows that the intermediate slope of ${\cal C}_{n}$ is smaller than the related slope of ${\cal E}_{n}$ . Hence, under fixed $\tau$ , ${\cal C}_{n}$ is less efficient than ${\cal E}_{n}$ . Therefore, we skip the presentation of the relevant details.

We have also derived the intermediate slope of a recent modification of ${\cal S}_{n}$ introduced by Stepanova and Pavlenko (2018). The results do not differ substantially from these on ${\cal S}_{n}$ . Therefore, we present here our results only for the classical case of ${\cal S}_{n}$ .

For ${\cal M}_{n}=\sqrt{\log({\cal S}_{n}+1)}$ suppose that $A(t)$ satisfies

[TABLE]

The assumption (5.6) implies that there exists $\delta=\delta_{\cal M}(A)\in(0,1/2)$ such that

[TABLE]

where $A^{*}(t)$ is defined in (5.2). In terms of an alternative $F_{1}(w)$ in (2.1), the condition (5.6) means that

[TABLE]

Put

[TABLE]

Theorem 2. *Suppose that $A(t)$ satisfies (5.6) with some $\varpi\in[0,1/2)$ . Consider (5.1) with $\theta_{n}\in(0,1),\;\theta_{n}=o(n^{-\varpi})$ and $(\log n\theta_{n}^{2})/\log\log n\to\infty$ as $n\to\infty.$ Then

*(i) $\displaystyle\limsup_{n\to\infty}P_{\theta_{n}}^{n}({\cal S}_{n}-b_{\cal E}(P_{\theta_{n}}^{n})\leqslant w)\leqslant S_{2}(w),\;\;w\in\mathbb{R}$ ;

(ii) $\displaystyle\liminf_{n\to\infty}P_{\theta_{n}}^{n}({\cal S}_{n}-b_{\cal E}(P_{\theta_{n}}^{n})\leqslant w)\geqslant S_{1}(w),\;\;w>0,$

*where $S_{2}(w)=\Phi(w)$ , $S_{1}(w)$ is the distribution function of $\;\sup_{[\delta,1-\delta]}\bigl{\{}|B(t)|/\sqrt{t(1-t)}\bigr{\}}$ with $\delta$ defined in (5.7), $b_{\cal E}(P_{\theta_{n}}^{n})$ is defined in (5.5), while $B$ is a Brownian bridge.

Hence*, ${\cal M}_{n}/b_{\cal M}(P_{\theta_{n}}^{n})\stackrel{{\scriptstyle P_{\theta_{n}}^{n}}}{{\longrightarrow}}1$ , *and the intermediate slope of ${\cal M}_{n}$ under $\{P_{\theta_{n}}\}$ has the form $c_{\cal M}[b_{\cal M}(P_{\theta_{n}}^{n})]^{2},$ where $c_{\cal M}=2.$

Remark 3. The restriction (5.6) on $A$ , imposed in Theorem 2, is obviously stronger than the related condition (5.3) needed for ${\cal E}_{n}$ . When $A$ is absolutely continuous with a derivative $a$ and for some $\epsilon\in[0,1/2)$ it holds that $\displaystyle\limsup_{t\to 0^{+}}t^{\epsilon}|a(t)|<\infty$ and $\displaystyle\limsup_{t\to 1^{-}}(1-t)^{\epsilon}|a(t)|<\infty$ then the condition (5.6) is satisfied with $\varpi=\epsilon$ . In particular, when $a$ is bounded then (5.6) holds with $\varpi=0$ . The case $\varpi\in(0,1/2)$ admits unbounded $a$ .

Consider the alternative (5.1) with $A$ of the form $A(t)=t^{\delta}-t,\;\delta\in(0,1/2)$ . Then (5.3) and (5.6) do not hold. This $A$ corresponds to a heavy-tailed departure. When the null distribution is $F_{0}(x)=\Phi(x)$ then such $A$ corresponds to the Lehmann (1953) alternative $F_{1}(x)=\Lambda(x;\delta)=[\Phi(x)]^{\delta}$ in (2.1). For further discussion of some examples see Sections 8 and 9.

**6. The integral Anderson-Darling statistic ${\cal I}_{n}$ and the related asymptotic results

**

Set

[TABLE]

By Proposition 2.2 and Remarks 2.2-2.4 in Inglot and Ledwina (1993) we infer the following.

Lemma 5. *For any $w_{n}\to 0$ , and such that $nw_{n}^{2}\to\infty$ , it holds that *

[TABLE]

Now, consider alternatives of the form (5.1), with $A$ such that for some $\ell\in(0,1/2)$ it holds that

[TABLE]

Observe that under (6.3) for $A^{*}(t)$ defined in (5.2) it holds that

[TABLE]

Note that both conditions (5.3) and (5.6) imply (6.3). Asymptotic behavior of ${\cal I}_{n}$ under the sequence of alternatives (5.1) with $A$ satisfying (6.3) is described below.

Theorem 3. Suppose $A(t)$ satisfies (6.3). Consider ${P_{\theta_{n}}}$ obeying (5.1) with $\theta_{n}\in(0,1),$ and such that $\;\theta_{n}\to 0$ , $n\theta_{n}^{2}\to\infty$ as $n\to\infty$ . Then

[TABLE]

where

[TABLE]

Hence, the intermediate slope of ${\cal I}_{n}$ has the form

[TABLE]

The result (6.4) was reported in Inglot et al. (2000) for the case of $A(t)$ absolutely continuous with a bounded derivative $a(t)=A^{\prime}(t)$ . Its proof was very briefly sketched in Inglot et al. (1998). Here, for completeness, we provide detailed justification of (6.4). In fact, a result like (6.4) with the corresponding (6.3) can be immediately generalized to Hilbertian norms on $D[0,1]$ imposed on the empirical process. We omit the details. Such a result, along with the technique developed in Inglot and Ledwina (1993), allows us to calculate intermediate slopes of a family of integral test statistics.

Remark 4. Assume that $A$ in (2.2) is absolutely continuous and $a=A^{\prime}$ . If $a\in L_{r}(0,1)$ for some $r>1$ then (6.3) holds. If $a\in L_{2}(0,1)$ then (5.3) is satisfied. In the case $a\in L_{r}(0,1)$ for some $r>2$ we have (5.6) with $\varpi\geq 1/r$ .

For testing $F_{0}(x)=\Phi(x)$ consider the alternative distribution function $F_{1}$ , parametrized by $\zeta>0$ , and given by $F_{1}(x)=\Pi(x;\zeta)$ , where $\Pi(x;\zeta)=|x|^{-\zeta}/2$ if $x<-1$ , $\Pi(x;\zeta)=1/2$ if $-1\leq x\leq 1$ , and $\Pi(x;\zeta)=1-x^{-\zeta}/2$ if $x>1$ . $F_{1}$ is a member of the symmetric Pareto family considered in Grabchak and Samorodnitsky (2010). Such an $F_{1}$ , via (2.1), corresponds to $A(t)=\Pi(\Phi^{-1}(t);\zeta)-t$ in (2.2). A simple calculation shows that (6.3) is satisfied for $\ell>2/\zeta$ when $\zeta>4$ while (5.3) does not hold for any $\zeta>0$ . Moreover, for each $\zeta>0$ it holds that $a=A^{\prime}\notin L_{r}(0,1)$ for any $r>1$ . In such a sense, $F_{1}$ has the heaviest possible tails which can appear in (2.1) when $F_{0}(x)=\Phi(x)$ .

**7. Intermediate efficiencies of $\;{\cal G}_{n},\;{\cal E}_{n},\;{\cal I}_{n},\;$ and $\;{\cal M}_{n}\;$ with respect to $\;{\cal K}_{n}$

**

Exploiting the results collected in Sections 2 - 6 and using Theorem 1 from Inglot et al. (2018), we immediately obtain the following results.

Theorem 4. *Consider a sequence of alternatives $\{P_{\theta_{n}}\}$ defined by (5.1) with $n\theta_{n}^{2}\to\infty$ .

(i) The intermediate efficiency of ${\cal G}_{n}$ with respect to ${\cal K}_{n}$ , under the sequence $\{P_{\theta_{n}}^{n}\}$ , exists and equals

[TABLE]

(ii) Suppose $\liminf_{n\to\infty}n\kappa_{n}/\log^{2}n>0$ . If $A$ satisfies (5.3) and $\theta_{n}=o(\sqrt{\kappa_{n}}),\;n\theta_{n}^{2}/\log\log n\to\infty$ . Then the intermediate efficiency of ${\cal E}_{n}$ with respect to ${\cal K}_{n}$ under the sequence $\{P_{\theta_{n}}^{n}\}$ , exists and equals

[TABLE]

(iii) If $A$ satisfies (6.3) then the intermediate efficiency of ${\cal I}_{n}$ with respect to ${\cal K}_{n}$ under the sequence $\{P_{\theta_{n}}^{n}\}$ , exists and equals

[TABLE]

Theorem 5. *Consider a sequence of alternatives $\{P_{\theta_{n}}\}$ defined by (5.1) with $A$ satisfying (5.6) for some $\varpi\in[0,1/2)$ and $\theta_{n}=o(n^{-\varpi}),\;(\log n\theta_{n}^{2})/\log\log n\to\infty$ .

Then the intermediate efficiency of ${\cal M}_{n}$ with respect to ${\cal K}_{n}$ , under the sequence $\{P_{\theta_{n}}^{n}\}$ , exists and equals*

[TABLE]

Remark 5. We have chosen ${\cal K}_{n}$ as a benchmark since, first of all, it seems to be a natural reference statistic when some weighting is considered. Moreover, in view of the approach elaborated in Inglot et al. (2018), it is applicable in such a role since it obeys moderate deviations in the full range. Alternatively, in view of Lemmas 1 and 5, ${\cal G}_{n}$ and ${\cal I}_{n}$ can be used as benchmarks, as well. Perhaps the most natural candidate for a benchmark procedure could be the Neyman-Pearson test statistic for uniformity against $F_{n}$ , cf. (2.2), defined when $A$ is absolutely continuous with derivative $a$ . To justify such a choice, again one should know that moderate deviations for this statistic hold for all sequences $\{w_{n}\}$ such that $w_{n}\to 0$ and $nw_{n}^{2}\to\infty.$ This is the case when $a$ is bounded. However, for unbounded $a$ such a question seems to remain open. Results of Merlevède and Peligrad (2009) suggest that for unbounded $a$ the speed $a_{n}=1/nw_{n}^{2}$ , using their and our notations, needs to be adjusted to $\vartheta_{n}$ .

Remark 6. The results (7.1), (7.2) and (7.3) show that, under appropriate assumptions, the sample sizes needed for the Kolomogorov-Smirnov test to be, given $\{P_{\theta_{n}}\}$ , as good as the tests based on ${\cal G}_{n}$ , ${\cal E}_{n}$ and ${\cal I}_{n}$ , respectively, are equal approximately to $ne_{{\cal G}{\cal K}}$ $ne_{{\cal E}{\cal K}}$ $ne_{{\cal I}{\cal K}}$ , respectively. Thus, they are approximately proportional to $n$ .

The relation (7.4) reveals that, under (5.6), for the Kolmogorov-Smirnov test the sample size sufficient to attain, given $P_{\theta_{n}}$ , the power as good as that of the test based on ${\cal M}_{n}$ is of smaller order than $n$ . A similar result to (7.4) can be formulated on $e_{{\cal M}{\cal I}}$ .

The statement (7.4) deserves some more detailed comments. First of all, it should be emphasized that the intermediate efficiency concerns the situation when asymptotic powers of the corresponding tests are kept in $(0,1)$ . Therefore, the result (7.4) does not contradict consistency of ${\cal M}_{n}$ under fixed or convergent alternatives. Observe that our approach exhibits that the functions $b_{\cal M}(\cdot)$ and $b_{\cal K}(\cdot)$ , defining the intermediate slopes, are related to the respective shifts in the limiting theorems, which ensure non-degenerate asymptotic powers. Since $b_{\cal M}(\cdot)\ll b_{\cal K}(\cdot)$ , it can be expected that, in a finite sample comparison, the power function of ${\cal M}_{n}$ should be much smaller than the corresponding power function of ${\cal K}_{n}$ . This tendency is quantitatively measured by the intermediate slopes and the intermediate efficiency. Since in the intermediate approach the alternatives are not very close to the null one and the levels do not decrease very fast, we can expect that a similar tendency shall be seen in empirical powers under fixed alternatives, which satisfy (5.6). In Section 8.2 we present a small simulation study which confirms such intuitions.

Next, compare (7.4) with, consistent with it, findings of Lockhart (1991). In that paper it was shown that, under usual types of contiguous alternatives, the power and the level of ${\cal S}_{n}$ have the same limit, and the related ARE of the test with respect to the corresponding Neyman-Pearson test (NP) is 0. The same conclusion holds true for the ARE of ${\cal S}_{n}$ with respect to any other test with a nonzero asymptotic efficiency relative to the NP test. In our opinion, in this application, the intermediate approach, resulting in non-zero shift, explains better observed empirical powers of ${\cal S}_{n}$ than the conclusion on the shift 0 under Pitman’s approach.

The results on ${\cal S}_{n}$ in Lockhart (1991) were formulated in the case when $A$ is absolutely continuous and the corresponding function $a=A^{\prime}$ belongs to $L_{2}(0,1)$ . This assumption is standard in the classical approach to investigation of an asymptotic power and the asymptotic relative efficiency of tests under alternatives of order $1/\sqrt{n}$ . For an illustration see the insightful results on ${\cal K}_{n}$ proved by Milbrodt and Strasser (1990), and Janssen (1995). On the other hand, note that we have shown that the intermediate slope of ${\cal K}_{n}$ is well defined for any $a\in L_{1}(0,1)$ . This opens a possibility of comparisons of some competitors to ${\cal K}_{n}$ for some interesting alternatives with $a\notin L_{2}(0,1)$ . Typically, alternatives with heavy tails lead, via $F_{1}\circ F_{0}^{-1}$ , to a corresponding $a\notin L_{2}(0,1)$ . Tails heavier than Gaussian are common in many current applications. For related discussion, see Cont (2001). Examples of such alternatives along with some preliminary results on a weak variant of the intermediate efficiency are presented in Section 9. It turns out that in such a setting the asymptotic behavior of ${\cal E}_{n}$ changes dramatically. Namely, the weak intermediate efficiency of ${\cal E}_{n}$ with respect to ${\cal K}_{n}$ is infinite. In light of recent results on the intermediate efficiency of the Neyman-Pearson statistic ${\cal V}_{n}$ with respect to ${\cal K}_{n}$ , in the case when $a\in L_{p}(0,1),\;p\in(1,2),$ contained in Inglot (2019), this is not surprising. It turns out that, in contrast to ${\cal E}_{n}$ , ${\cal K}_{n}$ is completely inefficient in such situations.

Remark 7. An easy calculation shows that $e_{{\cal E}{\cal K}}\geq 1$ , $e_{{\cal E}{\cal K}}\geq e_{{\cal G}{\cal K}}$ and $e_{{\cal I}{\cal K}}\leq 2e_{{\cal E}{\cal K}}$ for any $A$ satisfying (5.3). Moreover, $e_{{\cal G}{\cal K}}$ can be arbitrarily close to 0 (take $A^{\prime}(t)=a(t)={\bf 1}_{[0,(\kappa+\delta)/2]}(t)-{\bf 1}_{((\kappa+\delta)/2,\kappa+\delta]}(t)$ for small $\delta>0$ , where ${\bf 1}_{E}$ denotes the indicator of the set $E$ ). On the other hand, $e_{{\cal I}{\cal K}}$ can take any positive value (for small $\delta>0$ take $a(t)=(1/\delta-1){\bf 1}_{[0,\delta]}(t)-{\bf 1}_{(\delta,1]}(t)$ or $a(t)={\bf 1}_{[1/2-\delta,1/2)}(t)-{\bf 1}_{[1/2,1/2+\delta]}(t)$ ). Also $e_{{\cal I}{\cal K}}$ can be arbitrarily close to $2e_{{\cal E}{\cal K}}$ (for small $\delta>0$ take $A(t)=\sqrt{t(1-t)}\{t^{\delta}{\bf 1}_{[0,1/2]}(t)+(1-t)^{\delta}{\bf 1}_{(1/2,1]}(t)\}$ ).

Remark 8. To give some insight into asymptotic levels of the tests considered in Theorems 1 - 3 and Remark 1, set $\theta_{n}=cn^{-q},$ where $\;q\in(0,1/2),$ while $c$ is a positive constant. Recall that the Kallenberg efficiency is characterized by levels $\alpha_{n}$ tending to 0 and asymptotic powers in $(0,1)$ . According to (i) of Theorem 1 in Inglot et al. (2018), for any of the statistics, say ${\cal U}_{n}$ , being compared to ${\cal K}_{n}$ , it holds that $\log\alpha_{n}\sim-c_{\cal U}[b_{\cal U}(P_{\theta_{n}}^{n})]^{2}$ , where $c_{\cal U}[b_{\cal U}(P_{\theta_{n}}^{n})]^{2}$ is the intermediate slope of ${\cal U}_{n}$ .

For ${\cal G}_{n}$ and any $q\in(0,1/2)$ the allowable levels are of the form $\log\alpha_{n}\sim-\frac{c^{2}}{2}(\sup_{\kappa\leq t\leq 1-\kappa}|A^{*}(t)|)^{2}\times n^{1-2q}$ .

For ${\cal E}_{n}$ take $\kappa_{n}\asymp n^{-\epsilon},\;\epsilon\in(0,1)$ , and $A(t)$ satisfying (5.3). Then (7.2) holds for any $q\in(\epsilon/2,1/2)$ and the allowable levels take the form $\log\alpha_{n}\sim-\frac{c^{2}}{2}(\sup_{t}|A^{*}(t)|)^{2}\times n^{1-2q}$ .

For ${\cal M}_{n}$ take $A$ satisfying (5.6) with some $\varpi\in[0,1/2)$ . Then (7.4) holds true for any $q\in(\varpi,1/2)$ and the allowable levels take the form $\log\alpha_{n}\sim-\log\left[c^{2}n^{1-2q}(\sup_{t}|A^{*}(t)|)^{2}\right]\sim\log n^{2q-1}$ .

For ${\cal I}_{n}$ the situation is much more regular. For any $q\in(0,1/2)$ in $\theta_{n}=cn^{-q}$ and $A$ satisfying (6.3) the statement (7.3) holds and the allowable levels take the form $\log\alpha_{n}\sim-c^{2}||A^{*}||_{2}^{2}\times n^{1-2q}$ .

As for the asymptotic power under $F_{n}$ with the above $\theta_{n}$ , we have the following situation, being a consequence of Lemma 1 of Appendix B in Inglot et al. (2018). In the case ${\cal I}_{n}$ any fixed asymptotic power from (0,1) is attainable by an appropriate choice of $w$ in (6.4). In contrast, for ${\cal G}_{n},\;{\cal E}_{n}$ and ${\cal M}_{n}$ we do not show that asymptotic power exists and we can only say that taking, in the present Theorems 1 and 2, any $w>0$ the resulting sequences of powers are bounded away from 0 and 1.

Though the above conclusions, contained in Remark 8, may look to be complicated and abstract, it turns out that, under standard circumstances, the value of the efficiency nicely helps to predict the empirical power of a test being compared to a benchmark. The reason for this is that on not very extreme tails of the test statistic, which are characteristic to the intermediate approach, the asymptotics work well for relatively small sample sizes. Hence, the approach gives good approximation for standard significance levels. A similar conclusion can also be found in Ermakov (2004), p. 624. Below, we demonstrate to what extent, for selected statistics with non-zero intermediate efficiency with respect to ${\cal K}_{n}$ , our results explain empirical powers under fixed levels and fixed alternatives.

**8. Simulation and efficiencies

**

**8.1. Examples of departures from the standard Gaussian model

**

We start with three simple classical situations related to detecting lack-of-fit to the standard normal distribution $N(0,1)$ . To be specific, $F_{0}(x)=\Phi(x)$ , and the alternatives are: $H_{1}(x;\mu)=\Phi(x-\mu),\;H_{2}(x;\sigma)=\Phi(x/\sigma)$ and $H_{3}(x;\mu,p)=(1-p)\Phi(x)+p\Phi(x-\mu).$ In all simulations here and in Section 9 we consider fixed alternatives. To clearly distinguish this case from the combination $(1-\vartheta_{n})F_{0}+\vartheta_{n}F_{1}$ , used in theoretical considerations, we use the notation $H_{j},\;j=1,2,...$ for the fixed alternative. This is especially useful in Section 9, where $F_{1}$ itself corresponds to some mixtures. For some simulated powers of ${\cal S}_{n}$ under the shift and scale models see Moscovich et al. (2016). The location-contaminated alternative $H_{3}(x;\mu,p)$ comes from the paper by Pearson et al. (1977). The alternative $H_{3}(x;\mu,p)$ was exploited for comparison of powers in Li and Siegmund (2015). In recent years this model with $p=p_{n},\;p_{n}\to 0$ , and $\mu=\mu_{n},\;\mu_{n}\to\infty$ , has been popularized under the label “sparse heterogeneous mixtures”; cf. Donoho and Jin (2004) and related papers.

After the transformation $\Phi(X_{i}),\;i=1,...,n$ , these alternatives have some densities $h_{j}$ on $(0,1)$ which can always be written in the form $1+a^{[j]}(t)$ , where $\int a^{[j]}(t)dt=0,\;j=1,2,3.$ Since we like to present $a^{[j]}$ ’s in our figures in some normalized form, we introduce the following parametrization. By $||\cdot||_{1}$ we denote the $L_{1}$ norm on (0,1) with the Lebesgue measure, we put $\varphi=\Phi^{\prime}$ , $\theta^{[j]}=||a^{[j]}||_{1}$ and $a_{j}=a^{[j]}/\theta^{[j]},\;j=1,2,3$ . This yields the following alternative models:

$\mathbb{M}_{1}$ : $\displaystyle h_{1}(t;\mu)=1+\theta^{[1]}a_{1}(t;\mu),\;\;\;\mbox{with}\;\;\;a^{[1]}(t;\mu)=\frac{\varphi(\Phi^{-1}(t)-\mu)}{\varphi(\Phi^{-1}(t))}-1,\;\;\mu\in\mathbb{R},\;\mu\neq 0,$

$\mathbb{M}_{2}$ : $\displaystyle h_{2}(t;\sigma)=1+\theta^{[2]}a_{2}(t;\sigma),\;\;\;\mbox{with}\;\;\;a^{[2]}(t;\sigma)=\frac{\varphi(\frac{1}{\sigma}\Phi^{-1}(t))}{\sigma\varphi(\Phi^{-1}(t))}-1,\;\;\sigma\in\mathbb{R}_{+},\;\sigma\neq 1,$

$\mathbb{M}_{3}$ : $\displaystyle h_{3}(t;p,\mu)=1+\theta^{[3]}a_{3}(t;p,\mu),\;\;\;\mbox{with}\;\;\;a^{[3]}(t;p,\mu)=p\Bigl{\{}\frac{\varphi(\Phi^{-1}(t)-\mu)}{\varphi(\Phi^{-1}(t))}-1\Bigr{\}},\\ \hskip 28.45274ptp\in(0,1),\;\mu\in\mathbb{R},\;\mu\neq 0.$

The functions $a_{1}$ and $a_{3}$ are unbounded while $a_{2}$ is bounded for $\sigma\leq 1$ and unbounded otherwise. It holds that $a_{j}(u;\cdot)\in L_{2}(0,1),\;j=1,2,3$ . Set $A_{j}(t;\cdot)=\int_{0}^{t}a_{j}(u;\cdot)du.$ We have $A_{1}(t;\mu)=[\Phi(\Phi^{-1}(t)-\mu)-t]/\theta^{[1]},\;A_{2}(t;\sigma)=[\Phi(\frac{1}{\sigma}\Phi^{-1}(t))-t]/\theta^{[2]},\;A_{3}(t;p,\mu)=[pA_{1}(t;\mu)]/\theta^{[3]}=A_{1}(t;\mu).$ The last relation implies that the intermediate efficiency of the mixture does not depend on $p$ . In contrast, the efficiency is influenced by a change of the “direction” of the noise in the mixture; i.e. $\Phi(x-\mu)$ in this particular case. More examples of mixtures are discussed in Section 9.

Similarly as in Section 5, given $A_{j}$ , set

[TABLE]

Note that for the functions $A_{1}$ and $A_{3}$ and all related parameters under consideration (5.6) holds with any $\varpi\in(0,1/2)$ and hence (5.3) and (6.3) hold, as well (cf. Remark 2). For $A_{2}$ , if $\sigma<1$ then (5.6) holds with $\varpi=0$ ; if $\sigma\in(1,\sqrt{2})$ then (5.6) holds with $\varpi\in[1-\sigma^{-2},1/2)$ ; if $\sigma=\sqrt{2}$ then (5.3) holds while (5.6) does not. For all $\sigma>0$ (6.3) is satisfied.

**8.2. Alternatives from $\mathbb{M}_{1}$ , $\mathbb{M}_{2}$ and $\mathbb{M}_{3}$ satisfying (5.3), (5.6) and (6.3), corresponding efficiencies and simulated powers

**

We restrict our attention to ${\cal I}_{n}$ , ${\cal M}_{n}$ , ${\cal K}_{n}$ , and two selected members of the class of statistics ${\cal E}_{n}={\cal E}_{n}(\kappa_{n})$ , indexed by $\kappa_{n}$ satisfying (ii) of Lemma 2. It is intuitively clear that using a relatively small parameter $\kappa_{n}$ can be profitable when under an alternative a considerable amount of a probability mass is shifted towards one or both tails, while a larger $\kappa_{n}$ is expected to be more useful in detecting centrally located changes. For an illustration we took

[TABLE]

and

[TABLE]

In the simulation experiments the significance level was set to $\alpha=0.01$ and the number of MC runs for estimating sizes was $10^{5}$ . Moreover, we used $10^{4}$ MC runs for estimating powers. The programs were written in C Sharp.

We have considered $\mathbb{M}_{1}$ with $\mu=0.15$ , $\mathbb{M}_{2}$ with $\sigma=0.75$ and $\sigma=1.25$ and $\mathbb{M}_{3}$ with $p=0.05,\;\mu=2.00$ . For all the cases the assumptions (5.3), (5.6) and (6.3) are satisfied. Hence our theoretical results on the intermediate efficiencies are applicable.

The selected models, the corresponding efficiencies and the related empirical powers are presented in Figures 1 and 2. In the first row of the figures we display graphs of $a_{j}$ and $A^{*}_{j}$ , $j=1,2,3$ , and the corresponding values of $t_{0}$ , $m_{0}$ , where $t_{0}=\arg\max|A_{j}^{*}(t)|$ and $m_{0}=|A_{j}^{*}(t_{0})|$ .

The middle rows show empirical powers of ${\cal E}_{n}^{o}$ , ${\cal E}_{n}^{\star}$ , ${\cal I}_{n}$ , ${\cal M}_{n}$ and ${\cal K}_{n}$ , against $n$ .

The bottom rows show the above power curves for sample sizes not exceeding the first value for which the empirical power of ${\cal E}_{n}^{o}$ attains the value in $[0.99,1]$ . We additionally display here the values of the efficiencies $e_{{\cal E}{\cal K}}$ and $e_{{\cal I}{\cal K}}$ . In all four cases $e_{{\cal E}{\cal K}}>1$ as well as $e_{{\cal I}{\cal K}}>1.$ In the last row we also present the corresponding simulation results for ${\cal K}_{n\cdot e_{{\cal E}{\cal K}}}$ and ${\cal K}_{n\cdot e_{{\cal I}{\cal K}}}$ i.e. the empirical powers for the Kolmogorov-Smirnov test based on the corrected sample sizes ${n}\cdot{e_{{\cal E}{\cal K}}}$ and ${n}\cdot{e_{{\cal I}{\cal K}}}$ , respectively. The zoom applied here allows to see well the way in which the corrected sample sizes influence the empirical powers of ${\cal K}_{n}$ .

The results show that the finite sample interpretation of the intermediate efficiency indeed reflects very well the behavior of empirical powers of ${\cal K}_{n}$ . For very large values of the efficiency $e_{{\cal E}{\cal K}}$ and relatively small sample sizes, as is the case for the model ${\mathbb{M}}_{3}$ in Figure 2, the empirical powers of ${\cal K}_{n\cdot e_{{\cal E}{\cal K}}}$ considerably overestimate the powers of ${\cal E}_{n}^{o}$ and ${\cal E}_{n}^{\star}$ . However, it is hard to expect very accurate small sample results in such an extreme situation. In any case, the message is informative. The results of simulations also indicate that the 0 efficiency of ${\cal M}_{n}$ with respect to ${\cal K}_{n}$ should not be surprising. Shapes of empirical powers of ${\cal M}_{n}$ , as functions of $n$ , are very different from those for ${\cal K}_{n}$ . For the alternatives under consideration one needs a relatively huge number of observations to achieve a high power of the test based on ${\cal M}_{n}$ . Similar pictures are expected to be valid for many other classical alternative distribution models.

**9. On the behavior of ${\cal E}_{n}$ and ${\cal K}_{n}$ when (5.3) is violated

**

The above part of the paper gives some quite reliable insight into the behavior of powers of the Kolmogorov-Smirnov ${\cal K}_{n}$ test and the selected Eicker-Jaeschke statistics ${\cal E}_{n}$ and ${\cal M}_{n}$ , in the case when the tails of an alternative are relatively light; i.e. the conditions (5.3) and (5.6) are satisfied. Under these conditions $e_{{\cal E}{\cal K}}\geq 1$ and $e_{{\cal M}{\cal K}}=0$ , respectively. From previous developments it follows that one should expect much worse power behavior of ${\cal K}_{n}$ in the case of alternatives obeying relatively heavy tails. We shall study this question in the present section by contrasting the behavior of ${\cal K}_{n}$ with ${\cal E}_{n}$ , in the case when the condition (5.3) is violated. Since we are aware of an extension of Theorem 1 in this case, we are able to calculate only a so-called weak variant of the intermediate efficiency. Let us denote it by $\hat{e}_{{\cal E}{\cal K}}$ . This weak variant is defined as a limit of the ratio of the slopes, as $n$ tends to infinity. The difference between $\hat{e}_{{\cal E}{\cal K}}$ and $\ e_{{\cal E}{\cal K}}$ resembles to some extent the difference between the approximate and the exact Bahadur efficiency. The weak variant of the intermediate efficiency was already studied in Ivchenko and Mirakhmedov (1995), and Inglot (1999).

To calculate $\hat{e}_{{\cal E}{\cal K}}$ for a local sequence of alternatives $F_{n}(t)=t+\theta_{n}A(t)$ , when (5.3) is violated, set

[TABLE]

and denote by $t_{n}$ any point at which the supremum in (9.1) is attained.

Lemma 7. *Suppose that $m_{n}\to\infty$ and $t_{n}\to 0$ or $t_{n}\to 1$ , as $n\to\infty$ . Assume that $\lim\inf_{n\to\infty}n\kappa_{n}/\log^{2}n>0$ , $\lim_{n\to\infty}\log\kappa_{n}/\log n<0,\;n\theta_{n}^{2}/\log\log(1/\kappa_{n})\to\infty$ , and $\theta_{n}^{2}m_{n}^{2}/\kappa_{n}\to 0$ . Then one gets *

[TABLE]

Hence, the intermediate slope ${c_{\cal E}[b_{\cal E}(P_{\theta_{n}}^{n})]^{2}}$ of ${\cal E}_{n}$ under $\{P_{\theta_{n}}\}$ has the form $n\theta_{n}^{2}m_{n}^{2}/2$ .

Corollary 1. Under the assumptions of Lemma 7 it holds that

[TABLE]

The relation (9.3) suggests that perhaps the intermediate efficiency $e_{{\cal E}{\cal K}}$ of ${\cal E}_{n}$ with respect to ${\cal K}_{n}$ equals $+\infty$ , as well. However, verifying this would require non-trivial investigations of the question on non-degeneracy of the asymptotic power of ${\cal E}_{n}$ under the above described local alternatives. This is a challenging open question. Note that non-degenerate asymptotic power of ${\cal E}_{n}$ is needed to have the interpretation of the intermediate efficiency in terms of the limiting ratio of appropriate sample sizes; cf. Theorem 1 in Inglot et al. (2018).

We show below that even this weak variant $\hat{e}_{{\cal E}{\cal K}}$ of the efficiency gives a right indication on an empirical power behavior of ${\cal E}_{n}$ and ${\cal K}_{n}$ , when (5.3) fails.

We shall study an empirical behavior of ${\cal E}_{n}^{o}$ , ${\cal E}_{n}^{\star}$ , ${\cal K}_{n}$ , as well as ${\cal M}_{n}$ and ${\cal I}_{n}$ under the following alternative models

${\mathbb{M}}_{4}:H_{4}(t;\beta,\pi)=\{\pi^{(\beta-1)/\beta}{t^{1/\beta}}\}{\bf 1}_{[0,\pi)}(t)+t{\bf 1}_{[\pi,1-\pi]}(t)+\{1-\pi^{(\beta-1)/\beta}(1-t)^{1/\beta}\}{\bf 1}_{(1-\pi,1]}(t)$ , where $\beta>0,\;\pi\in[0,0.5]$ , and $t\in[0,1]$ ,

${\mathbb{M}}_{5}:H_{5}(x;\delta,p)=(1-p)\Phi(x)+p\Lambda(x;\delta),\;\delta>0,\;p\in[0,1],\;x\in\mathbb{R},$ where $\Lambda(x;\delta)=[\Phi(x)]^{\delta}$ is the Lehmann distribution; cf. Remark 3,

${\mathbb{M}}_{6}:H_{6}(x;\gamma,p)=(1-p)\Phi(x)+p\Sigma(x;\gamma),\;\gamma>0,\;p\in[0,1],\;x\in\mathbb{R},$ where $\Sigma(x;\gamma)$ is the symmetric Subbotin distribution function obeying the density $C_{\gamma}\exp\{-|x|^{\gamma}/\gamma\},\;x\in\mathbb{R},$

${\mathbb{M}}_{7}:H_{7}(x;\zeta,p)=(1-p)\Phi(x)+p\Pi(x;\zeta),\;\zeta>0,\;p\in[0,1],\;x\in\mathbb{R},$ where $\Pi(x;\zeta)$ is the distribution function of the symmetric Pareto distribution with the parameter $\zeta$ ; cf. Remark 4.

The model ${\mathbb{M}}_{4}$ comes from Mason and Schuenemeyer (1983). If $\beta\in(0,1)$ then $H_{4}(t;\beta,\pi)$ has lighter tails than the uniform (0,1) distribution, say $U(0,1)$ . When $\beta>1$ then $H_{4}(t;\beta,\pi)$ has heavier lower and upper tails than $U(0,1)$ . For ${\mathbb{M}}_{4}$ the condition (5.3) does not hold if $\beta\geq 2$ . ${\mathbb{M}}_{4}$ defines alternatives with an allocation of the probability mass only on the tails.

${\mathbb{M}}_{5}$ - ${\mathbb{M}}_{7}$ were chosen as mixtures. Detection of mixtures is of vital interest. Lehmann’s model, used in ${\mathbb{M}}_{5}$ , is popular in the statistical literature. The Subbotin distribution is discussed in Donoho and Jin (2004). The mixture ${\mathbb{M}}_{7}$ has been inspired by Jin et al. (2005), where an additive model with disturbances with algebraically decreasing tails was considered. For ${\mathbb{M}}_{5}$ with $\delta>0$ , ${\mathbb{M}}_{6}$ with $\gamma\in(0,2)$ , and ${\mathbb{M}}_{7}$ with $\zeta>0$ the condition (5.3) does not hold.

Each of the models ${\mathbb{M}}_{j},\;j=4,...,7,$ can be equivalently rewritten in the form $1+\theta^{[j]}a_{j}(t;\cdot).$ The functions $a_{j},\;j=4,6,7$ , are symmetrical with respect to 1/2 and unbounded at 0 and 1 while $a_{5}$ is unbounded at 0. For $t$ close to 0 the functions $a_{4},...,a_{7}$ behave like: $t^{(1-\beta)/\beta},\;t^{\delta-1},\;t^{-1}\exp\{-\frac{1}{\gamma}[\log(1/t^{2}\log(1/t^{2}))]^{\gamma/2}-\frac{1}{2}[\log\log(1/t)]\}$ , $\;t^{-1}[\log(1/t)]^{-1-\zeta/2},$ respectively. Note also that $a_{4},...,a_{7}$ do not belong to $L_{2}(0,1)$ for $\beta\geq 2,\delta\leq 1/2,\gamma<2,\zeta>0$ , accordingly.

In Figure 3 we plot empirical powers of the considered tests, under $\alpha=0.01$ and some selected $n$ and $p$ , against the parameters $\pi,\;\beta,\;\delta,\;\gamma,$ and $\zeta$ of the considered models. The outcomes show that, when (5.3) is violated, empirical behavior of ${\cal K}_{n}$ is very poor and resembles the behavior of ${\cal M}_{n}$ in previous figures. In contrast, now ${\cal M}_{n}$ does very well. Obviously, the imposed lack of (5.3) implies the violation of (5.6), as well. Moreover, except for the cases when a very large amount of probability mass is shifted to the ends of $(0,1)$ , ${\cal E}_{n}^{\star}$ also works very well. In all situations shown in Figure 3 the variant ${\cal E}_{n}^{\star}$ dominates ${\cal E}_{n}^{o}$ considerably. The empirical behavior of ${\cal I}_{n}$ is not impressive in comparison to ${\cal M}_{n}$ and ${\cal E}_{n}^{\star}$ .

It should be emphasized that we have not conducted an extensive search for $\kappa_{n}^{o}$ and $\kappa_{n}^{\star}$ defining ${\cal E}_{n}^{o}$ and ${\cal E}_{n}^{\star}$ . We simply took the two candidates which satisfy the assumption (ii) of Lemma 2, i.e. $\kappa_{n}$ satisfying $\liminf_{n}n\kappa_{n}/\log^{2}n>0$ . In spite of this, from the outcomes in Figures 1 - 3, it can be seen that ${\cal E}_{n}^{\star}$ is a reasonably well balanced solution. At any rate, some search for a data-driven choice of the smoothing parameter $\kappa_{n}$ would be very welcome.

10. Discussion

The present paper illustrates the advantages of using the pathwise variant of the Kallenberg efficiency to study goodness-of-fit to a completely known continuous distribution function. In Inglot et al. (2018) the paths were defined as mixtures of a big fraction of the null distribution and a small fraction of an alternative one. Consequently, we consider $(1-\vartheta_{n})F_{0}(x)+\vartheta_{n}F_{1}(x)$ , where $F_{0}$ is the null distribution, $F_{1}$ represents the alternative, and $\vartheta_{n}\to 0$ as $n\to\infty$ . For convenience, in this paper we have transformed the observations to (0,1) via $F_{0}$ , cf. (2.1), but it is not essential to the interpretation of the results. Moreover, to increase the readability of the results, we introduced (2.2). Anyway, in essence the pathwise variant of the efficiency evaluates the quality of tests by measuring their ability to detect (local) mixtures. On the other hand, the mixtures define “directions” along which we approach the null model and, as a rule, the corresponding results on the efficiency are valid for many “directions”. Moreover, in the intermediate approach $\vartheta_{n}$ decreases relatively slowly. The above implies that the resulting, asymptotic in nature, expression for the efficiency gives reliable results on empirical powers under fixed alternatives which are not necessarily mixtures, fixed sample sizes, and standard significance levels.

At first glance, our approach resembles detecting mixtures under the dense regime; cf. Cai et al. (2011) for the terminology and an insightful introduction to the problem. However, we are focused on a goodness-of-fit context and our goal is not to study if and when a procedure can detect or fail to detect a given mixture, but we would like to investigate how well a selected test can distinguish some classes of alternatives from the null model. Therefore, in contrast to the signal detection approach, we insist on having the error of the second kind in $(0,1)$ . Moreover, the distribution function $F_{1}$ is fixed, independent on $n$ . So, our setting differs from the typical approach in studies of detectable and undetectable regions, originated by Ingster (1997) and extensively developed in recent years; cf. Ditzhaus (2018) for the most general setting and historical details. Also, the outcomes of both approaches are qualitatively different. A typical feature of Ingster’s approach is that whole big classes of tests have the same detection boundaries; cf. Jager and Wellner (2007), and Ditzhaus (2018) for an illustration. In contrast, the Kallenberg efficiency allows for catching some subtle differences between test statistics. It seems that some further investigations on this approach could result in better understanding advantages and limitations of popular classes of modern goodness-of-fit statistics. In particular, some more work on the asymptotic distribution of test statistics under the regime $\vartheta_{n}\to 0$ and $n\vartheta_{n}^{2}\to\infty$ is necessary. Moreover, moderate deviations for the whole classes of test statistics, which were recently considered, should be developed. As illustrated by our analysis of ${\cal I}_{n}$ and related discussion, for sufficiently smooth functionals of the weighted empirical process deriving the intermediate efficiency is relatively easy. Sup-type functionals are less regular and more difficult to handle. Anyway, in our opinion, the present paper shows that such work is worthy of further consideration. In particular, it would be interesting to close our investigations on ${\cal M}_{n}$ and ${\cal E}_{n}$ by showing if and when their intermediate efficiencies with respect to ${\cal K}_{n}$ exist in the situation when (5.6) and (5.3), respectively, are violated.

**Appendix: Proofs

**

**A.1. Proof of Lemma 2

**

Let $U_{1},...,U_{n}$ be independent uniform (0,1) random variables and let $U_{(1)}\leq...\leq U_{(n)}$ denote their order statistics.

(i) Let $i_{n}=\lfloor 3nw_{n}\sqrt{\kappa_{n}}\rfloor$ . Then, by the assumption $w_{n}/\sqrt{\kappa_{n}}\to\infty$ , we have for sufficiently large $n$

[TABLE]

Since $j!\geq j^{j}e^{-j}$ for all $j\geq 1$ then

[TABLE]

Hence and from the relation $en\kappa_{n}/i_{n}<1/2$ for sufficiently large $n$ we get

[TABLE]

Moreover, since $i_{n}/n\to 0$ , then for sufficiently large $n$ it holds $P_{0}^{n}(U_{(i_{n})}>1-\kappa_{n})\leq P_{0}^{n}(U_{(i_{n})}<\kappa_{n})$ . On the other hand by $j!\leq j^{j+1}e^{-j}$ being true for $j\geq 7$ we have

[TABLE]

As $1-i_{n}/n>2/3$ for sufficiently large $n$ , the above inequality and the definition of $i_{n}$ imply for sufficiently large $n$

[TABLE]

Combining (A.1), (A.2) and (A.3), again by the definition of $i_{n}$ and the assumption $w_{n}/\sqrt{\kappa_{n}}\to\infty$ , we obtain for sufficiently large $n$

[TABLE]

Imposing the logarithm in (A.4), dividing by $-nw_{n}^{2}$ , and using again the assumption $w_{n}/\sqrt{\kappa_{n}}\to\infty$ we get

[TABLE]

and the proof is complete. $\Box$

(ii) Let $u_{n}(t)$ be the uniform empirical process and denote

[TABLE]

Since $nw_{n}^{2}\to\infty$ , then the assumption on $\kappa_{n}$ implies $n^{2}w_{n}^{2}\kappa_{n}/\log^{2}n\to\infty$ . Let $\varepsilon_{n}>0,\;\varepsilon_{n}\to 0$ , be such that $w_{n}^{2}/(\kappa_{n}\varepsilon_{n}^{2})\to 0$ and $n^{2}w_{n}^{2}\kappa_{n}\varepsilon_{n}^{2}/\log^{2}n\to\infty$ . Then for any fixed $c\in(0,1)$ and sufficiently large $n$ we have

[TABLE]

and

[TABLE]

Moreover, from KMT inequality we have

[TABLE]

where $l,L,C$ are universal positive constants.

If we shall show that for any $w_{n}\to 0$ and such that $nw_{n}^{2}\to\infty$ it holds

[TABLE]

then by the choice of $\varepsilon_{n}$ , the first component in the exponent on the right hand side of (A.7) dominates the second one and simultaneously the first component on the right side of (A.5) and (A.6) dominates the second one and (4.4) follows from (A.8).

To prove (A.8) recall that from the Darling-Erdös theorem (cf. Csörgő and Horvath, 1993, pp. 257-258) it follows that

[TABLE]

where $a_{n}=2\log\log(1/\kappa_{n}-1)$ . Denote by $\mu_{n}$ the median of $Z_{n}$ . Then from the above relation $\sqrt{a_{n}}\mu_{n}-a_{n}-(\log(a_{n}/(2\pi))/2\to\mu$ , where $\mu=\log(2/\log 2)$ is the median of the limiting distribution. Hence $\mu_{n}=\sqrt{a_{n}}+o(1)$ , and $\mu_{n}$ tends to infinity. By a straightforward application of the Borell inequality for $Z_{n}$ (see e.g. van der Vaart and Wellner, 2000, p. 438) we get for every $n$ and $y>0$

[TABLE]

Since by the assumption it follows $nw_{n}^{2}/\mu_{n}^{2}\to\infty$ then inserting $y=\sqrt{n}w_{n}-\mu_{n}$ into the last inequality we get

[TABLE]

On the other hand, for any $\epsilon\in(0,1/2)$ and sufficiently large $n$ we have $\kappa_{n}<\epsilon$ and consequently

[TABLE]

The last two relations complete the proof of (A.8). $\Box$

**A.2. Proof of Lemma 4 (i)

**

The proof goes along the lines of that of Lemma 1 in Mason (1985). For any $n\geq 1$ the function $h(y)=y+w_{n}y^{\tau}-1/n$ is increasing on $(0,\infty)$ and $h((2nw_{n})^{-1/\tau})<0$ due to $nw_{n}^{1/(1-\tau)}\geq nw_{n}^{2}\to\infty$ . This gives the following estimate

[TABLE]

By the inequality $1-(1-y)^{n}>ny/e$ holding for $y<1/n$ we infer that for some positive $c$

[TABLE]

Taking logarithms of both sides of (A.9), dividing by $-nw_{n}^{2}$ and using the assumption $nw_{n}^{2}/\log n\to\infty$ we get

[TABLE]

which completes the proof. $\Box$

**A.3. Proof of Lemma 3 (i)

**

Observe that for $\tau=1/2$ the above proof is valid. The only difference is that in (A.10) the first component on the right hand side vanishes and the assumption $nw_{n}^{2}/\log n\to\infty$ becomes superfluous. So, Lemma 3 (i) holds true. $\Box$

**A.4. Proof of Theorem 1

**

Let $u_{n}(t),\;t\in(0,1)$ , be the uniform empirical process and set $v(t)=\sqrt{t(1-t)}$ . By (5.3) there exists $t_{0}\in(0,1)$ such that $\displaystyle|A^{*}(t_{0})|=\sup_{(0,1)}|A^{*}(t)|=m_{0}$ .

(i) It holds that

[TABLE]

When $A(t_{0})>0$ then (A.11) is majorized by ${Pr}\left(u_{n}(F_{n}(t_{0}))/\sqrt{t_{0}(1-t_{0})}\leq w\right)$ converging to ${\Phi}(w)$ . If $A(t_{0})<0$ then the majorant $Pr\left(u_{n}(F_{n}(t_{0}))/\sqrt{t_{0}(1-t_{0})}\geq-w\right)$ of (A.11) converges to ${\Phi}(w)$ as well. This proves (i).

(ii). The key step is to show that for $\delta=\delta_{\cal E}(A)$ appearing in (5.4)

[TABLE]

Indeed, having (A.12), for positive $w$ the triangle inequality and (5.4) imply that

[TABLE]

Since $u_{n}\circ F_{n}$ converges in distribution to a Brownian bridge, then (ii) follows.

Now, by the definitions of $m_{0}$ and $\delta$ , using the triangle inequality we infer that

[TABLE]

For $t\in(0,1)$ we have

[TABLE]

So, by (5.3) and the assumption $\theta_{n}^{2}/\kappa_{n}\to 0$ for $t\in[\kappa_{n},1-\kappa_{n}]$ and sufficiently large $n$ the right hand side of (A.14) can be estimated by

[TABLE]

Hence, for $t\in[\kappa_{n},1-\kappa_{n}]$ and $n$ sufficiently large we have

[TABLE]

and the right hand side in (A.13) is majorized by

[TABLE]

which, in view of the assumption $(n\theta_{n}^{2})/\log\log n\to\infty$ , as $n\to\infty$ , and an application of the main result of Mason (1985), tends to 0. This concludes the proof of (A.12). $\Box$

**A.5. Proof of Theorem 2

**

As previously, let $u_{n}(t),\;t\in(0,1)$ , be the uniform empirical process and set $v(t)=\sqrt{t(1-t)}$ . By (5.6) there exists $t_{0}\in(0,1)$ such that $\displaystyle|A^{*}(t_{0})|=\sup_{(0,1)}|A^{*}(t)|=m_{0}$ . We can write

[TABLE]

Proof of (i). Since

[TABLE]

then we proceed exactly in the same way as in the proof of (i) in Theorem 1.

Proof of (ii). The key step is to show that for $\delta=\delta_{\cal M}(A)$ appearing in (5.7)

[TABLE]

Indeed, having (A.15), and arguing as in the proof of (ii) of Theorem 1, (5.7) imply that for positive $w$

[TABLE]

Since $u_{n}\circ F_{n}$ converges in distribution to a Brownian bridge, then (ii) follows.

To prove (A.15) let $(\theta_{n})$ be such that $n^{\varpi}\theta_{n}\to 0$ as $n\to\infty$ . Let $(\iota_{n})$ be a sequence such that $\iota_{n}\leq\log n$ and $\iota_{n}\to\infty$ as $n\to\infty$ . Let $U_{(1)}\leq...\leq U_{(n)}$ be order statistics of $n$ i.i.d. $U(0,1)$ random variables. Set

[TABLE]

Then, due to (5.6) and the assumption $n^{\varpi}\theta_{n}\to 0$ ,

[TABLE]

Now, by the definitions of $m_{0}$ and $\delta$ in (5.7) and (A.16), we infer in the same way as in (A.13) that

[TABLE]

On the event $\mathbb{E}_{n}$ , for $t\in(0,F^{-1}_{n}(U_{(1)}))$ and $n$ sufficiently large, by (5.6) and $n^{\varpi}\theta_{n}\to 0$ , it holds that

[TABLE]

The same estimate holds on $\mathbb{E}_{n}$ for $\;t\in(F^{-1}_{n}(U_{(n)}),1)$ . On the other hand, on the event $\mathbb{E}_{n}$ , for $t\in[F_{n}^{-1}(U_{(1)}),F_{n}^{-1}(U_{(n)})]$ and $n$ suficiently large, by (A.14) and (5.6),

[TABLE]

provided that $\iota_{n}\to\infty$ is chosen in such a way that $(n\iota_{n})^{\varpi}\theta_{n}\to 0$ . The relations (A.18) and (A.19) allow to majorize the right hand side of (A.17) by

[TABLE]

In view of the assumption $(\log n\theta_{n}^{2})/\log\log n\to\infty$ as $n\to\infty$ , we have $\sqrt{\iota_{n}}/(\sqrt{n}\theta_{n})\to 0$ . An application of the main result of Mason (1985) concludes the proof of (ii).

By (i) and (ii), ${\cal S}_{n}-\theta_{n}\sqrt{n}m_{0}$ is bounded in the probability $P_{\theta_{n}}$ and, in consequence, ${\cal S}_{n}/\theta_{n}\sqrt{n}m_{0}\stackrel{{\scriptstyle P_{\theta_{n}}}}{{\longrightarrow}}1$ . Hence, for $b_{\cal M}(P_{\theta_{n}}^{n})=\sqrt{\log(\theta_{n}\sqrt{n}m_{0})}$ it holds that ${\cal M}_{n}-b_{\cal M}(P_{\theta_{n}}^{n})\stackrel{{\scriptstyle P_{\theta_{n}}}}{{\longrightarrow}}0$ . Therefore, for $\{P_{\theta_{n}}\}$ satisfying the assumptions of Theorem 2, we infer that ${\cal M}_{n}/b_{\cal M}(P_{\theta_{n}}^{n})\stackrel{{\scriptstyle P_{\theta_{n}}}}{{\longrightarrow}}1$ . $\Box$

**A.6. Proof of Theorem 3

**

Let us start with an useful elementary result.

Lemma A.1. Let $\{T_{n}\}$ be a sequence of non-negative random variables defined on a probability space with a measure $P$ which can depend on $n$ . Moreover, let $\{\mu_{n}\}$ be a sequence of positive numbers tending to infinity as $n\to\infty$ .

Then the following conditions are equivalent:

*(i) $\;T_{n}-\mu_{n}\stackrel{{\scriptstyle D}}{{\longrightarrow}}T$ ;

*(ii) $\displaystyle\frac{T_{n}^{2}-\mu_{n}^{2}}{2\mu_{n}}\stackrel{{\scriptstyle D}}{{\longrightarrow}}T$ ;

and each of them implies $T_{n}/\mu_{n}\stackrel{{\scriptstyle P}}{{\longrightarrow}}1.$

Set $T_{n}={\cal I}_{n}$ and $\mu_{n}=\sqrt{n}\theta_{n}||A^{*}||_{2}$ . Recall that under $P_{\theta_{n}}^{n}$ the empirical process $\sqrt{n}(\hat{F}_{n}(t)-t)$ has the same distribution as $u_{n}(F_{n}(t))+\sqrt{n}\theta_{n}A(t)$ , where $u_{n}(t)$ denotes the uniform empirical process. Hence

[TABLE]

where for $f\in D[0,1]$

[TABLE]

By the inequality $(a+b)^{r}\leq a^{r}+b^{r},\;a,b>0,\;0<r<1,$ we have for sufficiently large $n$

[TABLE]

Applying the above estimate to $r=2\ell$ and $r=\ell$ , by (6.3) and the Lebesgue Dominated Theorem it follows

[TABLE]

Moreover, for any sequence $f_{n}(t)\in D[0,1]$ converging in $D[0,1]$ to $f\in C[0,1]$ it holds $d_{n}(f_{n})\to d(f)$ and $l_{n}(f_{n})\to l(f)$ . As the process $B(t)/(t(1-t))^{\ell}$ has continuous trajectories a.s., then by Theorem 5.5 of Billingsley(1968) the above implies

[TABLE]

and

[TABLE]

The right hand side of (A.22) is a mean-zero Gaussian random variable with the variance $\rho^{2}_{A}$ . Using this, the assumption $n\theta_{n}^{2}\to\infty$ and (6.3) the proof follows from (A.20) - (A.22). $\Box$

**A.7. Proof of Theorem 5

**

To prove (7.4) we shall exploit throughout the relation of ${\cal M}_{n}$ and ${\cal S}_{n}$ , and corresponding results for ${\cal S}_{n}$ . Take any $w>0$ and define $w_{n}^{*}=\sqrt{\log(1+w+\theta_{n}\sqrt{n}m_{0})}$ . Set $\alpha_{n}=P_{0}^{n}({\cal M}_{n}\geq w_{n}^{*})$ . Since ${\cal M}_{n}$ has continuous and increasing distribution function then $t_{\alpha_{n}n}=w_{n}^{*}$ is the critical value of ${\cal M}_{n}$ corresponding to the level $\alpha_{n}$ .

By the assumption we have $(w_{n}^{*})^{2}/\log\log n\geq[\log\theta_{n}\sqrt{n}+\log m_{0}]/\log\log n\to\infty$ and Lemma 3 implies that

[TABLE]

This yields $\alpha_{n}\to 0$ and $-[\log\alpha_{n}]/n\to 0$ and means that $\{\alpha_{n}\}$ is an admissible significance level and the assumptions of Theorem 1 in Inglot et al. (2018) hold with $\gamma_{n}=\log\log n$ and $\lambda_{n}=n$ .

On the other hand, the power of ${\cal M}_{n}$ under $P_{\theta_{n}}^{n}$ equals $P_{\theta_{n}}^{n}({\cal M}_{n}\geq w_{n}^{*})=P_{n}^{n}({\cal S}_{n}-\sqrt{n}\theta_{n}m_{0}\geq w)$ . By Theorem 2 we have

[TABLE]

This proves that under $\{\alpha_{n}\}$ the test ${\cal M}_{n}$ has non-degenerate asymptotic power. By Proposition 1 of the present contribution and Theorem 1 in Inglot et al. (2018) the proof of (7.4) is concluded. $\Box$

**A.8. Proof of Lemma 7

**

By (A.14) and the assumption $\theta_{n}^{2}m_{n}^{2}/\kappa_{n}\to 0$ we have for $t\in[\kappa_{n},1-\kappa_{n}]$

[TABLE]

From the Chebyshev’s inequality and (A.23) we get for arbitrary $\epsilon\in(0,1)$

[TABLE]

where $t_{n}$ is defined in (9.1). On the other hand, since $F_{n}(\kappa_{n})\sim\kappa_{n},\;F_{n}(1-\kappa_{n})\sim 1-\kappa_{n}$ , then by the triangle inequality for sufficiently large $n$ we have

[TABLE]

where $a_{n}=2\log(\log(2/\kappa_{n}-1))$ is the normalizing sequence in the Darling-Erdős theorem. The last expression tends to 0 due to the assumption $n\theta_{n}^{2}/\log\log(1/\kappa_{n})\to\infty$ and the theorem of Jaeschke (1979). Combining (A.24) and (A.25) we obtain (9.2). $\Box$

Acknowledgements. The work of B. Ćmiel was partially supported by the Faculty of Applied Mathematics AGH UST dean grant for PhD students and young researchers within subsidy of Ministry of Science and Higher Education.

Bibliography52

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Adler, R.J. (1990). An Introduction to Continuity, Extrema, and Related Topics for General Gaussian Processes . Institute of Mathematical Statistics. Lecture Notes-Monograph Series. Vol. 12, Hayward, California.
2[2] Anderson, T. W. and Darling, D. A. (1952). Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann. Math. Statist. 23 193-212.
3[3] Billingsley, P. (1968). Convergence of Probability Measures . Wiley.
4[4] Borovkov, A. A. and Sycheva, N. M. (1968). On asymptotically optimal non-parametric criteria. Theory Probab. Appl. 13 359-393.
5[5] Borovkov, A. A. and Sycheva, N. M. (1970). On asymptotically optimal nonparametric criteria. In Nonparametric Techniques in Statistical Inference (M. L. Puri, ed.) 259-266. Cambridge Univ. Press.
6[6] Cai, T. T., J. Jeng and J. Jin (2011). Optimal detection of heterogeneous and heteroscedastic mixtures. J. R. Statist. Soc. B 73 629-662.
7[7] Chicheportiche, R. and Bouchaud, J.-F. (2012). Weighted Kolmogorov-Smirnov test: Accounting for the tails. Physical Review E 86 041115-1 - 041115-6.
8[8] Cont, R. (2001). Empirical properties of assets: stylized facts and statistical issues. Quant. Fin. 1 223-236.