Weak Convergence of Stationary Empirical Processes
Dragan Radulovic, Marten Wegkamp

TL;DR
This paper extends the weak convergence results of empirical processes to more general, function-indexed processes under broad dependence conditions, providing a unified framework for stationary empirical processes.
Contribution
It introduces a general weak convergence result for empirical processes indexed by functions of bounded variation, applicable under alpha mixing dependence.
Findings
Weak convergence established for processes indexed by functions of bounded variation.
Applicable to stationary sequences with alpha mixing dependence.
Extends classical empirical process results to more general settings.
Abstract
We offer an umbrella type result which extends weak convergence of the classical empirical process on the line to that of more general processes indexed by functions of bounded variation. This extension is not contingent on the type of dependence of the underlying sequence of random variables. As a consequence we establish weak convergence for stationary empirical processes indexed by general classes of functions under alpha mixing conditions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Weak Convergence of Stationary Empirical Processes
Dragan Radulović
Department of Mathematics, Florida Atlantic University
Marten Wegkamp
Department of Mathematics & Department of Statistical Science, Cornell University
Abstract
We offer an umbrella type result which extends weak convergence of classical empirical processes on the line to that of more general processes indexed by functions of bounded variation. This extension is not contingent on the type of dependence of the underlying sequence of random variables. As a consequence we establish weak convergence for stationary empirical processes indexed by general classes of functions under -mixing conditions.
Running title: Weak convergence of stationary empirical processes
MSC2000 Subject classification: Primary 60F17; secondary 60G99.
Keywords and phrases: Integration by parts, bounded variation, empirical processes, stationary distributions, weak convergence.
1
Introduction
We consider the empirical process
[TABLE]
indexed by some class of functions . It is an obvious generalization of the classical process
[TABLE]
in which case the indexing class is . If the underlying sequence is i.i.d., then the limiting behavior (as ) of the empirical processes is well understood. The same is true for the bootstrap counterpart based on an i.i.d. bootstrap sample , see, for instance, Van der Vaart and Wellner (1996) and Dudley (1999). The theory of weak convergence of empirical processes based on independent sequences has yielded a wealth of statistical applications and, in particular, it was instrumental for establishing the weak convergence of numerous novel statistics. Often the limiting distributions of these statistics do not allow for closed form solutions, in which case the bootstrap version of the process is utilized. As is standard practice nowadays, see the thorough exposition in the first chapter of Van der Vaart and Wellner (1996), weak convergence in this paper should be understood in the sense of Hoffmann-Jørgensen and expectations should be interpreted as outer expectations.
For empirical processes based on stationary sequences the situation is rather different. The classical process has been treated by numerous authors, who established weak convergence under sharp mixing assumptions, see, for instance, Rio (2000). However, nothing similar exists for more general processes. The only work that we could find in the literature, Andrews and Pollard (1994), treats more general indexing classes, but imposes rather restrictive assumptions on the decay of -mixing coefficients. This discrepancy between the conditions needed for and those for , is due to fact that the typical approach for proving the uniform limiting theorems heavily relies on the estimation of entropy numbers, which in turn require good exponential maximal inequalities. Only -mixing, via decoupling, allows for such an estimate. This is the reason why one can find in the literature the treatment of only for -mixing sequences, see Arcones and Yu (1994) and Doukhan, Massart and Rio (1995).
The situation for bootstrap of stationary empirical processes is even worse. Although introduced more than twenty years ago, we only found three papers that study the bootstrap for stationary empirical processes indexed by general classes , see Künsch (1989), Liu and Singh (1992) and Sengupta, Volgushev and Shao (2016). All these results operate under -mixing assumptions. Moreover, only in the case of VC-classes do we have the sharp conditions, see Radulović (1996). Bracketing classes were considered by Bühlmann (1995), but this was established under restrictive assumptions on both -mixing coefficients and bracketing numbers. It is worth mentioning that the covariance function of the limiting Gaussian process is unknown, and consequently in most actual applications of these results, we heavily rely on the bootstrap version of the process for which adequate results are sorely lacking. In short, the most general -mixing sequences have never been considered for stationary bootstrap processes , while for the non-bootstrap version we have only one example in the literature.
In what follows we prove two general results that allow us to extend weak convergence of to that of , where is a class of functions of uniformly bounded total variation (called in this paper). We would like to point out that although there are examples of Donsker classes of infinite variation (for instance, with , , such cases are rather the exception than the norm. The majority of examples of bounded Donsker classes that are given in the literature are subsets of the class .
This enlargement from to is not contingent on the dependence structure of underlying sequence (or ) and the only requirement is that converges weakly to a Gaussian process. This allows us to derive weak convergence of and for -mixing sequences. The same extension applies for the short memory causal processes considered in Doukhan and Surgailis (1998), as well as processes treated in Dehling, Durieu and Volny (2009). The technique we employ is a simple application of the integration-by-parts formula, followed by the continuous mapping theorem. Arguably this approach may have been known before, although we could not find any communication of it in the literature, and certainly not among the research related to stationary empirical processes. A follow-up paper (Radulović, Wegkamp and Zhao (2017)) extends the results obtained in this work to classes indexed by multivariate functions of bounded variation, with an emphasis on empirical copula processes. An important technical difference with the follow-up paper is that here we allow for general stationary distribution functions of the underlying . The proof for general, non-continuous processes, is not trivial caused by technical complications related to the interplay between the atoms of the limiting process and discontinuities of .
The paper is organized as follows. Section 2 contains the statement of the main result (Theorem 1) and related discussions, while Section 3 presents the bootstrap version (Theorem 6) of this result. The proofs of these two main results are collected in Section 4. For completeness, the well-known integration by parts formula can be found in the appendix.
2 Main results
For the total variation norm of a function , we use the notation
[TABLE]
The supremum is taken over all countable partitions of . We set, for ,
[TABLE]
We let be the class of all functions in that are right-continuous. Finally, we let be an arbitrary stochastic process such that
- A1:
;
- A2:
The sample paths of are right-continuous and of bounded variation.
Clearly, both requirements A1 and A2 are met for the canonical empirical process . In this paper, we study the limit distribution (as ) of the sequence of processes
[TABLE]
for some class , for some finite .
Although the motivation as well as the most notable applications of our results are related to the canonical case , the actual proof carries over for any process as long as assumptions A1 and A2 are satisfied.
Our main result is the following theorem.
Theorem 1
*Assume that converges weakly, as , to a Gaussian process , that has uniformly continuous sample paths with respect to the distance for some distribution function .
Then, for any and , converges weakly, as , to a Gaussian process in , that has uniformly -continuous sample paths.*
Remark.
Usually, an envelope condition and is imposed. Since such a condition, coupled with , implies that the functions are uniformly bounded, we assume in our definition of .
Theorem 1 allows us to derive weak convergence of via , regardless of the structure of the latter process. For instance, taking as the standard empirical processes based on stationary sequences , we obtain the following corollary as an immediate consequence of Theorem 1.
Corollary 2
Let be a stationary sequence of random variables with distribution and -mixing coefficients satisfying , for some and all . Then, for any ,
, converges weakly to a Gaussian process with uniformly -continuous sample paths in .
- -
For continuous , can be enlarged to .
Proof. It is well known, see Theorem 7.2, page 96 in Rio (2000), that with , implies that the standard empirical process converges weakly to a Brownian bridge process with uniformly continuous paths with respect to the distance , for the stationary distribution of . The result for \mathcal{G}\subseteq BV_{T}^{\prime}\,\now follows trivially from Theorem 1. If is continuous, then the limiting Brownian bridge has uniformly continuous sample paths with respect to the Lebesgue measure on . This, combined with Lemma 3 below, implies Corollary 2.
We used the following lemma.
Lemma 3
We have, for based on the canonical process ,
[TABLE]
Proof. Let be an arbitrary function in We denote its countable many discontinuities by Let be the right-continuous version of , that is, for all and for all . Then
[TABLE]
and the conclusion follows.
We would like to point out that -mixing is the least restrictive form of available mixing assumptions in the literature. To the best of our knowledge, there are actually very few results that treat processes indexed by functions, and they all require very stringent conditions on the entropy numbers of and on the rate of decay for . See, for instance, Andrews and Pollard (1994). This is due to fact that -mixing does not allow for sharp exponential inequalities for partial sums. Consequently, the only known cases for which we have sharp conditions are under more restrictive, -mixing dependence. Indeed, -mixing allows for decoupling and it does yield exponential inequalities not unlike the i.i.d. case. The current state-of-the art results, Arcones and Yu (1994), Doukhan, Massart and Rio (1995), applied to bounded sequences, require .
While it is correct that -mixing is the weakest of the known mixing concepts, a referee pointed out that functionals of mixing processes constitutes a much weaker concept of short range dependence. For instance, Billingsley (1968, Theorem 22.2) establishes the empirical process CLT for certain functionals of -mixing processes. The same referee made us aware that Theorem 1 also applies to the empirical process of long-range dependent data, established by Dehling and Taqqu (1989) for subordinate Gaussian processes, and by Ho and Hsing (1996) for linear processes. We provide an example to demonstrate that Theorem 1 goes beyond dependence defined via mixing conditions. It applies to short memory causal linear sequences that are defined by based on i.i.d. random variables and constants . While the form a stationary sequence, they do not necessarily satisfy any mixing condition. Weak convergence of the empirical processes was established under sharp conditions in Doukhan and Surgailis (1998) on the weights ( for some ), characteristic function of ( for all , some and ), and a moment condition on (). To the best of our knowledge, there are no extensions to the more general processes . Theorem 1 and the Doukhan and Surgailis (1998) result combined imply the following:
Corollary 4
Let be such that conditions of Doukhan and Surgailis (1998, pp 87–88) are satisfied and let be the stationary distribution of . Then, for any , converges weakly to a Gaussian process with uniformly -continuous sample paths in .
Proof. Doukhan and Surgailis (1998) proved that converges weakly to a Gaussian process in the Skorohod space. Since is continuous under their assumptions, see Doukhan and Surgailis (1998, pp 88), the sample paths of limiting process of are uniformly continuous with respect to the distance based on the stationary distribution . The proof now follows trivially from Theorem 1 and Lemma 3.
The recent papers by Dehling, Durieu and Volny (2009) and Dehling, Durieu and Tusche (2014) offer yet another, clever way to prove the weak limit of the standard empirical processes based on stationary sequences that are not necessarily mixing. Their technique uses finite dimensional convergence coupled with a bound on the higher moments of partial sums, which in turn controls the dependence structure. Dehling, Durieu and Volny (2009) establishes the weak convergence of , while Dehling, Durieu and Tusche (2014) extends this idea to indexed by more general classes of functions. However, these authors impose cumbersome entropy conditions and only manage to marginally extend the classes. For example, they prove weak convergence of the process , indexed by functions which constitute an one-dimensional monotone class (under the restrictive requirement that ). Theorem 1 applied in their setting, yields a more general result.
Corollary 5
Let and let be the stationary distribution of the sequence . Under assumptions (i) and (ii) in Section 1 of Dehling, Durieu and Volny (2009), converges weakly to a Gaussian process with uniformly -continuous sample paths in .
Proof. The underlying distribution function of in Dehling, Durieu and Volny (2009) is continuous, see their display (3) at page 3702. Moreover, these authors establish weak convergence of to a Gaussian process that has uniformly continuous sample paths with respect to the distance . The proof follows trivially from Theorem 1 and Lemma 3.
3 Bootstrap
The weak limit of based on in Theorem 1 is a Gaussian process with complicated covariance structure
[TABLE]
for . A closed form solution is seldom available, so that actual applications of weak limit results of are hard to implement. This situation calls for the bootstrap principle. Given the sample of first observations , we let be the bootstrap empirical process
[TABLE]
based on a bootstrap sample . We stress that no additional assumption on the structure of the variables is required for our purposes. Analogous to , we define for any with . We recall that the bounded Lipschitz distance
[TABLE]
between two processes and metrizes weak convergence. The symbol stands for the expectation over the randomness of the bootstrap sample , conditionally given the original sample , while denotes the space of Lipschitz functionals with and for all . As is customary in the literature, we speak of weak convergence in probability if the random variable converges to zero in probability; if it converges to zero almost surely, we speak of weak convergence almost surely.
Theorem 6
Let . Assume that, conditionally on , converges weakly to a Gaussian process, in probability, that has unifomly continuous sample paths with respect to the distance based on the stationary distribution of . The following three statements hold true:
* converges weakly to a Gaussian process with uniformly -continuous sample paths in .* 2. 2.
If the weak convergence of holds almost surely, then the conclusion that converges weakly in holds almost surely as well. 3. 3.
If and converge to the same limit, then so do and .
The literature offers numerous bootstrapping techniques for stationary data, such as moving block bootstrap, stationary bootstrap, sieved bootstrap, Markov chain bootstrap, to name a few, but their validity is proved for specific cases/statistics only. Due to complications with entropy calculations for dependent triangular arrays, almost all results treat the standard empirical processes with few notable exceptions. The moving block bootstrap was justified for VC-type classes, but only under rather restrictive -mixing conditions on (Radulović, 1996). Bracketing classes were considered by Bühlmann (1995), but his conditions are even more restrictive. In contrast, the process is rather easy to bootstrap. This coupled with Theorem 6 offers the following result.
Corollary 7
Let be a stationary sequence of random variables with continuous stationary distribution function and -mixing coefficients satisfying , for some and . Let be the bootstrapped standard empirical process based on the moving block bootstrap, with block sizes , specified in Peligrad (1998, page 882). Then, for , converges weakly to Gaussian process, almost surely, with uniformly -continuous sample paths.
Proof. Theorem 2.3 of Peligrad (1998) establishes the convergence of to a Gaussian process with uniformly continuous sample paths with respect to the distance based on the stationary distribution of . Invoke Theorem 6 and Lemma 3 to conclude the proof.
Just as for weak convergence of the empirical process based on stationary sequences, there are numerous results that consider bootstrap for stationary, non-mixing sequences. For example, El Ktaibi, Ivanoff and Weber (2014) study short memory causal linear sequences, and prove weak convergence of under conditions (on the growth of the weights , the characteristic function of and moments of ) akin to the ones required for its non-bootstrap counterpart (Doukhan and Surgailis, 1998). Again, Theorem 6 easily extends their result.
Corollary 8
Let be a sequence of random variables with stationary distribution such that conditions of El Ktaibi, Ivanoff and Weber (2014) are satisfied. Then, for any , converges weakly to a Gaussian limit with uniformly -continuous sample paths, almost surely.
Proof. El Ktaibi, Ivanoff and Weber (2014) establish weak convergence of in the Skorohod space, for continuous which in turn implies that the limiting process has uniformly continuous sample paths with respect to distance . Again, invoke Theorem 6 and Lemma 3 to conclude the proof.
4 Proofs for Theorem 1 and Theorem 6
We first give a short proof of Theorem 1 if the limit of has continuous sample paths with respect to the Lebesgue measure. Let be the bounded Lipschitz metric that metrizes weak convergence, see, e.g., Van der Vaart and Wellner (1996, page 73) for the definition. Set for any . Assumptions A1 and A2 imply that the Lebesgue-Stieltjes integrals
[TABLE]
are well defined. Next, by the integration by parts formula in Lemma A, we have
[TABLE]
with
[TABLE]
in probability, as , since has uniformly continuous sample paths. For fixed , converges weakly to by the continuous mapping theorem and weak convergence of . The continuous mapping theorem also guarantees that the limit is tight in as the map , , is continuous. By the triangle inequality,
[TABLE]
The second inequality follows since the map is linear and Lipschitz with Lipschitz constant and the suprema are taken over all Lipschitz functionals with and and with and , respectively. Together with the tightness of the limit , this implies that the empirical process indexed by converges weakly.
The above proof for continuous limit processes (that is, processes with uniformly continuous sample paths) is rather simple. Nevertheless, we could not find an actual publication of this trick. We would like to stress that the extension of this proof to general, non-continuous processes, is not trivial. Technical complications related to the interplay between the atoms of the limiting process and discontinuities of require some care.
Lemma 9
Assume that converges weakly to a Gaussian process , as . Then, for any right continuous function of bounded variation = is well defined, and converges to a normal distribution on
Proof. Let be an arbitrary right-continuous function of bounded variation. First, we notice that and are indeed well defined as Lebesgue-Stieltjes integrals. Recall that can have only countably many discontinuities which we will denote By the integration by parts formula, Lemma A in the appendix, we have
[TABLE]
with operators
[TABLE]
Since has finite variation, it is bounded with for the jumps . Hence, the operator , given by
[TABLE]
is linear and continuous. We conclude the proof by observing that the continuity of the operators and and the weak convergence of to a Gaussian process ensures that these sequences converge weakly to a normal distribution via the continuous mapping theorem.
For any distribution function and any , we define
[TABLE]
with the convention that the supremum taken over the empty set is equal to zero.
Clearly, for all if is continuous. In general, for arbitrary , the quantity is bounded in probability, for all , by the continuous mapping theorem, as long as converges weakly. The following lemma plays an instrumental role and it could be perhaps of independent interest.
Lemma 10** (Decoupling Lemma)**
For any distribution function on , any right-continuous function and any , we have
[TABLE]
Here and satisfies assumptions A1 and A2.
Proof. Without loss of generality we can assume that Since is a distribution function we can construct, for any a finite grid such that
[TABLE]
and
[TABLE]
leaving the possible jumps unspecified. Based on this grid we approximate by
[TABLE]
and we set We observe that by construction
[TABLE]
Since the process inherits the bounded variation properties of
[TABLE]
is well defined for any right-continuous function of bounded variation. Using the integration by parts formula (Lemma A in the appendix) we obtain for the last term on the right
[TABLE]
Since is of bounded variation, it has countably many discontinuities . Using (3), we obtain
[TABLE]
Consequently,
[TABLE]
Next we deal with the finite dimensional approximation
[TABLE]
Clearly,
[TABLE]
For the first term in (4) we introduce the step function
[TABLE]
designed to approximate Clearly and
[TABLE]
Hence, by (3) we have
[TABLE]
For the second term in (4), we have
[TABLE]
using for the last inequality
[TABLE]
and
[TABLE]
Lemma 10 now follows by combining the estimates (3) and (4) and (5).
An immediate corollary is the following result.
Corollary 11
For any distribution function and for all , and , we have
[TABLE]
where the is taken over all right-continuous functions with and .
Proof. The proof follows trivially from Lemma 10 by taking and observing that for
Proof of Theorem 1.
First we recall, see, for instance, Chapter 1.5 in Van der Vaart and Wellner (1996), that , converges weakly to a tight limit in , provided
- (a)
the marginals converge weakly for every finite subset , and
- (b)
there exists a semi-metric on such that is totally bounded and is -stochastically equicontinuous, that is,
[TABLE]
for all .
The finite dimensional convergence (a) follows trivially from Lemma 9, linearity of the process and the Cramèr-Wold device. As for the stochastic equicontinuity (b) of , it is sufficient to show that, for every
[TABLE]
where the supremum is taken over all differences with and Since is also right-continuous and , Corollary 11 implies that
[TABLE]
Let so that and
[TABLE]
and observe that
[TABLE]
The weak convergence of , or equivalently, , to a continuous (with respect to ) process implies that
[TABLE]
Moreover, the weak convergence of implies that is bounded in probability, so as and
Summarizing, converges for each to a Gaussian limit and is uniformly -equicontinuous, in probability. Moreover, is totally bounded (for any distribution . This well-known fact can be found in Example 2.6.21 in Van der Vaart and Wellner (1996, page 149). It implies the weak convergence of to a process with uniformly continuous sample paths (see Theorems 1.5.4 and 1.5.7 in Van der Vaart and Wellner 1996).
Proof of Theorem 6.
Here we only prove the “in probability” case. The almost sure statement follows after straightforward changes in the proof of Theorem 1. A simple modification of Lemma 9 yields
[TABLE]
for the same operators and as defined in Lemma 9. Since converges to a Gaussian limit the finite dimensional convergence follows by repeating the computation presented in the proof of Lemma 9 after replacing with . As for stochastic equicontinuity of we find, analogous to Corollary 11, that for
[TABLE]
Now the weak convergence of follows from the convergence of , as in the proof of Theorem 1, conditionally given the sample . Moreover, if and converge to the same Gaussian process, then Lemma 9 coupled with the Cramèr-Wold device implies that the finite dimensional distribution of the limiting process of and are the same. This concludes the proof.
Appendix A Appendix
For completeness, we state the following classical result and give a simple elementary proof which was communicated to us by David Pollard. Theorem 18.4 in Billingsley (1986) states the result for functions on a bounded interval, yet its proof can be easily extended to the entire real line.
Lemma A. Let and be right-continuous functions of bounded variation and define measures and as and . Then
[TABLE]
Moreover, if either or , then
[TABLE]
Proof. Set and observe that by the very definition of Lebesgue integral
[TABLE]
and
[TABLE]
Hence
[TABLE]
Next we apply Fubini
[TABLE]
to prove the lemma.
Acknowledgements.
We would like to thank both referees for their careful reading and many constructive remarks.
The research of Wegkamp was supported in part by NSF grants DMS-1310119 and DMS-1712709.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D.W.K. Andrews and D.P. Pollard (1994). An Introduction to Functional Central Limit Theorems for Dependent Stochastic Processes. International Statistical Review , 62 , 119–132.
- 2[2] M.A. Arcones, and B. Yu (1994). Central limit theorems for empirical and U-processes of stationary mixing sequences. Journal of Theoretical Probability , 7 , 47–71.
- 3[3] P. Billingsley (1968). Convergence of probability measures. J. Wiley & Sons, New York.
- 4[4] P. Billingsley (1986). Probability and Measure, 2nd edition . J. Wiley & Sons, New York.
- 5[5] P. Bühlmann (1995). The blockwise bootstrap for general empirical processes of stationary sequences. Stochastic Processes and their Applications , 58 , 247–265.
- 6[6] R. M. Dudley, (1999), Uniform Central Limit Theorems . Cambridge University Press.
- 7[7] H. Dehling, O. Durieu and D. Volny (2009). New techniques for empirical processes of dependent data. Stochastic Processes and their Applications , 119 , 3699–3718.
- 8[8] H. Dehling, O. Durieu and M. Tusche (2014). Approximating class approach for empirical processes of dependent sequences indexed by functions. Bernoulli , 20 , 1372–1403.
