Functional central limit theorems for conditional Poisson sampling
Leo Pasquazzi

TL;DR
This paper refines and generalizes functional central limit theorems for conditional Poisson sampling, providing detailed proofs and insights useful for applications in survey sampling.
Contribution
It offers more suitable, generalized versions of existing theorems with detailed proofs, enhancing understanding of weak convergence in survey sampling.
Findings
Refined functional central limit theorems for conditional Poisson sampling.
Detailed discussion on proving weak convergence in bounded function spaces.
Enhanced theoretical framework for applications in survey sampling.
Abstract
This paper provides refined versions of some known functional central limit theorems for conditional Poisson sampling which are more suitable for applications. The theorems presented in this paper are generalizations of some results that have been recently published by \citet*{Bertail_2017}. The asymptotic equicontinuity part of the proofs presented in this paper is based on the same idea as in \citep{Bertail_2017} but some of the missing details are provided. On the way to the functional central limit theorems, this paper provides a detailed discussion of what must be done in order to prove conditional and unconditional weak convergence in bounded function spaces in the context of survey sampling. The results from this discussion can be useful to prove further weak convergence results.
| 0.841 | 0.916 | 0.983 | |
| (0.4233; 0.4541) | (0.4755; 0.5121) | (0.5779; 0.6481) | |
| 0.866 | 0.913 | 0.990 | |
| (0.3022; 0.3185) | (0.3378; 0.3610) | (0.4079; 0.4662) | |
| 0.877 | 0.937 | 0.991 | |
| (0.3102; 0.3264) | (0.3470; 0.3676) | (0.4198; 0.4656) | |
| 0.875 | 0.928 | 0.981 | |
| (0.2190; 0.2310) | (0.2444; 0.2602) | (0.2942; 0.3296) | |
| 0.873 | 0.938 | 0.993 | |
| (0.2247; 0.2362) | (0.2509; 0.2669) | (0.3019; 0.3374) | |
| 0.885 | 0.949 | 0.991 | |
| (0.1574; 0.1666) | (0.1755; 0.1903) | (0.2110; 0.2384) |
| 0.860 | 0.928 | 0.991 | |
| (0.4319; 0.4562) | (0.4838; 0.5100) | (0.5864; 0.6480) | |
| 0.875 | 0.925 | 0.989 | |
| (0.3062; 0.3203) | (0.3418; 0.3652) | (0.4127; 0.4523) | |
| 0.887 | 0.940 | 0.993 | |
| (0.3145; 0.3344) | (0.3514; 0.3785) | (0.4243; 0.4737) | |
| 0.880 | 0.935 | 0.982 | |
| (0.2212; 0.2332) | (0.2465; 0.2623) | (0.2962; 0.3313) | |
| 0.878 | 0.944 | 0.994 | |
| (0.2269; 0.2416) | (0.2529; 0.2686) | (0.3049; 0.3387) | |
| 0.886 | 0.948 | 0.994 | |
| (0.1585; 0.1684) | (0.1765; 0.1899) | (0.2116; 0.2370) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPoint processes and geometric inequalities · Probability and Risk Models · Statistical Distribution Estimation and Applications
Functional Central Limit Theorems for Conditional Poisson sampling Designs††thanks: This work was supported by the grant 2016-ATE-0459 and the grant 2017-ATE-0402 from Università degli Studi di Milano-Bicocca.
Leo Pasquazzi111Dipartimento di Statistica e Metodi Quantitativi, Università degli Studi di Milano-Bicocca, Edificio U7, Via Bicocca degli Arcimboldi 8, 20126 – Milano
Abstract
This paper provides refined versions of some known functional central limit theorems for conditional Poisson sampling which are more suitable for applications. The theorems presented in this paper are generalizations of some results that have been recently published by Bertail, Chautru, and Clémençon [1]. The asymptotic equicontinuity part of the proofs presented in this paper is based on the same idea as in [1] but some of the missing details are provided. On the way to the functional central limit theorems, this paper provides a detailed discussion of what must be done in order to prove conditional and unconditional weak convergence in bounded function spaces in the context of survey sampling. The results from this discussion can be useful to prove further weak convergence results.
Keywords: weak convergence, empirical process, conditional Poisson sampling, uniform entropy condition
Mathematics Subject Classification (2010): 62A05, 60F05,60F17
1 Introduction
Bertail, Chautru, and Clémençon [1] have recently published a paper where they proposed some FLCTs for Poisson sampling designs as well as for conditional Poisson sampling designs (henceforth CPS designs or rejective sampling designs). The author of the present paper has already published a draft manuscript which provides quite substantial generalizations of the results for the Poisson sampling case (see [11]). In fact, [1] considers only empirical processes indexed by function classes which satisfy the uniform entropy condition, while [11] extends these results to arbitrary Donsker classes with uniformly bounded means. The proofs of the more general results given in [11] are based on the symmetrization technique and they differ substantially from those given in [1] which use the Hoeffding inequality (see [7]). Unfortunately, the symmetrization trick cannot be applied in the conditional Poisson sampling case which prevents to generalize the weak convergence results for the conditional Poisson sampling case given in [1] by using the symmetrization technique as in [11]. Anyway, the results given in [1] are somewhat unsatisfactory because the assumptions about the sequence of conditional Poisson sampling designs are unnecessarily restrictive. In fact, perhaps in order to simplify the proofs, in [1] it is assumed that the first order sample inclusion probabilities of the underlying (approximately canonical) Poisson sampling designs are realizations of i.i.d. random variables which are bounded away from zero. As a consequence, the assumptions of the theorems presented in [1] imply that the sequence of sample sizes of the rejective sampling designs must be random and moreover the theorems cannot be applied to cases where there is dependence among the first order sample inclusion probabilities, or to cases where the sample inclusion probabilities are proportional to some size variable which can take on values arbitrarily close to zero. The results given in the present paper overcome these shortcomings.
This work is organized as follows. Section 2 introduces the probabilistic framework within which the FCLTs will be derived. The probabilistic model and all other definitions given in Section 2 are identical to those given in Section 2 of [11]. Section 3 provides some general definitions and theorems which are very useful for showing conditional weak convergence results in the context of survey sampling. The definitions and theorems provided in this section are conditional analogues of the definitions and theorems given in Chapter 1.5 on pages 34 - 41 in [17]. They are completely general and can be used to prove other weak convergence results in the context of survey sampling as well. Section 4 reviews the relevant conditional Poisson sampling theory which is due to Hájek (see [6]). In Section 5 the FCLTs for the conditional Poisson sampling case will be derived. Section 6 provides extensions for the Hájek empirical process and Section 7 concludes this work with a simulation study.
2 Notation and Definitions
Let , , …, denote the values taken on by a study variable on the units of a finite population and let , , …, denote corresponding values of an auxiliary variable . In this paper it will be assumed that the ordered pairs corresponding to a given finite population of interest are the first realizations of an infinite sequence of i.i.d. random variables which take on values in the cartesian product of two measurable spaces which will be denoted by and , respectively. Moreover, as usual in finite population sampling theory, it will be assumed that the values taken on by the auxiliary variable are known in advance for all the population units, while the values taken on by the study variable are only known for the population units that have been selected into a random sample. The corresponding vector of sample inclusion indicator functions will be denoted by and it will be assumed that the vectors and are conditionally independent given . With reference to the sample design, probability and expectation will be denoted by e , respectively. With this notation, the vector of first order sample inclusion probabilities will be given by
[TABLE]
and from the conditional independence assumption it follows that must be a deterministic function of .
Now, with reference to the measurable space , consider the random empirical measure given by
[TABLE]
For a given , the integral of with respect to can be written as
[TABLE]
so that, for any given class of functions , the random empirical measure , as a real-valued function of , can be interpreted as a stochastic process indexed by the set . For obvious reasons will be called Horvitz-Thompson empirical process (henceforth HTEP). Depending on the values taken on by the study variable and on the class of functions , a sample path of could be either bounded or not. In the former case it will be an element of , the space of all bounded and real-valued functions with domain given by the class of functions . In what follows will be considered as a metric space with distance function induced by the norm .
As already mentioned in the introduction, the present paper provides FCLTs for conditional Poisson sampling designs (or rejective sampling designs). To be precise, the present paper investigates conditions under which
[TABLE]
where is a Borel measurable and tight (in ) Gaussian process. Both unconditional and conditional (on the realized values of and ) weak convergence will be considered. Recall that unconditional weak convergence is defined as
[TABLE]
where is the class of all real-valued and bounded functions on . If the realizations of lie in a separable subset of almost surely, this is equivalent to
[TABLE]
where is the set of all functions such that for every (see Chapter 1.12 in [17]). Based on this observation, [17] provides two definitions of conditional weak convergence: conditional weak convergence in outer probability (henceforth opCWC), which in the context of this paper translates to the condition
[TABLE]
(see page 181 in [17]), and outer almost sure conditional weak convergence (henceforth oasCWC), which in the context of this paper translates to the condition
[TABLE]
As expected, oasCWC implies opCWC (see Lemma 1.9.2 on page 53 in [17]). However, it seems that oasCWC is not strong enough to imply asymptotic measurability (cfr. Theorem 2.9.6 on page 182 in [17] and the comments thereafter) which is a necessary condition for unconditional weak convergence (see Lemma 1.3.8 on page 21 in [17]).
Since the very definition of weak convergence relies on the concept of outer expectation, some assumptions about the underlying probability space will be necessary for what follows. Throughout this paper it will be assumed that the latter is a product space of the form
[TABLE]
and that the elements of the random sequence are the coordinate projections on the first infinite coordinates of the sample points . On the other hand, the sample inclusion indicators are allowed to depend on all the coordinates. As suggested by the notation, it will be assumed that for each value of the corresponding sample inclusion indicator functions , , …, are the elements of one row of a triangular array of random variables. This assumption is needed in order to make sure that for each value of the sample design can be readapted according to all the (known) values taken on by the auxiliary variable as the population size increases. To make sure that the conditional independence assumption holds, it will be assumed that for each value of the corresponding vector is defined as a function of the random vector and of random variables , , …which are functions of the last coordinate of the sample points only, i.e. of the coordinate that takes on values in the set (instead of a random sequence one could also consider a stochastic process with an arbitrary index set but this will not be of interest in the present paper). For example, in the case of a Poisson sampling design with a given vector of first order sample inclusion probabilities (which could be a function of ) one could define as a sequence of i.i.d. uniform- random variables and define for each value of the corresponding row of sample inclusion indicators by
[TABLE]
Of course, the above probability space does not only work for Poisson sampling designs, but it can accommodate any non-informative sampling design. In fact, it is not difficult to show that for any non-informative sampling design the vector of sample inclusion indicators can be defined as a function of and of a single uniform- random variable that depends on the last coordinate of the sample points only. To this aim let
[TABLE]
denote the probability to select a given sample . Note that the definition of the function specifies a desired sampling design. Since the values taken on by the auxiliary variable are assumed to be already known before the sample is drawn, the sample selection probabilities are allowed to depend on . Now, let , , …, denote the elements of arranged in some fixed order (for example, according to the order determined by the binary expansion corresponding to the finite sequence of zeros and ones in ), and put , . Then, define the vector of sample inclusion indicators by
[TABLE]
and note that for every this vector satisfies as desired. This concludes the proof of the above assertion written in italics.
Next, observe that in the above construction the sample selection probabilities are functions of . If for a given the corresponding sample selection probability is a measurable function of (this depends on the sampling design), then, with reference to the probability space of this paper, can be interpreted as a conditional probability in the proper sense. Otherwise, will just be a non measurable (random) function of . More generally, the expectation with respect to the uniform random variable with and kept fixed, which can be interpreted as design expectation and will therefore be denoted by , can be applied to any function of , and . In fact, the expectation
[TABLE]
is given by
[TABLE]
and is thus a function of and . If for every fixed the corresponding function is a measurable function of and and the function is a measurable function of , then, with respect to the probability space of this paper, can be interpreted as a conditional expectation in the proper sense (and in this case it will obviously be a measurable function of and ), while otherwise it could either be a measurable or a non measurable function of and .
Throughout this paper it will be assumed that all the vectors of sample inclusion indicators are defined as described in the above construction (the one which involves a single uniform- random variable ). Of course, in this way the random vectors will be dependent for different values of , but for the purposes of this paper this dependence structure is irrelevant. Moreover, in what follows only measurable sample designs will be considered, i.e. sample designs such that for every fixed the corresponding sample selection probability in (3) is a measurable function of . Note that this is a very mild restriction that should be satisfied in virtually every practical setting. However, it entails three important consequences which will be relevant for the proofs presented in this paper. They are: (i) the vectors of sample inclusion indicators are measurable functions of and of the uniform- random variable , (ii) for every the corresponding probability is a conditional probability in the proper sense, and (iii) for a measurable function of , and the corresponding expectation is a conditional expectation in the proper sense.
3 Weak convergence in bounded function spaces in the context of survey sampling
This section contains a detailed discussion of general methods for proving unconditional and conditional weak convergence in bounded function spaces in the context of survey sampling. Throughout this section it will be assumed that is an arbitrary set and that is a sequence of mappings from the probability space (2) into where each depends on the sample points only through , and . Moreover, it will be assumed that for every and for every the corresponding coordinate projection is measurable. The scope of this section is to provide necessary and sufficient conditions for opCWC (oasCWC) in , i.e. for
[TABLE]
with a Borel measurable and tight mapping from some probability space into . In order to avoid repetitions, in what follows the symbols and will be used in order to express two versions of a convergence condition. This notation will often appear in the assumptions and in the conclusions of lemmas, theorems and corollaries. In these cases it is understood that the "probability convergence versions" of the assumptions imply the "probability convergence versions" of the conclusions and that the "almost sure convergence versions" of the assumptions imply the "almost sure convergence versions" of the conclusions.
What needs to be done in order to prove opCWC (oasCWC) seems to be clear from the general (unconditional) weak convergence theory laid out in [17]. In fact, according to Theorem 1.5.4 on page 35 in [17], if the sequence is asymptotically tight and all the finite dimensional marginals converge weakly (in ) to the corresponding marginals of some stochastic process , then there exists a version of which is a Borel measurable and tight mapping from some probability space into such that
[TABLE]
Since the realizations of lie in a separable subset of almost surely (this follows from tightness), it follows that condition (5) is equivalent to
[TABLE]
(see the comments at the top of page 73 in [17]). On the other hand, Theorem 1.5.4 on page 35 in [17] says also that if is a Borel measurable and tight mapping from some probability space into and if condition (5) or equivalently condition (6) holds, then it must be necessarily true that the sequence is asymptotically tight and that its finite-dimensional marginals converge weakly to the corresponding marginals of .
Since the only difference between condition (6) and condition (4) is the fact that the unconditional expectation is replaced by the sample design expectation (which is not necessarily a conditional expectation in the proper sense), one would expect that opCWC (oasCWC) is equivalent to the joint occurrence of some form of conditional asymptotic tightness and of some form of conditional weak convergence of the finite-dimensional marginals. In this section it will be shown that this is indeed true. The first step towards this goal is to provide a clear definition of what "conditional asymptotic tightness" and "conditional weak convergence of the finite-dimensional marginals" mean. To this aim recall that according to Definition 1.3.7 on pages 20-21 in [17] the sequence is asymptotically tight in the usual unconditional sense if for every there exists a compact set such that
[TABLE]
where . Of course this condition is satisfied if and only if for every there exists a compact set such that for every there exists a sequence of real numbers for which
[TABLE]
Based on this observation we can define two versions of conditional asymptotic tightness (henceforth CAT): a probability version by requiring that for every there exists a compact set such that for every there exists a sequence of random variables for which
[TABLE]
and an almost sure version of CAT by requiring instead of .
The following theorem provides two characterizations of CAT which are analogous to the characterizations of (unconditional) asymptotic tightness given in Theorem 1.5.6 and Theorem 1.5.7 on pages 36-37 in [17].
Theorem 1**.**
The following three conditions are equivalent:
- (i)
the sequence is CAT;
- (ii)
the marginals of the sequence are CAT in the sense that for every and there exists a constant such that
[TABLE]
for some sequence of random variables which goes to zero in probability (almost surely), and there exists a semimetric for which is totally bounded and for which the sequence is conditionally asymptotically -equicontinuous (henceforth conditionally AEC w.r.t. ) in the sense that for every there exists a such that
[TABLE]
for some sequence of random variables which goes to zero in probability (almost surely);
- (iii)
the marginals of the sequence are CAT and the following conditional probability version of finite approximation holds: for every there exists a finite partition such that
[TABLE]
for some sequence of random variables which goes to zero in probability (almost surely).
Proof.
(i)(ii). If condition (7) holds, then it follows that
[TABLE]
with and with the same sequence as in (7). This shows that the marginals of the sequence are CAT. Next, consider a sequence of compact subsets of such that, for every fixed , condition (7) holds with and . Note that the sequence in (7) depends on , and and hence we shall write instead of . By part (b) of Theorem 6.2 on page 88 in [9], to every there corresponds a semimetric which makes totally bounded and for which is a subset of , i.e. of the class of all real-valued functions on which are uniformly -continuous. Based on the sequence of semimetrics , define a new semimetric by
[TABLE]
Then it follows that for every and moreover it is not difficult to show that is totally bounded w.r.t. too. Now, choose arbitrarily and note that implies
[TABLE]
for some small enough , and hence that implies
[TABLE]
for the same set of values of . For small enough values of it follows therefore that condition (9) is satisfied with and .
(ii)(iii). Fix and choose a finite collection of open balls of -radius which covers , disjointify and create a finite partition such that each is a subset of an open ball of -radius . Then (9) implies (10) with the sequence equal to the sequence from the definition of conditional AEC corresponding to , and .
(iii)(i). This part of the proof is essentially the same as the proof of Theorem 1.5.6 on page 36 in [17]. First, it will be shown that (iii) implies that is CAT, i.e. that for every there exists a constant such that
[TABLE]
for some sequence of random variables which goes to zero in probability (almost surely). To this aim, choose arbitrarily and let be a corresponding partition for which (10) holds. Then choose one index from each partition set and note that
[TABLE]
implies . Now, from the assumptions in (iii) it follows that there exists sequences of random variables and which go to zero in probability (almost surely) such that, for every ,
[TABLE]
and
[TABLE]
and from this it follows that condition (11) holds with .
Next, consider an arbitrary and put for an arbitrary sequence . Let be a corresponding sequence of partitions such that
[TABLE]
for some sequences of non negative random variables which go to zero in probability (almost surely) when is kept fixed and goes to infinity. The conditional probability version of finite approximation ensures that for every fixed there exist such a partition and such a sequence of random variables . Now, denote by the functions which are constant on each partition set and which can take on only the values . Note for each there are only finitely many such functions, i.e. that for every . Let denote the union of the closed balls of radius around each function . Then,
[TABLE]
implies that . Consider the set and note that is closed and totally bounded and hence compact (since is complete). Moreover, it can be shown that for every there is a finite such that (the proof of this claim is given in the proof of Theorem 1.5.6 on pages 36 and 37 in [17]). Using this fact along with condition (11) and condition (12) yields
[TABLE]
which completes the proof. ∎
Remark 1**.**
Consider the case where is a class of measurable functions and where is a sequence of HTEPs as defined in (1). It is not difficult to show that in this case is conditionally AEC w.r.t. to a given semimetric (see (ii) in the statement of the previous theorem) if and only if
[TABLE]
where
[TABLE]
The next theorem shows that CAT is a necessary condition for conditional weak convergence.
Theorem 2**.**
Let be a Borel measurable and tight mapping from some probability space into and assume that condition (4) holds. Then it follows that the sequence satisfies the probability (almost sure) version of CAT.
Proof.
Let be an arbitrary Borel subset of and let
[TABLE]
be the distance between and the set . Note that for the function
[TABLE]
has Lipschitz constant and that , where is the open -enlargement of the set . Thus, and therefore it follows that
[TABLE]
Since opCWC (oasCWC) implies that the supremum goes to zero in outer probability (outer almost surely), this shows that for every there exists a sequence of random variables such that
[TABLE]
for every Borel subset of and for every . Since is tight by assumption, the conclusion of the theorem follows from this. ∎
Now, consider "conditional weak convergence of the finite-dimensional marginals". Perhaps the most obvious way to define this concept is to require pointwise convergence in probability (pointwise almost sure converence) of the sample design characteristic function for every sequence of finite-dimensional vectors with and . Since we are assuming that the components of the vectors are measurable, conditional weak convergence of the finite-dimensional marginals (henceforth CWCM) can therefore be defined as
[TABLE]
for every , and for every , where is the finite dimensional vector of random variables corresponding to some -indexed stochastic process .
It is not difficult to prove that CWCM is a necessary condition for opCWC (oasCWC). The next theorem says that CWCM is already equivalent to opCWC (oasCWC) if the index set is finite. In its statement indicates the restriction of some function to a subset of .
Theorem 3**.**
Let be a finite subset of . Under the assumptions made at the beginning of this section
[TABLE]
is measurable, and CWCM is equivalent to
[TABLE]
Proof.
The proof is the same as the proof of Corollary 3.1 in [11]. ∎
Up to now it has already been shown that CAT and CWCM are necessary conditions for opCWC (oasCWC) and that CWCM is equivalent to opCWC (oasCWC) when is finite. The next theorem says that CAT and CWCM together are sufficient conditions for opCWC (oasCWC) regardless of the cardinality of .
Theorem 4**.**
Assume that satisfies CAT and CWCM for some stochastic process . Then there exists a version of the stochastic process which is a Borel measurable and tight mapping from some probability space into such that opCWC (oasCWC) as defined in (4) holds.
Proof.
Consider the stochastic process from CWCM. The first step of the proof is to show that CAT and CWCM imply that there exists a version of with the following properties:
- a)
is a Borel measurable and tight mapping into ;
- b)
the sample paths are uniformly -continuous with probability .
To this aim note that by the characterization of CAT given in (ii) of Theorem 1 there must exists a semimetric for which is totally bounded and for which is conditionally AEC. Now, consider a countable and dense (w.r.t. ) subset of and note that the random variable
[TABLE]
is measurable. Conclude that
[TABLE]
is a conditional probability in the proper sense and use this fact along with (9) in order to show that for every there exists a such that
[TABLE]
i.e. that the sequence of mappings is asymptotically uniformly -equicontinuous in probability as defined on page 37 in [17]. By Theorem 1.5.7 on the same page in [17] it follows that is asymptotically tight, and since CWCM implies unconditional convergence of the marginal distributions of , it follows by Theorem 1.5.4 on page 35 in [17] that there exists a Borel measurable and tight mapping from some probability space into , call it , such that in . Moreover, by Addendum 1.5.8 on page 37 in [17] it follows that the sample paths of are uniformly -continuous with probability . Let be the mapping which carries the uniformly -continuous functions in to their uniformly -continuous extensions in , and which transforms all other functions in into the zero function in . Then the mapping is certainly measurable and is is a continuous function of with probability . It follows that is a Borel measurable and tight mapping into whose sample paths are uniformly -continuous with probability . In order to prove a) and b) it suffices now to show that the finite dimensional distributions of are the same as those of the limit process from CWCM, i.e. that every finite dimensional vector satisfies the condition
[TABLE]
where is the set of all continuous and bounded functions . If all the components of are elements of , then (15) follows directly from CWCM and the definition of . Otherwise, if some or all of the components of are elements of but not of , then there exists a sequence in such that . Since the sample paths of are uniformly -continuous with probability , it follows that and hence that which is the same as . Now, consider and define in terms of in the same way as has been defined above in term of . Then, as before, it follows by CWCM that (15) holds with in place of , and since the sample paths of are uniformly -continuous with probability , it follows that . Since for every the three random vectors , and have all the same distribution and since the distributions of and are the same as well, this implies
[TABLE]
and hence that condition (15) holds also when some or all of the components of are in but not in . This shows that the marginal distributions of are the same as those of the stochastic process and hence that there exists a version of the latter process which satisfies a) and b).
Next, it will be shown that if is a version of which satisfies a) and b), then opCWC (oasCWC) as defined in (4) holds (this part of the proof is essentially the same as the proof of Theorem 2.9.6 on page 182 in [17]). To this aim define for each a corresponding set which contains the centers of a collection of open balls of -radius which cover . Since is totally bounded w.r.t. , each can be chosen to be finite. Then define for each fixed a mapping which maps to the element which is closest to . If there are more than one which minimize , can be defined to be any such . Since the sample paths of are uniformly -continuous with probability , it follows that if and hence that
[TABLE]
Next, it will be shown that
[TABLE]
To this aim, define for each the mapping by and note that transforms a function into a function by extending the domain from to : for the new function remains the same (in fact, ), and the new function is constant on each level set of (since is finite there is only a finite number of such level sets and the range of the new function must therefore be finite as well). Then, for and an arbitrary -indexed stochastic process it follows that . Moreover, if , then
[TABLE]
and the composition is therefore a member of , i.e. of the set of all functions such that for every . It follows that the supremum on the left side in (16) is bounded by
[TABLE]
which, by Theorem 3, is measurable and goes to zero in probability (almost surely). This proves (16).
Finally, in order to complete the proof it remains to show that for every there exists a such that
[TABLE]
for some sequence of random variables which goes to zero in probability (almost surely). To this aim note that the left side in the last display is bounded by
[TABLE]
and that by (ii) of Theorem 1 there must exists a such that
[TABLE]
for some sequence of non negative random variables . The proof of the theorem is now complete. ∎
Corollary 1**.**
Assume that satisfies CWCM for some stochastic process , and assume that there exists a semimetric for which is totally bounded and for which is conditionally AEC. Then, it follows that
- (i)
there exists a version of which is a Borel measurable and tight mapping from some probability space into such that opCWC (oasCWC) as defined in (4) holds;
- (ii)
the sample paths of are uniformly -continuous with probability .
Proof.
It is easy to see that CWCM implies that the marginals of the sequence are CAT as required by (ii) in the statement of Theorem 1. Thus, it follows from Theorem 1 that is CAT and the above proof of Theorem 4 yields the two conclusions of the corollary. ∎
Remark 2**.**
As already pointed out in Section 2, conditional weak convergence is apparently not strong enough to imply asymptotic measurability of which is a necessary condition for unconditional weak convergence. However, if for every the corresponding supremum
[TABLE]
is measurable, then it will be certainly true that conditional weak convergence is stronger than unconditional weak convergence. In fact, if the suprema in the above display are measurable, then it follows that the probabilities on the left sides in (9) and in (10) are conditional probabilities in the proper sense and hence that CAT implies asymptotic tightness in the usual unconditional sense. Since conditional weak convergence implies CWCM and since CWCM is certainly stronger than unconditional convergence of the marginal distributions, it follows by Theorem 1.5.4 on page 35 in [17] that conditional weak convergence implies unconditional weak convergence in this case.
Theorem 5**.**
Let be a sequence of mappings from the probability space (2) into and assume that each depends on the sample points only through and . Moreover, assume that in with a Borel measurable and tight mapping into , and assume that opCWC as defined in (4) holds. Then it follows that
[TABLE]
with and independent.
Proof.
The proof is the same as the proof of Corollary 3.2 in [11]. ∎
4 Conditional Poisson sampling (or rejective sampling)
This section reviews some theoretical results about rejective sampling. The basic theory for this sampling design was developed by [6]. Some of the results contained in his paper will be needed in the next section. The relevant ones will be singled out in what follows.
Recall that a rejective sampling design is a conditional Poisson sampling design where the final sample is rejected unless its size equals a given natural number . Equivalently, rejective sampling can also be defined as random sampling with replacement of a fixed number of units according to specified selection probabilities where the final sample is rejected and the sampling procedure is repeated until a sample of different units is obtained. Rejective sampling is of great interest to researchers and practitioners because it provides largest possible entropy subject to the constraints of a fixed sample size and given first order sample inclusion probabilities (see Theorem 3.4 in Hájek, 1981). By the definition of rejective sampling as conditional Poisson sampling it follows that the pdf of the vector of sample inclusion indicators is given by
[TABLE]
where is the vector of first order sample inclusion probabilities of the underlying Poisson sampling design, and where is the set of all possible realizations of the vector of sample inclusion indicators that give rise to samples of size . Of course the definition rejective sampling and hence of can also be extended to the case where some or even all of the ’s are [math] or . However, if for some , then the corresponding population unit will be excluded from the sample with probability and unbiased estimation of population characteristics is hence be impossible. Since the Horvitz-Thompson estimator is not well-defined in this case, we shall henceforth consider only rejective sampling designs such that for every . On the other hand, if for some , then the corresponding population unit will be included in the sample with probability and rejective sampling with sample size from a population of size will be equivalent to rejective sampling with sample size from a population of size . In this case the pdf of the vector of sample inclusion indicators will still be defined as in (17) provided that is at least as large as the number of population units for which . For smaller values of a corresponding rejective sampling design does obviously not exist.
Note that in general there are infinitely many underlying Poisson sampling designs which give rise to the same rejective sampling design. In fact, if the vector of first order inclusion probabilities corresponding to the underlying Poisson sampling design is changed to in such way that for some fixed constant
[TABLE]
then for every sample . The underlying Poisson sampling design is called canonical if its first order sample inclusion probabilities are chosen so that , i.e. so that the expected sample size of the underlying Poisson sampling design equals the fixed sample size of the rejective sampling plan. Of course, the first order sample inclusion probabilities corresponding to a rejective sampling design are in general different from those corresponding to any of the underlying Poisson sampling designs. However, Háyek (1964) showed that, in some asymptotic sense, they are uniformly close to those corresponding to the underlying canonical Poisson sampling design (see Theorem 5.1 in Háyek’s paper). In fact, Háyek proved the following result:
Result 1**.**
Let be the vector of first order sample inclusion probabilities for a rejective sampling design and let be the vector of first order sample inclusion probabilities of the corresponding canonical Poisson sampling design. Then it follows that
- (i)
[TABLE]
(note that is the variance of the random sample size corresponding to the canonical Poisson sampling design);
- (ii)
[TABLE]
- (iii)
[TABLE]
Actually, Háyek did not use the double subscript notation to indicate the ’s and the ’s. However, it is easily checked that all the proofs given in his paper are actually meant for sequences of rejective sampling designs and corresponding canonical Poisson sampling designs where all the first order sample inclusion probabilities can be redefined as the population size increases. With respect to Result 1 it is worth noting that for every properly scaled vector of first order sample inclusion probabilities there exists a corresponding rejective sampling design. In other words, for every such that for some there exists a corresponding vector such that
[TABLE]
This result has been shown by [5] but it can also be viewed as a consequence of a well-known theorem about exponential families (see Theorem 5 on page 67 in [15]). Fast algorithms to recover the vector corresponding to a given vector and viceversa can be found in [3].
Having shown Result 1 and some similar approximation results (also for second order sample inclusion probabilities), [6] moves on to show asymptotic normality for the sequence of Horvitz-Thompson estimators. To this aim, he introduces a new sampling design, call it , which approximates the canonical Poisson sampling design associated to the rejective sampling design of interest. The sampling design can be implemented in three steps according to the following procedure. At the first step a sample of size is selected using the rejective sampling design of interest. As before, let denote the vector of first order sample inclusion probabilities of the corresponding canonical Poisson sampling design. Then, independently from the outcome of the first step, a new random experiment is performed to ascertain the sample size of a Poisson sampling design with sample inclusion probabilities given by . If , then the final sample according to will be the rejective sample obtained at the first step. However, if , rejective sampling is used again to select a sample of size from the population units that were not included in the first rejective sample, and this new sample is added to the sample obtained at the first step in order to obtain the final sample according to . The first order sample inclusion probabilities of the canonical Poisson sampling design that underlies the second rejective sampling plan are proportional to the -values of the population units that are not included in the first rejective sample. On the other hand, if , rejective sampling is used again to select a sample of size from the population units that were already included in the rejective sample obtained at the first step, and the final sample according to is obtained by removing the units that are included in the second rejective sample from those which were already included in the first one. In this case, the first order sample inclusion probabilities of the underlying canonical Poisson sampling design will be proportional to the values of corresponding to the population units that were already included in the first rejective sample.
Now, to give a formal statement of the sense in which the sample design provides an approximation to the canonical Poisson sampling design which underlies the rejective sample plan of interest, it will be convenient to introduce some notation. So, let be the vector of sample inclusion indicators that describe the outcome of the rejective sampling design that is used at the first step of the experiment (i.e. the rejective sampling design of interest), and let denote the vector of sample inclusion indicators that identify the final sample according to . Moreover, denote their joint pdf by , and the marginal pdfs corresponding to and by and , respectively. Finally, let be the pdf of the vector of sample inclusion indicators corresponding to the Poisson sampling design with first order inclusion probabilities given by the components of (i.e. the canonical Poisson sampling design which underlies the rejective sampling design of interest). With this notation, the approximation result contained in Lemma 4.3 of Hájek’s paper can now be stated as follows:
Result 2**.**
The total variation distance
[TABLE]
converges to zero as .
Based on Result 1 and Result 2, Hájek [6] proved asymptotic normality of the Horvitz-Thompson estimators corresponding to a sequence of rejective sampling designs as follows. First, he considered the sequence of underlying canonical Poisson sampling designs and showed asymptotic normality for a corresponding sequence of auxiliary statistics . As pointed out by [4] and by [1], the auxiliary statistic can be viewed as the residual of the projection of the Horvitz-Thompson estimator corresponding to the canonical Poisson sampling design on the random sample size from the latter design. Thus, if the goal is to estimate for some given real-valued function , then the corresponding auxiliary statistic can be written as
[TABLE]
where is the vector of sample inclusion indicators for the canonical Poisson sampling design, and where
[TABLE]
Note that the design expectation of coincides with , and that the design variance of must be smaller than that of unless . However, is not an estimator because it depends on the unknown value of . At this point, having proved asymptotic normality for the sequence , Hájek deduces asymptotic normality for the corresponding sequence by using Result 2. From this he gets asymptotic normality for the sequence by proving the following result (note that because ):
Result 3**.**
Let
[TABLE]
Then,
[TABLE]
where the expectation refers to .
Actually, in his paper [6], Hájek did non single out Result 3 in a dedicated lemma or theorem, but he proved it in the course of the proof of his Theorem 7.1 which establishes asymptotic normality for the sequence . Note that the statement of the latter theorem considers actually only the case where the ’s are proportional to some size variable, but that the proof of Result 3 given in Hájek’s paper goes through for any sequence of vectors such that . From Result 3 one can finally deduce asymptotic normality for the Horvitz-Thompson estimators corresponding to a sequence of rejective sampling designs by using Result 1.
5 Weak convergence theorems for CPS designs
In this section the functional central limit theorems for the rejective sampling case given in [1] will be proven again in somewhat greater generality. Since this requires some assumptions which involve the marginal distribution of the component in , it will be convenient to denote the latter distribution by . As usual in the empirical process literature, the symbol will also be used to indicate an operator on the function class or on related function classes. For example, will also be used to indicate the real-valued function .
The first result of this section settles a measurability issue. It will be used in the rest of this paper without explicitly mentioning it.
Lemma 1**.**
For let be the sample selection probabilities for a CPS sampling design, and let be the sample selection probabilities for the corresponding canonical Poisson sampling design. Moreover, let be the vector of first order sample inclusion probabilities for the CPS design and let be the vector of first order sample inclusion probabilities for the canonical Poisson sampling design. The following statements are equivalent:
- (i)
The CPS design is measurable, i.e. for every the corresponding function is a measurable function from into ;
- (ii)
the vector is measurable, i.e. the function is a measurable function from into ;
- (iii)
the vector is measurable, i.e. the function is a measurable function from into ;
- (iv)
The canonical Poisson sampling design is measurable, i.e. for every the corresponding function is a measurable function from into ;
Proof.
The proofs of the implications (i)(ii), (iii)(iv) and (iv)(i) are easy and the implication (ii)(iii) follows from Theorem 5 on page 67 in [15] which is a special case of a well-known result about exponential families (see for example [2]) ∎
The next lemma establishes conditional convergence of the marginal distributions of the sequence of HTEPs.
Lemma 2** (CWCM).**
Let be the sequence of vectors of sample inclusion indicators corresponding to a sequence of measurable CPS designs and let be the sequence of vectors of first order sample inclusion probabilities of the corresponding sequence of canonical Poisson sampling designs. Let be a class of measurable functions and let be the sequence HTEPs corresponding to and .
Assume that:
- A0)
the sequence is such that
[TABLE]
- A1)
there exists a function such that
[TABLE]
for every ,
- A2)
for every finite-dimensional vector and for every
[TABLE]
where is the euclidean norm on , and .
Then it follows that the function is a positive semidefinite covariance function, and for every finite-dimensional vector and for every
[TABLE]
where , and where is the covariance matrix whose elements are given by , .
Proof.
Assume WLOG that and the two sequences and of the previous subsection are defined in such way that the sequence of pdfs corresponding to is given by , and such that the sequence of joint pdfs corresponding to is given by . This can be done in many ways by defining each vector , and as a measurable function of and a single uniform- random variable as described in Section 2. In what follows the sequence of joint pdfs corresponding to will not be relevant.
Now, consider first the sequence of stochastic processes with defined as
[TABLE]
Note that for every , and that the left side of the display in condition A1 is the sequence of covariances . Now, for consider the triangular array of rowwise conditionally independent random vectors
[TABLE]
[TABLE]
Observe that the random vector can be written as
[TABLE]
Using the fact that along with condition A2 it is not difficult to show that the Lindeberg condition
[TABLE]
must be satisfied whenever and are such that . Therefore it follows that
[TABLE]
Next, consider the sequence of stochastic processes with defined in the same way as but with in place of . Use assumption A0 along with Result 2 in Section 4 to show that
[TABLE]
Note that this does not require to know the joint distributions of the vectors and .
Third, consider the sequence of stochastic processes with defined in the same way as but with in place of . Use Result 3 in Section 4 to conclude that
[TABLE]
as well.
Finally, note that the definition of coincides with the one of except for the fact that the former contains the first order sample inclusion probabilities corresponding to in place of those corresponding to , i.e. contains in place of . However, this problem can be easily fixed by using Result 1. ∎
Remark 3**.**
If each vector and every are measurable (in their respective senses), then condition A2 will be certainly satisfied if for every and
- A2∗**)
there exists a constant such that with probability tending to (eventually almost surely).
Now, Lemma 2 provides sufficient conditions for convergence of the finite-dimensional marginal distributions of the sequence of HTEPs, but in order to establish (conditional) weak convergence in for infinite function classes it must still be shown that sequence of HTEPs is (conditionally) asymptotically tight in (for unconditional weak convergence this follows from Theorem 1.5.4 on page 35 in [17], while for conditional weak convergence this follows from Theorem 4 in Section 3). By Theorem 1.5.7 on page 37 in [17] (Theorem 1 in Section 3) this can be done by showing that there exists a semimetric for which is totally bounded and for which the HTEP sequence is (conditionally) asymptotically equicontinuous (henceforth AEC). In this paper the choice of the semimetric will depend on the definition of the first order sample inclusion probabilities. In the next subsection it will be seen that if the first order sample inclusion probabilities are bounded away from zero, it is convenient to consider the -semimetric
[TABLE]
The subsequent subsection will then treat the case where the first order sample inclusion probabilities are proportional to some size variable which might take on arbitrarily small values. For that case another semimetric will be used.
5.1 CPS designs with a positive lower bound on the first order sample inclusion probabilities
The next lemma provides sufficient conditions which make sure that is totally bounded w.r.t. the semimetric and that the sequence of HTEPs conditionally AEC w.r.t. .
Lemma 3** (Total boundedness and conditional AEC).**
Let and be defined as in Lemma 2, let be a class of measurable functions and let be the sequence HTEPs corresponding to and . Assume that condition A2∗ holds and that assumptions
- GC)
* is an outer almost sure -Glivenko-Cantelli class*
- F1)
* has an envelope function such that and such that the uniform entropy condition*
[TABLE]
holds. In the last display the supremum is taken over all finitely discrete probability measures on such that ;
Then it follows that
- (i)
* is totally bounded w.r.t. ;*
- (ii)
[TABLE]
where
[TABLE]
Proof.
Part (i) of the conclusion follows from condition F1 (see Problem 2.5.1 on page 133 in [17].
The proof of part (ii) of the conclusion is essentially the same as the proof of conditional AEC for the rejective sampling case given in [1] (see pages 12-13 in the supplement to that paper) but it corrects a little mistake in the final part of that proof. First it will be shown that for arbitrary the corresponding stochastic processes are, with probability tending to (or eventually almost surely), conditionally subgaussian w.r.t. the empirical semimetric
[TABLE]
i.e. it will be shown that there exists a constant (which does not depend on the sample points and neither on ) such that, with probability tending to (or eventually almost surely),
[TABLE]
To this aim, write
[TABLE]
so that, for every ,
[TABLE]
by Markov’s inequality. Now, note that by Theorem 2.8 in [8] the components of are negatively associated, and hence it follows that
[TABLE]
Since for every and for every , and since
[TABLE]
it follows from Hoeffding’s lemma (see [7]) that
[TABLE]
By assumption A2*∗* and Result 1 from Hájek’s paper the right side does not exceed
[TABLE]
with probability tending to one (or eventually almost surely). In combination with (21) and (22) this shows that
[TABLE]
with probability tending to (or eventually almost surely). Combining this inequality with the same inequality for shows that
[TABLE]
with probability tending to (or eventually almost surely). Finally, optimizing the right side w.r.t. yields the subgaussian inequality in (20) with .
Now, note that
[TABLE]
and that the uniform entropy condition in assumption F1 implies that for every the square of the corresponding entropy number on the far right must be finite. From this it follows that contains a countable subset (note that this subset depends on ) such that
[TABLE]
As a consequence, the stochastic process is separable in the sense required for an application of Corollary 2.2.8 on page 101 in [17] with respect to the sample design distribution of the process. Since it has already be shown that the sub-Gaussian inequality (20) holds with probability tending to (or eventually almost surely), it follows by the second part of the conclusion of the just cited corollary that there exists a constant (which does not depend on the sample points and neither on ) such that
[TABLE]
with inner probability tending to one (or eventually inner almost surely), where denotes the packing number, i.e. the cardinality of the largest subset of such that for every . Since
[TABLE]
it follows that the right side in (23) is bounded by a constant multiple of
[TABLE]
The proof can now be completed by using assumptions GC and F1 in order to show that the latter integral goes to zero outer almost surely. This can be done as in the proof of Theorem 2.5.2 on page 127 in [17] (see the lines following display (2.5.3) on page 128 in [17]; see also Remark 4 below in order to see that assumption GC can be replaced with a measurability condition). ∎
Remark 4**.**
In the proof of Theorem 2.5.2 on page 127 in [17] it is shown that condition F1 together with condition
- M1)
* is a -measurable class of functions (see Definition 2.3.3 on page 110 in [17]), i.e. the function*
[TABLE]
is measurable on the completion of for every and for every
imply condition GC. Moreover, in the proof of Theorem 2.5.2 on page 127 in [17] assumption M1 is used only for this purpose. Thus, the proof of Theorem 2.5.2 on page 127 in [17] does actually show that assumptions F1, GC and
- M2)
for every the corresponding function class is a -measurable class of functions, i.e. the function
[TABLE]
is measurable on the completion of for every and for every
imply that is a -Donsker class.
Remark 5**.**
It is not difficult to show that condition
- PM)
* is a pointwise measurable class of functions, i.e. contains a countable subset such that for every there exists a sequence of functions such that is the pointwise limit of (see Example 2.3.4 on page 110 in [17])*
implies condition M1 as well as condition M2.
Remark 6**.**
The FCLT for the rejective sampling case given in [1] (Theorem 3.2 on page 105 of that paper) imposes neither assumption M1 nor assumption GC. However, there is a mistake in the proof of conditional AEC given in [1]. In fact, inequality (S3) on page 13 in the supplement to [1] is false in general. According to the first inequality in the conclusion of Corollary 2.2.8 on page 101 in [17], which was used by the authors of [1] in order to obtain inequality (S3), the left hand side of inequality (S3) should actually be
[TABLE]
rather than
[TABLE]
with the function class in place of . As a consequence, the proof of conditional AEC given in [1] shows actually that
[TABLE]
which does not imply conditional AEC. In order to obtain conditional AEC, the authors of [1] should have used the second inequality in the conclusion of Corollary 2.2.8 on page 101 in [17] rather than the first one. In this way they would have obtained inequality (23) instead of inequality (S3), and in order to prove that under condition F1 the right side of (23) goes to zero outer almost surely some additional assumption seems to be necessary (cf. the proof of Theorem 2.5.2 on page 127 in [17]).
As already pointed out in [11] (cf. Remark 2), conditional AEC w.r.t. to a given semimetric and not even the conclusion of Lemma 3 (which is certainly stronger than conditional AEC) seem to be strong enough to imply unconditional AEC, which for the HTEP sequence can be defined as
[TABLE]
where is the product probability measure (cf. the equivalent definition of conditional AEC given in Remark 1). The problem is that for uncountable function classes the random functions , , might be non measurable and that might therefore be strictly smaller than with positive inner probability. As a consequence, the -probabilities on the left side in (13) might not be conditional probabilities in the proper sense and condition (24) might therefore fail even though condition (13) is satisfied (note that this is consistent with the conjecture that oasCWC does not imply unconditional weak convergence; see Remark 2). To be safe, in order to deduce unconditional AEC from conditional AEC condition PM will be used in this paper. In fact, condition PM makes sure that the random functions , , are all measurable and that the -probabilities on the left side in (13) are therefore conditional probabilities in the proper sense.
Now, combining the sufficient conditions for CWCM with those for total boundedness and conditional AEC yields the following weak convergence results:
Theorem 6** (conditional weak convergence).**
Let , , and be defined as in Lemma 2. Assume that conditions A0, A1, A2∗, GC and F1 are satisfied. Then it follows that
- (i)
there exists a zero-mean Gaussian process with covariance function given by which is a Borel measurable and tight mapping from some probability space into such that
[TABLE]
- (ii)
the sample paths are uniformly continuous w.r.t. the semimetric with probability .
Proof.
Assumptions A0, A1, A2*∗* make sure that CWCM holds for some zero-mean Gaussian limit process with covariance function given by (see Lemma 2 and Remark 3), while assumptions A2*∗*, GC and F1 imply that is totally bounded w.r.t. and that is conditionally AEC w.r.t. (see Lemma 3). Both conclusions of the theorem follow now from Corollary 1. ∎
Theorem 7** (Unconditional weak convergence).**
Let , , and be defined as in Lemma 2. Assume that conditions A0, A1, A2∗, F1 and PM are satisfied. Then it follows that
- (i)
there exists zero-mean Gaussian process with covariance function given by which is a Borel measurable and tight mapping from some probability space into such that
[TABLE]
- (ii)
the sample paths are uniformly continuous w.r.t. the semimetric with probability .
Proof.
Remark 4 and Remark 5 show that assumption F1 along with assumption PM imply assumption GC. The conditions of the present theorem are therefore stronger than the conditions of Theorem 6, and the conclusion of the present theorem follows therefore from Remark 2 (note that condition PM implies measurability of the suprema in Remark 2). ∎
The following corollary establishes joint weak convergence for the sequence of HTEPs and the classical sequence of -indexed i.i.d. empirical processes given by
[TABLE]
Corollary 2** (Joint weak convergence).**
Under the assumptions of Theorem 7 it follows that
[TABLE]
where is defined as in Theorem 6 (or Theorem 7), is the classical -indexed empirical process defined in (25), and where is a Borel measurable and tight -Brownian Bridge which is independent from .
Proof.
The assumptions of Theorem 7 are stronger than those of Theorem 6 (which imply opCWC) and they imply that is a -Donsker class (see Remark 4 and Remark 5). The proof of the corollary follows now from an application of Theorem 5. ∎
5.2 CPS designs with first order sample inclusion probabilities proportional to some size variable which might take on arbitrarily small values
This subsection treats the case where the first order sample inclusion probabilities are proportional to some size variable which can take on values arbitrarily close to zero. Note that this case is not covered by the theorems given in the previous subsection because assumptions A0 and A2*∗* imply that the first order sample inclusion probabilities are bounded away from zero with probability tending to or eventually almost surely (see Result 1 in Section 4). So, let be a mapping such that can be interpreted as the "size" of the th population unit. Throughout this subsection it will be assumed that the first order sample inclusion probabilities are defined as
[TABLE]
where is a function which makes sure that the expected sample size equals the value taken on by some other integer-valued function (in many applications is simply a deterministic sequence of positive integers), i.e. makes sure that
[TABLE]
It is not difficult to show that the function is well defined, i.e. that for every there exists a unique positive constant such that equation (27) holds. Moreover, under the assumptions
- B0)
is a measurable function and the sequence of expected sample sizes is such that
[TABLE]
- B1)
is a measurable function such that ,
it can also be shown that is measurable and that in probability (almost surely), where is the unique (positive) constant such that
[TABLE]
The details of the proof of the latter claim are left to the reader.
Now, in order obtain weak convergence theorems for the case where the first order sample inclusion probabilities are defined as in (26) it will be convenient to proceed as in Subsection 3.2 of [11] and to place restrictions on the class of functions
[TABLE]
where is the original class of interest, and where
[TABLE]
Note that the domain of the members of the class is the range of the random vectors (which is assumed to be ), and that the value taken on by at a given realization of the random vector is given by .
The following lemma establishes CWCM for the HTEP sequence for the case where the first order sample inclusion probabilities are defined as in (26).
Lemma 4** (CWCM).**
Let be the sequence of vectors of first order sample inclusion probabilities for a sequence of CPS designs and let be the sequence of vectors of first order sample inclusion probabilities for the corresponding sequence of canonical Poisson sampling designs. Assume that the components of each vector are defined as in (26) and that conditions B0, B1 and condition
- B2)
the members of are square integrable, i.e. for every
hold. Then it follows that conditions A0, A1, and A2 of Lemma 2 are satisfied and that the covariance function in condition A1 is given by
[TABLE]
with
[TABLE]
Proof.
Define
[TABLE]
and note that
[TABLE]
Next, note that
[TABLE]
and that the right hand side is positive only if lies between and , in which case it is bounded by
[TABLE]
Since this bound does not depend on , and since under assumptions B0 and B1 it goes to zero in probability (almost surely), it follows that
[TABLE]
In combination with (30) this yields
[TABLE]
Using this result it is easily seen that
[TABLE]
Since , the limiting constant in the last line in the last display must be strictly positive unless with probability . However, in the latter case it would follow that which contradicts assumption B0. This proves that the limiting constant on the right side in (33) is positive and hence that assumption A0 holds with in place of . From (i) in Result 1 in Section 4 it follows that assumption A0 in its original form must be satisfied as well. Actually, the previous argument shows more than that. In fact, it shows that under assumptions B0 and B1
[TABLE]
Next, consider assumption A1. Using (32), (34) and assumption B2 it is not difficult to show that assumption A1 is also satisfied with the limiting covariance function defined as in the statement of the lemma (the details of the proof are left to the reader).
Finally, it remains to show that the Lindeberg condition in assumption A2 holds as well. Also this can be easily shown by using (32) and assumption B2 (the details are left to the reader). ∎
Having established sufficient conditions for conditional convergence of the marginal distributions it remains to deal with AEC and total boundedness. The next lemma deals with both issues. As already mentioned above, the underlying semimetric will be different from the -semimetric which was used in the previous subsection. In fact, in the present setting it seems more convenient to use the semimetric
[TABLE]
in place of . Note that can be viewed as the -semimetric on the function class .
Lemma 5** (Total boundedness and conditional AEC).**
Let be the sequence of vectors of sample inclusion indicators for a sequence of measurable CPS designs, let be a class of measurable functions and let be the sequence of HTEPs corresponding to and . Assume that the first order sample inclusion probabilities corresponding to each vector are defined as in (26) and that conditions B0 and B1 hold. Moreover, assume that conditions
- GC∗**)
* is an outer almost sure -Glivenko-Cantelli class;*
- F1∗**)
* has an envelope function such that and such that the uniform entropy condition*
[TABLE]
holds, where the supremum is taken over all finitely discrete probability measures on such that222Note the abuse of notation: and should actually be written as and , respectively, with and defined as and for .**
[TABLE]
hold. Then it follows that
- (i)
* is totally bounded w.r.t. ;*
- (ii)
[TABLE]
where .
Proof.
Part (i) of the conclusion follows from condition F1*∗* (see Problem 2.5.1 on page 133 in [17]).
The proof of part (ii) of the conclusion is almost the same as the proof of Lemma 3. The first step is to show that for arbitrary the corresponding stochastic processes are, with probability tending to (eventually almost surely), conditionally subgaussian w.r.t. to the empirical semimetric
[TABLE]
i.e. to show that there exists a constant (which does not depend on the sample points and neither on ) such that, with probability tending to (eventually almost surely),
[TABLE]
(cfr. display (20)). To this aim, note that the difference can be written as
[TABLE]
where is defined as in (29). Then, note that for ,
[TABLE]
and that with probability tending to (eventually almost surely) the right side in the last inequality is bounded by
[TABLE]
where is a positive constant which depends only on (use (31) along with the fact that assumptions B0 and B1 imply ). Thus, it follows by Hoeffding’s lemma (see [7]) that, with probability tending to (eventually almost surely),
[TABLE]
Now, as in the proof of Lemma 3, use the fact that the components of are negatively associated to conclude that
[TABLE]
with probability tending to (eventually almost surely). Optimizing the right side w.r.t. yields then the subgaussian tail inequality in (36) with .
Next, note that Corollary 2.2.8 on page 101 in [17] can be applied also in the present case and conclude that, with probability tending to one (eventually almost surely),
[TABLE]
for some constant (which does not depend on the sample points and neither on ). Now note that
[TABLE]
where , so that the integral in the second last display is bounded by a constant multiple of
[TABLE]
The proof can now be completed by using assumptions GC*∗* and F1*∗* in order to show that this last integral goes to zero outer almost surely. Again, this can be done by the method used in the proof of Theorem 2.5.2 on page 127 in [17] (see the lines following display 2.5.3 on page 128 in [17]; see also Remark 7 in order to see that assumption GC*∗* can be replaced by a measurability condition). ∎
Remark 7**.**
Assume that is any measurable and uniformly bounded function. Then condition F1∗ together with condition
- M1’)
* is a -measurable class of functions (see Definition 2.3.3 on page 110 in [17]), i.e. the function*
[TABLE]
is measurable on the completion of for every and for every
imply condition GC∗ (cf. Remark 4). Of course, condition PM implies also condition M1’.
Remark 8**.**
Assume that is any measurable and uniformly bounded function. Then, the uniform entropy conditions (19) and (35) are equivalent. To prove this claim, define the projections and as in footnote 2, define and let for any probability measure on . Then note that for every measurable and deduce that
[TABLE]
where the supremum on the left side ranges over the set of all finitely discrete probability measures on such that , and where the supremum on right side ranges over the set of all finitely discrete probability measures on such that .
Next, define for each finitely discrete probability measure on a corresponding finitely discrete measure by setting
[TABLE]
Since this density is strictly positive, it follows that the supports of and must be the same. Moreover, it follows that
[TABLE]
which shows that the mapping is a bijection between the set of all finitely discrete probability measures on and the set of all finitely discrete measures on . Obviously, this bijection satisfies for every . Conclude that
[TABLE]
where the supremum over ranges over the set of all finitely discrete probability measures on such that , and where the supremum over ranges over the set of all finitely discrete measures on such that . Next, note that
[TABLE]
where the supremum over ranges over the set of all finitely discrete measures on such that (see Problem 2.10.5 on page 204 in [17]). Now combine equations (37), (38) and (39) to obtain
[TABLE]
where, by an abuse of notation, the right side can be written as the integrand on the left side in (35). This shows that the uniform entropy integrals in (19) and (35) are actually the same.
Remark 9**.**
Assume that is any measurable and uniformly bounded function. Then condition F1∗ does obviously imply condition B2. Moreover, from Remark 8 it follows that condition F1∗ implies also condition F1. Since condition GC∗ is stronger than condition GC, it follows further that conditions F1∗, GC∗ and M2 imply that is a -Donsker class (cf. Remark 4).
Theorem 8** (conditional weak convergence).**
Let be the sequence of vectors of sample inclusion indicators corresponding to a sequence of measurable Poisson sampling designs, let be a class of measurable functions and let be the sequence of HTEPs corresponding to and . Assume that the first order sample inclusion probabilities corresponding to each vector are defined as in (26) and assume that conditions B0, B1, F1∗ and GC∗ are satisfied. Then it follows that
- (i)
there exists zero-mean Gaussian process with covariance function given by as defined in (28) (or in assumption A1) which is a Borel measurable and tight mapping from some probability space into such that
[TABLE]
- (ii)
the sample paths are uniformly continuous w.r.t. the semimetric with probability .
Proof.
Assumptions B0 and B1 imply that the function is well defined and that it is measurable and uniformly bounded, and together with assumption F1*∗* they imply also condition B2. From Lemma 4 and Lemma 2 it follows therefore that satisfies CWCM for some zero-mean Gaussian limit process with covariance function given by as defined in (28) (or in assumption A1). Moreover, Lemma 5 shows that is totally bounded w.r.t. and that is conditionally AEC w.r.t. . The two conclusions of the theorem follow now by Corollary 1. ∎
Theorem 9** (Unconditional weak convergence).**
Let , and be defined as in Theorem 8. Assume that the first order sample inclusion probabilities corresponding to each vector are defined as in (26) and assume that conditions B0, B1, F1∗ and PM are satisfied. Then it follows that
- (i)
there exists zero-mean Gaussian process with covariance function given by as defined in (28) (or in assumption A1) which is a Borel measurable and tight random element of such that
[TABLE]
- (ii)
the sample paths are uniformly continuous w.r.t. the semimetric with probability .
Proof.
Remark 7 shows that conditions F1*∗* and PM imply assumption GC*∗*. The conditions of the present theorem are therefore stronger than those of Theorem 8, and the conclusions of the present theorem follows therefore from Theorem 8 and Remark 2 (note that assumption PM implies that the suprema in Remark 2 are measurable). ∎
Corollary 3** (Joint weak convergence).**
Under the assumptions of Theorem 9 it follows that
[TABLE]
where is defined as in Theorem 9, is the classical -indexed empirical process defined in (25), and where is a Borel measurable and tight -Brownian Bridge which is independent from .
Proof.
In the proof of Theorem 9 it has already been shown that the assumptions of Theorem 9 are stronger than those of Theorem 8 which imply opCWC. Moreover, from Remark 7, Remark 9 and Remark 5 it follows that is a -Donsker class. The proof of the corollary follows now from an application of Theorem 5. ∎
6 Extensions for Hájek empirical processes
This section is very similar to Section 4 in [11]. It extends the weak convergence results for HTEP sequences to the corresponding Hájek empirical processes (henceforth HEP). Given a class of functions , the HEP is defined as
[TABLE]
with the Horvitz-Thompson estimator of the population size . Note that the value taken on by is undefined when . However, this will not be problem here since the assumptions in the forthcoming theory will always imply that
[TABLE]
In fact, this condition allows to consider in place of the HEP as defined in (40) the closely related empirical process given by
[TABLE]
where is the empirical measure on . In order to see why under condition (41) we can consider in place of the HEP it is sufficient to observe that
[TABLE]
and that this together with condition (41) implies that any one of the three weak convergence results in for the sequence carries over immediately to the corresponding sequence of HEPs, and viceversa.
The following lemma establishes conditional convergence of the marginal distributions for the sequence and hence for the corresponding sequence of HEPs as well.
Lemma 6** (CWCM).**
Let , and be defined as in Lemma 2, let be the sequence of vectors of first order sample inclusion probabilities corresponding to , and let be the sequence of empirical processes defined by (42). Assume that conditions
- C1)
* contains a constant function which is not identically equal to zero, i.e. a function such that -almost surely for some constant ;*
- C2)
* for every *
and conditions A0, A1 and A2 are satisfied. Then the function
[TABLE]
with defined as in assumption A1, is a positive semidefinite covariance function, and for every finite-dimensional and for every
[TABLE]
where is the covariance matrix whose elements are given by .
Proof.
The proof is almost the same as the proof of Lemma 2. Define the sequences of sample inclusion indicators and as in the proof of Lemma 2. Then, define the sequence of stochastic processes by
[TABLE]
Note that for every , and that
[TABLE]
where is defined as in the proof of Lemma 2. Now, it follows from assumptions C1, C2 and A1 that
[TABLE]
where is defined as in (44). This implies that must be positive semidefinite and proves the first part of the conclusion of the lemma.
In order to prove the second part of the conclusion, consider for some given the triangular array of rowwise conditionally independent random vectors
[TABLE]
[TABLE]
where . Observe that the random vector can be written as
[TABLE]
Using the fact that along with condition A2 it is not difficult to show that the Lindeberg condition
[TABLE]
must be satisfied whenever and such that . Therefore it follows that
[TABLE]
Next, consider the sequence of stochastic processes with defined in the same way as but with in place of . Use assumption A0 along with Result 2 in Section 4 to show that
[TABLE]
Note that this does not require to know the joint distributions of the vectors and .
Third, consider the sequence of stochastic processes with defined in the same way as but with in place of . Use Result 3 in Section 4 to conclude that
[TABLE]
as well.
Finally, note that the definition of coincides with the one of except for the fact that the former contains the first order sample inclusion probabilities corresponding to in place of those corresponding to , i.e. contains in place of . However, this problem can be easily fixed by using Result 1. ∎
Remark 10**.**
Assumption A0 implies that condition (41) holds. By (43) it follows therefore that the conditions of Lemma 6 imply also that
[TABLE]
Remark 11**.**
Assumption C2 is certainly satisfied if assumption F1 holds or if is measurable and uniformly bounded and assumption F1∗ holds.
The next two lemmas establish conditional AEC of the sequence for the case where there is a positive lower bound for the ’s and for the case where the ’s are proportional to some size variable which can take on arbitrarily small values, respectively.
Lemma 7** (conditional AEC).**
Let be a class of functions which satisfies assumption M2. Then, under the assumptions of Lemma 3, it follows that
[TABLE]
Proof.
First, note that
[TABLE]
where is the HTEP. From this it follows that
[TABLE]
Since by Theorem 2.8 in [8] the ’s are negatively associated, it follows that
[TABLE]
and assumption A2*∗* implies that the right side in the latter inequality is bounded in probability (eventually almost surely). To complete the proof of the lemma it remains therefore to show that
[TABLE]
To this aim note that
[TABLE]
and that because assumptions F1, GC and M2 imply that is a -Donsker class (see Remark 4) and hence an outer almost sure -Glivenko-Cantelli class. ∎
Lemma 8** (conditional AEC).**
Let be a class of functions which satisfies assumptions C1 and M2. Then, under the assumptions of Lemma 5, it follows that
[TABLE]
Proof.
Follow the steps in the proof of Lemma 7 up to inequality (46) (with in place of ) and note that the right side of that inequality is bounded in probability (eventually almost surely) because assumptions B0 and B1 imply (32) (see the proof of Lemma 4), and assumptions C1 and F1*∗* imply . To complete the proof of the lemma it remains therefore to show that
[TABLE]
To this aim note that
[TABLE]
and that because assumptions F1*∗*, GC*∗* and M2 imply that is a -Donsker class (see Remark 9) and hence an outer almost sure -Glivenko-Cantelli class. ∎
Having found sufficient conditions for CWCM and for conditional AEC w.r.t to suitable semimetrics, we are now ready to prove the three desired weak convergence results. Since the sufficient conditions under consideration imply condition (41) (see Remark 10), the weak convergence results for and for the HEP sequence are equivalent. Since only the HEP sequence is of interest in applications, the weak convergence results will be stated only in terms of the latter.
Theorem 10** (Conditional and unconditional weak convergence).**
Let be a class of functions which satisfies assumption C1 and let be the corresponding sequence of HEPs as defined in (40). Then, under the assumptions of Theorem 7 or the assumptions of Theorem 9 it follows that
- (i)
there exists zero-mean Gaussian process with covariance function defined as in (44) which is a Borel measurable and tight random element of ;
- (ii)
(conditional weak convergence)
[TABLE]
- (iii)
(unconditional weak convergence)
[TABLE]
Moreover,
- (iv)
under the assumptions of Theorem 7 it follows that the sample paths are uniformly -continuous with probability ;
- (v)
under the assumptions of Theorem 9 it follows that the sample paths are uniformly -continuous with probability .
Proof.
Consider first the assumptions of Theorem 7. Remark 3 and Remark 11 show that the assumptions of Theorem 7 together with assumption C1 imply the assumptions of Lemma 6, and the conclusion of that lemma says that the sequence of auxiliary processes satisfies CWCM for some zero-mean Gaussian limit process with covariance function given by as defined in (44). Next, in the proof of Theorem 7 it has already been shown that the assumptions of that theorem are stronger than those of Lemma 3. Hence, the assumptions of Theorem 7 along with assumption C1 imply the assumptions of Lemma 7 (use the fact that assumption PM is stronger that assumption M2; see Remark 5) whose conclusion implies that is conditionally AEC w.r.t. . Since the first part of the conclusion of Lemma 3 says that is totally bounded w.r.t. , it follows by Corollary 1 that (and hence also the corresponding sequence of HEPs) satisfies part (ii) of the conclusion of the present theorem for some which satisfies the conditions given in parts (i) and (iv). Part (iii) of the conclusion of the theorem follows now from Remark 2 (recall that condition PM implies that the suprema in Remark 2 are measurable).
Now, consider the assumptions of Theorem 9. In the proof of Theorem 9 it has already been shown that its assumptions are stronger than those of Theorem 8, and in the proof of the latter theorem it has been shown that its assumptions imply the conditions of Lemma 4 whose conclusion says that conditions A0, A1 and A2 are satisfied. Use Remark 11 to conclude that the assumptions of Theorem 9 along with assumption C1 imply the assumptions of Lemma 6 whose conclusion says that the sequence of auxiliary processes satisfies CWCM for some zero-mean Gaussian limit process with covariance function given by as defined in (44). Next, recall that in the proof of Theorem 8 it has been shown that its assumptions imply those of Lemma 5, and conclude that the assumptions of Theorem 9 along with condition C1 must therefore imply the assumptions of Lemma 8 (use the fact that assumption PM is stronger that assumption M2; see Remark 5) whose conclusion implies that is conditionally AEC w.r.t. . Since the first part of the conclusion of Lemma 5 says that is totally bounded w.r.t. , it follows by Corollary 1 that the sequence of auxiliary processes (and hence also the corresponding sequence of HEPs) satisfies part (ii) of the conclusion of the present theorem for some which satisfies the conditions given in parts (i) and (v). Again, part (iii) of the conclusion of the theorem follows now from Remark 2 (recall that condition PM implies that the suprema in Remark 2 are measurable). ∎
Corollary 4** (Joint weak convergence).**
Under the assumptions of Theorem 10 it follows that
- (i)
[TABLE]
where is defined as in the conclusion of Theorem 10, is the classical -indexed empirical process defined in (25), and where is a Borel measurable and tight -Brownian Bridge which is independent from .
Proof.
As already shown in the proof of Corollary 2 (Corollary 3), the assumptions of Theorem 7 (Theorem 9) imply that is a -Donsker class. The conclusion of the present corollary follows therefore from Theorem 10 and Theorem 5. ∎
7 Simulation results
This section about simulation results is analogous to Section 5 in [11] (see also Appendix S4 in [1]). The numerical results given in this section have been obtain by using the R Statistical Software [13] in order to repeat times the following steps:
Generate a population of independent observations from the linear model , where the ’s are i.i.d. lognormal with and , and where the ’s are independent zero mean Gaussian random variables with , .
- 2)
Select a sample according to the CPS design with sample size (specified below) and with first order sample inclusion probabilities proportional to the values (this step was performed by using the function "UPmaxentropy" from the R package "sampling" [16]).
- 3)
Compute the Horvitz-Thompson and the Hájek estimator for the population cdf , , and compute the uniform distance between each of those estimators and , i.e. compute and for the case where .
- 4)
Estimate the -quantiles and of the limiting distributions of and , i.e. the -quantiles of the distributions of and . This was done by using Algorithm 5.1 in [10] which was also used in the simulation study in [1]. The details for the implementation of this algorithm are described below.
- 5)
Compute the asymptotic uniform -confidence bands for the population cdf based on the Horvitz-Thompson and the Hájek estimators and verify whether lies within these confidence bands, i.e. verify whether and whether , where and are the estimates of and obtained from step 4. Note that the widths of the two asymptotic uniform -confidence bands for are given by and , respectively.
The -quantiles of the distributions of and were estimated according to the following procedure (see Algorithm 5.1 in [10]):
- i)
Estimate the covariance matrices and for where correspond to the sampled population units, i.e. are the values of the subscript for which , . The components and , , of the two covariance matrices were estimated as follows (cf. Lemma 4 and Lemma 6):
[TABLE]
with
[TABLE]
and
[TABLE]
with
[TABLE]
- ii)
Compute the Cholesky decompositions of the estimated covariance matrices, i.e. compute two lower triangular matrices and such that and .
- iii)
Generate independently random vectors , , whose components are i.i.d. standard normal random variables and compute the vectors and which can be considered as realizations of the limit processes and , respectively.
- iv)
for each compute the maximum norms and (i.e. the two maxima of the absolute values of the components of and ), put the two vectors and in ascending order and put equal to the -quantile of the first vector, and put equal to the -quantile of the second vector.
Table 1 (for the HTEP) and Table 2 (for the HEP) summarize the simulation results. For each considered population size , for each considered sampling fraction and for each considered confidence level the two tables report the estimate of the coverage probability of the corresponding confidence band for as well as the average width (the first figure within each bracket) and the maximum width (the second figure within each bracket) of the simulated confidence bands. The simulation results suggest that the confidence bands based on the HTEP and on the HEP are very similar. Their coverage accuracy is quite precise for the populations of size and seems not to depend on the sampling fraction . As expected, the with of the confidence bands is roughly proportional to and it appears to be quite stable from sample to sample (the differences between the maximum widths and the average widths are rather small). However, for many applications the widths of the confidence bands might be too large. This problem can be probably overcome through alternative estimators which use the information provided by the auxiliary variable more efficiently (see e.g. [14] or [12] and references therein).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bertail et al. [2017] P. Bertail, E. Chautru, and S. Clémençon. Empirical processes in survey sampling with (conditional) Poisson designs. Scand. J. Stat. , 44(1):97–111, 2017. ISSN 0303-6898. doi: 10.1111/sjos.12243 . URL https://doi.org/10.1111/sjos.12243 . · doi ↗
- 2Brown [1986] L. D. Brown. Fundamentals of statistical exponential families with applications in statistical decision theory , volume 9 of Institute of Mathematical Statistics Lecture Notes—Monograph Series . Institute of Mathematical Statistics, Hayward, CA, 1986. ISBN 0-940600-10-2.
- 3Chen et al. [1994] X.-H. Chen, A. P. Dempster, and J. S. Liu. Weighted finite population sampling to maximize entropy. Biometrika , 81(3):457–469, 1994. ISSN 0006-3444. doi: 10.1093/biomet/81.3.457 . URL https://doi.org/10.1093/biomet/81.3.457 . · doi ↗
- 4Conti [2014] P. L. Conti. On the estimation of the distribution function of a finite population under high entropy sampling designs, with applications. Sankhya B , 76(2):234–259, 2014. ISSN 0976-8386. doi: 10.1007/s 13571-014-0083-x . URL https://doi.org/10.1007/s 13571-014-0083-x . · doi ↗
- 5Dupačová [1979] J. Dupačová. A note on rejective sampling. In Contributions to statistics , pages 71–78. Reidel, Dordrecht-Boston, Mass.-London, 1979.
- 6Hájek [1964] J. Hájek. Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann. Math. Statist. , 35:1491–1523, 1964. ISSN 0003-4851. doi: 10.1214/aoms/1177700375 . URL https://doi.org/10.1214/aoms/1177700375 . · doi ↗
- 7Hoeffding [1963] W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. , 58:13–30, 1963. ISSN 0162-1459. URL http://links.jstor.org/sici?sici=0162-1459(196303)58:301<13:PIFSOB>2.0.CO;2-D&origin=MSN .
- 8Joag-Dev and Proschan [1983] K. Joag-Dev and F. Proschan. Negative association of random variables, with applications. Ann. Statist. , 11(1):286–295, 1983. ISSN 0090-5364. doi: 10.1214/aos/1176346079 . URL https://doi.org/10.1214/aos/1176346079 . · doi ↗
