Functional central limit theorems for conditional Poisson sampling

Leo Pasquazzi

arXiv:1905.01021·math.ST·June 18, 2019

Functional central limit theorems for conditional Poisson sampling

Leo Pasquazzi

PDF

Open Access

TL;DR

This paper refines and generalizes functional central limit theorems for conditional Poisson sampling, providing detailed proofs and insights useful for applications in survey sampling.

Contribution

It offers more suitable, generalized versions of existing theorems with detailed proofs, enhancing understanding of weak convergence in survey sampling.

Findings

01

Refined functional central limit theorems for conditional Poisson sampling.

02

Detailed discussion on proving weak convergence in bounded function spaces.

03

Enhanced theoretical framework for applications in survey sampling.

Abstract

This paper provides refined versions of some known functional central limit theorems for conditional Poisson sampling which are more suitable for applications. The theorems presented in this paper are generalizations of some results that have been recently published by \citet*{Bertail_2017}. The asymptotic equicontinuity part of the proofs presented in this paper is based on the same idea as in \citep{Bertail_2017} but some of the missing details are provided. On the way to the functional central limit theorems, this paper provides a detailed discussion of what must be done in order to prove conditional and unconditional weak convergence in bounded function spaces in the context of survey sampling. The results from this discussion can be useful to prove further weak convergence results.

Tables2

Table 1. Table 1: Simulation results for the Horvitz-Thompson empirical process.

	$γ = 0.90$	$γ = 0.95$	$γ = 0.99$
$𝐍 = 𝟓𝟎𝟎$
$α = 0.05$	0.841	0.916	0.983
$α = 0.05$	(0.4233; 0.4541)	(0.4755; 0.5121)	(0.5779; 0.6481)
$α = 0.10$	0.866	0.913	0.990
$α = 0.10$	(0.3022; 0.3185)	(0.3378; 0.3610)	(0.4079; 0.4662)
$𝐍 = 𝟏𝟎𝟎𝟎$
$α = 0.05$	0.877	0.937	0.991
$α = 0.05$	(0.3102; 0.3264)	(0.3470; 0.3676)	(0.4198; 0.4656)
$α = 0.10$	0.875	0.928	0.981
$α = 0.10$	(0.2190; 0.2310)	(0.2444; 0.2602)	(0.2942; 0.3296)
$𝐍 = 𝟐𝟎𝟎𝟎$
$α = 0.05$	0.873	0.938	0.993
$α = 0.05$	(0.2247; 0.2362)	(0.2509; 0.2669)	(0.3019; 0.3374)
$α = 0.10$	0.885	0.949	0.991
$α = 0.10$	(0.1574; 0.1666)	(0.1755; 0.1903)	(0.2110; 0.2384)

Table 2. Table 2: Simulation results for the Hájek empirical process.

	$γ = 0.90$	$γ = 0.95$	$γ = 0.99$
$𝐍 = 𝟓𝟎𝟎$
$α = 0.05$	0.860	0.928	0.991
$α = 0.05$	(0.4319; 0.4562)	(0.4838; 0.5100)	(0.5864; 0.6480)
$α = 0.10$	0.875	0.925	0.989
$α = 0.10$	(0.3062; 0.3203)	(0.3418; 0.3652)	(0.4127; 0.4523)
$𝐍 = 𝟏𝟎𝟎𝟎$
$α = 0.05$	0.887	0.940	0.993
$α = 0.05$	(0.3145; 0.3344)	(0.3514; 0.3785)	(0.4243; 0.4737)
$α = 0.10$	0.880	0.935	0.982
$α = 0.10$	(0.2212; 0.2332)	(0.2465; 0.2623)	(0.2962; 0.3313)
$𝐍 = 𝟐𝟎𝟎𝟎$
$α = 0.05$	0.878	0.944	0.994
$α = 0.05$	(0.2269; 0.2416)	(0.2529; 0.2686)	(0.3049; 0.3387)
$α = 0.10$	0.886	0.948	0.994
$α = 0.10$	(0.1585; 0.1684)	(0.1765; 0.1899)	(0.2116; 0.2370)

Equations364

\underline{π}_{N} := (π_{1, N}, π_{2, N}, \dots, π_{N, N}) := (E_{d} S_{1, N}, E_{d} S_{2, N}, \dots, E_{d} S_{N, N}) = (P_{d} {S_{1, N} = 1}, P_{d} {S_{2, N} = 1}, \dots, P_{d} {S_{N, N} = 1}),

\underline{π}_{N} := (π_{1, N}, π_{2, N}, \dots, π_{N, N}) := (E_{d} S_{1, N}, E_{d} S_{2, N}, \dots, E_{d} S_{N, N}) = (P_{d} {S_{1, N} = 1}, P_{d} {S_{2, N} = 1}, \dots, P_{d} {S_{N, N} = 1}),

G_{N}^{'} := \frac{1}{N} i = 1 \sum N (\frac{S _{i, N}}{π _{i, N}} - 1) δ_{Y_{i}} .

G_{N}^{'} := \frac{1}{N} i = 1 \sum N (\frac{S _{i, N}}{π _{i, N}} - 1) δ_{Y_{i}} .

G_{N}^{'} f := \frac{1}{N} i = 1 \sum N (\frac{S _{i, N}}{π _{i, N}} - 1) f (Y_{i})

G_{N}^{'} f := \frac{1}{N} i = 1 \sum N (\frac{S _{i, N}}{π _{i, N}} - 1) f (Y_{i})

G_{N}^{'} ⇝ G^{'} in l^{\infty} (F),

G_{N}^{'} ⇝ G^{'} in l^{\infty} (F),

E^{*} h (G_{N}^{'}) \to E h (G^{'}) for all h \in C_{b} (l^{\infty} (F)),

E^{*} h (G_{N}^{'}) \to E h (G^{'}) for all h \in C_{b} (l^{\infty} (F)),

h \in B L_{1} (l^{\infty} (F)) sup ∣ E^{*} h (G_{N}^{'}) - E h (G^{'}) ∣ \to 0,

h \in B L_{1} (l^{\infty} (F)) sup ∣ E^{*} h (G_{N}^{'}) - E h (G^{'}) ∣ \to 0,

h \in B L_{1} (l^{\infty} (F)) sup ∣ E_{d} h (G_{N}^{'}) - E h (G^{'}) ∣ ⟶ P^{*} 0

h \in B L_{1} (l^{\infty} (F)) sup ∣ E_{d} h (G_{N}^{'}) - E h (G^{'}) ∣ ⟶ P^{*} 0

h \in B L_{1} (l^{\infty} (F)) sup ∣ E_{d} h (G_{N}^{'}) - E h (G^{'}) ∣ \to a s * 0.

h \in B L_{1} (l^{\infty} (F)) sup ∣ E_{d} h (G_{N}^{'}) - E h (G^{'}) ∣ \to a s * 0.

i = 1 \prod \infty (Ω_{y, x}, A_{y, x}, P_{y, x}) \times (Ω_{d}, A_{d}, P_{d})

i = 1 \prod \infty (Ω_{y, x}, A_{y, x}, P_{y, x}) \times (Ω_{d}, A_{d}, P_{d})

S_{i, N} := {10 if D_{i} \leq π_{i, N} otherwise i = 1, 2, \dots, N

S_{i, N} := {10 if D_{i} \leq π_{i, N} otherwise i = 1, 2, \dots, N

p_{N} (s_{N}) := p_{N} (s_{N}; X_{N})

p_{N} (s_{N}) := p_{N} (s_{N}; X_{N})

S_{N} := {s_{N}^{(1)} if D \leq p_{N}^{(1)} s_{N}^{(i)} if \sum_{j = 1}^{i - 1} p_{N}^{(j)} < D \leq \sum_{j = 1}^{i} p_{N}^{(j)} for i = 2, 3, \dots, 2^{N},

S_{N} := {s_{N}^{(1)} if D \leq p_{N}^{(1)} s_{N}^{(i)} if \sum_{j = 1}^{i - 1} p_{N}^{(j)} < D \leq \sum_{j = 1}^{i} p_{N}^{(j)} for i = 2, 3, \dots, 2^{N},

E_{d} g (S_{N}, Y_{N}, X_{N})

E_{d} g (S_{N}, Y_{N}, X_{N})

s_{N} \in {0, 1}^{N} \sum g (s_{N}, Y_{N}, X_{N}) P_{d} {S_{N} = s_{N}},

s_{N} \in {0, 1}^{N} \sum g (s_{N}, Y_{N}, X_{N}) P_{d} {S_{N} = s_{N}},

h \in B L_{1} (l^{\infty} (F)) sup ∣ E_{d} h (H_{N}^{'}) - E h (H^{'}) ∣ \to P * (a s *) 0

h \in B L_{1} (l^{\infty} (F)) sup ∣ E_{d} h (H_{N}^{'}) - E h (H^{'}) ∣ \to P * (a s *) 0

H_{N}^{'} ⇝ H^{'} in l^{\infty} (F) .

H_{N}^{'} ⇝ H^{'} in l^{\infty} (F) .

h \in B L_{1} (l^{\infty} (F)) sup ∣ E h (H_{N}^{'}) - E h (H^{'}) ∣ \to 0

h \in B L_{1} (l^{\infty} (F)) sup ∣ E h (H_{N}^{'}) - E h (H^{'}) ∣ \to 0

N \to \infty lim inf P_{*} {H_{N}^{'} \in K^{δ}} \geq 1 - η for every δ > 0,

N \to \infty lim inf P_{*} {H_{N}^{'} \in K^{δ}} \geq 1 - η for every δ > 0,

P_{*} {H_{N}^{'} \in K^{δ}} + A_{N} \geq 1 - η for every N = 1, 2, \dots .

P_{*} {H_{N}^{'} \in K^{δ}} + A_{N} \geq 1 - η for every N = 1, 2, \dots .

P_{d} {H_{N}^{'} \in K^{δ}} + A_{N} \geq 1 - η for every N = 1, 2, \dots,

P_{d} {H_{N}^{'} \in K^{δ}} + A_{N} \geq 1 - η for every N = 1, 2, \dots,

P_{d} {∣ H_{N}^{'} f ∣ \leq M} + B_{N} \geq 1 - η for every N = 1, 2, \dots

P_{d} {∣ H_{N}^{'} f ∣ \leq M} + B_{N} \geq 1 - η for every N = 1, 2, \dots

P_{d} {f, g \in F : ρ (f, g) < δ sup ∣ H_{N}^{'} f - H_{N}^{'} g ∣ > ϵ} < η + C_{N} for every N = 1, 2, \dots

P_{d} {f, g \in F : ρ (f, g) < δ sup ∣ H_{N}^{'} f - H_{N}^{'} g ∣ > ϵ} < η + C_{N} for every N = 1, 2, \dots

P_{d} {1 \leq i \leq k sup f, g \in F_{i} sup ∣ H_{N}^{'} f - H_{N}^{'} g ∣ > ϵ} < η + D_{N} for every N = 1, 2, \dots

P_{d} {1 \leq i \leq k sup f, g \in F_{i} sup ∣ H_{N}^{'} f - H_{N}^{'} g ∣ > ϵ} < η + D_{N} for every N = 1, 2, \dots

P_{d} {∣ H_{N}^{'} f ∣ \leq M} + A_{N} \geq 1 - η for every N = 1, 2, \dots

P_{d} {∣ H_{N}^{'} f ∣ \leq M} + A_{N} \geq 1 - η for every N = 1, 2, \dots

ρ (f, g) := m = 1 \sum \infty 2^{- m} [ρ_{m} (f, g) \land 1], f, g \in F .

ρ (f, g) := m = 1 \sum \infty 2^{- m} [ρ_{m} (f, g) \land 1], f, g \in F .

f, g \in F : ρ (f, g) < δ sup ∣ H_{N}^{'} f - H_{N}^{'} g ∣ \leq ϵ /3

f, g \in F : ρ (f, g) < δ sup ∣ H_{N}^{'} f - H_{N}^{'} g ∣ \leq ϵ /3

f, g \in F : ρ (f, g) < δ sup ∣ H_{N}^{'} f - H_{N}^{'} g ∣ \leq ϵ

f, g \in F : ρ (f, g) < δ sup ∣ H_{N}^{'} f - H_{N}^{'} g ∣ \leq ϵ

P_{d} {∥ H_{N}^{'} ∥_{F} \leq M} + E_{N} \geq 1 - η for every N = 1, 2, \dots

P_{d} {∥ H_{N}^{'} ∥_{F} \leq M} + E_{N} \geq 1 - η for every N = 1, 2, \dots

∣ H_{N}^{'} f_{i} ∣ \leq M - ϵ, i = 1, 2, \dots, k, and 1 \leq i \leq k sup f, g \in F_{i} sup ∣ H_{N}^{'} f - H_{N}^{'} g ∣ \leq ϵ

∣ H_{N}^{'} f_{i} ∣ \leq M - ϵ, i = 1, 2, \dots, k, and 1 \leq i \leq k sup f, g \in F_{i} sup ∣ H_{N}^{'} f - H_{N}^{'} g ∣ \leq ϵ

P_{d} {∣ H_{N}^{'} f_{i} ∣ \leq M - ϵ} + B_{N} \geq 1 - η / (k + 1), i = 1, 2, \dots, k,

P_{d} {∣ H_{N}^{'} f_{i} ∣ \leq M - ϵ} + B_{N} \geq 1 - η / (k + 1), i = 1, 2, \dots, k,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPoint processes and geometric inequalities · Probability and Risk Models · Statistical Distribution Estimation and Applications

Full text

Functional Central Limit Theorems for Conditional Poisson sampling Designs††thanks: This work was supported by the grant 2016-ATE-0459 and the grant 2017-ATE-0402 from Università degli Studi di Milano-Bicocca.

Leo Pasquazzi111Dipartimento di Statistica e Metodi Quantitativi, Università degli Studi di Milano-Bicocca, Edificio U7, Via Bicocca degli Arcimboldi 8, 20126 – Milano

Abstract

This paper provides refined versions of some known functional central limit theorems for conditional Poisson sampling which are more suitable for applications. The theorems presented in this paper are generalizations of some results that have been recently published by Bertail, Chautru, and Clémençon [1]. The asymptotic equicontinuity part of the proofs presented in this paper is based on the same idea as in [1] but some of the missing details are provided. On the way to the functional central limit theorems, this paper provides a detailed discussion of what must be done in order to prove conditional and unconditional weak convergence in bounded function spaces in the context of survey sampling. The results from this discussion can be useful to prove further weak convergence results.

Keywords: weak convergence, empirical process, conditional Poisson sampling, uniform entropy condition

Mathematics Subject Classification (2010): 62A05, 60F05,60F17

1 Introduction

Bertail, Chautru, and Clémençon [1] have recently published a paper where they proposed some FLCTs for Poisson sampling designs as well as for conditional Poisson sampling designs (henceforth CPS designs or rejective sampling designs). The author of the present paper has already published a draft manuscript which provides quite substantial generalizations of the results for the Poisson sampling case (see [11]). In fact, [1] considers only empirical processes indexed by function classes which satisfy the uniform entropy condition, while [11] extends these results to arbitrary Donsker classes with uniformly bounded means. The proofs of the more general results given in [11] are based on the symmetrization technique and they differ substantially from those given in [1] which use the Hoeffding inequality (see [7]). Unfortunately, the symmetrization trick cannot be applied in the conditional Poisson sampling case which prevents to generalize the weak convergence results for the conditional Poisson sampling case given in [1] by using the symmetrization technique as in [11]. Anyway, the results given in [1] are somewhat unsatisfactory because the assumptions about the sequence of conditional Poisson sampling designs are unnecessarily restrictive. In fact, perhaps in order to simplify the proofs, in [1] it is assumed that the first order sample inclusion probabilities of the underlying (approximately canonical) Poisson sampling designs are realizations of i.i.d. random variables which are bounded away from zero. As a consequence, the assumptions of the theorems presented in [1] imply that the sequence of sample sizes of the rejective sampling designs must be random and moreover the theorems cannot be applied to cases where there is dependence among the first order sample inclusion probabilities, or to cases where the sample inclusion probabilities are proportional to some size variable which can take on values arbitrarily close to zero. The results given in the present paper overcome these shortcomings.

This work is organized as follows. Section 2 introduces the probabilistic framework within which the FCLTs will be derived. The probabilistic model and all other definitions given in Section 2 are identical to those given in Section 2 of [11]. Section 3 provides some general definitions and theorems which are very useful for showing conditional weak convergence results in the context of survey sampling. The definitions and theorems provided in this section are conditional analogues of the definitions and theorems given in Chapter 1.5 on pages 34 - 41 in [17]. They are completely general and can be used to prove other weak convergence results in the context of survey sampling as well. Section 4 reviews the relevant conditional Poisson sampling theory which is due to Hájek (see [6]). In Section 5 the FCLTs for the conditional Poisson sampling case will be derived. Section 6 provides extensions for the Hájek empirical process and Section 7 concludes this work with a simulation study.

2 Notation and Definitions

Let $Y_{1}$ , $Y_{2}$ , …, $Y_{N}$ denote the values taken on by a study variable $Y$ on the $N$ units of a finite population and let $X_{1}$ , $X_{2}$ , …, $X_{N}$ denote corresponding values of an auxiliary variable $X$ . In this paper it will be assumed that the $N$ ordered pairs $(Y_{i},X_{i})$ corresponding to a given finite population of interest are the first $N$ realizations of an infinite sequence of i.i.d. random variables which take on values in the cartesian product of two measurable spaces which will be denoted by $(\mathcal{Y},\mathcal{A})$ and $(\mathcal{X},\mathcal{B})$ , respectively. Moreover, as usual in finite population sampling theory, it will be assumed that the values taken on by the auxiliary variable $X$ are known in advance for all the $N$ population units, while the values taken on by the study variable $Y$ are only known for the population units that have been selected into a random sample. The corresponding vector of sample inclusion indicator functions will be denoted by $\mathbf{S}_{N}:=(S_{1,N},S_{2,N},\dots,S_{N,N})$ and it will be assumed that the vectors $\mathbf{S}_{N}$ and $\mathbf{Y}_{N}:=(Y_{1},Y_{2},\dots,Y_{N})$ are conditionally independent given $\mathbf{X}_{N}:=(X_{1},X_{2},\dots,X_{N})$ . With reference to the sample design, probability and expectation will be denoted by $P_{d}$ e $E_{d}$ , respectively. With this notation, the vector of first order sample inclusion probabilities will be given by

[TABLE]

and from the conditional independence assumption it follows that $\underline{\mathbf{\pi}}_{N}$ must be a deterministic function of $\mathbf{X}_{N}$ .

Now, with reference to the measurable space $(\mathcal{Y},\mathcal{A})$ , consider the random empirical measure given by

[TABLE]

For a given $f:\mathcal{Y}\mapsto\mathbb{R}$ , the integral of $f$ with respect to $\mathbb{G}_{N}^{\prime}$ can be written as

[TABLE]

so that, for any given class $\mathcal{F}$ of functions $f:\mathcal{Y}\mapsto\mathbb{R}$ , the random empirical measure $\mathbb{G}_{N}^{\prime}$ , as a real-valued function of $f\in\mathcal{F}$ , can be interpreted as a stochastic process indexed by the set $\mathcal{F}$ . For obvious reasons $\mathbb{G}_{N}^{\prime}$ will be called Horvitz-Thompson empirical process (henceforth HTEP). Depending on the values taken on by the study variable $Y$ and on the class of functions $\mathcal{F}$ , a sample path of $\mathbb{G}_{N}^{\prime}$ could be either bounded or not. In the former case it will be an element of $l^{\infty}(\mathcal{F})$ , the space of all bounded and real-valued functions with domain given by the class of functions $\mathcal{F}$ . In what follows $l^{\infty}(\mathcal{F})$ will be considered as a metric space with distance function induced by the norm $\lVert z\rVert_{\mathcal{F}}:=\sup_{f\in\mathcal{F}}|z(f)|$ .

As already mentioned in the introduction, the present paper provides FCLTs for conditional Poisson sampling designs (or rejective sampling designs). To be precise, the present paper investigates conditions under which

[TABLE]

where $\mathbb{G}^{\prime}$ is a Borel measurable and tight (in $l^{\infty}(\mathcal{F})$ ) Gaussian process. Both unconditional and conditional (on the realized values of $X$ and $Y$ ) weak convergence will be considered. Recall that unconditional weak convergence is defined as

[TABLE]

where $C_{b}(l^{\infty}(\mathcal{F}))$ is the class of all real-valued and bounded functions on $l^{\infty}(\mathcal{F})$ . If the realizations of $\mathbb{G}^{\prime}$ lie in a separable subset of $l^{\infty}(\mathcal{F})$ almost surely, this is equivalent to

[TABLE]

where $BL_{1}(l^{\infty}(\mathcal{F}))$ is the set of all functions $h:l^{\infty}(\mathcal{F})\mapsto[0,1]$ such that $|h(z_{1}-h(z_{2})|\leq\lVert z_{1}-z_{2}\rVert_{\mathcal{F}}$ for every $z_{1},z_{2}\in l^{\infty}(\mathcal{F})$ (see Chapter 1.12 in [17]). Based on this observation, [17] provides two definitions of conditional weak convergence: conditional weak convergence in outer probability (henceforth opCWC), which in the context of this paper translates to the condition

[TABLE]

(see page 181 in [17]), and outer almost sure conditional weak convergence (henceforth oasCWC), which in the context of this paper translates to the condition

[TABLE]

As expected, oasCWC implies opCWC (see Lemma 1.9.2 on page 53 in [17]). However, it seems that oasCWC is not strong enough to imply asymptotic measurability (cfr. Theorem 2.9.6 on page 182 in [17] and the comments thereafter) which is a necessary condition for unconditional weak convergence (see Lemma 1.3.8 on page 21 in [17]).

Since the very definition of weak convergence relies on the concept of outer expectation, some assumptions about the underlying probability space will be necessary for what follows. Throughout this paper it will be assumed that the latter is a product space of the form

[TABLE]

and that the elements of the random sequence $\{(Y_{i},X_{i})\}_{i=1}^{\infty}$ are the coordinate projections on the first infinite coordinates of the sample points $\omega\in\Omega_{y,x}^{\infty}\times\Omega_{d}$ . On the other hand, the sample inclusion indicators $S_{i,N}$ are allowed to depend on all the coordinates. As suggested by the notation, it will be assumed that for each value of $N$ the corresponding sample inclusion indicator functions $S_{1,N}$ , $S_{2,N}$ , …, $S_{N,N}$ are the elements of one row of a triangular array of random variables. This assumption is needed in order to make sure that for each value of $N$ the sample design can be readapted according to all the $N$ (known) values taken on by the auxiliary variable $X$ as the population size increases. To make sure that the conditional independence assumption holds, it will be assumed that for each value of $N$ the corresponding vector $\mathbf{S}_{N}$ is defined as a function of the random vector $\mathbf{X}_{N}$ and of random variables $D_{1}$ , $D_{2}$ , …which are functions of the last coordinate of the sample points $\omega\in\Omega_{y,x}^{\infty}\times\Omega_{d}$ only, i.e. of the coordinate that takes on values in the set $\Omega_{d}$ (instead of a random sequence $\{D_{i}\}_{i=1}^{\infty}$ one could also consider a stochastic process $\{D_{t}:t\in T\}$ with an arbitrary index set $T$ but this will not be of interest in the present paper). For example, in the case of a Poisson sampling design with a given vector of first order sample inclusion probabilities $\underline{\mathbf{\pi}}_{N}$ (which could be a function of $\mathbf{X}_{N}$ ) one could define $\{D_{i}\}_{i=1}^{\infty}$ as a sequence of i.i.d. uniform- $[0,1]$ random variables and define for each value of $N$ the corresponding row of sample inclusion indicators by

[TABLE]

Of course, the above probability space does not only work for Poisson sampling designs, but it can accommodate any non-informative sampling design. In fact, it is not difficult to show that for any non-informative sampling design the vector of sample inclusion indicators $\mathbf{S}_{N}$ can be defined as a function of $\mathbf{X}_{N}$ and of a single uniform- $[0,1]$ random variable $D$ that depends on the last coordinate of the sample points $\omega\in\Omega_{y,x}^{\infty}\times\Omega_{d}$ only. To this aim let

[TABLE]

denote the probability to select a given sample $\mathbf{s}_{N}\in\{0,1\}^{N}$ . Note that the definition of the function $\mathfrak{p}_{N}$ specifies a desired sampling design. Since the values taken on by the auxiliary variable $X$ are assumed to be already known before the sample is drawn, the sample selection probabilities $\mathfrak{p}_{N}(\mathbf{s}_{N})$ are allowed to depend on $\mathbf{X}_{N}$ . Now, let $\mathbf{s}_{N}^{(1)}$ , $\mathbf{s}_{N}^{(2)}$ , …, $\mathbf{s}_{N}^{(2^{N})}$ denote the $2^{N}$ elements of $\{0,1\}^{N}$ arranged in some fixed order (for example, according to the order determined by the binary expansion corresponding to the finite sequence of zeros and ones in $\mathbf{s}_{N}$ ), and put $\mathfrak{p}_{N}^{(i)}:=\mathfrak{p}_{N}(\mathbf{s}_{N}^{(i)})$ , $i=1,2,\dots,2^{N}$ . Then, define the vector of sample inclusion indicators $\mathbf{S}_{N}$ by

[TABLE]

and note that for every $\mathbf{s}_{N}\in\{0,1\}^{N}$ this vector satisfies $P_{d}\{\mathbf{S}_{N}=\mathbf{s}_{N}\}=\mathfrak{p}_{N}(\mathbf{s}_{N})$ as desired. This concludes the proof of the above assertion written in italics.

Next, observe that in the above construction the sample selection probabilities $P_{d}\{\mathbf{S}_{N}=\mathbf{s}_{N}\}$ are functions of $\mathbf{X}_{N}$ . If for a given $\mathbf{s}_{N}\in\{0,1\}^{N}$ the corresponding sample selection probability $P_{d}\{\mathbf{S}_{N}=\mathbf{s}_{N}\}$ is a measurable function of $\mathbf{X}_{N}\in\mathcal{X}^{N}$ (this depends on the sampling design), then, with reference to the probability space of this paper, $P_{d}\{\mathbf{S}_{N}=\mathbf{s}_{N}\}$ can be interpreted as a conditional probability in the proper sense. Otherwise, $P_{d}\{\mathbf{S}_{N}=\mathbf{s}_{N}\}$ will just be a non measurable (random) function of $\mathbf{X}_{N}$ . More generally, the expectation with respect to the uniform random variable $D$ with $\mathbf{Y}_{N}$ and $\mathbf{X}_{N}$ kept fixed, which can be interpreted as design expectation and will therefore be denoted by $E_{d}$ , can be applied to any function $g$ of $\mathbf{S}_{N}$ , $\mathbf{Y}_{N}$ and $\mathbf{X}_{N}$ . In fact, the expectation

[TABLE]

is given by

[TABLE]

and $E_{d}g(\mathbf{S}_{N},\mathbf{Y}_{N},\mathbf{X}_{N})$ is thus a function of $\mathbf{Y}_{N}$ and $\mathbf{X}_{N}$ . If for every fixed $\mathbf{s}_{N}\in\{0,1\}^{N}$ the corresponding function $g(\mathbf{s}_{N},\cdot)$ is a measurable function of $\mathbf{Y}_{N}$ and $\mathbf{X}_{N}$ and the function $P_{d}\{\mathbf{S}_{N}=\mathbf{s}_{N}\}$ is a measurable function of $\mathbf{X}_{N}$ , then, with respect to the probability space of this paper, $E_{d}g(\mathbf{S}_{N},\mathbf{Y}_{N},\mathbf{X}_{N})$ can be interpreted as a conditional expectation in the proper sense (and in this case it will obviously be a measurable function of $\mathbf{Y}_{N}$ and $\mathbf{X}_{N}$ ), while otherwise it could either be a measurable or a non measurable function of $\mathbf{Y}_{N}$ and $\mathbf{X}_{N}$ .

Throughout this paper it will be assumed that all the vectors of sample inclusion indicators $\mathbf{S}_{N}$ are defined as described in the above construction (the one which involves a single uniform- $[0,1]$ random variable $D$ ). Of course, in this way the random vectors $\mathbf{S}_{N}$ will be dependent for different values of $N$ , but for the purposes of this paper this dependence structure is irrelevant. Moreover, in what follows only measurable sample designs will be considered, i.e. sample designs such that for every fixed $\mathbf{s}_{N}\in\{0,1\}^{N}$ the corresponding sample selection probability in (3) is a measurable function of $\mathbf{X}_{N}$ . Note that this is a very mild restriction that should be satisfied in virtually every practical setting. However, it entails three important consequences which will be relevant for the proofs presented in this paper. They are: (i) the vectors of sample inclusion indicators $\mathbf{S}_{N}$ are measurable functions of $\mathbf{X}_{N}$ and of the uniform- $[0,1]$ random variable $D$ , (ii) for every $\mathbf{s}_{N}\in\{0,1\}^{N}$ the corresponding probability $P_{d}\{\mathbf{S}_{N}=\mathbf{s}_{N}\}$ is a conditional probability in the proper sense, and (iii) for $g$ a measurable function of $\mathbf{S}_{N}$ , $\mathbf{Y}_{N}$ and $\mathbf{X}_{N}$ the corresponding expectation $E_{d}g(\mathbf{S}_{N},\mathbf{Y}_{N},\mathbf{X}_{N})$ is a conditional expectation in the proper sense.

3 Weak convergence in bounded function spaces in the context of survey sampling

This section contains a detailed discussion of general methods for proving unconditional and conditional weak convergence in bounded function spaces in the context of survey sampling. Throughout this section it will be assumed that $\mathcal{F}$ is an arbitrary set and that $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ is a sequence of mappings from the probability space (2) into $l^{\infty}(\mathcal{F})$ where each $\mathbb{H}_{N}^{\prime}$ depends on the sample points $\omega\in\Omega_{y,x}^{\infty}\times\Omega_{d}$ only through $\mathbf{Y}_{N}$ , $\mathbf{X}_{N}$ and $\mathbf{S}_{N}$ . Moreover, it will be assumed that for every $N=1,2,\dots$ and for every $f\in\mathcal{F}$ the corresponding coordinate projection $\mathbb{H}_{N}^{\prime}f$ is measurable. The scope of this section is to provide necessary and sufficient conditions for opCWC (oasCWC) in $l^{\infty}(\mathcal{F})$ , i.e. for

[TABLE]

with $\mathbb{H}^{\prime}$ a Borel measurable and tight mapping from some probability space into $l^{\infty}(\mathcal{F})$ . In order to avoid repetitions, in what follows the symbols $\overset{P(as)}{\rightarrow}$ and $\overset{P*(as*)}{\rightarrow}$ will be used in order to express two versions of a convergence condition. This notation will often appear in the assumptions and in the conclusions of lemmas, theorems and corollaries. In these cases it is understood that the "probability convergence versions" of the assumptions imply the "probability convergence versions" of the conclusions and that the "almost sure convergence versions" of the assumptions imply the "almost sure convergence versions" of the conclusions.

What needs to be done in order to prove opCWC (oasCWC) seems to be clear from the general (unconditional) weak convergence theory laid out in [17]. In fact, according to Theorem 1.5.4 on page 35 in [17], if the sequence $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ is asymptotically tight and all the finite dimensional marginals $(\mathbb{H}_{N}^{\prime}f_{1},\mathbb{H}_{N}^{\prime}f_{2},\dots,\mathbb{H}_{N}^{\prime}f_{r})$ converge weakly (in $\mathbb{R}^{r}$ ) to the corresponding marginals of some stochastic process $\{\mathbb{H}^{\prime}f:f\in\mathcal{F}\}$ , then there exists a version of $\mathbb{H}^{\prime}$ which is a Borel measurable and tight mapping from some probability space into $l^{\infty}(\mathcal{F})$ such that

[TABLE]

Since the realizations of $\mathbb{H}^{\prime}$ lie in a separable subset of $l^{\infty}(\mathcal{F})$ almost surely (this follows from tightness), it follows that condition (5) is equivalent to

[TABLE]

(see the comments at the top of page 73 in [17]). On the other hand, Theorem 1.5.4 on page 35 in [17] says also that if $\mathbb{H}^{\prime}$ is a Borel measurable and tight mapping from some probability space into $l^{\infty}(\mathcal{F})$ and if condition (5) or equivalently condition (6) holds, then it must be necessarily true that the sequence $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ is asymptotically tight and that its finite-dimensional marginals converge weakly to the corresponding marginals of $\mathbb{H}^{\prime}$ .

Since the only difference between condition (6) and condition (4) is the fact that the unconditional expectation $Eh(\mathbb{H}_{N}^{\prime})$ is replaced by the sample design expectation $E_{d}h(\mathbb{H}_{N}^{\prime})$ (which is not necessarily a conditional expectation in the proper sense), one would expect that opCWC (oasCWC) is equivalent to the joint occurrence of some form of conditional asymptotic tightness and of some form of conditional weak convergence of the finite-dimensional marginals. In this section it will be shown that this is indeed true. The first step towards this goal is to provide a clear definition of what "conditional asymptotic tightness" and "conditional weak convergence of the finite-dimensional marginals" mean. To this aim recall that according to Definition 1.3.7 on pages 20-21 in [17] the sequence $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ is asymptotically tight in the usual unconditional sense if for every $\eta>0$ there exists a compact set $K\subset l^{\infty}(\mathcal{F})$ such that

[TABLE]

where $K^{\delta}:=\{z\in l^{\infty}(\mathcal{F}):\lVert z-z^{\prime}\rVert_{\mathcal{F}}<\delta\text{ for some }z^{\prime}\in K\}$ . Of course this condition is satisfied if and only if for every $\eta>0$ there exists a compact set $K\subset l^{\infty}(\mathcal{F})$ such that for every $\delta>0$ there exists a sequence of real numbers $A_{N}\rightarrow 0$ for which

[TABLE]

Based on this observation we can define two versions of conditional asymptotic tightness (henceforth CAT): a probability version by requiring that for every $\eta>0$ there exists a compact set $K\subset l^{\infty}(\mathcal{F})$ such that for every $\delta>0$ there exists a sequence of random variables $\widetilde{A}_{N}\overset{P}{\rightarrow}0$ for which

[TABLE]

and an almost sure version of CAT by requiring $\widetilde{A}_{N}\overset{as}{\rightarrow}0$ instead of $\widetilde{A}_{N}\overset{P}{\rightarrow}0$ .

The following theorem provides two characterizations of CAT which are analogous to the characterizations of (unconditional) asymptotic tightness given in Theorem 1.5.6 and Theorem 1.5.7 on pages 36-37 in [17].

Theorem 1.

The following three conditions are equivalent:

(i)

the sequence $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ is CAT;

(ii)

the marginals of the sequence $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ are CAT in the sense that for every $f\in\mathcal{F}$ and $\eta>0$ there exists a constant $M>0$ such that

[TABLE]

for some sequence of random variables $\widetilde{B}_{N}$ which goes to zero in probability (almost surely), and there exists a semimetric $\rho$ for which $\mathcal{F}$ is totally bounded and for which the sequence $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ is conditionally asymptotically $\rho$ -equicontinuous (henceforth conditionally AEC w.r.t. $\rho$ ) in the sense that for every $\epsilon,\eta>0$ there exists a $\delta>0$ such that

[TABLE]

for some sequence of random variables $\widetilde{C}_{N}$ which goes to zero in probability (almost surely);

(iii)

the marginals of the sequence $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ are CAT and the following conditional probability version of finite approximation holds: for every $\epsilon,\eta>0$ there exists a finite partition $\mathcal{F}=\cup_{i=1}^{k}\mathcal{F}_{i}$ such that

[TABLE]

for some sequence of random variables $\widetilde{D}_{N}$ which goes to zero in probability (almost surely).

Proof.

(i) $\Rightarrow$ (ii). If condition (7) holds, then it follows that

[TABLE]

with $M:=\sup\{|z(f)|:z\in K^{\delta}\}<\infty$ and with the same sequence $\{\widetilde{A}_{N}\}_{N=1}^{\infty}$ as in (7). This shows that the marginals of the sequence $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ are CAT. Next, consider a sequence $\{K_{m}\}_{m=1}^{\infty}$ of compact subsets of $l^{\infty}(\mathcal{F})$ such that, for every fixed $m=1,2,\dots$ , condition (7) holds with $\eta=1/m$ and $K=K_{m}$ . Note that the sequence $\widetilde{A}_{N}$ in (7) depends on $\eta$ , $K$ and $\delta>0$ and hence we shall write $\widetilde{A}_{N}(\eta,K,\delta)$ instead of $\widetilde{A}_{N}$ . By part (b) of Theorem 6.2 on page 88 in [9], to every $K_{m}$ there corresponds a semimetric $\rho_{m}$ which makes $\mathcal{F}$ totally bounded and for which $K_{m}$ is a subset of $UC(\mathcal{F},\rho_{m})$ , i.e. of the class of all real-valued functions on $\mathcal{F}$ which are uniformly $\rho_{m}$ -continuous. Based on the sequence of semimetrics $\{\rho_{m}\}_{m=1}^{\infty}$ , define a new semimetric by

[TABLE]

Then it follows that $K_{m}\subset UC(\mathcal{F},\rho)$ for every $m=1,2,\dots$ and moreover it is not difficult to show that $\mathcal{F}$ is totally bounded w.r.t. $\rho$ too. Now, choose $\epsilon>0$ arbitrarily and note that $\mathbb{H}_{N}^{\prime}\in K_{m}$ implies

[TABLE]

for some small enough $\delta>0$ , and hence that $\mathbb{H}_{N}^{\prime}\in K_{m}^{\epsilon/3}$ implies

[TABLE]

for the same set of values of $\delta$ . For small enough values of $\delta>0$ it follows therefore that condition (9) is satisfied with $\eta=1/m$ and $\widetilde{C}_{N}:=\widetilde{A}_{N}(1/m,K_{m},\epsilon/3)$ .

(ii) $\Rightarrow$ (iii). Fix $\delta>0$ and choose a finite collection of open balls of $\rho$ -radius $\delta$ which covers $\mathcal{F}$ , disjointify and create a finite partition $\mathcal{F}=\cup_{i=1}^{k}\mathcal{F}_{i}$ such that each $\mathcal{F}_{i}$ is a subset of an open ball of $\rho$ -radius $\delta$ . Then (9) implies (10) with the sequence $\{\widetilde{D}_{N}\}_{N=1}^{\infty}$ equal to the sequence $\{\widetilde{C}_{N}\}_{N=1}^{\infty}$ from the definition of conditional AEC corresponding to $\epsilon$ , $\eta$ and $\delta$ .

(iii) $\Rightarrow$ (i). This part of the proof is essentially the same as the proof of Theorem 1.5.6 on page 36 in [17]. First, it will be shown that (iii) implies that $\lVert\mathbb{H}_{N}^{\prime}\rVert_{\mathcal{F}}$ is CAT, i.e. that for every $\eta>0$ there exists a constant $M$ such that

[TABLE]

for some sequence of random variables $\widetilde{E}_{N}$ which goes to zero in probability (almost surely). To this aim, choose $\epsilon,\eta>0$ arbitrarily and let $\mathcal{F}=\cup_{i=1}^{k}\mathcal{F}_{i}$ be a corresponding partition for which (10) holds. Then choose one index $f_{i}$ from each partition set $\mathcal{F}_{i}$ and note that

[TABLE]

implies $\lVert\mathbb{H}_{N}^{\prime}\rVert_{\mathcal{F}}\leq M$ . Now, from the assumptions in (iii) it follows that there exists sequences of random variables $\{\widetilde{B}_{N}\}_{N=1}^{\infty}$ and $\{\widetilde{D}_{N}\}_{N=1}^{\infty}$ which go to zero in probability (almost surely) such that, for every $N=1,2,\dots$ ,

[TABLE]

and

[TABLE]

and from this it follows that condition (11) holds with $\widetilde{E}_{N}:=k\widetilde{B}_{N}+\widetilde{D}_{N}$ .

Next, consider an arbitrary $\eta>0$ and put $\epsilon=\epsilon_{m}$ for an arbitrary sequence $\epsilon_{m}\downarrow 0$ . Let $\{\mathcal{F}=\cup_{i=1}^{k_{m}}\mathcal{F}_{m,i}\}_{m=1}^{\infty}$ be a corresponding sequence of partitions such that

[TABLE]

for some sequences of non negative random variables $\widetilde{D}_{m,N}$ which go to zero in probability (almost surely) when $m=1,2,\dots$ is kept fixed and $N$ goes to infinity. The conditional probability version of finite approximation ensures that for every fixed $m=1,2,\dots$ there exist such a partition $\mathcal{F}=\cup_{i=1}^{k_{m}}\mathcal{F}_{m,i}$ and such a sequence of random variables $\{\widetilde{D}_{m,N}\}_{N=1}^{\infty}$ . Now, denote by $z_{m,1},z_{m,2},\dots,z_{m,p_{m}}\in l^{\infty}(\mathcal{F})$ the functions which are constant on each partition set $\mathcal{F}_{m,i}$ and which can take on only the values $\pm\epsilon_{m},\pm 2\epsilon_{m},\dots,\pm\lfloor M/\epsilon_{m}\rfloor$ . Note for each $m$ there are only finitely many such functions, i.e. that $p_{m}<\infty$ for every $m=1,2,\dots$ . Let $K_{m}$ denote the union of the $p_{m}$ closed balls of radius $\epsilon_{m}$ around each function $z_{m,i}$ . Then,

[TABLE]

implies that $\mathbb{H}_{N}^{\prime}\in K_{m}$ . Consider the set $K:=\cap_{m=1}^{\infty}K_{m}$ and note that $K$ is closed and totally bounded and hence compact (since $l^{\infty}(\mathcal{F})$ is complete). Moreover, it can be shown that for every $\delta>0$ there is a finite $m$ such that $\cap_{i=1}^{m}K_{i}\subset K^{\delta}$ (the proof of this claim is given in the proof of Theorem 1.5.6 on pages 36 and 37 in [17]). Using this fact along with condition (11) and condition (12) yields

[TABLE]

which completes the proof. ∎

Remark 1.

Consider the case where $\mathcal{F}$ is a class of measurable functions $f:\mathcal{Y}\mapsto\mathbb{R}$ and where $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ is a sequence of HTEPs as defined in (1). It is not difficult to show that in this case $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ is conditionally AEC w.r.t. to a given semimetric $\rho$ (see (ii) in the statement of the previous theorem) if and only if

[TABLE]

where

[TABLE]

The next theorem shows that CAT is a necessary condition for conditional weak convergence.

Theorem 2.

Let $\mathbb{H}^{\prime}$ be a Borel measurable and tight mapping from some probability space into $l^{\infty}(\mathcal{F})$ and assume that condition (4) holds. Then it follows that the sequence $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ satisfies the probability (almost sure) version of CAT.

Proof.

Let $B$ be an arbitrary Borel subset of $l^{\infty}(\mathcal{F})$ and let

[TABLE]

be the distance between $z\in l^{\infty}(\mathcal{F})$ and the set $B$ . Note that for $\delta>0$ the function

[TABLE]

has Lipschitz constant $\delta^{-1}$ and that $I(z\in B)\leq h_{B,\delta}(z)\leq I(z\in B^{\delta})$ , where $B^{\delta}$ is the open $\delta$ -enlargement of the set $B$ . Thus, $\delta h_{B,\delta}\in BL_{1}(l^{\infty}(\mathcal{F}))$ and therefore it follows that

[TABLE]

Since opCWC (oasCWC) implies that the supremum goes to zero in outer probability (outer almost surely), this shows that for every $\delta>0$ there exists a sequence of random variables $\widetilde{A}_{N}\overset{P(as)}{\rightarrow}0$ such that

[TABLE]

for every Borel subset $B$ of $l^{\infty}(\mathcal{F})$ and for every $N=1,2,\dots$ . Since $\mathbb{H}^{\prime}$ is tight by assumption, the conclusion of the theorem follows from this. ∎

Now, consider "conditional weak convergence of the finite-dimensional marginals". Perhaps the most obvious way to define this concept is to require pointwise convergence in probability (pointwise almost sure converence) of the sample design characteristic function for every sequence of finite-dimensional vectors $\{\mathbb{H}_{N}^{\prime}\mathbf{f}\}_{N=1}^{\infty}:=\{(\mathbb{H}_{N}^{\prime}f_{1},\mathbb{H}_{N}^{\prime}f_{2},\dots,\mathbb{H}_{N}^{\prime}f_{r})^{\intercal}\}_{N=1}^{\infty}$ with $\mathbf{f}:=(f_{1},f_{2},\dots,f_{r})^{\intercal}\in\mathcal{F}^{r}$ and $r=1,2,\dots$ . Since we are assuming that the components of the vectors $\mathbb{H}_{N}^{\prime}\mathbf{f}$ are measurable, conditional weak convergence of the finite-dimensional marginals (henceforth CWCM) can therefore be defined as

[TABLE]

for every $\mathbf{t}\in\mathbb{R}^{r}$ , $\mathbf{f}\in\mathcal{F}^{r}$ and for every $r=1,2,\dots$ , where $\mathbb{H}^{\prime}\mathbf{f}:=(\mathbb{H}^{\prime}f_{1},\mathbb{H}^{\prime}f_{2},\dots,\mathbb{H}^{\prime}f_{r})^{\intercal}$ is the finite dimensional vector of random variables corresponding to some $\mathcal{F}$ -indexed stochastic process $\mathbb{H}^{\prime}:=\{\mathbb{H}^{\prime}f:f\in\mathcal{F}\}$ .

It is not difficult to prove that CWCM is a necessary condition for opCWC (oasCWC). The next theorem says that CWCM is already equivalent to opCWC (oasCWC) if the index set $\mathcal{F}$ is finite. In its statement $z\upharpoonright\mathcal{G}$ indicates the restriction of some function $z\in l^{\infty}(\mathcal{F})$ to a subset $\mathcal{G}$ of $\mathcal{F}$ .

Theorem 3.

Let $\mathcal{G}$ be a finite subset of $\mathcal{F}$ . Under the assumptions made at the beginning of this section

[TABLE]

is measurable, and CWCM is equivalent to

[TABLE]

Proof.

The proof is the same as the proof of Corollary 3.1 in [11]. ∎

Up to now it has already been shown that CAT and CWCM are necessary conditions for opCWC (oasCWC) and that CWCM is equivalent to opCWC (oasCWC) when $\mathcal{F}$ is finite. The next theorem says that CAT and CWCM together are sufficient conditions for opCWC (oasCWC) regardless of the cardinality of $\mathcal{F}$ .

Theorem 4.

Assume that $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ satisfies CAT and CWCM for some stochastic process $\{\mathbb{H}^{\prime}f:f\in\mathcal{F}\}$ . Then there exists a version of the stochastic process $\mathbb{H}^{\prime}$ which is a Borel measurable and tight mapping from some probability space into $l^{\infty}(\mathcal{F})$ such that opCWC (oasCWC) as defined in (4) holds.

Proof.

Consider the stochastic process $\{\mathbb{H}^{\prime}f:f\in\mathcal{F}\}$ from CWCM. The first step of the proof is to show that CAT and CWCM imply that there exists a version of $\mathbb{H}^{\prime}$ with the following properties:

a)

$\mathbb{H}^{\prime}$ is a Borel measurable and tight mapping into $l^{\infty}(\mathcal{F})$ ;

b)

the sample paths $f\mapsto\mathbb{H}^{\prime}f$ are uniformly $\rho$ -continuous with probability $1$ .

To this aim note that by the characterization of CAT given in (ii) of Theorem 1 there must exists a semimetric $\rho$ for which $\mathcal{F}$ is totally bounded and for which $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ is conditionally AEC. Now, consider a countable and dense (w.r.t. $\rho$ ) subset $\mathcal{G}$ of $\mathcal{F}$ and note that the random variable

[TABLE]

is measurable. Conclude that

[TABLE]

is a conditional probability in the proper sense and use this fact along with (9) in order to show that for every $\epsilon,\eta>0$ there exists a $\delta>0$ such that

[TABLE]

i.e. that the sequence of mappings $\{\mathbb{H}_{N}^{\prime}\upharpoonright\mathcal{G}\}_{N=1}^{\infty}$ is asymptotically uniformly $\rho$ -equicontinuous in probability as defined on page 37 in [17]. By Theorem 1.5.7 on the same page in [17] it follows that $\{\mathbb{H}_{N}^{\prime}\upharpoonright\mathcal{G}\}_{N=1}^{\infty}$ is asymptotically tight, and since CWCM implies unconditional convergence of the marginal distributions of $\{\mathbb{H}_{N}^{\prime}\upharpoonright\mathcal{G}\}_{N=1}^{\infty}$ , it follows by Theorem 1.5.4 on page 35 in [17] that there exists a Borel measurable and tight mapping from some probability space into $l^{\infty}(\mathcal{G})$ , call it $\widetilde{\mathbb{H}}^{\prime}$ , such that $\mathbb{H}_{N}^{\prime}\rightsquigarrow\widetilde{\mathbb{H}}^{\prime}$ in $l^{\infty}(\mathcal{G})$ . Moreover, by Addendum 1.5.8 on page 37 in [17] it follows that the sample paths of $\widetilde{\mathbb{H}}^{\prime}$ are uniformly $\rho$ -continuous with probability $1$ . Let $c:l^{\infty}(\mathcal{G})\mapsto l^{\infty}(\mathcal{F})$ be the mapping which carries the uniformly $\rho$ -continuous functions in $l^{\infty}(\mathcal{G})$ to their uniformly $\rho$ -continuous extensions in $l^{\infty}(\mathcal{F})$ , and which transforms all other functions in $l^{\infty}(\mathcal{G})$ into the zero function in $l^{\infty}(\mathcal{F})$ . Then the mapping $c$ is certainly measurable and $\widehat{\mathbb{H}}^{\prime}:=c(\widetilde{\mathbb{H}}^{\prime})$ is is a continuous function of $\widetilde{\mathbb{H}}^{\prime}$ with probability $1$ . It follows that $\widehat{\mathbb{H}}^{\prime}$ is a Borel measurable and tight mapping into $l^{\infty}(\mathcal{F})$ whose sample paths are uniformly $\rho$ -continuous with probability $1$ . In order to prove a) and b) it suffices now to show that the finite dimensional distributions of $\widehat{\mathbb{H}}^{\prime}$ are the same as those of the limit process $\{\mathbb{H}^{\prime}f:f\in\mathcal{F}\}$ from CWCM, i.e. that every finite dimensional vector $\mathbf{f}:=(f_{1},f_{2},\dots,f_{r})^{\intercal}\in\mathcal{F}^{r}$ satisfies the condition

[TABLE]

where $C_{b}(\mathbb{R}^{r})$ is the set of all continuous and bounded functions $g:\mathbb{R}^{r}\mapsto\mathbb{R}$ . If all the components of $\mathbf{f}$ are elements of $\mathcal{G}$ , then (15) follows directly from CWCM and the definition of $\widehat{\mathbb{H}}^{\prime}$ . Otherwise, if some or all of the components of $\mathbf{f}$ are elements of $\mathcal{F}$ but not of $\mathcal{G}$ , then there exists a sequence $\{\mathbf{f}_{\nu}\}_{\nu=1}^{\infty}:=\{(f_{\nu,1},f_{\nu,2},\dots,f_{\nu,r})^{\intercal}\}_{\nu=1}^{\infty}$ in $\mathcal{G}^{r}$ such that $\rho_{r}(\mathbf{f}_{\nu},\mathbf{f}):=\max_{1\leq i\leq r}\rho(f_{i},f_{\nu,i})\rightarrow 0$ . Since the sample paths of $\widehat{\mathbb{H}}^{\prime}$ are uniformly $\rho$ -continuous with probability $1$ , it follows that $\widehat{\mathbb{H}}^{\prime}\mathbf{f}_{\nu}\overset{as}{\rightarrow}\widehat{\mathbb{H}}^{\prime}\mathbf{f}$ and hence that $\widehat{\mathbb{H}}^{\prime}\mathbf{f}_{\nu}\rightsquigarrow\widehat{\mathbb{H}}^{\prime}\mathbf{f}$ which is the same as $\mathbb{H}^{\prime}\mathbf{f}_{\nu}\rightsquigarrow\widehat{\mathbb{H}}^{\prime}\mathbf{f}$ . Now, consider $\mathcal{G}^{\dagger}:=\mathcal{G}\cup\{f_{1},f_{2},\dots,f_{r}\}$ and define $\widehat{\mathbb{H}}^{\dagger}$ in terms of $\mathcal{G}^{\dagger}$ in the same way as $\widehat{\mathbb{H}}^{\prime}$ has been defined above in term of $\mathcal{G}$ . Then, as before, it follows by CWCM that (15) holds with $\widehat{\mathbb{H}}^{\dagger}\mathbf{f}$ in place of $\widehat{\mathbb{H}}^{\prime}\mathbf{f}$ , and since the sample paths of $\widehat{\mathbb{H}}^{\dagger}$ are uniformly $\rho$ -continuous with probability $1$ , it follows that $\widehat{\mathbb{H}}^{\dagger}\mathbf{f}_{\nu}\rightsquigarrow\widehat{\mathbb{H}}^{\dagger}\mathbf{f}$ . Since for every $\nu=1,2,\dots$ the three random vectors $\widehat{\mathbb{H}}^{\dagger}\mathbf{f}_{\nu}$ , $\widehat{\mathbb{H}}^{\prime}\mathbf{f}_{\nu}$ and $\mathbb{H}^{\prime}\mathbf{f}_{\nu}$ have all the same distribution and since the distributions of $\widehat{\mathbb{H}}^{\dagger}\mathbf{f}$ and $\mathbb{H}\mathbf{f}$ are the same as well, this implies

[TABLE]

and hence that condition (15) holds also when some or all of the components of $\mathbf{f}$ are in $\mathcal{F}$ but not in $\mathcal{G}$ . This shows that the marginal distributions of $\widehat{\mathbb{H}}^{\prime}$ are the same as those of the stochastic process $\{\mathbb{H}^{\prime}f:f\in\mathcal{F}\}$ and hence that there exists a version of the latter process which satisfies a) and b).

Next, it will be shown that if $\mathbb{H}^{\prime}$ is a version of $\{\mathbb{H}^{\prime}f:f\in\mathcal{F}\}$ which satisfies a) and b), then opCWC (oasCWC) as defined in (4) holds (this part of the proof is essentially the same as the proof of Theorem 2.9.6 on page 182 in [17]). To this aim define for each $\delta>0$ a corresponding set $\mathcal{G}_{\delta}$ which contains the centers of a collection of open balls of $\rho$ -radius $\delta$ which cover $\mathcal{F}$ . Since $\mathcal{F}$ is totally bounded w.r.t. $\rho$ , each $\mathcal{G}_{\delta}$ can be chosen to be finite. Then define for each fixed $\delta>0$ a mapping $\Pi_{\delta}:\mathcal{F}\mapsto\mathcal{G}_{\delta}$ which maps $f\in\mathcal{F}$ to the element $g\in\mathcal{G}_{\delta}$ which is closest to $f$ . If there are more than one $g\in\mathcal{G}_{\delta}$ which minimize $\rho(f,g)$ , $\Pi_{\delta}(f)$ can be defined to be any such $g$ . Since the sample paths of $\mathbb{H}^{\prime}$ are uniformly $\rho$ -continuous with probability $1$ , it follows that $\lVert\mathbb{H}^{\prime}\circ\Pi_{\delta}-\mathbb{H}^{\prime}\rVert_{\mathcal{F}}\overset{as}{\rightarrow}0$ if $\delta\rightarrow 0$ and hence that

[TABLE]

Next, it will be shown that

[TABLE]

To this aim, define for each $\delta>0$ the mapping $A_{\delta}:l^{\infty}(\mathcal{G}_{\delta})\mapsto l^{\infty}(\mathcal{F})$ by $A_{\delta}(z):=z\circ\Pi_{\delta}$ and note that $A_{\delta}$ transforms a function $z\in l^{\infty}(\mathcal{G}_{\delta})$ into a function $z^{\prime}\in l^{\infty}(\mathcal{F})$ by extending the domain from $\mathcal{G}_{\delta}$ to $\mathcal{F}$ : for $f\in\mathcal{G}_{\delta}$ the new function $z^{\prime}$ remains the same (in fact, $z^{\prime}(f):=z(\Pi_{\delta}(f))=z(f)$ ), and the new function $z^{\prime}$ is constant on each level set of $\Pi_{\delta}$ (since $\mathcal{G}_{\delta}$ is finite there is only a finite number of such level sets and the range of the new function $z^{\prime}$ must therefore be finite as well). Then, for $h:l^{\infty}(\mathcal{F})\mapsto\mathbb{R}$ and $H$ an arbitrary $\mathcal{F}$ -indexed stochastic process it follows that $h(H\circ\Pi_{\delta})=h\circ A_{\delta}(H\restriction\mathcal{G}_{\delta})$ . Moreover, if $h\in BL_{1}(l^{\infty}(\mathcal{F}))$ , then

[TABLE]

and the composition $h\circ A_{\delta}$ is therefore a member of $BL_{1}(l^{\infty}(\mathcal{G}_{\delta}))$ , i.e. of the set of all functions $g:l^{\infty}(\mathcal{G}_{\delta})\mapsto[0,1]$ such that $|g(z_{1})-g(z_{2})|\leq\lVert z_{1}-z_{2}\rVert_{\mathcal{G}_{\delta}}$ for every $z_{1},z_{2}\in l^{\infty}(\mathcal{G}_{\delta})$ . It follows that the supremum on the left side in (16) is bounded by

[TABLE]

which, by Theorem 3, is measurable and goes to zero in probability (almost surely). This proves (16).

Finally, in order to complete the proof it remains to show that for every $\epsilon>0$ there exists a $\delta>0$ such that

[TABLE]

for some sequence of random variables $\{\widetilde{C}_{N}\}_{N=1}^{\infty}$ which goes to zero in probability (almost surely). To this aim note that the left side in the last display is bounded by

[TABLE]

and that by (ii) of Theorem 1 there must exists a $\delta>0$ such that

[TABLE]

for some sequence of non negative random variables $\widetilde{C}_{N}\overset{P(as)}{\rightarrow}0$ . The proof of the theorem is now complete. ∎

Corollary 1.

Assume that $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ satisfies CWCM for some stochastic process $\{\mathbb{H}^{\prime}f:f\in\mathcal{F}\}$ , and assume that there exists a semimetric $\rho$ for which $\mathcal{F}$ is totally bounded and for which $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ is conditionally AEC. Then, it follows that

(i)

there exists a version of $\mathbb{H}^{\prime}$ which is a Borel measurable and tight mapping from some probability space into $l^{\infty}(\mathcal{F})$ such that opCWC (oasCWC) as defined in (4) holds;

(ii)

the sample paths of $\mathbb{H}^{\prime}$ are uniformly $\rho$ -continuous with probability $1$ .

Proof.

It is easy to see that CWCM implies that the marginals of the sequence $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ are CAT as required by (ii) in the statement of Theorem 1. Thus, it follows from Theorem 1 that $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ is CAT and the above proof of Theorem 4 yields the two conclusions of the corollary. ∎

Remark 2.

As already pointed out in Section 2, conditional weak convergence is apparently not strong enough to imply asymptotic measurability of $\{\mathbb{H}_{N}^{\prime}\}_{N=1}^{\infty}$ which is a necessary condition for unconditional weak convergence. However, if for every $\mathcal{G}\subset\mathcal{F}$ the corresponding supremum

[TABLE]

is measurable, then it will be certainly true that conditional weak convergence is stronger than unconditional weak convergence. In fact, if the suprema in the above display are measurable, then it follows that the probabilities on the left sides in (9) and in (10) are conditional probabilities in the proper sense and hence that CAT implies asymptotic tightness in the usual unconditional sense. Since conditional weak convergence implies CWCM and since CWCM is certainly stronger than unconditional convergence of the marginal distributions, it follows by Theorem 1.5.4 on page 35 in [17] that conditional weak convergence implies unconditional weak convergence in this case.

Theorem 5.

Let $\{\mathbb{H}_{N}\}_{N=1}^{\infty}$ be a sequence of mappings from the probability space (2) into $l^{\infty}(\mathcal{F})$ and assume that each $\mathbb{H}_{N}$ depends on the sample points $\omega\in\Omega_{y,x}^{\infty}\times\Omega_{d}$ only through $\mathbf{Y}_{N}$ and $\mathbf{X}_{N}$ . Moreover, assume that $\mathbb{H}_{N}\rightsquigarrow\mathbb{H}$ in $l^{\infty}(\mathcal{F})$ with $\mathbb{H}$ a Borel measurable and tight mapping into $l^{\infty}(\mathcal{F})$ , and assume that opCWC as defined in (4) holds. Then it follows that

[TABLE]

with $\mathbb{H}$ and $\mathbb{H}^{\prime}$ independent.

Proof.

The proof is the same as the proof of Corollary 3.2 in [11]. ∎

4 Conditional Poisson sampling (or rejective sampling)

This section reviews some theoretical results about rejective sampling. The basic theory for this sampling design was developed by [6]. Some of the results contained in his paper will be needed in the next section. The relevant ones will be singled out in what follows.

Recall that a rejective sampling design is a conditional Poisson sampling design where the final sample is rejected unless its size equals a given natural number $n$ . Equivalently, rejective sampling can also be defined as random sampling with replacement of a fixed number $n$ of units according to specified selection probabilities where the final sample is rejected and the sampling procedure is repeated until a sample of $n$ different units is obtained. Rejective sampling is of great interest to researchers and practitioners because it provides largest possible entropy subject to the constraints of a fixed sample size and given first order sample inclusion probabilities (see Theorem 3.4 in Hájek, 1981). By the definition of rejective sampling as conditional Poisson sampling it follows that the pdf of the vector of sample inclusion indicators is given by

[TABLE]

where $\mathbf{p}_{N}:=(p_{1,N},p_{2,N},\dots,p_{N,N})\in(0,1)^{N}$ is the vector of first order sample inclusion probabilities of the underlying Poisson sampling design, and where $\Omega_{N,n}:=\{\mathbf{s}_{N}\in\{0,1\}^{N}:\sum_{i=1}^{N}s_{i}=n\}$ is the set of all possible realizations of the vector of sample inclusion indicators $\mathbf{S}_{N}$ that give rise to samples of size $n$ . Of course the definition rejective sampling and hence of $\mathfrak{p}_{N}^{R}(\mathbf{s}_{N};\mathbf{p}_{N})$ can also be extended to the case where some or even all of the $p_{i,N}$ ’s are [math] or $1$ . However, if $p_{i,N}=0$ for some $i=1,2,\dots,N$ , then the corresponding population unit $i$ will be excluded from the sample with probability $1$ and unbiased estimation of population characteristics is hence be impossible. Since the Horvitz-Thompson estimator is not well-defined in this case, we shall henceforth consider only rejective sampling designs such that $p_{i,N}>0$ for every $i=1,2,\dots,N$ . On the other hand, if $p_{i,N}=1$ for some $i=1,2,\dots,N$ , then the corresponding population unit $i$ will be included in the sample with probability $1$ and rejective sampling with sample size $n$ from a population of size $N$ will be equivalent to rejective sampling with sample size $n-1$ from a population of size $N-1$ . In this case the pdf of the vector of sample inclusion indicators will still be defined as in (17) provided that $n$ is at least as large as the number of population units for which $p_{i,N}=1$ . For smaller values of $n$ a corresponding rejective sampling design does obviously not exist.

Note that in general there are infinitely many underlying Poisson sampling designs which give rise to the same rejective sampling design. In fact, if the vector of first order inclusion probabilities corresponding to the underlying Poisson sampling design is changed to $\mathbf{p}_{N}^{\prime}:=(p_{1,N}^{\prime},p_{2,N}^{\prime},\dots,p_{N,N}^{\prime})$ in such way that for some fixed constant $c>0$

[TABLE]

then $\mathfrak{p}_{N}^{R}(\mathbf{s}_{N};\mathbf{p}_{N})=\mathfrak{p}_{N}^{R}(\mathbf{s}_{N};\mathbf{p}_{N}^{\prime})$ for every sample $\mathbf{s}_{N}\in\{0,1\}^{N}$ . The underlying Poisson sampling design is called canonical if its first order sample inclusion probabilities are chosen so that $\sum_{i=1}^{N}p_{i,N}=n$ , i.e. so that the expected sample size of the underlying Poisson sampling design equals the fixed sample size of the rejective sampling plan. Of course, the first order sample inclusion probabilities corresponding to a rejective sampling design are in general different from those corresponding to any of the underlying Poisson sampling designs. However, Háyek (1964) showed that, in some asymptotic sense, they are uniformly close to those corresponding to the underlying canonical Poisson sampling design (see Theorem 5.1 in Háyek’s paper). In fact, Háyek proved the following result:

Result 1.

Let $\mathbf{\underline{\pi}}_{N}:=(\pi_{1,N},\pi_{2,N},\dots,\pi_{N,N})\in(0,1)^{N}$ be the vector of first order sample inclusion probabilities for a rejective sampling design and let $\mathbf{p}_{N}:=(p_{1,N},p_{2,N},\dots,p_{N,N})$ be the vector of first order sample inclusion probabilities of the corresponding canonical Poisson sampling design. Then it follows that

(i)

[TABLE]

(note that $d_{N}$ is the variance of the random sample size corresponding to the canonical Poisson sampling design);

(ii)

[TABLE]

(iii)

[TABLE]

Actually, Háyek did not use the double subscript notation to indicate the $p_{i,N}$ ’s and the $\pi_{i,N}$ ’s. However, it is easily checked that all the proofs given in his paper are actually meant for sequences of rejective sampling designs and corresponding canonical Poisson sampling designs where all the first order sample inclusion probabilities can be redefined as the population size $N$ increases. With respect to Result 1 it is worth noting that for every properly scaled vector of first order sample inclusion probabilities $\mathbf{\underline{\pi}}_{N}\in(0,1)^{N}$ there exists a corresponding rejective sampling design. In other words, for every $\mathbf{\underline{\pi}}_{N}\in(0,1)^{N}$ such that $\sum_{i=1}^{N}\pi_{i,N}=n$ for some $n=0,1,2,\dots,N$ there exists a corresponding vector $\mathbf{p}_{N}\in(0,1)^{N}$ such that

[TABLE]

This result has been shown by [5] but it can also be viewed as a consequence of a well-known theorem about exponential families (see Theorem 5 on page 67 in [15]). Fast algorithms to recover the vector $\mathbf{p}_{N}$ corresponding to a given vector $\mathbf{\underline{\pi}}_{N}$ and viceversa can be found in [3].

Having shown Result 1 and some similar approximation results (also for second order sample inclusion probabilities), [6] moves on to show asymptotic normality for the sequence of Horvitz-Thompson estimators. To this aim, he introduces a new sampling design, call it $P_{0}$ , which approximates the canonical Poisson sampling design associated to the rejective sampling design of interest. The sampling design $P_{0}$ can be implemented in three steps according to the following procedure. At the first step a sample of size $n$ is selected using the rejective sampling design of interest. As before, let $\mathbf{p}_{N}$ denote the vector of first order sample inclusion probabilities of the corresponding canonical Poisson sampling design. Then, independently from the outcome of the first step, a new random experiment is performed to ascertain the sample size $K$ of a Poisson sampling design with sample inclusion probabilities given by $\mathbf{p}_{N}$ . If $K=n$ , then the final sample according to $P_{0}$ will be the rejective sample obtained at the first step. However, if $K>n$ , rejective sampling is used again to select a sample of size $K-n$ from the population units that were not included in the first rejective sample, and this new sample is added to the sample obtained at the first step in order to obtain the final sample according to $P_{0}$ . The first order sample inclusion probabilities of the canonical Poisson sampling design that underlies the second rejective sampling plan are proportional to the $p_{i,N}$ -values of the population units that are not included in the first rejective sample. On the other hand, if $K<n$ , rejective sampling is used again to select a sample of size $n-K$ from the population units that were already included in the rejective sample obtained at the first step, and the final sample according to $P_{0}$ is obtained by removing the units that are included in the second rejective sample from those which were already included in the first one. In this case, the first order sample inclusion probabilities of the underlying canonical Poisson sampling design will be proportional to the values of $1-p_{i,N}$ corresponding to the population units that were already included in the first rejective sample.

Now, to give a formal statement of the sense in which the sample design $P_{0}$ provides an approximation to the canonical Poisson sampling design which underlies the rejective sample plan of interest, it will be convenient to introduce some notation. So, let $\mathbf{S}_{N}^{R}:=(S_{1,N}^{R},S_{2,N}^{R},\dots,S_{N,N}^{R})$ be the vector of sample inclusion indicators that describe the outcome of the rejective sampling design that is used at the first step of the experiment (i.e. the rejective sampling design of interest), and let $\mathbf{S}_{N}^{P_{0}}:=(S_{1,N}^{P_{0}},S_{2,N}^{P_{0}},\dots,S_{N,N}^{P_{0}})$ denote the vector of sample inclusion indicators that identify the final sample according to $P_{0}$ . Moreover, denote their joint pdf by $\mathfrak{p}_{N}^{R,P_{0}}(\cdot,\cdot;\mathbf{p}_{N}):\{0,1\}^{N}\times\{0,1\}^{N}\mapsto[0,1]$ , and the marginal pdfs corresponding to $\mathbf{S}_{N}^{R}$ and $\mathbf{S}_{N}^{P_{0}}$ by $\mathfrak{p}_{N}^{R}(\cdot;\mathbf{p}_{N})$ and $\mathfrak{p}_{N}^{P_{0}}(\cdot;\mathbf{p}_{N})$ , respectively. Finally, let $\mathfrak{p}_{N}^{P}(\cdot;\mathbf{p}_{N})$ be the pdf of the vector of sample inclusion indicators corresponding to the Poisson sampling design with first order inclusion probabilities given by the components of $\mathbf{p}_{N}$ (i.e. the canonical Poisson sampling design which underlies the rejective sampling design of interest). With this notation, the approximation result contained in Lemma 4.3 of Hájek’s paper can now be stated as follows:

Result 2.

The total variation distance

[TABLE]

converges to zero as $d_{N}\rightarrow\infty$ .

Based on Result 1 and Result 2, Hájek [6] proved asymptotic normality of the Horvitz-Thompson estimators corresponding to a sequence of rejective sampling designs as follows. First, he considered the sequence of underlying canonical Poisson sampling designs and showed asymptotic normality for a corresponding sequence of auxiliary statistics $\{T_{N}\}_{N=1}^{\infty}$ . As pointed out by [4] and by [1], the auxiliary statistic $T_{N}$ can be viewed as the residual of the projection of the Horvitz-Thompson estimator corresponding to the canonical Poisson sampling design on the random sample size from the latter design. Thus, if the goal is to estimate $\overline{f}:=\sum_{i=1}^{N}f(Y_{i})/N$ for some given real-valued function $f$ , then the corresponding auxiliary statistic $T_{N}$ can be written as

[TABLE]

where $\mathbf{S}_{N}^{P}:=(S_{1,N}^{P},S_{2,N}^{P},\dots,S_{N,N}^{P})$ is the vector of sample inclusion indicators for the canonical Poisson sampling design, and where

[TABLE]

Note that the design expectation of $T_{N}(f;\mathbf{S}_{N}^{P})$ coincides with $\overline{f}$ , and that the design variance of $T_{N}(f;\mathbf{S}_{N}^{P})$ must be smaller than that of $Y_{N}(f;\mathbf{S}_{N}^{P})$ unless $R_{N}(f)=0$ . However, $T_{N}(f;\mathbf{S}_{N}^{P})$ is not an estimator because it depends on the unknown value of $R_{N}(f)$ . At this point, having proved asymptotic normality for the sequence $\{T_{N}(f;\mathbf{S}_{N}^{P})\}_{N=1}^{\infty}$ , Hájek deduces asymptotic normality for the corresponding sequence $\{T_{N}(f;\mathbf{S}_{N}^{P_{0}})\}_{N=1}^{\infty}$ by using Result 2. From this he gets asymptotic normality for the sequence $\{Y_{N}(f;\mathbf{S}_{N}^{R})\}_{N=1}^{\infty}$ by proving the following result (note that $Y_{N}(f;\mathbf{S}_{N}^{R})=T_{N}(f;\mathbf{S}_{N}^{R})$ because $\sum_{i=1}^{N}S_{i,N}^{R}=n=\sum_{i=1}^{N}p_{i,N}$ ):

Result 3.

Let

[TABLE]

Then,

[TABLE]

where the expectation refers to $\mathfrak{p}_{N}^{R,P_{0}}(\cdot,\cdot;\mathbf{p}_{N})$ .

Actually, in his paper [6], Hájek did non single out Result 3 in a dedicated lemma or theorem, but he proved it in the course of the proof of his Theorem 7.1 which establishes asymptotic normality for the sequence $\{Y_{N}(f;\mathbf{S}_{N}^{R})\}_{N=1}^{\infty}$ . Note that the statement of the latter theorem considers actually only the case where the $p_{i,N}$ ’s are proportional to some size variable, but that the proof of Result 3 given in Hájek’s paper goes through for any sequence of vectors $\{\mathbf{p}_{N}\}_{N=1}^{\infty}$ such that $d_{N}\rightarrow\infty$ . From Result 3 one can finally deduce asymptotic normality for the Horvitz-Thompson estimators corresponding to a sequence of rejective sampling designs by using Result 1.

5 Weak convergence theorems for CPS designs

In this section the functional central limit theorems for the rejective sampling case given in [1] will be proven again in somewhat greater generality. Since this requires some assumptions which involve the marginal distribution of the $Y_{i}$ component in $(Y_{i},X_{i})$ , it will be convenient to denote the latter distribution by $P_{y}$ . As usual in the empirical process literature, the symbol $P_{y}$ will also be used to indicate an operator on the function class $\mathcal{F}$ or on related function classes. For example, $P_{y}$ will also be used to indicate the real-valued function $f\in\mathcal{F}\mapsto\int f(y)dP_{y}(y)$ .

The first result of this section settles a measurability issue. It will be used in the rest of this paper without explicitly mentioning it.

Lemma 1.

For $\mathbf{s}_{N}\in\{0,1\}^{N}$ let $\mathfrak{p}_{N}^{R}(\mathbf{s}_{N};\mathbf{X}_{N})$ be the sample selection probabilities for a CPS sampling design, and let $\mathfrak{p}_{N}^{P}(\mathbf{s}_{N};\mathbf{X}_{N})$ be the sample selection probabilities for the corresponding canonical Poisson sampling design. Moreover, let $\underline{\mathbf{\pi}}_{N}$ be the vector of first order sample inclusion probabilities for the CPS design and let $\mathbf{p}_{N}$ be the vector of first order sample inclusion probabilities for the canonical Poisson sampling design. The following statements are equivalent:

(i)

The CPS design is measurable, i.e. for every $\mathbf{s}_{N}\in\{0,1\}^{N}$ the corresponding function $\mathbf{X}_{N}\mapsto\mathfrak{p}_{N}^{R}(\mathbf{s}_{N};\mathbf{X}_{N})$ is a measurable function from $(\mathcal{X},\mathcal{A})^{N}$ into $\mathbb{R}$ ;

(ii)

the vector $\underline{\mathbf{\pi}}_{N}$ is measurable, i.e. the function $\mathbf{X}_{N}\mapsto\underline{\mathbf{\pi}}_{N}$ is a measurable function from $(\mathcal{X},\mathcal{A})^{N}$ into $\mathbb{R}^{N}$ ;

(iii)

the vector $\mathbf{p}_{N}$ is measurable, i.e. the function $\mathbf{X}_{N}\mapsto\mathbf{p}_{N}$ is a measurable function from $(\mathcal{X},\mathcal{A})^{N}$ into $\mathbb{R}^{N}$ ;

(iv)

The canonical Poisson sampling design is measurable, i.e. for every $\mathbf{s}_{N}\in\{0,1\}^{N}$ the corresponding function $\mathbf{X}_{N}\mapsto\mathfrak{p}_{N}^{P}(\mathbf{s}_{N};\mathbf{X}_{N})$ is a measurable function from $(\mathcal{X},\mathcal{A})^{N}$ into $\mathbb{R}$ ;

Proof.

The proofs of the implications (i) $\Rightarrow$ (ii), (iii) $\Rightarrow$ (iv) and (iv) $\Rightarrow$ (i) are easy and the implication (ii) $\Rightarrow$ (iii) follows from Theorem 5 on page 67 in [15] which is a special case of a well-known result about exponential families (see for example [2]) ∎

The next lemma establishes conditional convergence of the marginal distributions of the sequence of HTEPs.

Lemma 2 (CWCM).

Let $\{\mathbf{S}_{N}^{R}\}_{N=1}^{\infty}$ be the sequence of vectors of sample inclusion indicators corresponding to a sequence of measurable CPS designs and let $\{\mathbf{p}_{N}\}_{N=1}^{\infty}$ be the sequence of vectors of first order sample inclusion probabilities of the corresponding sequence of canonical Poisson sampling designs. Let $\mathcal{F}$ be a class of measurable functions $f:\mathcal{Y}\mapsto\mathbb{R}$ and let $\{\mathbb{G}_{N}^{\prime}\}_{N=1}^{\infty}:=\{\{\mathbb{G}_{N}^{\prime}f:f\in\mathcal{F}\}\}_{N=1}^{\infty}$ be the sequence HTEPs corresponding to $\{\mathbf{S}_{N}^{R}\}_{N=1}^{\infty}$ and $\mathcal{F}$ .

Assume that:

A0)

the sequence $\{\mathbf{p}_{N}\}_{N=1}^{\infty}$ is such that

[TABLE]

A1)

there exists a function $\Sigma^{\prime}:\mathcal{F}\times\mathcal{F}\mapsto\mathbb{R}$ such that

[TABLE]

for every $f,g\in\mathcal{F}$ ,

A2)

for every finite-dimensional vector $\mathbf{f}:=(f_{1},f_{2},\dots,f_{r})^{\intercal}\in\mathcal{F}^{r}$ and for every $\epsilon>0$

[TABLE]

where $\lVert\cdot\rVert$ is the euclidean norm on $\mathbb{R}^{r}$ , $\mathbf{f}(Y_{i}):=(f_{1}(Y_{i}),f_{2}(Y_{i}),\dots,f_{r}(Y_{i}))^{\intercal}$ and $R_{N}(\mathbf{f}):=(R_{N}(f_{1}),R_{N}(f_{2}),\dots,R_{N}(f_{r}))^{\intercal}$ .

Then it follows that the function $\Sigma^{\prime}$ is a positive semidefinite covariance function, and for every finite-dimensional vector $\mathbf{f}\in\mathcal{F}^{r}$ and for every $\mathbf{t}\in\mathbb{R}^{r}$

[TABLE]

where $\mathbb{G}_{N}^{\prime}\mathbf{f}:=(\mathbb{G}_{N}^{\prime}f_{1},\mathbb{G}_{N}^{\prime}f_{2},\dots,\mathbb{G}_{N}^{\prime}f_{r})^{\intercal}$ , and where $\Sigma^{\prime}(\mathbf{f})$ is the covariance matrix whose elements are given by $\Sigma^{\prime}_{(ij)}(\mathbf{f}):=\Sigma^{\prime}(f_{i},f_{j})$ , $i,j=1,2,\dots,r$ .

Proof.

Assume WLOG that $\{\mathbf{S}_{N}^{R}\}_{N=1}^{\infty}$ and the two sequences $\{\mathbf{S}_{N}^{P}\}_{N=1}^{\infty}$ and $\{\mathbf{S}_{N}^{P_{0}}\}_{N=1}^{\infty}$ of the previous subsection are defined in such way that the sequence of pdfs corresponding to $\{\mathbf{S}_{N}^{P}\}_{N=1}^{\infty}$ is given by $\{\mathfrak{p}_{N}^{P}(\cdot;\mathbf{p}_{N})\}_{N=1}^{\infty}$ , and such that the sequence of joint pdfs corresponding to $\{(\mathbf{S}_{N}^{R},\mathbf{S}_{N}^{P_{0}})\}_{N=1}^{\infty}$ is given by $\{\mathfrak{p}_{N}^{R,P_{0}}(\cdot,\cdot;\mathbf{p}_{N})\}_{N=1}^{\infty}$ . This can be done in many ways by defining each vector $\mathbf{S}_{N}^{P}$ , $\mathbf{S}_{N}^{P_{0}}$ and $\mathbf{S}_{N}^{R}$ as a measurable function of $\mathbf{X}_{N}$ and a single uniform- $[0,1]$ random variable $D$ as described in Section 2. In what follows the sequence of joint pdfs corresponding to $\{(\mathbf{S}_{N}^{P},\mathbf{S}_{N}^{R},\mathbf{S}_{N}^{P_{0}})\}_{N=1}^{\infty}$ will not be relevant.

Now, consider first the sequence of stochastic processes $\{\mathbb{T}_{N}^{P}\}_{N=1}^{\infty}:=\{\{\mathbb{T}_{N}^{P}f:f\in\mathcal{F}\}\}_{N=1}^{\infty}$ with $\mathbb{T}_{N}^{P}f$ defined as

[TABLE]

Note that $E_{d}\mathbb{T}_{N}^{P}f=0$ for every $f\in\mathcal{F}$ , and that the left side of the display in condition A1 is the sequence of covariances $\Sigma_{N}^{\prime}(f,g):=E_{d}\mathbb{T}_{N}^{P}f\mathbb{T}_{N}^{P}g$ . Now, for $\mathbf{f}\in\mathcal{F}^{r}$ consider the triangular array of rowwise conditionally independent random vectors

[TABLE]

Observe that the random vector $\mathbb{T}_{N}^{P}\mathbf{f}:=(\mathbb{T}_{N}^{P}f_{1},\mathbb{T}_{N}^{P}f_{2},\dots,\mathbb{T}_{N}^{P}f_{r})^{\intercal}$ can be written as

[TABLE]

Using the fact that $\Sigma_{N}^{\prime}(f,g):=E_{d}\mathbb{T}_{N}^{P}f\mathbb{T}_{N}^{P}g\overset{P(as)}{\rightarrow}\Sigma^{\prime}(f,g)$ along with condition A2 it is not difficult to show that the Lindeberg condition

[TABLE]

must be satisfied whenever $\mathbf{f}\in\mathcal{F}$ and $\mathbf{t}\in\mathbb{R}^{r}$ are such that $\mathbf{t}^{\intercal}\Sigma^{\prime}(\mathbf{f})\mathbf{t}>0$ . Therefore it follows that

[TABLE]

Next, consider the sequence of stochastic processes $\{\mathbb{T}_{N}^{P_{0}}\}_{N=1}^{\infty}:=\{\{\mathbb{T}_{N}^{P_{0}}f:f\in\mathcal{F}\}\}_{N=1}^{\infty}$ with $\mathbb{T}_{N}^{P_{0}}f$ defined in the same way as $\mathbb{T}_{N}^{P}f$ but with $\mathbf{S}_{N}^{P_{0}}$ in place of $\mathbf{S}_{N}^{P}$ . Use assumption A0 along with Result 2 in Section 4 to show that

[TABLE]

Note that this does not require to know the joint distributions of the vectors $\mathbf{S}_{N}^{P_{0}}$ and $\mathbf{S}_{N}^{P}$ .

Third, consider the sequence of stochastic processes $\{\mathbb{Y}_{N}^{R}\}_{N=1}^{\infty}:=\{\{\mathbb{Y}_{N}^{R}f:f\in\mathcal{F}\}\}_{N=1}^{\infty}$ with $\mathbb{Y}_{N}^{R}f$ defined in the same way as $\mathbb{Y}_{N}^{P}f$ but with $\mathbf{S}_{N}^{R}$ in place of $\mathbf{S}_{N}^{P}$ . Use Result 3 in Section 4 to conclude that

[TABLE]

as well.

Finally, note that the definition of $\mathbb{Y}_{N}^{R}$ coincides with the one of $\mathbb{G}_{N}^{\prime}$ except for the fact that the former contains the first order sample inclusion probabilities corresponding to $\mathbf{S}_{N}^{P}$ in place of those corresponding to $\mathbf{S}_{N}^{R}$ , i.e. $\mathbb{Y}_{N}^{R}$ contains $p_{i,N}$ in place of $\pi_{i,N}:=E_{d}S_{i,N}^{R}$ . However, this problem can be easily fixed by using Result 1. ∎

Remark 3.

If each vector $\mathbf{p}_{N}$ and every $f\in\mathcal{F}$ are measurable (in their respective senses), then condition A2 will be certainly satisfied if $P_{y}f^{2}<\infty$ for every $f\in\mathcal{F}$ and

A2∗**)

there exists a constant $L>0$ such that $\min_{1\leq i\leq N}p_{i,N}>L$ with probability tending to $1$ (eventually almost surely).

Now, Lemma 2 provides sufficient conditions for convergence of the finite-dimensional marginal distributions of the sequence of HTEPs, but in order to establish (conditional) weak convergence in $l^{\infty}(\mathcal{F})$ for infinite function classes $\mathcal{F}$ it must still be shown that sequence of HTEPs is (conditionally) asymptotically tight in $l^{\infty}(\mathcal{F})$ (for unconditional weak convergence this follows from Theorem 1.5.4 on page 35 in [17], while for conditional weak convergence this follows from Theorem 4 in Section 3). By Theorem 1.5.7 on page 37 in [17] (Theorem 1 in Section 3) this can be done by showing that there exists a semimetric $\rho$ for which $\mathcal{F}$ is totally bounded and for which the HTEP sequence is (conditionally) asymptotically equicontinuous (henceforth AEC). In this paper the choice of the semimetric $\rho$ will depend on the definition of the first order sample inclusion probabilities. In the next subsection it will be seen that if the first order sample inclusion probabilities are bounded away from zero, it is convenient to consider the $L_{2}(P_{y})$ -semimetric

[TABLE]

The subsequent subsection will then treat the case where the first order sample inclusion probabilities are proportional to some size variable which might take on arbitrarily small values. For that case another semimetric will be used.

5.1 CPS designs with a positive lower bound on the first order sample inclusion probabilities

The next lemma provides sufficient conditions which make sure that $\mathcal{F}$ is totally bounded w.r.t. the $L_{2}(P_{y})$ semimetric $\rho$ and that the sequence of HTEPs conditionally AEC w.r.t. $\rho$ .

Lemma 3 (Total boundedness and conditional AEC).

Let $\{\mathbf{S}_{N}^{R}\}_{N=1}^{\infty}$ and $\{\mathbf{p}_{N}\}_{N=1}^{\infty}$ be defined as in Lemma 2, let $\mathcal{F}$ be a class of measurable functions $f:\mathcal{Y}\mapsto\mathbb{R}$ and let $\{\mathbb{G}_{N}^{\prime}\}_{N=1}^{\infty}:=\{\{\mathbb{G}_{N}^{\prime}f:f\in\mathcal{F}\}\}_{N=1}^{\infty}$ be the sequence HTEPs corresponding to $\{\mathbf{S}_{N}^{R}\}_{N=1}^{\infty}$ and $\mathcal{F}$ . Assume that condition A2∗ holds and that assumptions

GC)

$(\mathcal{F}_{\infty})^{2}:=\{(f-g)^{2}:f,g\in\mathcal{F}\}$ * is an outer almost sure $P_{y}$ -Glivenko-Cantelli class*

F1)

$\mathcal{F}$ * has an envelope function $F$ such that $P_{y}^{*}F^{2}<\infty$ and such that the uniform entropy condition*

[TABLE]

holds. In the last display the supremum is taken over all finitely discrete probability measures $Q_{y}$ on $\mathcal{Y}$ such that $\lVert F\rVert_{L_{2}(Q_{y})}:=\int F^{2}dQ_{y}>0$ ;

Then it follows that

(i)

$\mathcal{F}$ * is totally bounded w.r.t. $\rho$ ;*

(ii)

[TABLE]

where

[TABLE]

Proof.

Part (i) of the conclusion follows from condition F1 (see Problem 2.5.1 on page 133 in [17].

The proof of part (ii) of the conclusion is essentially the same as the proof of conditional AEC for the rejective sampling case given in [1] (see pages 12-13 in the supplement to that paper) but it corrects a little mistake in the final part of that proof. First it will be shown that for arbitrary $\delta_{N}\downarrow 0$ the corresponding stochastic processes $\{\mathbb{G}_{N}^{\prime}f:f\in\mathcal{F}_{\delta_{N}}\}$ are, with probability tending to $1$ (or eventually almost surely), conditionally subgaussian w.r.t. the empirical semimetric

[TABLE]

i.e. it will be shown that there exists a constant $C>0$ (which does not depend on the sample points $\omega\in\Omega_{y,x}^{\infty}\times\Omega_{d}$ and neither on $N$ ) such that, with probability tending to $1$ (or eventually almost surely),

[TABLE]

To this aim, write

[TABLE]

so that, for every $x,\lambda>0$ ,

[TABLE]

by Markov’s inequality. Now, note that by Theorem 2.8 in [8] the components of $\mathbf{S}_{N}^{R}$ are negatively associated, and hence it follows that

[TABLE]

Since $E_{d}Z_{i,N}=0$ for every $i=1,2,\dots,N$ and for every $N=1,2,\dots$ , and since

[TABLE]

it follows from Hoeffding’s lemma (see [7]) that

[TABLE]

By assumption A2*∗* and Result 1 from Hájek’s paper the right side does not exceed

[TABLE]

with probability tending to one (or eventually almost surely). In combination with (21) and (22) this shows that

[TABLE]

with probability tending to $1$ (or eventually almost surely). Combining this inequality with the same inequality for $\mathbb{G}_{N}^{\prime}g-\mathbb{G}_{N}^{\prime}f$ shows that

[TABLE]

with probability tending to $1$ (or eventually almost surely). Finally, optimizing the right side w.r.t. $\lambda>0$ yields the subgaussian inequality in (20) with $C=2L^{2}$ .

Now, note that

[TABLE]

and that the uniform entropy condition in assumption F1 implies that for every $\epsilon>0$ the square of the corresponding entropy number on the far right must be finite. From this it follows that $\mathcal{F}_{\delta_{N}}$ contains a countable subset $\mathcal{G}_{\delta_{N}}(\mathbf{Y}_{N})$ (note that this subset depends on $\mathbf{Y}_{N}$ ) such that

[TABLE]

As a consequence, the stochastic process $\{\mathbb{G}_{N}^{\prime}f:f\in\mathcal{F}_{\delta_{N}}\}$ is separable in the sense required for an application of Corollary 2.2.8 on page 101 in [17] with respect to the sample design distribution of the process. Since it has already be shown that the sub-Gaussian inequality (20) holds with probability tending to $1$ (or eventually almost surely), it follows by the second part of the conclusion of the just cited corollary that there exists a constant $K>0$ (which does not depend on the sample points $\omega\in\Omega_{y,x}^{\infty}\times\Omega_{d}$ and neither on $N$ ) such that

[TABLE]

with inner probability tending to one (or eventually inner almost surely), where $D(\epsilon,\mathcal{F}_{\delta_{N}},L_{2}(\mathbb{P}_{y,N}))$ denotes the packing number, i.e. the cardinality of the largest subset $\mathcal{H}$ of $\mathcal{F}_{\delta_{N}}$ such that $\rho_{N}(f,g)>\epsilon$ for every $f,g\in\mathcal{H}$ . Since

[TABLE]

it follows that the right side in (23) is bounded by a constant multiple of

[TABLE]

The proof can now be completed by using assumptions GC and F1 in order to show that the latter integral goes to zero outer almost surely. This can be done as in the proof of Theorem 2.5.2 on page 127 in [17] (see the lines following display (2.5.3) on page 128 in [17]; see also Remark 4 below in order to see that assumption GC can be replaced with a measurability condition). ∎

Remark 4.

In the proof of Theorem 2.5.2 on page 127 in [17] it is shown that condition F1 together with condition

M1)

$(\mathcal{F}_{\infty})^{2}:=\{(f-g)^{2}:f,g\in\mathcal{F}\}$ * is a $P_{y}$ -measurable class of functions (see Definition 2.3.3 on page 110 in [17]), i.e. the function*

[TABLE]

is measurable on the completion of $(\mathcal{Y}^{N},\mathcal{A}^{N},P_{y}^{N})$ for every $N$ and for every $(e_{1},e_{2},\dots,e_{N})\in\mathbb{R}^{N}$

imply condition GC. Moreover, in the proof of Theorem 2.5.2 on page 127 in [17] assumption M1 is used only for this purpose. Thus, the proof of Theorem 2.5.2 on page 127 in [17] does actually show that assumptions F1, GC and

M2)

for every $\delta>0$ the corresponding function class $\mathcal{F}_{\delta}$ is a $P_{y}$ -measurable class of functions, i.e. the function

[TABLE]

is measurable on the completion of $(\mathcal{Y}^{N},\mathcal{A}^{N},P_{y}^{N})$ for every $N$ and for every $(e_{1},e_{2},\dots,e_{N})\in\mathbb{R}^{N}$

imply that $\mathcal{F}$ is a $P_{y}$ -Donsker class.

Remark 5.

It is not difficult to show that condition

PM)

$\mathcal{F}$ * is a pointwise measurable class of functions, i.e. $\mathcal{F}$ contains a countable subset $\mathcal{G}$ such that for every $f\in\mathcal{F}$ there exists a sequence $\{g_{m}\}_{m=1}^{\infty}$ of functions $g_{m}\in\mathcal{G}$ such that $f$ is the pointwise limit of $\{g_{m}\}_{m=1}^{\infty}$ (see Example 2.3.4 on page 110 in [17])*

implies condition M1 as well as condition M2.

Remark 6.

The FCLT for the rejective sampling case given in [1] (Theorem 3.2 on page 105 of that paper) imposes neither assumption M1 nor assumption GC. However, there is a mistake in the proof of conditional AEC given in [1]. In fact, inequality (S3) on page 13 in the supplement to [1] is false in general. According to the first inequality in the conclusion of Corollary 2.2.8 on page 101 in [17], which was used by the authors of [1] in order to obtain inequality (S3), the left hand side of inequality (S3) should actually be

[TABLE]

rather than

[TABLE]

with the function class $\mathcal{F}$ in place of $\mathcal{F}_{\delta}$ . As a consequence, the proof of conditional AEC given in [1] shows actually that

[TABLE]

which does not imply conditional AEC. In order to obtain conditional AEC, the authors of [1] should have used the second inequality in the conclusion of Corollary 2.2.8 on page 101 in [17] rather than the first one. In this way they would have obtained inequality (23) instead of inequality (S3), and in order to prove that under condition F1 the right side of (23) goes to zero outer almost surely some additional assumption seems to be necessary (cf. the proof of Theorem 2.5.2 on page 127 in [17]).

As already pointed out in [11] (cf. Remark 2), conditional AEC w.r.t. to a given semimetric $\rho$ and not even the conclusion of Lemma 3 (which is certainly stronger than conditional AEC) seem to be strong enough to imply unconditional AEC, which for the HTEP sequence $\{\{\mathbb{G}_{N}^{\prime}f:f\in\mathcal{F}\}\}_{N=1}^{\infty}$ can be defined as

[TABLE]

where $P$ is the product probability measure $P_{y,x}^{\infty}\times P_{d}$ (cf. the equivalent definition of conditional AEC given in Remark 1). The problem is that for uncountable function classes $\mathcal{F}$ the random functions $\lVert\mathbb{G}_{N}^{\prime}\rVert_{\mathcal{F}_{\delta}}$ , $\delta>0$ , might be non measurable and that $\lVert\mathbb{G}_{N}^{\prime}\rVert_{\mathcal{F}_{\delta_{N}}}$ might therefore be strictly smaller than $\lVert\mathbb{G}_{N}^{\prime}\rVert_{\mathcal{F}_{\delta_{N}}}^{*}$ with positive inner probability. As a consequence, the $P_{d}$ -probabilities on the left side in (13) might not be conditional probabilities in the proper sense and condition (24) might therefore fail even though condition (13) is satisfied (note that this is consistent with the conjecture that oasCWC does not imply unconditional weak convergence; see Remark 2). To be safe, in order to deduce unconditional AEC from conditional AEC condition PM will be used in this paper. In fact, condition PM makes sure that the random functions $\lVert\mathbb{G}_{N}^{\prime}\rVert_{\mathcal{F}_{\delta}}$ , $\delta>0$ , are all measurable and that the $P_{d}$ -probabilities on the left side in (13) are therefore conditional probabilities in the proper sense.

Now, combining the sufficient conditions for CWCM with those for total boundedness and conditional AEC yields the following weak convergence results:

Theorem 6 (conditional weak convergence).

Let $\{\mathbf{S}_{N}^{R}\}_{N=1}^{\infty}$ , $\{\mathbf{p}_{N}\}_{N=1}^{\infty}$ , $\mathcal{F}$ and $\{\mathbb{G}_{N}^{\prime}\}_{N=1}^{\infty}:=\{\{\mathbb{G}_{N}^{\prime}f:f\in\mathcal{F}\}\}_{N=1}^{\infty}$ be defined as in Lemma 2. Assume that conditions A0, A1, A2∗, GC and F1 are satisfied. Then it follows that

(i)

there exists a zero-mean Gaussian process $\{\mathbb{G}^{\prime}f:f\in\mathcal{F}\}$ with covariance function given by $\Sigma^{\prime}$ which is a Borel measurable and tight mapping from some probability space into $l^{\infty}(\mathcal{F})$ such that

[TABLE]

(ii)

the sample paths $f\mapsto\mathbb{G}^{\prime}f$ are uniformly continuous w.r.t. the $L_{2}(P_{y})$ semimetric $\rho(f,g):=[P_{y}(f-g)^{2}]^{1/2}$ with probability $1$ .

Proof.

Assumptions A0, A1, A2*∗* make sure that CWCM holds for some zero-mean Gaussian limit process with covariance function given by $\Sigma^{\prime}$ (see Lemma 2 and Remark 3), while assumptions A2*∗*, GC and F1 imply that $\mathcal{F}$ is totally bounded w.r.t. $\rho$ and that $\{\mathbb{G}_{N}^{\prime}\}_{N=1}^{\infty}$ is conditionally AEC w.r.t. $\rho$ (see Lemma 3). Both conclusions of the theorem follow now from Corollary 1. ∎

Theorem 7 (Unconditional weak convergence).

Let $\{\mathbf{S}_{N}^{R}\}_{N=1}^{\infty}$ , $\{\mathbf{p}_{N}\}_{N=1}^{\infty}$ , $\mathcal{F}$ and $\{\mathbb{G}_{N}^{\prime}\}_{N=1}^{\infty}:=\{\{\mathbb{G}_{N}^{\prime}f:f\in\mathcal{F}\}\}_{N=1}^{\infty}$ be defined as in Lemma 2. Assume that conditions A0, A1, A2∗, F1 and PM are satisfied. Then it follows that

(i)

there exists zero-mean Gaussian process $\{\mathbb{G}^{\prime}f:f\in\mathcal{F}\}$ with covariance function given by $\Sigma^{\prime}$ which is a Borel measurable and tight mapping from some probability space into $l^{\infty}(\mathcal{F})$ such that

[TABLE]

(ii)

the sample paths $f\mapsto\mathbb{G}^{\prime}f$ are uniformly continuous w.r.t. the $L_{2}(P_{y})$ semimetric $\rho(f,g):=[P_{y}(f-g)^{2}]^{1/2}$ with probability $1$ .

Proof.

Remark 4 and Remark 5 show that assumption F1 along with assumption PM imply assumption GC. The conditions of the present theorem are therefore stronger than the conditions of Theorem 6, and the conclusion of the present theorem follows therefore from Remark 2 (note that condition PM implies measurability of the suprema in Remark 2). ∎

The following corollary establishes joint weak convergence for the sequence of HTEPs and the classical sequence of $\mathcal{F}$ -indexed i.i.d. empirical processes given by

[TABLE]

Corollary 2 (Joint weak convergence).

Under the assumptions of Theorem 7 it follows that

[TABLE]

where $\mathbb{G}^{\prime}$ is defined as in Theorem 6 (or Theorem 7), $\mathbb{G}_{N}$ is the classical $\mathcal{F}$ -indexed empirical process defined in (25), and where $\mathbb{G}$ is a Borel measurable and tight $P_{y}$ -Brownian Bridge which is independent from $\mathbb{G}^{\prime}$ .

Proof.

The assumptions of Theorem 7 are stronger than those of Theorem 6 (which imply opCWC) and they imply that $\mathcal{F}$ is a $P_{y}$ -Donsker class (see Remark 4 and Remark 5). The proof of the corollary follows now from an application of Theorem 5. ∎

5.2 CPS designs with first order sample inclusion probabilities proportional to some size variable which might take on arbitrarily small values

This subsection treats the case where the first order sample inclusion probabilities are proportional to some size variable which can take on values arbitrarily close to zero. Note that this case is not covered by the theorems given in the previous subsection because assumptions A0 and A2*∗* imply that the first order sample inclusion probabilities are bounded away from zero with probability tending to $1$ or eventually almost surely (see Result 1 in Section 4). So, let $w:\mathcal{X}\mapsto(0,\infty)$ be a mapping such that $w(X_{i})$ can be interpreted as the "size" of the $i$ th population unit. Throughout this subsection it will be assumed that the first order sample inclusion probabilities are defined as

[TABLE]

where $c_{N}:\mathcal{X}^{N}\mapsto(0,\infty)$ is a function which makes sure that the expected sample size equals the value taken on by some other integer-valued function $n_{N}:\mathcal{X}^{N}\mapsto\{1,2,\dots,N\}$ (in many applications $\{n_{N}\}_{N=1}^{\infty}$ is simply a deterministic sequence of positive integers), i.e. $c_{N}$ makes sure that

[TABLE]

It is not difficult to show that the function $c_{N}$ is well defined, i.e. that for every $n_{N}\in[0,N]$ there exists a unique positive constant $c_{N}$ such that equation (27) holds. Moreover, under the assumptions

B0)

$n_{N}:\mathcal{X}^{N}\mapsto[0,N]$ is a measurable function and the sequence of expected sample sizes $\{n_{N}\}_{N=1}^{\infty}$ is such that

[TABLE]

B1)

$w:\mathcal{X}\mapsto(0,\infty)$ is a measurable function such that $Ew(X_{1})<\infty$ ,

it can also be shown that $c_{N}$ is measurable and that $c_{N}/N\rightarrow\theta$ in probability (almost surely), where $\theta$ is the unique (positive) constant such that

[TABLE]

The details of the proof of the latter claim are left to the reader.

Now, in order obtain weak convergence theorems for the case where the first order sample inclusion probabilities are defined as in (26) it will be convenient to proceed as in Subsection 3.2 of [11] and to place restrictions on the class of functions

[TABLE]

where $\mathcal{F}$ is the original class of interest, and where

[TABLE]

Note that the domain of the members of the class $\mathcal{F}/w_{\theta}$ is the range of the random vectors $(Y_{i},X_{i})$ (which is assumed to be $\mathcal{Y}\times\mathcal{X}$ ), and that the value taken on by $f/w_{\theta}\in\mathcal{F}/w_{\theta}$ at a given realization of the random vector $(Y_{i},X_{i})$ is given by $f/w_{\theta}(Y_{i},X_{i}):=f(Y_{i})/w_{\theta}(X_{i})$ .

The following lemma establishes CWCM for the HTEP sequence for the case where the first order sample inclusion probabilities are defined as in (26).

Lemma 4 (CWCM).

Let $\{\mathbf{\underline{\pi}}_{N}\}_{N=1}^{\infty}$ be the sequence of vectors of first order sample inclusion probabilities for a sequence of CPS designs and let $\{\mathbf{p}_{N}\}_{N=1}^{\infty}$ be the sequence of vectors of first order sample inclusion probabilities for the corresponding sequence of canonical Poisson sampling designs. Assume that the components of each vector $\mathbf{\underline{\pi}}_{N}$ are defined as in (26) and that conditions B0, B1 and condition

B2)

the members of $\mathcal{F}/w_{\theta}$ are square integrable, i.e. $E[f(Y_{1})/w_{\theta}(X_{1})]^{2}<\infty$ for every $f\in\mathcal{F}$

hold. Then it follows that conditions A0, A1, and A2 of Lemma 2 are satisfied and that the covariance function $\Sigma^{\prime}$ in condition A1 is given by

[TABLE]

with

[TABLE]

Proof.

Define

[TABLE]

and note that

[TABLE]

Next, note that

[TABLE]

and that the right hand side is positive only if $w(X_{i})$ lies between $\sum_{j=1}^{N}w(X_{j})/c_{N}$ and $Ew(X_{1})/\theta$ , in which case it is bounded by

[TABLE]

Since this bound does not depend on $i$ , and since under assumptions B0 and B1 it goes to zero in probability (almost surely), it follows that

[TABLE]

In combination with (30) this yields

[TABLE]

Using this result it is easily seen that

[TABLE]

Since $0<w_{\theta}(X_{1})\leq Ew(X_{1})/\theta<\infty$ , the limiting constant in the last line in the last display must be strictly positive unless $w_{\theta}(X_{1})=Ew(X_{1})/\theta$ with probability $1$ . However, in the latter case it would follow that $\theta=\alpha=1$ which contradicts assumption B0. This proves that the limiting constant on the right side in (33) is positive and hence that assumption A0 holds with $\pi_{i,N}$ in place of $p_{i,N}$ . From (i) in Result 1 in Section 4 it follows that assumption A0 in its original form must be satisfied as well. Actually, the previous argument shows more than that. In fact, it shows that under assumptions B0 and B1

[TABLE]

Next, consider assumption A1. Using (32), (34) and assumption B2 it is not difficult to show that assumption A1 is also satisfied with the limiting covariance function $\Sigma^{\prime}$ defined as in the statement of the lemma (the details of the proof are left to the reader).

Finally, it remains to show that the Lindeberg condition in assumption A2 holds as well. Also this can be easily shown by using (32) and assumption B2 (the details are left to the reader). ∎

Having established sufficient conditions for conditional convergence of the marginal distributions it remains to deal with AEC and total boundedness. The next lemma deals with both issues. As already mentioned above, the underlying semimetric will be different from the $L_{2}(P_{y})$ -semimetric $\rho$ which was used in the previous subsection. In fact, in the present setting it seems more convenient to use the semimetric

[TABLE]

in place of $\rho$ . Note that $\rho_{w}$ can be viewed as the $L_{2}(P_{y,x})$ -semimetric on the function class $\mathcal{F}/w_{\theta}$ .

Lemma 5 (Total boundedness and conditional AEC).

Let $\{\mathbf{S}_{N}^{R}\}_{N=1}^{\infty}$ be the sequence of vectors of sample inclusion indicators for a sequence of measurable CPS designs, let $\mathcal{F}$ be a class of measurable functions $f:\mathcal{Y}\mapsto\mathbb{R}$ and let $\{\mathbb{G}_{N}^{\prime}\}_{N=1}^{\infty}$ be the sequence of HTEPs corresponding to $\{\mathbf{S}_{N}^{R}\}_{N=1}^{\infty}$ and $\mathcal{F}$ . Assume that the first order sample inclusion probabilities corresponding to each vector $\mathbf{S}_{N}^{R}$ are defined as in (26) and that conditions B0 and B1 hold. Moreover, assume that conditions

GC∗**)

$(\mathcal{F}_{\infty}/w_{\theta})^{2}:=\{(f-g)^{2}/w_{\theta}^{2}:f,g\in\mathcal{F}\}$ * is an outer almost sure $P_{y,x}$ -Glivenko-Cantelli class;*

F1∗**)

$\mathcal{F}$ * has an envelope function $F$ such that $E[F^{*}(Y_{1})/w_{\theta}(X_{1})]^{2}<\infty$ and such that the uniform entropy condition*

[TABLE]

holds, where the supremum is taken over all finitely discrete probability measures $Q_{y,x}$ on $\mathcal{Y}\times\mathcal{X}$ such that222Note the abuse of notation: $\lVert F/w_{\theta}\rVert_{L_{2}(Q_{y,x})}$ and $\int[F(y)/w_{\theta}(x)]^{2}dQ_{y,x}(y,x)$ should actually be written as $\lVert(F\circ\phi_{y})/(w_{\theta}\circ\phi_{x)}\rVert_{L_{2}(Q_{y,x})}$ and $\int[F\circ\phi_{y}(y,x)/w_{\theta}\circ\phi_{x}(y,x)]^{2}dQ_{y,x}(y,x)$ , respectively, with $\phi_{y}:\mathcal{Y}\times\mathcal{X}\mapsto\mathbb{R}$ and $\phi_{x}:\mathcal{Y}\times\mathcal{X}\mapsto\mathbb{R}$ defined as $\phi_{y}(y,x):=y$ and $\phi_{x}(y,x):=x$ for $(y,x)\in\mathcal{Y}\times\mathcal{X}$ .**

[TABLE]

hold. Then it follows that

(i)

$\mathcal{F}$ * is totally bounded w.r.t. $\rho_{w}$ ;*

(ii)

[TABLE]

where $\mathcal{F}_{\delta}^{w}:=\{f-g:f,g\in\mathcal{F}\wedge\rho_{w}(f,g)<\delta\}$ .

Proof.

Part (i) of the conclusion follows from condition F1*∗* (see Problem 2.5.1 on page 133 in [17]).

The proof of part (ii) of the conclusion is almost the same as the proof of Lemma 3. The first step is to show that for arbitrary $\delta_{N}\downarrow 0$ the corresponding stochastic processes $\{\mathbb{G}_{N}^{\prime}f:f\in\mathcal{F}_{\delta_{N}}^{w}\}$ are, with probability tending to $1$ (eventually almost surely), conditionally subgaussian w.r.t. to the empirical semimetric

[TABLE]

i.e. to show that there exists a constant $C>0$ (which does not depend on the sample points $\omega\in\Omega_{y,x}^{\infty}\times\Omega_{d}$ and neither on $N$ ) such that, with probability tending to $1$ (eventually almost surely),

[TABLE]

(cfr. display (20)). To this aim, note that the difference $\mathbb{G}_{N}^{\prime}f-\mathbb{G}_{N}^{\prime}g$ can be written as

[TABLE]

where $w_{N}(X_{i})$ is defined as in (29). Then, note that $E_{d}Z_{i,N}=0$ for $i=1,2,\dots,N$ ,

[TABLE]

and that with probability tending to $1$ (eventually almost surely) the right side in the last inequality is bounded by

[TABLE]

where $L_{\theta}$ is a positive constant which depends only on $\theta$ (use (31) along with the fact that assumptions B0 and B1 imply $c_{N}/N\overset{P(as)}{\rightarrow}\theta$ ). Thus, it follows by Hoeffding’s lemma (see [7]) that, with probability tending to $1$ (eventually almost surely),

[TABLE]

Now, as in the proof of Lemma 3, use the fact that the components of $\mathbf{S}_{N}^{R}$ are negatively associated to conclude that

[TABLE]

with probability tending to $1$ (eventually almost surely). Optimizing the right side w.r.t. $\lambda$ yields then the subgaussian tail inequality in (36) with $C=2L_{\theta}^{2}$ .

Next, note that Corollary 2.2.8 on page 101 in [17] can be applied also in the present case and conclude that, with probability tending to one (eventually almost surely),

[TABLE]

for some constant $K$ (which does not depend on the sample points $\omega\in\Omega_{y,x}^{\infty}\times\Omega_{d}$ and neither on $N$ ). Now note that

[TABLE]

where $\mathcal{F}_{\delta_{N}}^{w}/w_{\theta}:=\{f/w_{\theta}:f\in\mathcal{F}_{\delta_{N}}^{w}\}$ , so that the integral in the second last display is bounded by a constant multiple of

[TABLE]

The proof can now be completed by using assumptions GC*∗* and F1*∗* in order to show that this last integral goes to zero outer almost surely. Again, this can be done by the method used in the proof of Theorem 2.5.2 on page 127 in [17] (see the lines following display 2.5.3 on page 128 in [17]; see also Remark 7 in order to see that assumption GC*∗* can be replaced by a measurability condition). ∎

Remark 7.

Assume that $w_{\theta}:\mathcal{X}\mapsto(0,\infty)$ is any measurable and uniformly bounded function. Then condition F1∗ together with condition

M1’)

$(\mathcal{F}_{\infty}/w_{\theta})^{2}:=\{(f-g)^{2}/w_{\theta}^{2}:f,g\in\mathcal{F}\}$ * is a $P_{y}$ -measurable class of functions (see Definition 2.3.3 on page 110 in [17]), i.e. the function*

[TABLE]

is measurable on the completion of $((\mathcal{Y}\times\mathcal{X})^{N},(\mathcal{A}\times\mathcal{B})^{N},P_{y,x}^{N})$ for every $N$ and for every $(e_{1},e_{2},\dots,e_{N})\in\mathbb{R}^{N}$

imply condition GC∗ (cf. Remark 4). Of course, condition PM implies also condition M1’.

Remark 8.

Assume that $w_{\theta}:\mathcal{X}\mapsto(0,\infty)$ is any measurable and uniformly bounded function. Then, the uniform entropy conditions (19) and (35) are equivalent. To prove this claim, define the projections $\phi_{y}$ and $\phi_{x}$ as in footnote 2, define $\mathcal{F}\circ\phi_{y}:=\{f\circ\phi_{y}:f\in\mathcal{F}\}$ and let $Q_{y}:=Q_{y,x}\circ\phi_{y}^{-1}$ for any probability measure $Q_{y,x}$ on $(\mathcal{Y},\mathcal{A})\times(\mathcal{X},\mathcal{B})$ . Then note that $\lVert f\rVert_{L_{2}(Q_{y})}=\lVert f\circ\phi_{y}\rVert_{L_{2}(Q_{y,x})}$ for every measurable $f:\mathcal{Y}\mapsto\mathbb{R}$ and deduce that

[TABLE]

where the supremum on the left side ranges over the set of all finitely discrete probability measures on $\mathcal{Y}$ such that $\lVert F\rVert_{L_{2}(Q_{y})}>0$ , and where the supremum on right side ranges over the set of all finitely discrete probability measures on $\mathcal{Y}\times\mathcal{X}$ such that $\lVert F\circ\phi_{y}\rVert_{L_{2}(Q_{y,x})}>0$ .

Next, define for each finitely discrete probability measure $Q_{y,x}$ on $\mathcal{Y}\times\mathcal{X}$ a corresponding finitely discrete measure $R_{y,x}$ by setting

[TABLE]

Since this density is strictly positive, it follows that the supports of $Q_{y,x}$ and $R_{y,x}$ must be the same. Moreover, it follows that

[TABLE]

which shows that the mapping $Q_{y,x}\mapsto R_{y,x}$ is a bijection between the set of all finitely discrete probability measures on $\mathcal{Y}\times\mathcal{X}$ and the set of all finitely discrete measures on $\mathcal{Y}\times\mathcal{X}$ . Obviously, this bijection satisfies $Q_{y,x}(f\circ\phi_{y})=R_{y,x}[(f\circ\phi_{y})/(w_{\theta}\circ\phi_{x})]$ for every $f:\mathcal{Y}\mapsto\mathbb{R}$ . Conclude that

[TABLE]

where the supremum over $Q_{y,x}$ ranges over the set of all finitely discrete probability measures on $\mathcal{Y}\times\mathcal{X}$ such that $\lVert F\circ\phi_{y}\rVert_{L_{2}(Q_{y,x})}>0$ , and where the supremum over $R_{y,x}$ ranges over the set of all finitely discrete measures on $\mathcal{Y}\times\mathcal{X}$ such that $\lVert(F\circ\phi_{y})/(w_{\theta}\circ\phi_{x})\rVert_{L_{2}(R_{y,x})}>0$ . Next, note that

[TABLE]

where the supremum over $Q_{y,x}$ ranges over the set of all finitely discrete measures on $\mathcal{Y}\times\mathcal{X}$ such that $\lVert(F\circ\phi_{y})/(w_{\theta}\circ\phi_{x})\rVert_{L_{2}(Q_{y,x})}>0$ (see Problem 2.10.5 on page 204 in [17]). Now combine equations (37), (38) and (39) to obtain

[TABLE]

where, by an abuse of notation, the right side can be written as the integrand on the left side in (35). This shows that the uniform entropy integrals in (19) and (35) are actually the same.

Remark 9.

Assume that $w_{\theta}:\mathcal{X}\mapsto(0,\infty)$ is any measurable and uniformly bounded function. Then condition F1∗ does obviously imply condition B2. Moreover, from Remark 8 it follows that condition F1∗ implies also condition F1. Since condition GC∗ is stronger than condition GC, it follows further that conditions F1∗, GC∗ and M2 imply that $\mathcal{F}$ is a $P_{y}$ -Donsker class (cf. Remark 4).

Theorem 8 (conditional weak convergence).

Let $\{\mathbf{S}_{N}^{R}\}_{N=1}^{\infty}$ be the sequence of vectors of sample inclusion indicators corresponding to a sequence of measurable Poisson sampling designs, let $\mathcal{F}$ be a class of measurable functions $f:\mathcal{Y}\mapsto\mathbb{R}$ and let $\{\mathbb{G}_{N}^{\prime}\}_{N=1}^{\infty}$ be the sequence of HTEPs corresponding to $\{\mathbf{S}_{N}^{R}\}_{N=1}^{\infty}$ and $\mathcal{F}$ . Assume that the first order sample inclusion probabilities corresponding to each vector $\mathbf{S}_{N}^{R}$ are defined as in (26) and assume that conditions B0, B1, F1∗ and GC∗ are satisfied. Then it follows that

(i)

there exists zero-mean Gaussian process $\{\mathbb{G}^{\prime}f:f\in\mathcal{F}\}$ with covariance function given by $\Sigma^{\prime}$ as defined in (28) (or in assumption A1) which is a Borel measurable and tight mapping from some probability space into $l^{\infty}(\mathcal{F})$ such that

[TABLE]

(ii)

the sample paths $f\mapsto\mathbb{G}^{\prime}f$ are uniformly continuous w.r.t. the semimetric $\rho_{w}(f,g):=[P_{y}(f-g)^{2}/w_{\theta}^{2}]^{1/2}$ with probability $1$ .

Proof.

Assumptions B0 and B1 imply that the function $w_{\theta}$ is well defined and that it is measurable and uniformly bounded, and together with assumption F1*∗* they imply also condition B2. From Lemma 4 and Lemma 2 it follows therefore that $\{\mathbb{G}_{N}^{\prime}\}_{N=1}^{\infty}$ satisfies CWCM for some zero-mean Gaussian limit process with covariance function given by $\Sigma^{\prime}$ as defined in (28) (or in assumption A1). Moreover, Lemma 5 shows that $\mathcal{F}$ is totally bounded w.r.t. $\rho_{w}$ and that $\{\mathbb{G}_{N}^{\prime}\}_{N=1}^{\infty}$ is conditionally AEC w.r.t. $\rho_{w}$ . The two conclusions of the theorem follow now by Corollary 1. ∎

Theorem 9 (Unconditional weak convergence).

Let $\{\mathbf{S}_{N}^{R}\}_{N=1}^{\infty}$ , $\mathcal{F}$ and $\{\mathbb{G}_{N}^{\prime}\}_{N=1}^{\infty}$ be defined as in Theorem 8. Assume that the first order sample inclusion probabilities corresponding to each vector $\mathbf{S}_{N}^{R}$ are defined as in (26) and assume that conditions B0, B1, F1∗ and PM are satisfied. Then it follows that

(i)

there exists zero-mean Gaussian process $\{\mathbb{G}^{\prime}f:f\in\mathcal{F}\}$ with covariance function given by $\Sigma^{\prime}$ as defined in (28) (or in assumption A1) which is a Borel measurable and tight random element of $l^{\infty}(\mathcal{F})$ such that

[TABLE]

(ii)

the sample paths $f\mapsto\mathbb{G}^{\prime}f$ are uniformly continuous w.r.t. the semimetric $\rho_{w}(f,g):=[P_{y}(f/w_{\theta}-g/w_{\theta})^{2}]^{1/2}$ with probability $1$ .

Proof.

Remark 7 shows that conditions F1*∗* and PM imply assumption GC*∗*. The conditions of the present theorem are therefore stronger than those of Theorem 8, and the conclusions of the present theorem follows therefore from Theorem 8 and Remark 2 (note that assumption PM implies that the suprema in Remark 2 are measurable). ∎

Corollary 3 (Joint weak convergence).

Under the assumptions of Theorem 9 it follows that

[TABLE]

where $\mathbb{G}^{\prime}$ is defined as in Theorem 9, $\mathbb{G}_{N}$ is the classical $\mathcal{F}$ -indexed empirical process defined in (25), and where $\mathbb{G}$ is a Borel measurable and tight $P_{y}$ -Brownian Bridge which is independent from $\mathbb{G}^{\prime}$ .

Proof.

In the proof of Theorem 9 it has already been shown that the assumptions of Theorem 9 are stronger than those of Theorem 8 which imply opCWC. Moreover, from Remark 7, Remark 9 and Remark 5 it follows that $\mathcal{F}$ is a $P_{y}$ -Donsker class. The proof of the corollary follows now from an application of Theorem 5. ∎

6 Extensions for Hájek empirical processes

This section is very similar to Section 4 in [11]. It extends the weak convergence results for HTEP sequences to the corresponding Hájek empirical processes (henceforth HEP). Given a class $\mathcal{F}$ of functions $f:\mathcal{Y}\mapsto\mathbb{R}$ , the HEP is defined as

[TABLE]

with $\widehat{N}:=\sum_{i=1}^{N}(S_{i,N}/\pi_{i,N})$ the Horvitz-Thompson estimator of the population size $N$ . Note that the value taken on by $\mathbb{G}_{N}^{\prime\prime}f$ is undefined when $\widehat{N}=0$ . However, this will not be problem here since the assumptions in the forthcoming theory will always imply that

[TABLE]

In fact, this condition allows to consider in place of the HEP as defined in (40) the closely related empirical process given by

[TABLE]

where $\mathbb{P}_{y,N}:=\sum_{i=1}^{N}\delta_{Y_{i}}/N$ is the empirical measure on $\mathcal{Y}$ . In order to see why under condition (41) we can consider $\widetilde{\mathbb{G}}_{N}^{\prime\prime}$ in place of the HEP it is sufficient to observe that

[TABLE]

and that this together with condition (41) implies that any one of the three weak convergence results in $l^{\infty}(\mathcal{F})$ for the sequence $\{\widetilde{\mathbb{G}}_{N}^{\prime\prime}\}_{N=1}^{\infty}$ carries over immediately to the corresponding sequence of HEPs, and viceversa.

The following lemma establishes conditional convergence of the marginal distributions for the sequence $\{\widetilde{\mathbb{G}}_{N}^{\prime\prime}\}_{N=1}^{\infty}$ and hence for the corresponding sequence of HEPs as well.

Lemma 6 (CWCM).

Let $\{\mathbf{S}_{N}^{R}\}_{N=1}^{\infty}$ , $\{\mathbf{p}_{N}\}_{N=1}^{\infty}$ and $\mathcal{F}$ be defined as in Lemma 2, let $\{\mathbf{\underline{\pi}}_{N}\}_{N=1}^{\infty}$ be the sequence of vectors of first order sample inclusion probabilities corresponding to $\{\mathbf{S}_{N}^{R}\}_{N=1}^{\infty}$ , and let $\{\widetilde{\mathbb{G}}^{\prime\prime}\}_{N=1}^{\infty}$ be the sequence of empirical processes defined by (42). Assume that conditions

C1)

$\mathcal{F}$ * contains a constant function which is not identically equal to zero, i.e. a function $f:\mathcal{Y}\mapsto\mathbb{R}$ such that $f\equiv C$ $P_{y}$ -almost surely for some constant $C\neq 0$ ;*

C2)

$P_{y}|f|<\infty$ * for every $f\in\mathcal{F}$ *

and conditions A0, A1 and A2 are satisfied. Then the function

[TABLE]

with $\Sigma^{\prime}$ defined as in assumption A1, is a positive semidefinite covariance function, and for every finite-dimensional $\mathbf{f}\in\mathcal{F}^{r}$ and for every $\mathbf{t}\in\mathbb{R}^{r}$

[TABLE]

where $\Sigma^{\prime\prime}(\mathbf{f})$ is the covariance matrix whose elements are given by $\Sigma^{\prime\prime}_{(ij)}(\mathbf{f}):=\Sigma^{\prime\prime}(f_{i},f_{j})$ .

Proof.

The proof is almost the same as the proof of Lemma 2. Define the sequences of sample inclusion indicators $\{\mathbf{S}_{N}^{P}\}_{N=1}^{\infty}$ and $\{\mathbf{S}_{N}^{P_{0}}\}_{N=1}^{\infty}$ as in the proof of Lemma 2. Then, define the sequence of stochastic processes $\{\mathbb{\widetilde{T}}_{N}^{P}\}_{N=1}^{\infty}:=\{\mathbb{\widetilde{T}}_{N}^{P}f:f\in\mathcal{F}\}\}_{N=1}^{\infty}$ by

[TABLE]

Note that $E_{d}\mathbb{\widetilde{T}}_{N}^{P}f=0$ for every $f\in\mathcal{F}$ , and that

[TABLE]

where $\Sigma_{N}^{\prime}(f,g):=E_{d}\mathbb{T}_{N}^{P}f\mathbb{T}_{N}^{P}g$ is defined as in the proof of Lemma 2. Now, it follows from assumptions C1, C2 and A1 that

[TABLE]

where $\Sigma^{\prime\prime}(f,g)$ is defined as in (44). This implies that $\Sigma^{\prime\prime}:\mathcal{F}^{2}\mapsto\mathbb{R}$ must be positive semidefinite and proves the first part of the conclusion of the lemma.

In order to prove the second part of the conclusion, consider for some given $\mathbf{f}\in\mathcal{F}^{r}$ the triangular array of rowwise conditionally independent random vectors

[TABLE]

where $\mathbb{P}_{y,N}\mathbf{f}:=(\mathbb{P}_{y,N}f_{1},\mathbb{P}_{y,N}f_{2},\dots,\mathbb{P}_{y,N}f_{r})^{\intercal}$ . Observe that the random vector $\mathbb{\widetilde{T}}_{N}^{P}\mathbf{f}:=(\mathbb{\widetilde{T}}_{N}^{P}f_{1},\mathbb{\widetilde{T}}_{N}^{P}f_{2},\dots,\mathbb{\widetilde{T}}_{N}^{P}f_{r})^{\intercal}$ can be written as

[TABLE]

Using the fact that $\Sigma_{N}^{\prime\prime}(f,g):=E_{d}\mathbb{\widetilde{T}}_{N}^{P}f\mathbb{\widetilde{T}}_{N}^{P}g\overset{P(as)}{\rightarrow}\Sigma^{\prime\prime}(f,g)$ along with condition A2 it is not difficult to show that the Lindeberg condition

[TABLE]

must be satisfied whenever $\mathbf{f}\in\mathcal{F}$ and $\mathbf{t}\in\mathbb{R}^{r}$ such that $\mathbf{t}^{\intercal}\Sigma^{\prime\prime}(\mathbf{f})\mathbf{t}>0$ . Therefore it follows that

[TABLE]

Next, consider the sequence of stochastic processes $\{\mathbb{\widetilde{T}}_{N}^{P_{0}}\}_{N=1}^{\infty}:=\{\{\mathbb{\widetilde{T}}_{N}^{P_{0}}f:f\in\mathcal{F}\}\}_{N=1}^{\infty}$ with $\mathbb{\widetilde{T}}_{N}^{P_{0}}f$ defined in the same way as $\mathbb{\widetilde{T}}_{N}^{P}f$ but with $\mathbf{S}_{N}^{P_{0}}$ in place of $\mathbf{S}_{N}^{P}$ . Use assumption A0 along with Result 2 in Section 4 to show that

[TABLE]

Note that this does not require to know the joint distributions of the vectors $\mathbf{S}_{N}^{P_{0}}$ and $\mathbf{S}_{N}^{P}$ .

Third, consider the sequence of stochastic processes $\{\mathbb{\widetilde{Y}}_{N}^{R}\}_{N=1}^{\infty}:=\{\{\mathbb{\widetilde{Y}}_{N}^{R}f:f\in\mathcal{F}\}\}_{N=1}^{\infty}$ with $\mathbb{\widetilde{Y}}_{N}^{R}f$ defined in the same way as $\mathbb{\widetilde{Y}}_{N}^{P}f$ but with $\mathbf{S}_{N}^{R}$ in place of $\mathbf{S}_{N}^{P}$ . Use Result 3 in Section 4 to conclude that

[TABLE]

as well.

Finally, note that the definition of $\mathbb{\widetilde{Y}}_{N}^{R}$ coincides with the one of $\mathbb{\widetilde{G}}_{N}^{\prime\prime}$ except for the fact that the former contains the first order sample inclusion probabilities corresponding to $\mathbf{S}_{N}^{P}$ in place of those corresponding to $\mathbf{S}_{N}^{R}$ , i.e. $\mathbb{\widetilde{Y}}_{N}^{R}$ contains $p_{i,N}$ in place of $\pi_{i,N}:=E_{d}S_{i,N}^{R}$ . However, this problem can be easily fixed by using Result 1. ∎

Remark 10.

Assumption A0 implies that condition (41) holds. By (43) it follows therefore that the conditions of Lemma 6 imply also that

[TABLE]

Remark 11.

Assumption C2 is certainly satisfied if assumption F1 holds or if $w_{\theta}:\mathcal{X}\mapsto(0,\infty)$ is measurable and uniformly bounded and assumption F1∗ holds.

The next two lemmas establish conditional AEC of the $\{\widetilde{\mathbb{G}}^{\prime\prime}\}_{N=1}^{\infty}$ sequence for the case where there is a positive lower bound for the $\pi_{i,N}$ ’s and for the case where the $\pi_{i,N}$ ’s are proportional to some size variable which can take on arbitrarily small values, respectively.

Lemma 7 (conditional AEC).

Let $\mathcal{F}$ be a class of functions $f:\mathcal{Y}\mapsto\mathbb{R}$ which satisfies assumption M2. Then, under the assumptions of Lemma 3, it follows that

[TABLE]

Proof.

First, note that

[TABLE]

where $\mathbb{G}^{\prime}_{N}$ is the HTEP. From this it follows that

[TABLE]

Since by Theorem 2.8 in [8] the $S_{i,N}^{R}$ ’s are negatively associated, it follows that

[TABLE]

and assumption A2*∗* implies that the right side in the latter inequality is bounded in probability (eventually almost surely). To complete the proof of the lemma it remains therefore to show that

[TABLE]

To this aim note that

[TABLE]

and that $\lVert\mathbb{P}_{y,N}-P_{y}\rVert_{\mathcal{F}}\overset{as*}{\rightarrow}0$ because assumptions F1, GC and M2 imply that $\mathcal{F}$ is a $P_{y}$ -Donsker class (see Remark 4) and hence an outer almost sure $P_{y}$ -Glivenko-Cantelli class. ∎

Lemma 8 (conditional AEC).

Let $\mathcal{F}$ be a class of functions $f:\mathcal{Y}\mapsto\mathbb{R}$ which satisfies assumptions C1 and M2. Then, under the assumptions of Lemma 5, it follows that

[TABLE]

Proof.

Follow the steps in the proof of Lemma 7 up to inequality (46) (with $\mathcal{F}_{\delta_{N}}^{w}$ in place of $\mathcal{F}_{\delta_{N}}$ ) and note that the right side of that inequality is bounded in probability (eventually almost surely) because assumptions B0 and B1 imply (32) (see the proof of Lemma 4), and assumptions C1 and F1*∗* imply $Ew_{\theta}^{-2}(X_{1})<\infty$ . To complete the proof of the lemma it remains therefore to show that

[TABLE]

To this aim note that

[TABLE]

and that $\lVert\mathbb{P}_{y,N}-P_{y}\rVert_{\mathcal{F}}\overset{as*}{\rightarrow}0$ because assumptions F1*∗*, GC*∗* and M2 imply that $\mathcal{F}$ is a $P_{y}$ -Donsker class (see Remark 9) and hence an outer almost sure $P_{y}$ -Glivenko-Cantelli class. ∎

Having found sufficient conditions for CWCM and for conditional AEC w.r.t to suitable semimetrics, we are now ready to prove the three desired weak convergence results. Since the sufficient conditions under consideration imply condition (41) (see Remark 10), the weak convergence results for $\{\widetilde{\mathbb{G}}^{\prime\prime}\}_{N=1}^{\infty}$ and for the HEP sequence $\{\mathbb{G}_{N}^{\prime\prime}\}_{N=1}^{\infty}$ are equivalent. Since only the HEP sequence is of interest in applications, the weak convergence results will be stated only in terms of the latter.

Theorem 10 (Conditional and unconditional weak convergence).

Let $\mathcal{F}$ be a class of functions $f:\mathcal{Y}\mapsto\mathbb{R}$ which satisfies assumption C1 and let $\{\mathbb{G}_{N}^{\prime\prime}\}_{N=1}^{\infty}$ be the corresponding sequence of HEPs as defined in (40). Then, under the assumptions of Theorem 7 or the assumptions of Theorem 9 it follows that

(i)

there exists zero-mean Gaussian process $\{\mathbb{G}^{\prime\prime}f:f\in\mathcal{F}\}$ with covariance function $\Sigma^{\prime\prime}:\mathcal{F}^{2}\mapsto\mathbb{R}$ defined as in (44) which is a Borel measurable and tight random element of $l^{\infty}(\mathcal{F})$ ;

(ii)

(conditional weak convergence)

[TABLE]

(iii)

(unconditional weak convergence)

[TABLE]

Moreover,

(iv)

under the assumptions of Theorem 7 it follows that the sample paths $f\mapsto\mathbb{G}^{\prime\prime}f$ are uniformly $\rho$ -continuous with probability $1$ ;

(v)

under the assumptions of Theorem 9 it follows that the sample paths $f\mapsto\mathbb{G}^{\prime\prime}f$ are uniformly $\rho_{w}$ -continuous with probability $1$ .

Proof.

Consider first the assumptions of Theorem 7. Remark 3 and Remark 11 show that the assumptions of Theorem 7 together with assumption C1 imply the assumptions of Lemma 6, and the conclusion of that lemma says that the sequence of auxiliary processes $\{\widetilde{\mathbb{G}}^{\prime\prime}\}_{N=1}^{\infty}$ satisfies CWCM for some zero-mean Gaussian limit process with covariance function given by $\Sigma^{\prime\prime}$ as defined in (44). Next, in the proof of Theorem 7 it has already been shown that the assumptions of that theorem are stronger than those of Lemma 3. Hence, the assumptions of Theorem 7 along with assumption C1 imply the assumptions of Lemma 7 (use the fact that assumption PM is stronger that assumption M2; see Remark 5) whose conclusion implies that $\{\widetilde{\mathbb{G}}^{\prime\prime}\}_{N=1}^{\infty}$ is conditionally AEC w.r.t. $\rho$ . Since the first part of the conclusion of Lemma 3 says that $\mathcal{F}$ is totally bounded w.r.t. $\rho$ , it follows by Corollary 1 that $\{\widetilde{\mathbb{G}}^{\prime\prime}\}_{N=1}^{\infty}$ (and hence also the corresponding sequence of HEPs) satisfies part (ii) of the conclusion of the present theorem for some $\mathbb{H}^{\prime}$ which satisfies the conditions given in parts (i) and (iv). Part (iii) of the conclusion of the theorem follows now from Remark 2 (recall that condition PM implies that the suprema in Remark 2 are measurable).

Now, consider the assumptions of Theorem 9. In the proof of Theorem 9 it has already been shown that its assumptions are stronger than those of Theorem 8, and in the proof of the latter theorem it has been shown that its assumptions imply the conditions of Lemma 4 whose conclusion says that conditions A0, A1 and A2 are satisfied. Use Remark 11 to conclude that the assumptions of Theorem 9 along with assumption C1 imply the assumptions of Lemma 6 whose conclusion says that the sequence of auxiliary processes $\{\widetilde{\mathbb{G}}^{\prime\prime}\}_{N=1}^{\infty}$ satisfies CWCM for some zero-mean Gaussian limit process with covariance function given by $\Sigma^{\prime\prime}$ as defined in (44). Next, recall that in the proof of Theorem 8 it has been shown that its assumptions imply those of Lemma 5, and conclude that the assumptions of Theorem 9 along with condition C1 must therefore imply the assumptions of Lemma 8 (use the fact that assumption PM is stronger that assumption M2; see Remark 5) whose conclusion implies that $\{\widetilde{\mathbb{G}}^{\prime\prime}\}_{N=1}^{\infty}$ is conditionally AEC w.r.t. $\rho_{w}$ . Since the first part of the conclusion of Lemma 5 says that $\mathcal{F}$ is totally bounded w.r.t. $\rho_{w}$ , it follows by Corollary 1 that the sequence of auxiliary processes $\{\widetilde{\mathbb{G}}^{\prime\prime}\}_{N=1}^{\infty}$ (and hence also the corresponding sequence of HEPs) satisfies part (ii) of the conclusion of the present theorem for some $\mathbb{H}^{\prime}$ which satisfies the conditions given in parts (i) and (v). Again, part (iii) of the conclusion of the theorem follows now from Remark 2 (recall that condition PM implies that the suprema in Remark 2 are measurable). ∎

Corollary 4 (Joint weak convergence).

Under the assumptions of Theorem 10 it follows that

(i)

[TABLE]

where $\mathbb{G}^{\prime\prime}$ is defined as in the conclusion of Theorem 10, $\mathbb{G}_{N}$ is the classical $\mathcal{F}$ -indexed empirical process defined in (25), and where $\mathbb{G}$ is a Borel measurable and tight $P_{y}$ -Brownian Bridge which is independent from $\mathbb{G}^{\prime\prime}$ .

Proof.

As already shown in the proof of Corollary 2 (Corollary 3), the assumptions of Theorem 7 (Theorem 9) imply that $\mathcal{F}$ is a $P_{y}$ -Donsker class. The conclusion of the present corollary follows therefore from Theorem 10 and Theorem 5. ∎

7 Simulation results

This section about simulation results is analogous to Section 5 in [11] (see also Appendix S4 in [1]). The numerical results given in this section have been obtain by using the R Statistical Software [13] in order to repeat $B=1000$ times the following steps:

Generate a population of $N$ independent observations $(Y_{i},X_{i})$ from the linear model $Y_{i}=X_{i}+U_{i}$ , where the $X_{i}$ ’s are i.i.d. lognormal with $E(\ln X_{i})=0$ and $Var(\ln X_{i})=1$ , and where the $U_{i}$ ’s are independent zero mean Gaussian random variables with $Var(U_{i})=X_{i}^{2}$ , $i=1,2,\dots,N$ .

2)

Select a sample $\mathbf{s}_{N}:=(s_{1,N},s_{2,N},\dots,s_{N,N})$ according to the CPS design with sample size $n$ (specified below) and with first order sample inclusion probabilities $\pi_{i,N}$ proportional to the $X_{i}$ values (this step was performed by using the function "UPmaxentropy" from the R package "sampling" [16]).

3)

Compute the Horvitz-Thompson and the Hájek estimator for the population cdf $F_{Y,N}(t):=\sum_{i=1}^{N}I(Y_{i}\leq t)/N$ , $t\in\mathbb{R}$ , and compute the uniform distance between each of those estimators and $F_{Y,N}$ , i.e. compute $\lVert\mathbb{G}^{\prime}_{N}\rVert_{\mathcal{F}}$ and $\lVert\mathbb{G}^{\prime\prime}_{N}\rVert_{\mathcal{F}}$ for the case where $\mathcal{F}:=\{I(y\leq t):t\in\mathbb{R}\}$ .

4)

Estimate the $\gamma$ -quantiles $q_{\gamma}^{\prime}$ and $q_{\gamma}^{\prime\prime}$ of the limiting distributions of $\lVert\mathbb{G}^{\prime}_{N}\rVert_{\mathcal{F}}$ and $\lVert\mathbb{G}^{\prime\prime}_{N}\rVert_{\mathcal{F}}$ , i.e. the $\gamma$ -quantiles of the distributions of $\lVert\mathbb{G}^{\prime}\rVert_{\mathcal{F}}$ and $\lVert\mathbb{G}^{\prime\prime}\rVert_{\mathcal{F}}$ . This was done by using Algorithm 5.1 in [10] which was also used in the simulation study in [1]. The details for the implementation of this algorithm are described below.

5)

Compute the asymptotic uniform $\gamma$ -confidence bands for the population cdf $F_{Y,N}$ based on the Horvitz-Thompson and the Hájek estimators and verify whether $F_{Y,N}$ lies within these confidence bands, i.e. verify whether $\lVert\mathbb{G}^{\prime}_{N}\rVert_{\mathcal{F}}\leq\widehat{q}_{\gamma}^{\prime}$ and whether $\lVert\mathbb{G}^{\prime\prime}_{N}\rVert_{\mathcal{F}}\leq\widehat{q}_{\gamma}^{\prime\prime}$ , where $\widehat{q}_{\gamma}^{\prime}$ and $\widehat{q}_{\gamma}^{\prime\prime}$ are the estimates of $q_{\gamma}^{\prime}$ and $q_{\gamma}^{\prime\prime}$ obtained from step 4. Note that the widths of the two asymptotic uniform $2\gamma$ -confidence bands for $F_{Y,N}$ are given by $2\widehat{q}_{\gamma}^{\prime}N^{-1/2}$ and $2\widehat{q}_{\gamma}^{\prime\prime}N^{-1/2}$ , respectively.

The $\gamma$ -quantiles of the distributions of $\lVert\mathbb{G}^{\prime}\rVert_{\mathcal{F}}$ and $\lVert\mathbb{G}^{\prime\prime}\rVert_{\mathcal{F}}$ were estimated according to the following procedure (see Algorithm 5.1 in [10]):

i)

Estimate the covariance matrices $\Sigma^{\prime}(\mathbf{f})$ and $\Sigma^{\prime\prime}(\mathbf{f})$ for $\mathbf{f}:=(I(y\leq Y_{i_{1}}),I(y\leq Y_{i_{2}}),\dots,I(y\leq Y_{i_{n}}))^{\intercal}$ where $(i_{1},i_{2},\dots,i_{n})$ correspond to the sampled population units, i.e. $(i_{1},i_{2},\dots,i_{n})$ are the values of the subscript $i$ for which $s_{i,N}=1$ , $i=1,2,\dots,N$ . The components $\Sigma^{\prime}_{r,c}(\mathbf{f})$ and $\Sigma^{\prime\prime}_{r,c}(\mathbf{f})$ , $r,c=1,2,\dots,n$ , of the two covariance matrices were estimated as follows (cf. Lemma 4 and Lemma 6):

[TABLE]

with

[TABLE]

and

[TABLE]

with

[TABLE]

ii)

Compute the Cholesky decompositions of the estimated covariance matrices, i.e. compute two lower triangular matrices $L$ and $H$ such that $\widehat{\Sigma}^{\prime}(\mathbf{f})=LL^{\intercal}$ and $\widehat{\Sigma}^{\prime\prime}_{i,j}(\mathbf{f})=HH^{\intercal}$ .

iii)

Generate independently $1000$ random vectors $\mathbf{Z}_{b}:=(Z_{1,b},Z_{2,b},\dots,Z_{n,b})^{\intercal}$ , $b=1,2,\dots,1000$ , whose components $Z_{k,b}$ are i.i.d. standard normal random variables and compute the vectors $\mathbf{G}_{b}^{\prime}:=L\mathbf{Z}_{b}$ and $\mathbf{G}_{b}^{\prime\prime}:=H\mathbf{Z}_{b}$ which can be considered as realizations of the limit processes $\mathbb{G}^{\prime}$ and $\mathbb{G}^{\prime\prime}$ , respectively.

iv)

for each $b=1,2,\dots,1000$ compute the maximum norms $\lVert\mathbf{G}_{b}^{\prime}\rVert_{\infty}$ and $\lVert\mathbf{G}_{b}^{\prime\prime}\rVert_{\infty}$ (i.e. the two maxima of the absolute values of the components of $\mathbf{G}_{b}^{\prime}$ and $\mathbf{G}_{b}^{\prime\prime}$ ), put the two vectors $(\lVert\mathbf{G}_{1}^{\prime}\rVert_{\infty},\lVert\mathbf{G}_{2}^{\prime}\rVert_{\infty},\dots,\lVert\mathbf{G}_{1000}^{\prime}\rVert_{\infty})$ and $(\lVert\mathbf{G}_{1}^{\prime\prime}\rVert_{\infty},\lVert\mathbf{G}_{2}^{\prime\prime}\rVert_{\infty},\dots,\lVert\mathbf{G}_{1000}^{\prime\prime}\rVert_{\infty})$ in ascending order and put $\widehat{q}_{\gamma}^{\prime}$ equal to the $\gamma$ -quantile of the first vector, and put $\widehat{q}_{\gamma}^{\prime\prime}$ equal to the $\gamma$ -quantile of the second vector.

Table 1 (for the HTEP) and Table 2 (for the HEP) summarize the simulation results. For each considered population size $N=500,1000,2000$ , for each considered sampling fraction $\alpha:=n/N=0.05,0.10$ and for each considered confidence level $\gamma=0.90,0.95,0.99$ the two tables report the estimate of the coverage probability of the corresponding confidence band for $F_{Y,N}$ as well as the average width (the first figure within each bracket) and the maximum width (the second figure within each bracket) of the $B=1000$ simulated confidence bands. The simulation results suggest that the confidence bands based on the HTEP and on the HEP are very similar. Their coverage accuracy is quite precise for the populations of size $N\geq 1000$ and seems not to depend on the sampling fraction $\alpha$ . As expected, the with of the confidence bands is roughly proportional to $n^{-1/2}$ and it appears to be quite stable from sample to sample (the differences between the maximum widths and the average widths are rather small). However, for many applications the widths of the confidence bands might be too large. This problem can be probably overcome through alternative estimators which use the information provided by the auxiliary variable $X$ more efficiently (see e.g. [14] or [12] and references therein).

Bibliography17

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bertail et al. [2017] P. Bertail, E. Chautru, and S. Clémençon. Empirical processes in survey sampling with (conditional) Poisson designs. Scand. J. Stat. , 44(1):97–111, 2017. ISSN 0303-6898. doi: 10.1111/sjos.12243 . URL https://doi.org/10.1111/sjos.12243 . · doi ↗
2Brown [1986] L. D. Brown. Fundamentals of statistical exponential families with applications in statistical decision theory , volume 9 of Institute of Mathematical Statistics Lecture Notes—Monograph Series . Institute of Mathematical Statistics, Hayward, CA, 1986. ISBN 0-940600-10-2.
3Chen et al. [1994] X.-H. Chen, A. P. Dempster, and J. S. Liu. Weighted finite population sampling to maximize entropy. Biometrika , 81(3):457–469, 1994. ISSN 0006-3444. doi: 10.1093/biomet/81.3.457 . URL https://doi.org/10.1093/biomet/81.3.457 . · doi ↗
4Conti [2014] P. L. Conti. On the estimation of the distribution function of a finite population under high entropy sampling designs, with applications. Sankhya B , 76(2):234–259, 2014. ISSN 0976-8386. doi: 10.1007/s 13571-014-0083-x . URL https://doi.org/10.1007/s 13571-014-0083-x . · doi ↗
5Dupačová [1979] J. Dupačová. A note on rejective sampling. In Contributions to statistics , pages 71–78. Reidel, Dordrecht-Boston, Mass.-London, 1979.
6Hájek [1964] J. Hájek. Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann. Math. Statist. , 35:1491–1523, 1964. ISSN 0003-4851. doi: 10.1214/aoms/1177700375 . URL https://doi.org/10.1214/aoms/1177700375 . · doi ↗
7Hoeffding [1963] W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. , 58:13–30, 1963. ISSN 0162-1459. URL http://links.jstor.org/sici?sici=0162-1459(196303)58:301<13:PIFSOB>2.0.CO;2-D&origin=MSN .
8Joag-Dev and Proschan [1983] K. Joag-Dev and F. Proschan. Negative association of random variables, with applications. Ann. Statist. , 11(1):286–295, 1983. ISSN 0090-5364. doi: 10.1214/aos/1176346079 . URL https://doi.org/10.1214/aos/1176346079 . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Functional Central Limit Theorems for Conditional Poisson sampling Designs††thanks: This work was supported by the grant 2016-ATE-0459 and the grant 2017-ATE-0402 from Università degli Studi di Milano-Bicocca.

Abstract

1 Introduction

2 Notation and Definitions

3 Weak convergence in bounded function spaces in the context of survey sampling

Theorem 1**.**

Proof.

Remark 1**.**

Theorem 2**.**

Proof.

Theorem 3**.**

Proof.

Theorem 4**.**

Proof.

Corollary 1**.**

Proof.

Remark 2**.**

Theorem 5**.**

Proof.

4 Conditional Poisson sampling (or rejective sampling)

Result 1**.**

Result 2**.**

Result 3**.**

5 Weak convergence theorems for CPS designs

Lemma 1**.**

Proof.

Lemma 2** (CWCM).**

Proof.

Remark 3**.**

5.1 CPS designs with a positive lower bound on the first order sample inclusion probabilities

Lemma 3** (Total boundedness and conditional AEC).**

Proof.

Remark 4**.**

Remark 5**.**

Remark 6**.**

Theorem 6** (conditional weak convergence).**

Proof.

Theorem 7** (Unconditional weak convergence).**

Proof.

Corollary 2** (Joint weak convergence).**

Proof.

5.2 CPS designs with first order sample inclusion probabilities proportional to some size variable which might take on arbitrarily small values

Lemma 4** (CWCM).**

Proof.

Lemma 5** (Total boundedness and conditional AEC).**

Proof.

Remark 7**.**

Remark 8**.**

Remark 9**.**

Theorem 8** (conditional weak convergence).**

Proof.

Theorem 9** (Unconditional weak convergence).**

Proof.

Corollary 3** (Joint weak convergence).**

Proof.

6 Extensions for Hájek empirical processes

Lemma 6** (CWCM).**

Proof.

Remark 10**.**

Remark 11**.**

Lemma 7** (conditional AEC).**

Proof.

Lemma 8** (conditional AEC).**

Proof.

Theorem 10** (Conditional and unconditional weak convergence).**

Proof.

Corollary 4** (Joint weak convergence).**

Proof.

7 Simulation results

Theorem 1.

Remark 1.

Theorem 2.

Theorem 3.

Theorem 4.

Corollary 1.

Remark 2.

Theorem 5.

Result 1.

Result 2.

Result 3.

Lemma 1.

Lemma 2 (CWCM).

Remark 3.

Lemma 3 (Total boundedness and conditional AEC).

Remark 4.

Remark 5.

Remark 6.

Theorem 6 (conditional weak convergence).

Theorem 7 (Unconditional weak convergence).

Corollary 2 (Joint weak convergence).

Lemma 4 (CWCM).

Lemma 5 (Total boundedness and conditional AEC).

Remark 7.

Remark 8.

Remark 9.

Theorem 8 (conditional weak convergence).

Theorem 9 (Unconditional weak convergence).

Corollary 3 (Joint weak convergence).

Lemma 6 (CWCM).

Remark 10.

Remark 11.

Lemma 7 (conditional AEC).

Lemma 8 (conditional AEC).

Theorem 10 (Conditional and unconditional weak convergence).

Corollary 4 (Joint weak convergence).