Empirical Process Results for Exchangeable Arrays

Laurent Davezies; Xavier D'Haultfoeuille; Yannick Guyonvarch

arXiv:1906.11293·math.ST·April 18, 2023

Empirical Process Results for Exchangeable Arrays

Laurent Davezies, Xavier D'Haultfoeuille, Yannick Guyonvarch

PDF

TL;DR

This paper establishes uniform laws of large numbers and central limit theorems for exchangeable arrays, which model dependence in dyadic data and multiway clustering, extending classical results to dependent data structures.

Contribution

It provides the first uniform laws of large numbers and CLTs for exchangeable arrays, including bootstrap convergence, under conditions similar to i.i.d. data.

Findings

01

Proves uniform laws of large numbers for exchangeable arrays.

02

Establishes central limit theorems for exchangeable arrays.

03

Demonstrates bootstrap convergence for dependent array data.

Abstract

Exchangeable arrays are natural tools to model common forms of dependence between units of a sample. Jointly exchangeable arrays are well suited to dyadic data, where observed random variables are indexed by two units from the same population. Examples include trade flows between countries or relationships in a network. Separately exchangeable arrays are well suited to multiway clustering, where units sharing the same cluster (e.g. geographical areas or sectors of activity when considering individual wages) may be dependent in an unrestricted way. We prove uniform laws of large numbers and central limit theorems for such exchangeable arrays. We obtain these results under the same moment restrictions and conditions on the class of functions as those typically assumed with i.i.d. data. We also show the convergence of bootstrap processes adapted to such arrays.

Tables2

Table 1. Table 1: KS tests of F T 𝒊 , t = F T 𝒊 , t + 1 subscript 𝐹 subscript 𝑇 𝒊 𝑡 subscript 𝐹 subscript 𝑇 𝒊 𝑡 1 F_{T_{\bm{i},t}}=F_{T_{\bm{i},t+1}} under different dependence assumptions

Pairs of	KS test	p-values under different assumptions
years	statistic	i.i.d.	P.W. cl.	E. cl.	I. cl.	dyadic
2012-2013	0.048	$< 0.001$	$< 0.001$	$< 0.001$	$< 0.001$	$< 0.001$
2013-2014	0.018	$< 0.001$	$< 0.001$	$< 0.001$	0.026	0.038
2014-2015	0.022	$< 0.001$	$< 0.001$	$< 0.001$	0.005	0.007
2015-2016	0.002	0.44	0.391	0.377	0.951	0.998
2016-2017	0.012	$< 0.001$	$< 0.001$	$< 0.001$	0.215	0.254
2017-2018	0.045	$< 0.001$	$< 0.001$	$< 0.001$	$< 0.001$	$< 0.001$
Notes: data from the Comtrade database. “cl.”, “E”, “I” and “P.W.” stand for clustering, exporter, importer and pairwise, respectively. The p-values were obtained with 1,000 bootstrap samples.

Table 2. Table 2: Point estimates of θ 0 subscript 𝜃 0 \theta_{0} and p-values of θ 0 j = 0 subscript 𝜃 0 𝑗 0 \theta_{0j}=0 under different dependence assumptions

		p-values under different assumptions
Variable	Estimator	i.i.d	P.W. cl.	E. cl.	I. cl.	dyadic
Log(E’s GDP)	0.732	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$
Log(I’s GDP)	0.741	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$
Log(E’s PCGDP)	0.157	0.003	$< 10^{- 3}$	0.04	0.001	0.078
Log(I’s PCGDP)	0.135	0.003	$< 10^{- 3}$	0.004	0.055	0.076
Log of distance	-0.784	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$
Contiguity	0.193	0.064	0.16	0.112	0.077	0.461
Common-language	0.746	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$	0.056
Colonial-tie	0.025	0.867	0.902	0.891	0.882	0.952
Landlocked E	-0.863	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$	0.004
Landlocked I	-0.696	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$	0.011
E’s remoteness	0.66	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$	$< 10^{- 3}$	0.036
I’s remoteness	0.562	$< 10^{- 3}$	$< 10^{- 3}$	0.003	0.004	0.105
P-T agreement	0.181	0.041	0.117	0.054	0.122	0.456
Openness	-0.107	0.416	0.522	0.498	0.453	0.771
Notes: data from Santos Silva and Tenreyro (2006), same specification as in their Table 3. “cl.”, “E”, “I”, “PCGDP”, “P-T”, “P.W.” stand for clustering, exporter, importer, per capita GPD, preferential-trade and pairwise, respectively. The p-values for the last column were obtained with 1,000 bootstrap samples.

Equations67

\overline{B} = {b = (b_{1}, ... b_{k}) \in B : \forall (i, j) \in {1, ..., k}^{2}, i \neq = j, b_{i} \neq = b_{j}} .

\overline{B} = {b = (b_{1}, ... b_{k}) \in B : \forall (i, j) \in {1, ..., k}^{2}, i \neq = j, b_{i} \neq = b_{j}} .

E_{r} = {(e_{1}, ..., e_{k}) \in {0, 1}^{k} : j = 1 \sum k e_{j} = r} .

E_{r} = {(e_{1}, ..., e_{k}) \in {0, 1}^{k} : j = 1 \sum k e_{j} = r} .

Y_{i} = τ ((U_{{i ⊙ e}^{+}})_{e \in \cup_{r = 1}^{k} E_{r}}) \forall i \in I_{k} .

Y_{i} = τ ((U_{{i ⊙ e}^{+}})_{e \in \cup_{r = 1}^{k} E_{r}}) \forall i \in I_{k} .

Y_{i_{1}, i_{2}} = τ (U_{i_{1}}, U_{i_{2}}, U_{{i_{1}, i_{2}}}) .

Y_{i_{1}, i_{2}} = τ (U_{i_{1}}, U_{i_{2}}, U_{{i_{1}, i_{2}}}) .

P_{n} f = \frac{( n - k )!}{n !} i \in I_{n, k} \sum f (Y_{i}),

P_{n} f = \frac{( n - k )!}{n !} i \in I_{n, k} \sum f (Y_{i}),

G_{n} f = n (P_{n} f - P f) .

G_{n} f = n (P_{n} f - P f) .

Q \in Q sup N (η ∣∣ F ∣ ∣_{Q, 1}, F, ∣∣ \cdot ∣ ∣_{Q, 1}) < \infty;

Q \in Q sup N (η ∣∣ F ∣ ∣_{Q, 1}, F, ∣∣ \cdot ∣ ∣_{Q, 1}) < \infty;

\int_{0}^{\infty} Q \in Q sup lo g N (η ∣∣ F ∣ ∣_{Q, 2}, F, ∣∣ \cdot ∣ ∣_{Q, 2}) d η < \infty;

\int_{0}^{\infty} Q \in Q sup lo g N (η ∣∣ F ∣ ∣_{Q, 2}, F, ∣∣ \cdot ∣ ∣_{Q, 2}) d η < \infty;

K (f_{1}, f_{2}) = \frac{1}{( k - 1 ) ! ^{2}} (π, π^{'}) \in S ({1}) \times S ({1^{'}}) \sum C o v (f_{1} (Y_{π (1)}), f_{2} (Y_{π^{'} (1^{'})})) .

K (f_{1}, f_{2}) = \frac{1}{( k - 1 ) ! ^{2}} (π, π^{'}) \in S ({1}) \times S ({1^{'}}) \sum C o v (f_{1} (Y_{π (1)}), f_{2} (Y_{π^{'} (1^{'})})) .

∥ f ∥_{1, 1}

∥ f ∥_{1, 1}

∥ f ∥_{1, 2}

P_{n}^{*} f = \frac{( n - k )!}{n !} i \in I_{n, k} \sum W_{i} f (Y_{i}),

P_{n}^{*} f = \frac{( n - k )!}{n !} i \in I_{n, k} \sum W_{i} f (Y_{i}),

G_{n}^{*} f = n (P_{n}^{*} f - P_{n} f) .

G_{n}^{*} f = n (P_{n}^{*} f - P_{n} f) .

\sup_{h\in\text{BL}_{1}}\left|\mathbb{E}\left(h(\mathbb{G}_{n}^{\ast})\big{|}(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}\right)-\mathbb{E}\left(h(\mathbb{G})\right)\right|\stackrel{{\scriptstyle\text{as}*}}{{\longrightarrow}}0,

\sup_{h\in\text{BL}_{1}}\left|\mathbb{E}\left(h(\mathbb{G}_{n}^{\ast})\big{|}(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}\right)-\mathbb{E}\left(h(\mathbb{G})\right)\right|\stackrel{{\scriptstyle\text{as}*}}{{\longrightarrow}}0,

\mathbb{E}\left(\mathbb{P}^{\ast}_{n}(f)\big{|}(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}\right)=\frac{1}{n^{k}}\sum_{\bm{i}\in\mathbb{I}_{n,k}}f(Y_{\bm{i}})\neq\mathbb{P}_{n}f.

\mathbb{E}\left(\mathbb{P}^{\ast}_{n}(f)\big{|}(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}\right)=\frac{1}{n^{k}}\sum_{\bm{i}\in\mathbb{I}_{n,k}}f(Y_{\bm{i}})\neq\mathbb{P}_{n}f.

K (y_{1}, y_{2}) =

K (y_{1}, y_{2}) =

n (θ - θ_{0}) ⟶ d N (0, V (g_{F_{Y}}^{'} (G))) .

n (θ - θ_{0}) ⟶ d N (0, V (g_{F_{Y}}^{'} (G))) .

μ_{m} (f) = E [[f (Y_{1, 2}) - E (f (Y_{1, 2}))] ψ_{m_{1}} (U_{1}) ψ_{m_{2}} (U_{2}) ψ_{m_{3}} (U_{{1, 2}})] .

μ_{m} (f) = E [[f (Y_{1, 2}) - E (f (Y_{1, 2}))] ψ_{m_{1}} (U_{1}) ψ_{m_{2}} (U_{2}) ψ_{m_{3}} (U_{{1, 2}})] .

G^{d} (f) =

G^{d} (f) =

+

\int_{0}^{\infty} Q \in Q sup lo g N (η ∣∣ F ∣ ∣_{Q, 2}, F, ∣∣ \cdot ∣ ∣_{Q, 2}) d η < \infty.

\int_{0}^{\infty} Q \in Q sup lo g N (η ∣∣ F ∣ ∣_{Q, 2}, F, ∣∣ \cdot ∣ ∣_{Q, 2}) d η < \infty.

G_{n}^{m *} : f \mapsto \frac{1}{n} i_{1} = 1 \sum n ξ_{i_{1}} \frac{1}{n - 1} 1 \leq i_{2} \neq = i_{1} \leq n \sum [f (Y_{i_{1}, i_{2}}) + f (Y_{i_{2}, i_{1}})] - 2 P_{n} f .

G_{n}^{m *} : f \mapsto \frac{1}{n} i_{1} = 1 \sum n ξ_{i_{1}} \frac{1}{n - 1} 1 \leq i_{2} \neq = i_{1} \leq n \sum [f (Y_{i_{1}, i_{2}}) + f (Y_{i_{2}, i_{1}})] - 2 P_{n} f .

(Y_{i})_{i \in N^{+ k}} = d (Y_{π_{1} (i_{1}), ..., π_{k} (i_{k})})_{i \in N^{+ k}} .

(Y_{i})_{i \in N^{+ k}} = d (Y_{π_{1} (i_{1}), ..., π_{k} (i_{k})})_{i \in N^{+ k}} .

P_{n} f

P_{n} f

G_{n} f

G_{n}^{*} f = \underline{n} (\frac{1}{\prod _{j = 1}^{k} n _{j}} 1 \leq i \leq n \sum (W_{i} - 1) ℓ = 1 \sum N_{i} f (Y_{i, ℓ})) .

G_{n}^{*} f = \underline{n} (\frac{1}{\prod _{j = 1}^{k} n _{j}} 1 \leq i \leq n \sum (W_{i} - 1) ℓ = 1 \sum N_{i} f (Y_{i, ℓ})) .

K_{λ} (f_{1}, f_{2}) = j = 1 \sum k λ_{j} C o v (f_{1} (Y_{1}), f_{2} (Y_{2_{j}})),

K_{λ} (f_{1}, f_{2}) = j = 1 \sum k λ_{j} C o v (f_{1} (Y_{1}), f_{2} (Y_{2_{j}})),

K S_{t} = u \in R sup \frac{1}{n ( n - 1 )} (i_{1}, i_{2}) \in I_{n, 2} \sum \mathds 1_{{T_{i_{1}, i_{2}, t} \leq u}} - \mathds 1_{{T_{i_{1}, i_{2}, t + 1} \leq u}} .

K S_{t} = u \in R sup \frac{1}{n ( n - 1 )} (i_{1}, i_{2}) \in I_{n, 2} \sum \mathds 1_{{T_{i_{1}, i_{2}, t} \leq u}} - \mathds 1_{{T_{i_{1}, i_{2}, t + 1} \leq u}} .

K S_{t}^{*} = u \in R sup \frac{1}{n ( n - 1 )} (i_{1}, i_{2}) \in I_{n, 2} \sum (W_{i} - 1) (\mathds 1_{{T_{i_{1}, i_{2}, t} \leq u}} - \mathds 1_{{T_{i_{1}, i_{2}, t + 1} \leq u}}) .

K S_{t}^{*} = u \in R sup \frac{1}{n ( n - 1 )} (i_{1}, i_{2}) \in I_{n, 2} \sum (W_{i} - 1) (\mathds 1_{{T_{i_{1}, i_{2}, t} \leq u}} - \mathds 1_{{T_{i_{1}, i_{2}, t + 1} \leq u}}) .

T_{i_{1}, i_{2}} = exp (α_{0}) G_{i_{1}}^{α_{1}} G_{i_{2}}^{α_{2}} D_{i_{1}, i_{2}}^{α_{3}} exp (A_{i_{1}, i_{2}} β) η_{i_{1}, i_{2}},

T_{i_{1}, i_{2}} = exp (α_{0}) G_{i_{1}}^{α_{1}} G_{i_{2}}^{α_{2}} D_{i_{1}, i_{2}}^{α_{3}} exp (A_{i_{1}, i_{2}} β) η_{i_{1}, i_{2}},

E [X_{i}^{'} (T_{i} - exp (X_{i} θ_{0}))] = 0,

E [X_{i}^{'} (T_{i} - exp (X_{i} θ_{0}))] = 0,

E [Φ (f \in F sup ∣ P_{n} f - P f ∣)]

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Empirical Process Results for Exchangeable Arrays††thanks: We are grateful to anonymous referees and an associate editor for their thoughtful comments that improved the paper. We would also like to thank Stéphane Bonhomme, Bryan Graham, Isabelle Méjean, Pedro Sant’ Anna and participants at various seminars and conferences for their remarks.

Laurent Davezies CREST-ENSAE, [email protected]

Xavier D’Haultfœuille CREST-ENSAE. [email protected]

Yannick Guyonvarch CREST-ENSAE. [email protected]

Abstract

Exchangeable arrays are natural tools to model common forms of dependence between units of a sample. Jointly exchangeable arrays are well suited to dyadic data, where observed random variables are indexed by two units from the same population. Examples include trade flows between countries or relationships in a network. Separately exchangeable arrays are well suited to multiway clustering, where units sharing the same cluster (e.g. geographical areas or sectors of activity when considering individual wages) may be dependent in an unrestricted way. We prove uniform laws of large numbers and central limit theorems for such exchangeable arrays. We obtain these results under the same moment restrictions and conditions on the class of functions as those typically assumed with i.i.d. data. We also show the convergence of bootstrap processes adapted to such arrays.

Keywords: exchangeable arrays, empirical processes, bootstrap.

1 Introduction

Taking into account dependence between observations is crucial for making correct inference. For instance, different observations may face common shocks, tending to correlate them positively and thus leading to overly optimistic inference when ignored (Bertrand et al., 2004). Such common shocks may arise if the data are polyadic (e.g., dyadic), namely they involve interactions between several units of a given population. An example is international trade, where each observation corresponds to a pair of countries, one exporting and the other importing. We can then expect that two such pairs may be dependent whenever they share at least one country, because of that country’s specificities in terms of international trade. Common shocks may also correspond to aggregate fluctuations that affect all units sharing some characteristics. For instance, wages of two individuals may be correlated either because they live in the same geographical area, or because they work in the same sector. We refer to multiway clustering when there are several dimensions along which units may be correlated.

Holland and Leinhardt (1976), Fafchamps and Gubert (2007) derived variance formulas for linear regressions with dyadic data, while Cameron et al. (2011) propose similar formulas for multiway clustering. The Stata command ivreg2 and the R package multiwaycov are now used routinely to report standard errors accounting for multiway clustering. However, theory has lagged behind this practice. Tabord-Meehan (2019) shows the asymptotic validity of inference based on Holland and Leinhardt’s suggestion for dyadic data, but for OLS estimators only. Graham (2019) and Graham et al. (2019) study respectively parametric regressions and density estimation with dyadic data. Regarding multiway clustering, the only papers we are aware of are the recent works of Menzel (2019) and MacKinnon et al. (2019). Again, they focus on linear parameters.111On the other hand and interestingly, Menzel (2019) studies inference both with and without asymptotically normality. He also shows that refinements in asymptotic approximations are possible using the wild bootstrap.

In this paper, we establish uniform laws of large numbers (LLN) and central limit theorems (CLT) for such type of data. Uniform LLNs and CLTs are key in showing consistency and asymptotic normality of nonlinear estimators under weak regularity conditions. As such, they have been studied extensively with i.i.d. but also dependent data. We refer to, e.g., van der Vaart and Wellner (1996) and Giné and Nickl (2015) for overviews with i.i.d. data, and Dehling and Philipp (2002) for the case of time series (see also, e.g., Bertail et al., 2017; Han and Wellner, 2019, for recent results on sampling designs). Noteworthy, we obtain these uniform LLNs and CLTs under the same moment restrictions and conditions on the class of functions as those usually considered with i.i.d. data. Thus, statistical results deducted from the uniform LLNs and CLTs with i.i.d. data directly extend to the exchangeable arrays we consider. As a proof of concept, we consider Z-estimators and smooth functionals of the empirical cumulative distribution function (cdf).

We also study consistency of a direct generalization of the standard bootstrap for i.i.d. data to polyadic data. A related bootstrap scheme for multiway clustering is the so-called pigeonhole bootstrap, suggested by McCullagh (2000) and studied by Owen (2007), but for which no uniform result has been established so far. For both, we establish weak convergence of the corresponding process. These results imply the validity of the corresponding bootstrap schemes in a wide range of setting, including Z-estimators and smooth functionals of the empirical cdf.

To prove these results, we first argue that polyadic data correspond to dissociated, jointly exchangeable arrays. Similarly, multiway clustering corresponds to dissociated separately exchangeable arrays. We then rely extensively on the so-called Aldous-Hoover-Kallenberg representation (Hoover, 1979; Aldous, 1981; Kallenberg, 1989) for such arrays. This representation allows us in particular to prove a symmetrization lemma, which is very useful to derive the uniform LLNs and CLTs. This lemma generalizes a similar result for i.i.d. data, but also for U-processes (see, e.g. de la Peña and Giné, 1999, Theorem 3.5.3). Note that simple LLNs and CLTs have been already proved, or are direct consequences of known results on dissociated, jointly exchangeable arrays. For LLNs, we refer to Eagleson and Weber (1978) and Lemma 7.35 in Kallenberg (2005). For CLTs, see Silverman (1976). But to our knowledge, no abstract uniform LLNs and CLTs have been proved so far for such arrays.

Finally, we illustrate our results with two applications to international trade. In the first, we test whether international trade remains stable from one year to another, using a Kolmogorov-Smirnov test. Given the dependence structure over pairs of countries and through time, the asymptotic distribution of the test under the null is complicated, making the bootstrap attractive. We show that neglecting the dependence between dyads leads to important overrejection of the null hypothesis. Next, we estimate the so-called gravity equation, a very popular model for explaining trade between countries. Since Santos Silva and Tenreyro (2006), this equation has often been estimated with Poisson pseudo maximum likelihood, an estimator for which our results apply. Again, much fewer explanatory variables are significant at usual levels when accounting for dependence between pairs of countries than when considering such pairs to be i.i.d. observations (as in Santos Silva and Tenreyro, 2006).

The paper is organized as follows. Section 2 describes the set-up and gives our main results for jointly exchangeable arrays. In addition to uniform LLNs and CLTs, we prove weak convergence of our bootstrap scheme. We also show results for Z-estimators and smooth functionals of the empirical cdf. Section 3 considers a few extensions. In particular, we study separately exchangeable arrays. An important difference for such arrays is that the multiple dimensions, corresponding to different sources of clustering, may not grow at the same rate. We show that our results still hold in this case. We also study “degenerate” cases (in the same sense as with U-processes) and consider another bootstrap scheme. The two applications to international trade are developed in Section 4. The appendix presents three key lemmas. In the supplementary material, we present additional extensions. In particular, we generalize our main results to cases where the number of observations for each $k$ -tuple (e.g., the number of matches between two sport players) varies. We also display Monte Carlo simulations and all the proofs of our results.

2 The set up and main results

2.1 Set up

Before formally defining our data generating process, we introduce some notation. For any $A\subset\mathbb{R}$ and $B\subset\mathbb{R}^{k}$ for some $k\geq 2$ , we let $A^{+}=A\cap(0,\infty)$ and

[TABLE]

We then let $\mathbb{I}_{k}=\overline{\mathbb{N}^{+k}}$ denote the set of $k$ -tuples of $\mathbb{N}^{+}$ without repetition. Similarly, for any $n\in\mathbb{N}^{+}$ , we let $\mathbb{I}_{n,k}=\overline{\{1,...,n\}^{k}}$ . For any $\bm{i}=(i_{1},...,i_{k})$ and $\bm{j}=(j_{1},...,j_{k})$ in $\mathbb{N}^{k}$ , we let $\bm{i}\odot\bm{j}=(i_{1}j_{1},...,i_{k}j_{k})$ . With a slight abuse of notation, we also let, for any $\bm{i}=(i_{1},...,i_{k})\in\mathbb{N}^{k}$ , $\{\bm{i}\}$ denote the set of distinct elements of $(i_{1},...i_{k})$ . For any $r\in\{1,...,k\}$ , we let

[TABLE]

Finally, for any $A\subset\mathbb{N}^{+}$ , we let $\mathfrak{S}(A)$ denote the set of permutations on $A$ . For any $\bm{i}=(i_{1},...,i_{k})\in\mathbb{N}^{+k}$ and $\pi\in\mathfrak{S}(\mathbb{N}^{+})$ , we let $\pi(\bm{i})=(\pi(i_{1}),...,\pi(i_{k}))$ .

We are interested in polyadic data, that is to say random variables $Y_{\bm{i}}$ (whose support is denoted by $\mathcal{Y}$ ) indexed by $\bm{i}\in\mathbb{I}_{k}$ . Dyadic data, which are the most common case, correspond to $k=2$ . For instance, when considering trade data, $Y_{i_{1},i_{2}}$ corresponds to export flows from country $i_{1}$ to country $i_{2}$ . In network data, $Y_{i_{1},i_{2}}$ could be a dummy for whether there is a link from $i_{1}$ to $i_{2}$ . In directed networks, $Y_{i_{1},i_{2}}\neq Y_{i_{2},i_{1}}$ , while $Y_{i_{1},i_{2}}=Y_{i_{2},i_{1}}$ in undirected networks. Similarly, $Y_{i_{1},i_{2},i_{3}}$ could capture whether $(i_{1},i_{2},i_{3})$ forms a triad or not (see, e.g. Wasserman and Faust, 1994, for a motivation on triad counts). $Y_{\bm{i}}$ could also correspond to data subject to multiway clustering. Then $i_{1}$ ,…, $i_{k}$ are the indexes corresponding to the different dimensions of clustering, for instance geographical areas and sectors of activity. In such cases, however, adaptations of our set-up are needed, and we postpone this discussion to Section 3.3 below.

We assume that the random variables are generated according to a jointly exchangeable and dissociated array, defined formally as follows:

Assumption 1.

For any $\pi\in\mathfrak{S}(\mathbb{N}^{+})$ , $(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}\overset{d}{=}(Y_{\pi(\bm{i})})_{\bm{i}\in\mathbb{I}_{k}}$ . Moreover, for any $A,B$ disjoint subsets of $\mathbb{N}^{+}$ with $\min(|A|,|B|)\geq k$ , $(Y_{\bm{i}})_{\bm{i}\in\overline{A^{k}}}$ is independent of $(Y_{\bm{i}})_{\bm{i}\in\overline{B^{k}}}$ .

The first part imposes that the labelling conveys no information: the joint distribution of the data remains identical under any possible permutation of the labels. The second part states that the array is dissociated: the variables are independent if they share no unit in common. For instance, $Y_{(i_{1},i_{2})}$ must be independent of $Y_{(j_{1},j_{2})}$ if $\{i_{1},i_{2}\}\cap\{j_{1},j_{2}\}=\emptyset$ . On the other hand, Assumption 1 does not impose independence otherwise. This is important in many applications. In the international trade example, $Y_{i_{1},i_{2}}$ and $Y_{i_{1},i_{3}}$ are likely to be dependent because if $i_{1}$ is open to international trade, it tends to export more than the average to any other country. It may also import more from other countries, meaning that $Y_{i_{1},i_{2}}$ and $Y_{i_{3},i_{1}}$ could also be dependent.

Lemma 2.1 below is very helpful to better understand the dependence structure imposed by joint exchangeability and dissociation. It may be seen as an extension of de Finetti’s theorem to arrays satisfying such restrictions. It is also key in establishing our asymptotic results below.

Lemma 2.1.

Assumption 1 holds if and only if there exist i.i.d. variables $(U_{J})_{J\subset\mathbb{N}^{+},1\leq|J|\leq k}$ and a measurable function $\tau$ such that almost surely,222In this formula, the $(U_{\{\bm{i}\odot\bm{e}\}^{+}})_{\bm{e}\in\cup_{r=1}^{k}\mathcal{E}_{r}}$ appear according to a precise ordering, which we let nonetheless implicit as it bears no importance hereafter.

[TABLE]

This result is due to Kallenberg (1989) but a weaker version, where the equality only holds in distribution, is known as Aldous-Hoover representation (Aldous, 1981; Hoover, 1979). Accordingly, we refer to (2.1) as the AHK representation hereafter. To illustrate it, let us consider dyadic data ( $k=2$ ). Then, according to Lemma 2.1, we have, for every $i_{1}<i_{2}$ ,

[TABLE]

Thus, in the example of trade flows, the volume of exports from $i_{1}$ to $i_{2}$ depends on factors specific to $i_{1}$ and $i_{2}$ , such as their own GDP, but also on factors relating both, such as the distance between the two countries. (2.2) has been also used by Bickel and Chen (2009) and Bickel et al. (2011) to model network formation (in which case $Y_{i_{1},i_{2}}=1$ if there is a link between $i_{1}$ and $i_{2}$ , 0 otherwise). Note also the link between (2.2) and U-statistics: $Y_{i_{1},i_{2}}$ would correspond to such a statistic if $\tau$ did not depend on its third argument.

Under Assumption 1, the $(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}$ have a common marginal probability distribution, which we denote by $P$ . We are interested in estimating and making inference on features of this distribution, such as its expectation or a quantile, based on observing the first $n$ units only, namely the sample $(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{n,k}}$ , with $n\geq k$ .

2.2 Uniform laws of large numbers and central limit theorems

Let $\mathcal{F}$ denote a class of real-valued functions admitting a first moment with respect to the distribution $P$ and let $Pf$ denote the corresponding moment $\mathbb{E}\left[f(Y_{\bm{1}})\right]$ (with $\bm{1}$ the $k-$ tuple $(1,...,k)$ ). To avoid measurability issues and the use of outer expectations subsequently, we maintain the following assumption:

Assumption 2.

There exists a countable subclass $\mathcal{G}\subset\mathcal{F}$ such that elements of $\mathcal{F}$ are pointwise limits of sequences of elements of $\mathcal{G}$ .

Assumption 2 is not necessary but often imposed (see, e.g. Chernozhukov et al., 2014; Kato, 2019). We refer to Kosorok (2006, pp.137-140) for further discussion.

In this section, we study the empirical measure $\mathbb{P}_{n}$ and the empirical process $\mathbb{G}_{n}$ defined on $\mathcal{F}$ by

[TABLE]

Let $\ell^{\infty}(\mathcal{F})$ denote the set of bounded functions on $\mathcal{F}$ . We prove below that under restrictions on $\mathcal{F}$ , $\mathbb{P}_{n}f$ converges almost surely to $Pf$ uniformly over $f\in\mathcal{F}$ , while $\mathbb{G}_{n}$ converges weakly in $\ell^{\infty}(\mathcal{F})$ to a Gaussian process. We refer to, e.g., van der Vaart and Wellner (1996) for a formal definition of weak convergence of empirical processes. These results, stronger than pointwise convergence of $\mathbb{P}_{n}f$ and $\mathbb{G}_{n}f$ , are key in establishing the consistency and asymptotic normality of, e.g., smooth functionals of the empirical cdf or Z- and M-estimators. We consider briefly applications in Section 2.4 below, and refer to Part 3 of van der Vaart and Wellner (1996) for a more comprehensive review of statistical applications of empirical process results.

We use the rate $\sqrt{n}$ to normalize $\mathbb{P}_{n}f-Pf$ , though we have $n!/(n-k)!$ different random variables. In general, we cannot expect a better rate of convergence. To see this, let $(X_{i})_{i\in\mathbb{N}^{+}}$ be i.i.d. random variables and let $Y_{\bm{i}}=\sum_{j\in\{\bm{i}\}}X_{j}$ . Then $(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}$ satisfies Assumption 1, and $\mathbb{P}_{n}f$ boils down to an average over $n$ i.i.d. terms only. In some cases, however, for instance if the $(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}$ are i.i.d., the convergence rate is faster than $\sqrt{n}$ .333 As with U-statistics, we expect different rates depending on the degree of “degeneracy”. Theorem 2.1 below remains valid in such cases, but the limit Gaussian process is then degenerate. We come back in more details to such cases in Section 3.1 below.

Let us now introduce the restrictions on $\mathcal{F}$ that we use to obtain uniform laws. We require additional notation for that purpose. For any $\eta>0$ and any seminorm $||\cdot||$ on a space containing $\mathcal{F}$ , $N(\eta,\mathcal{F},||\cdot||)$ denotes the minimal number of $||\cdot||$ -closed balls of radius $\eta$ with centers in $\mathcal{F}$ needed to cover $\mathcal{F}$ . $N_{[\;]}(\eta,\mathcal{F},||\cdot||)$ denotes the minimal number of $\eta$ -brackets needed to cover $\mathcal{F}$ , where an $\eta$ -bracket for $f\in\mathcal{F}$ is a pair of functions $(\ell,u)$ such that $\ell\leq f\leq u$ and $||u-\ell||<\eta$ . The seminorms we consider hereafter are $\|f\|_{\mu,r}=(\int|f|^{r}d\mu)^{1/r}$ for any $r\geq 1$ and probability measure or cdf $\mu$ . Hereafter, an envelope of $\mathcal{F}$ is a measurable function $F$ satisfying $F(u)\geq\sup_{f\in\mathcal{F}}|f(u)|$ . Finally, we let $\mathcal{Q}$ denote the set of probability measures with finite support on $\mathcal{Y}$ .

Assumption 3.

The class $\mathcal{F}$ either:

(i)

admits an envelope $F$ with $PF<\infty$ and $\forall\eta>0$ ,

[TABLE] 2. (ii)

or satisfies $N_{[\;]}\left(\eta,\mathcal{F},||\cdot||_{L_{1}(P)}\right)<\infty$ for all $\eta>0$ .

Assumption 4.

The class $\mathcal{F}$ either:

(i)

admits an envelope $F$ with $PF^{2}<\infty$ and

[TABLE] 2. (ii)

or satisfies $\int_{0}^{\infty}{\sqrt{\log N_{[\;]}\left(\eta,\mathcal{F},||\cdot||_{L_{2}(P)}\right)}d\eta}<\infty$ .

Assumptions 3 and 4 are exactly the same as the conditions often imposed with i.i.d. data to show uniform LLNs and CLTs (see, e.g., Theorems 19.4, 19.5, 19.13 and 19.14 in van der Vaart, 2000).444In van der Vaart (2000), the supremum in Assumptions 3 and 4 is taken over the set of probability measures $Q$ with finite support on $\mathcal{Y}$ and such that $||F||_{Q,2}>0$ . This additional restriction is simply due to a different convention in constructing covering numbers, as van der Vaart considers open balls while we use closed balls, following, e.g., Kato (2019). In particular, Assumption 4-(i) (resp. (ii)) imposes a condition on what is usually referred to as the uniform (resp. bracketing) entropy integral, see, e.g., van der Vaart and Wellner (1996). Finiteness of the uniform entropy integral is satisfied by any VC-type class of functions (see Chernozhukov et al., 2014, for a definition), or by the convex hull of such classes under some restrictions. The bracketing entropy integral is finite for instance for classes of monotone or Hölder continuous functions (see, e.g. van der Vaart and Wellner, 1996).

The following theorem establishes uniform LLNs and CLTs under these two conditions. We denote by $\bm{1}^{\prime}$ the $k-$ tuple $(1,k+1,...,2k-1)$ .

Theorem 2.1.

Suppose that Assumptions 1-2 hold. Then:

If Assumption 3 holds, $\sup_{f\in\mathcal{F}}\left|\mathbb{P}_{n}f-Pf\right|$ tends to 0 a.s. and in $L^{1}$ . 2. 2.

If Assumption 4 holds, the process $\mathbb{G}_{n}$ converges weakly in $\ell^{\infty}(\mathcal{F})$ to a centered Gaussian process $\mathbb{G}$ on $\mathcal{F}$ as $n$ tends to infinity. Moreover, the covariance kernel $K$ of $\mathbb{G}$ satisfies:

[TABLE]

The proof is in Section LABEL:sub:proof_thm_unif of the supplement. When Assumption 3-(ii) holds, Part 1 can be proved by essentially combining Theorem 3 in Eagleson and Weber (1978) and Lemma 7.35 in Kallenberg (2005). Part 2 was also proved for a finite $\mathcal{F}$ by Silverman (1976). But the weak convergence result under the bracketing entropy condition, and the uniform laws under the uniform entropy conditions, do not follow from such results. To prove the former, we adapt a maximal inequality in Giné and Nickl (2015, see their Lemma 3.5.12) to our context. To this end, we show that Hoeffding’s bound on U-statistic (Hoeffding, 1963, Section 5.a) still applies to our context.

To prove the results under the uniform entropy conditions, the key ingredient, as with i.i.d. data, is a symmetrization lemma stated in Appendix A below and proved in the supplement. Its proof relies extensively on Lemma 2.1 and a decoupling inequality that may be of independent interest (see Lemma A.2). The latter result generalizes a similar inequality for U-processes (see de la Peña, 1992). In the proofs of both lemmas, we follow similar strategies as with U-processes, with two complications. First, even with $k=2$ , $Y_{\bm{i}}$ does not only depend on $U_{i_{1}}$ and $U_{i_{2}}$ , but also on $U_{\{i_{1},i_{2}\}}$ . Second, when $k\geq 3$ , dependence between observations arises not only because of single-unit terms such as $U_{i_{1}}$ or $U_{i_{2}}$ , but also because of multiple-unit terms such as $U_{\{i_{1},i_{2}\}}$ .

As in the i.i.d. case, Assumption 3 is actually stronger than necessary to obtain the uniform law of large numbers. The following proposition gives an exact characterization, where, for simplicity, we restrict to $k=2$ . It is similar to the characterization for i.i.d. data (see, e.g. Theorem 3.7.4 in Giné and Nickl, 2015) or for U-processes (see Theorem 5.2.2 in de la Peña and Giné, 1999). Let us introduce the following norms:

[TABLE]

Proposition 2.1.

Suppose that Assumptions 1-2 hold and $\mathcal{F}$ admits an envelop $F$ with $PF<\infty$ . Then $\sup_{f\in\mathcal{F}}\left|\mathbb{P}_{n}f-Pf\right|\stackrel{{\scriptstyle\text{as}}}{{\longrightarrow}}0$ if and only if both $\log N(\varepsilon,\mathcal{F},||\cdot||_{1,2})/n^{2}$ and $\log N(\varepsilon,\mathcal{F},||\cdot||_{1,1})/n$ tend to [math] in outer probability.555For a definition of convergence in outer probability or outer almost-sure convergence considered below, see e.g. Chapter 1.9 in van der Vaart and Wellner (1996).

Proposition 2.1 emphasizes the two aspects of dissociated, exchangeable arrays. The first is i.i.d. variations, through the random entropy term related to $||\cdot||_{1,2}$ , which only involves $(U_{\{i_{1},i_{2}\}})_{\bm{i}\in\mathbb{I}_{n,2}}$ . The second is U-statistic like variations, through the random entropy term related to $||\cdot||_{1,1}$ : up to negligible terms, $||f||_{1,1}$ only depends on $(U_{i_{1}})_{1\leq i_{1}\leq n}$ . Key in establishing the necessity of these two conditions is a weak converse of the symmetrization lemma for $k=2$ , see Equation (LABEL:eq:desym) in the supplement.

2.3 Convergence of the bootstrap process

We now study the properties of the following bootstrap sampling scheme, which extends the pigeonhole bootstrap (McCullagh, 2000; Owen, 2007) to jointly separable arrays:

$n$ units are sampled independently in $\{1,...,n\}$ with replacement and equal probability. $W_{i}$ denotes the number of times unit $i$ is sampled. 2. 2.

the $k-$ tuple $\bm{i}=(i_{1},...,i_{k})\in\mathbb{I}_{n,k}$ is then selected $W_{\bm{i}}=\prod_{j=1}^{k}W_{i_{j}}$ times in the bootstrap sample.

Then we consider $\mathbb{P}^{\ast}_{n}$ and $\mathbb{G}_{n}^{\ast}$ , defined on $\mathcal{F}$ by

[TABLE]

Asymptotic validity of the bootstrap amounts to showing that conditional on the data $(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}$ , $\mathbb{G}_{n}^{\ast}$ converges weakly to the process $\mathbb{G}$ defined in Theorem 2.1.666For the sake of brevity, we focus afterwards on convergence results under the sole uniform entropy condition (Assumption 4-(i)). As discussed in, e.g., van der Vaart and Wellner (1996, Chapter 3.6), the outer almost-sure conditional weak convergence boils down to proving

[TABLE]

where $\text{BL}_{1}$ is the set of bounded and Lipschitz functions from $\ell^{\infty}(\mathcal{F})$ to $[0,1]$ and “ $\stackrel{{\scriptstyle\text{as}*}}{{\longrightarrow}}$ ” denotes outer almost-sure convergence.

Theorem 2.2.

If Assumptions 1-2 and 4-(i) hold, the process $\mathbb{G}^{\ast}_{n}$ converges weakly in $\ell^{\infty}(\mathcal{F})$ to $\mathbb{G}$ , conditional on $(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}$ and outer almost surely.

This theorem ensures the asymptotic validity of the bootstrap above not only for sample means, but also for smooth functionals of the empirical cdf and nonlinear estimators, as we shall see below. The proof of Theorem 2.2, in Section LABEL:ssec:boot_gen_case of the supplement, follows the same lines as that of Theorem 2.1, though some of the corresponding steps are more involved, as often with the bootstrap. In particular, to prove pointwise convergence, we use arguments in Lindeberg’s proof of the CLT for triangular arrays, Theorem 2.1.1 and Urysohn’s subsequence principle, combined with Prohorov’s theorem.

Note that in contrast with the standard bootstrap for i.i.d. data,

[TABLE]

However, the difference between $\mathbb{P}_{n}$ and $\mathbb{P}^{\prime}_{n}$ , the empirical measure with weights $1/n^{k}$ , becomes negligible as $n\rightarrow\infty$ . Accordingly, we also show in the proof of Theorem 2.2 the almost-sure conditional convergence of $\sqrt{n}\left(\mathbb{P}_{n}^{\ast}f-\mathbb{P}^{\prime}_{n}f\right)$ , in addition to that of $\mathbb{G}^{*}_{n}$ .

2.4 Application to nonlinear estimators

Theorem 2.1 ensures consistency and asymptotic normality of a large class of estimators. In turn, Theorem 2.2 shows that using the bootstrap for such estimators is asymptotically valid. To illustrate these points, we consider here two popular classes of estimators, namely Z-estimators and smooth functionals of the empirical cdf. Similar results could be obtained for, e.g., M-estimators (see, e.g. Cheng and Huang, 2010) or generalized method of moments estimators (see, e.g. Hansen, 1982).

Let us first consider Z-estimators. Let $\Theta$ denote a normed space, endowed with the norm $\|\cdot\|_{\Theta}$ and let $(\psi_{\theta,h})_{(\theta,h)\in\Theta\times\mathcal{H}}$ denote a class of real, measurable functions. Let $\Psi(\theta)(h)=P\psi_{\theta,h}$ , $\Psi_{n}(\theta)(h)=\mathbb{P}_{n}\psi_{\theta,h}$ and $\Psi^{*}_{n}(\theta)(h)=\mathbb{P}^{*}_{n}\psi_{\theta,h}$ . We let, for any real function $g$ on $\mathcal{H}$ , $\|g\|_{\mathcal{H}}=\sup_{h\in\mathcal{H}}|g(h)|$ . The parameter of interest $\theta_{0}$ , which satisfies $\Psi(\theta_{0})=0$ , is estimated by $\widehat{\theta}=\arg\min_{\theta\in\Theta}\|\Psi_{n}(\theta)\|_{\mathcal{H}}$ . We also define $\widehat{\theta}^{*}=\arg\min_{\theta\in\Theta}\|\Psi^{*}_{n}(\theta)\|_{\mathcal{H}}$ as the bootstrap counterpart of $\widehat{\theta}$ . The following theorem extends Theorem 13.4 in Kosorok (2006) to jointly exchangeable and dissociated arrays. For related results on Z-estimators in the i.i.d. case, see Section 3.2 in van der Vaart and Wellner (1996) and Wellner and Zhan (1996).

Theorem 2.3.

Suppose that Assumption 1 holds and:

$\|\Psi(\theta_{m})\|_{\mathcal{H}}\rightarrow 0$ * implies $\|\theta_{m}-\theta_{0}\|_{\Theta}\rightarrow 0$ for every $(\theta_{m})_{m\in\mathbb{N}}$ in $\Theta$ ;* 2. 2.

The class $\{\psi_{\theta,h}:(\theta,h)\in\Theta\times\mathcal{H}\}$ satisfies Assumptions 2-3, with the envelope function $F$ satisfying $PF<\infty$ ; 3. 3.

There exists $\delta>0$ such that the class $\{\psi_{\theta,h}:\|\theta-\theta_{0}\|_{\Theta}<\delta,h\in\mathcal{H}\}$ satisfies Assumptions 2 and 4, with an envelope function $F_{\delta}$ satisfying $PF^{2}_{\delta}<\infty$ ; 4. 4.

$\lim_{\theta\rightarrow\theta_{0}}\sup_{h\in\mathcal{H}}P\left(\psi_{\theta,h}-\psi_{\theta_{0},h}\right)^{2}=0$ ; 5. 5.

$\|\Psi_{n}(\widehat{\theta})\|_{\mathcal{H}}=o_{p}(n^{-1/2})$ * and $P\left(\|\sqrt{n}\Psi^{*}_{n}(\widehat{\theta}^{*})\|_{\mathcal{H}}>\eta|(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}\right)=o_{p}(1)$ for every $\eta>0$ ;* 6. 6.

$\theta\mapsto\Psi(\theta)$ * is Fréchet-differentiable at $\theta_{0}$ , with continuously invertible derivative $\dot{\Psi}_{\theta_{0}}$ .*

Then $\sqrt{n}(\widehat{\theta}-\theta_{0})$ converges in distribution to a centered Gaussian process $\mathbb{G}$ . Moreover, conditional on $(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}$ and almost surely, $\sqrt{n}(\widehat{\theta}^{*}-\widehat{\theta})$ converges in distribution to $\mathbb{G}$ .

Next, we consider smooth functionals of $F_{Y}$ , the cdf of $Y_{\bm{i}}$ . Suppose that $\mathcal{Y}\subset\mathbb{R}^{p}$ for some $p\in\mathbb{N}^{+}$ and $\theta_{0}=g(F_{Y})$ , where $g$ is Hadamard differentiable (for a definition, see, e.g., van der Vaart and Wellner, 1996, Section 3.9.1). We estimate $\theta_{0}$ with $\widehat{\theta}=g(\widehat{F_{Y}})$ , where $\widehat{F_{Y}}$ denotes the empirical cdf of $(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{n,k}}$ . Finally, we let $\widehat{\theta}^{*}$ denote the bootstrap counterpart of $\widehat{\theta}$ .

Theorem 2.4.

Suppose that $g$ is Hadamard differentiable at $F_{Y}$ tangentially to a set $\mathbb{D}_{0}$ , with derivative equal to $g^{\prime}_{F_{Y}}$ . Suppose also that Assumption 1 holds. Then:

$\sqrt{n}(\widehat{F_{Y}}-F_{Y})$ * converges weakly, as a process indexed by $y$ , to a Gaussian process $\mathbb{G}$ with kernel $K$ satisfying*

[TABLE] 2. 2.

If $\mathbb{G}\in\mathbb{D}_{0}$ with probability one,

[TABLE]

Moreover, conditional on $(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}$ and almost surely, $\sqrt{n}(\widehat{\theta}^{*}-\widehat{\theta})$ converges in distribution to the same limit.

In practice, $\mathbb{D}_{0}$ often corresponds to the set of functions that are continuous everywhere or at a certain point $y_{0}$ . This is the case for instance with $g:F_{Y}\mapsto F_{Y}^{-1}(\tau)$ for $\tau\in(0,1)$ . In such cases, one can show that $\mathbb{G}\in\mathbb{D}_{0}$ under the same condition as for i.i.d. data, namely that $F_{Y}$ is continuous everywhere or at the point $F_{Y}^{-1}(\tau)$ .

3 Extensions

We now consider several extensions to our main results. First, we study the asymptotic behavior of the properly normalized empirical process in degenerate cases where $K(f,f)=0$ . Second, we establish additional results on the bootstrap. Third, we study separately, rather than jointly, separable arrays. Other extensions to arrays with multiple observations per $k$ -tuple and arrays where $Y_{\bm{i}}$ is defined even if there are identical indices in $\bm{i}$ are considered in the supplement. We also develop therein a test that the data are in fact i.i.d.

3.1 Degenerate cases

We consider here situations where $K(f,f)=0$ for all $f\in\mathcal{F}$ , focusing for simplicity on $k=2$ .777If $K(f,f)=0$ for only some $f\in\mathcal{F}$ , we focus on $\mathcal{F}^{\prime}=\{f\in\mathcal{F}:K(f,f)=0\}$ . Such a degeneracy appears for instance if the variables in the array are actually i.i.d., in which case $\sqrt{n}\mathbb{G}_{n}$ converges to a Gaussian process with covariance kernel $K(f_{1},f_{2})=\mathbb{C}ov(f_{1}(Y_{1,2}),f_{2}(Y_{1,2}))$ . As another example (see Menzel, 2019; Bretagnolle, 1983), suppose that $Y_{i_{1},i_{2}}=X_{i_{1}}X_{i_{2}}$ , with $(X_{i})_{i\in\mathbb{N}^{+}}$ i.i.d. variables with $\mathbb{E}(X_{1})=0$ , $\mathbb{V}(X_{1})=1$ . Let also $\mathcal{F}=\{f_{\lambda}(x)=\lambda x,\lambda\in I\}$ for a compact $I\subset\mathbb{R}$ . Then one can easily see that $\sqrt{n}\mathbb{G}_{n}$ converges weakly in $\ell^{\infty}(\mathcal{F})$ to $\mathbb{G}(f_{\lambda})=\lambda(Z^{2}-1)$ , with $Z$ a standard normal variable.

More generally and as with U-processes (see, e.g. Arcones and Giné, 1993), when $K(f,f)=0$ , the rate of convergence of $\mathbb{P}_{n}f-Pf$ is $n^{-1}$ rather than $n^{-1/2}$ and the asymptotic distribution may not be normal. For any $(i_{1},i_{2})\in\mathbb{I}_{2}$ , let $Y_{i_{1},i_{2}}=\tau(U_{i_{1}},U_{i_{2}},U_{\{i_{1},i_{2}\}})$ be the Aldous-Hoover-Kallenberg representation where, without loss of generality, the variables in $\tau(\cdot,\cdot,\cdot)$ are assumed to be uniform on $[0,1]$ . Let $\psi_{m}(u)=\left(1+\mathds{1}_{\{m\geq 2\}}\right)^{1/2}\cos\left(m\pi u\right)$ for $m$ even and $\psi_{m}(u)=\sqrt{2}\sin((m+1)\pi u)$ for $m$ odd. Then $(\psi_{m})_{m\in\mathbb{N}}$ forms an orthonormal basis of $L^{2}[0,1]$ . For all $\bm{m}\in\mathbb{N}^{3}$ and any $f\in\mathcal{F}$ , we define $\mu_{\bm{m}}(f)$ by

[TABLE]

Let $(Z_{m})_{m\in\mathbb{N}^{+}}$ , $(Z_{m_{1},m_{2}})_{(m_{1},m_{2})\in\mathbb{N}\times\mathbb{N}^{+}}$ and $(Z_{\{m_{1},m_{2}\},m_{3}})_{(m_{1},m_{2},m_{3})\in\mathbb{N}^{2}\times\mathbb{N}^{+}:m_{1}<m_{2}}$ denote independent standard normal variables. We then define the process $\mathbb{G}^{d}$ on $\mathcal{F}$ by

[TABLE]

To prove the convergence of $\sqrt{n}\mathbb{G}_{n}$ , we consider a condition on $\mathcal{F}$ that slightly differs from Assumption 4-(i).

Assumption 5.

The class $\mathcal{F}$ admits an envelope $F$ with $PF^{2}<\infty$ and

[TABLE]

Assumption 5 is more stringent than Assumption 4-(i). A similar condition was also imposed by Arcones and Giné (1993) for degenerate U-processes of order 1, see their condition (5.1).

Theorem 3.1.

Suppose that $k=2$ , Assumptions 1-2 and 5 hold and $K(f,f)=0$ for all $f\in\mathcal{F}$ . Then $\sqrt{n}\mathbb{G}_{n}$ converges weakly in $\ell^{\infty}(\mathcal{F})$ to $\mathbb{G}^{d}$ .

As with degenerate U-processes (see Section 5 of Arcones and Giné, 1993), the limit process is a Gaussian chaos process. The result is based in particular on a symmetrization lemma and a maximal inequality taylored to these degenerate cases. Specifically, the symmetrized process only includes Rademacher variables at the pair $\{i_{1},i_{2}\}$ level, or products $\varepsilon_{i_{1}}^{(1)}\varepsilon_{i_{1}}^{(2)}$ of Rademacher variables. We refer to Lemmas SLABEL:lem:sym3 and SLABEL:lem:max_ineq_deg_case in the supplement for more details.

Finally, we note that the bootstrap process considered above does not generally converge to $\mathbb{G}^{d}$ .888The same holds true for the multiplier bootstrap process considered below. With i.i.d. data, for instance, one can show that the variance of the bootstrapped mean converges to $3\mathbb{V}(Y_{i_{1},i_{2}})$ . We expect similar phenomena as with U statistics, where the bootstrap is known to fail in degenerate cases (Arcones and Gine, 1992; Arcones and Giné, 1994). In the close case of separately exchangeable arrays (see Section 3.3 below), Menzel (2019) shows that a suitable wild bootstrap is consistent for the sample average, whether or not we have degeneracy. Whether such a result generalizes to the empirical process is left for future research.

3.2 Further results on the bootstrap

Theorem 2.2 shows convergence of the bootstrap process under conditions on $\mathcal{F}$ that ensure the convergence of the initial process $\mathbb{G}_{n}$ . The following result shows that under moment conditions, convergence of $\mathbb{G}_{n}$ is actually necessary for the convergence of $\mathbb{G}_{n}^{*}$ to a Gaussian process.

Theorem 3.2.

Suppose that Assumptions 1-2 hold, $Pf^{2}<\infty$ for all $f\in\mathcal{F}$ and $\mathcal{F}$ admits an envelope $F$ such that $PF^{1+\delta}<\infty$ for some $\delta>0$ . Then, if conditional on $(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}$ and outer almost surely, the process $\mathbb{G}^{\ast}_{n}$ converges weakly in $\ell^{\infty}(\mathcal{F})$ to $\mathbb{G}$ , a centered Gaussian process, the process $\mathbb{G}_{n}$ also converges weakly in $\ell^{\infty}(\mathcal{F})$ to $\mathbb{G}$ .

Theorem 3.2 may be seen as a partial extension to jointly exchangeable arrays of Theorem 2.4 in Giné and Zinn (1990), which, with i.i.d. data, establishes the equivalence between the convergence of the bootstrap process and $PF^{2}<\infty$ together with convergence of the initial process.

With i.i.d. data, several other bootstrap schemes than the multinomial bootstrap are possible: see, e.g., Barbe and Bertail (1995) for an extensive review. The situation is probably no different with jointly exchangeable arrays. To illustrate this, we consider a version of the multiplier bootstrap adapted to such data (see, e.g., Kosorok, 2003, for the case of i.i.d. data). Specifically, let $(\xi_{i})_{i=1}^{n}$ be a sequence of i.i.d. random variables that are centered, have unit variance and are independent from the original data $(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{n,2}}.$ We then consider the following process:

[TABLE]

The next theorem shows the conditional weak convergence of $\mathbb{G}_{n}^{m*}$ under the same conditions on $\mathcal{F}$ as previously.

Theorem 3.3.

Suppose that Assumptions 1-2 and 4-(i) hold and $(\xi_{i})_{i=1}^{n}$ is i.i.d. with $\mathbb{E}(\xi_{1})=0$ , $\mathbb{V}(\xi_{1})=1$ . Then, conditional on $(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}$ and outer almost surely, the process $\mathbb{G}^{m\ast}_{n}$ converges weakly in $\ell^{\infty}(\mathcal{F})$ to $\mathbb{G}$ .

3.3 Separately exchangeable arrays

Up to now, we have considered cases where the $n$ units that interact stem from the same population. In some cases, however, they do not, because the $k$ populations differ. For instance, we may be interested only in relationships between men and women. In that case, the symmetry condition in Assumption 1 has to be strengthened: both the labelling of men and the labelling of women should be irrelevant. This corresponds to so-called separately exchangeable arrays, defined formally in Assumption 6 below. Another important motivation for considering separately exchangeable arrays is multiway clustering, namely dependence arising through different dimensions of clustering. For instance, wages of workers may be affected by local shocks or sector-of-activity shocks. In such cases, we observe $Y_{i_{1},i_{2}}$ , the wage of a worker in geographical area $i_{1}$ and sector of activity $i_{2}$ .999Oftentimes, we actually have several observations per cell, and the number varies from one cell to another. This extension is discussed in Section LABEL:sub:heterog of the supplement.

More generally, we consider in this section random variables $Y_{\bm{i}}$ where $\bm{i}=(i_{1},...,i_{k})\in\mathbb{N}^{+k}$ , implying that repetitions (e.g. $\bm{i}=(1,...,1)$ ) are allowed. We impose the following condition on these random variables.

Assumption 6.

For any $(\pi_{1},...,\pi_{k})\in\mathfrak{S}(\mathbb{N}^{+})^{k}$ ,

[TABLE]

Moreover, for any $A,B$ , disjoint subsets of $\mathbb{N}^{+}$ , $(Y_{\bm{i}})_{\bm{i}\in A^{k}}$ is independent of $(Y_{\bm{i}})_{\bm{i}\in B^{k}}$ .

This condition is stronger than Assumption 1 since it implies in particular equality in distribution for $\pi_{1}=...=\pi_{k}$ .

Let us redefine $\bm{1}$ here as $(1,...,1)$ and let $\bm{n}=(n_{1},...,n_{k})$ , where $n_{j}\geq 1$ denotes the number of units observed in population $j$ (or cluster $j$ with multiway clustering). Note that in general, $n_{j}\neq n_{j^{\prime}}$ for $j\neq j^{\prime}$ . The sample at hand is then $(Y_{\bm{i}})_{\bm{1}\leq\bm{i}\leq\bm{n}}$ , where $\bm{i}\geq\bm{i}^{\prime}$ means that $i_{j}\geq i^{\prime}_{j}$ for all $j=1,...,k$ . Let $\underline{n}=\min(n_{1},...,n_{k})$ . The empirical measure and empirical process that we consider for separately exchangeable arrays are:

[TABLE]

We also consider the “pigeonhole bootstrap”, suggested by McCullagh (2000) and studied, in the case of the sample mean and for particular models, by Owen (2007). This bootstrap scheme is very close to the one we considered in Section 2 for jointly exchangeable arrays, except that the weights are now independent from one coordinate to another:

For each $j\in\{1,...,k\}$ , $n_{j}$ elements are sampled with replacement and equal probability in the set $\{1,...,n_{j}\}$ . For each $i_{j}$ in this set, let $W^{j}_{i_{j}}$ denote the number of times $i_{j}$ is selected this way. 2. 2.

The $k$ -tuple $\bm{i}=(i_{1},...,i_{k})$ is then selected $W_{\bm{i}}=\prod_{j=1}^{k}W^{j}_{i_{j}}$ times in the bootstrap sample.

The bootstrap process $\mathbb{G}_{\bm{n}}^{\ast}$ is thus defined on $\mathcal{F}$ by

[TABLE]

Henceforth, we consider the convergence of $\mathbb{P}_{\bm{n}}$ , $\mathbb{G}_{\bm{n}}$ and $\mathbb{G}^{*}_{\bm{n}}$ as $\underline{n}$ tends to infinity. More precisely, as with multisample U-statistics (see, e.g. van der Vaart, 2000, Section 12.2), we assume that there is an index $m\in\mathbb{N}^{+}$ , left implicit hereafter, and increasing functions $g_{1},...,g_{k}$ such that for all $j$ , $n_{j}=g_{j}(m)\rightarrow\infty$ as $m\rightarrow\infty$ (we also assume without loss of generality that for all $m\in\mathbb{N}^{+}$ , $g_{j}(m+1)>g_{j}(m)$ for some $j$ ). The following theorem extends Theorems 2.1 and 2.2 to this set-up.

Theorem 3.4.

Suppose that Assumptions 2 and 6 hold and that for every $j=1,...,k$ , there exists $\lambda_{j}\geq 0$ such that $\underline{n}/n_{j}\rightarrow\lambda_{j}\geq 0$ . Then:

If Assumption 3 holds, $\sup_{f\in\mathcal{F}}\left|\mathbb{P}_{\bm{n}}f-Pf\right|$ tends to 0 a.s. and in $L^{1}$ . 2. 2.

If Assumption 4-(i) holds, the process $\mathbb{G}_{n}$ converges weakly in $\ell^{\infty}(\mathcal{F})$ to a centered Gaussian process $\mathbb{G}_{\lambda}$ on $\mathcal{F}$ as $n$ tends to infinity. Moreover, the covariance kernel $K_{\lambda}$ of $\mathbb{G}_{\lambda}$ satisfies:

[TABLE]

where $\bm{2}_{j}$ is the $k$ -tuple with 2 in each entry but 1 in entry $j$ . 3. 3.

If Assumption 4-(i) holds, the process $\mathbb{G}^{\ast}_{n}$ converges weakly to $\mathbb{G}_{\lambda}$ , conditional on $(Y_{\bm{i}})_{\bm{i}\in\mathbb{N}^{+k}}$ and outer almost surely.

Theorem 3.4 includes the case where $\lambda_{j}=0$ for some $j$ , corresponding to “strongly unbalanced” designs with different rates of convergence to $\infty$ along the different dimensions of the array. In that case, only the dimensions with the slowest rate of convergence contribute to the asymptotic distribution, as can be seen in (3.1).

Because the $(n_{j})_{j=1...k}$ are not all equal in general, Theorem 3.4 does not follow directly from Theorem 2.1, even if Assumption 6 is stronger than Assumption 1. We prove the result by showing a simpler and convenient version of the symmetrization lemma in this setting. We refer to Lemma SLABEL:lem:sym2 in the supplement for more details.

4 Applications to international trade

Finally, we illustrate the importance of accounting for dependence in real dyadic data, through two applications to international trade data.

4.1 Evolution of international trade

There is a large interest in economics on the evolution of international trade. But before analyzing the causes and consequences of such an evolution, one must check that there is indeed some significant changes. In this first application, we test whether the distribution of exports remains the same between two consecutive years, using Comtrade data on all countries from 2012 to 2018. We use for that purpose the Kolmogorov-Smirnov (KS) test statistic

[TABLE]

where $T_{i_{1},i_{2},t}$ denotes the trade volume from country $i_{1}$ to country $i_{2}$ in year $t$ . Let us assume that Assumption 1 holds, with $Y_{\bm{i}}=(T_{\bm{i},t},T_{\bm{i},t+1})$ . Then, under the null hypothesis that the distributions of $T_{\bm{i},t}$ and $T_{\bm{i},t+1}$ are equal, we have, by Theorem 2.1, $\sqrt{n}KS_{t}\stackrel{{\scriptstyle d}}{{\longrightarrow}}\|\mathbb{G}\|_{\mathcal{F}}$ , with $\mathcal{F}=\{f_{u}(x,y)=\mathds{1}_{\{x\leq u\}}-\mathds{1}_{\{y\leq u\}}\}$ . Given the dependence structure both between pairs of countries and across time, the distribution of $\|\mathbb{G}\|_{\mathcal{F}}$ depends on the true data generating process. To estimate it, we rely on the recentered bootstraped test statistic:

[TABLE]

We compute the p-value of the test by $\mathbb{P}\left(KS^{*}_{t}>KS_{t}\big{|}(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{n,k}}\right)$ . For the sake of comparison, we also compute p-values based on alternative forms of dependence that have been considered in applied work on similar data. Specifically, we also assume that the variables $(Y_{\bm{i}})_{\bm{i}}$ are i.i.d. We then assume pairwise clustering, where $Y_{i_{1},i_{2}}$ and $Y_{i_{2},i_{1}}$ may be dependent, but $Y_{\bm{i}}$ and $Y_{\bm{j}}$ are independent if $\bm{j}$ is not a permutation of $\bm{i}$ . We also consider one-way clustering according to $i_{1}$ (and, similarly, according to $i_{2}$ ). In this case, $Y_{i_{1},i_{2}}$ and $Y_{i_{1},i_{3}}$ may be dependent, but $Y_{i_{1},i_{2}}$ and $Y_{i^{\prime}_{1},i_{3}}$ are independent as soon as $i_{1}\neq i^{\prime}_{1}$ , whether or not $i_{2}=i_{3}$ . For each of these cases, we use the bootstrap, but with different bootstrap schemes accounting for these different dependence structures.

The results are displayed in Table 1. They suggest significant changes in export volumes in some years but not all. In particular, international trade seems very stable between 2015 and 2017. There is some evidence of changes between 2012 and 2015 but we still do not reject the null hypothesis at the 1% level for the years 2013-2014. The other columns of the table shows the importance of accounting for dependence along both dimensions. In particular, assuming i.i.d. data or pairwise dependence always leads to a strong rejection of the null, except for 2015-2016.101010 A concern is that if the data are actually i.i.d. (or, more generally, pairwise dependent), our bootstrap is conservative, which would explain the discrepancy between the p-values under pariwise dependence and non-degenerate joint exchangeability. Using the methodology in Section LABEL:sub:test_of_i_i_d_data of the supplement, we test for pairwise dependence. For the eight years we consider, the null hypothesis is rejected at all standard levels, with p-values always smaller than $10^{-4}$ . Clustering along exporters also leads to artificially small p-values, in particular for the pairs 2013-2014, 2014-2015 and 2016-2017. In this context, clustering along importers leads to results that are closer to those based on dyadic data.

4.2 Estimation of a gravity equation

Second, we revisit Santos Silva and Tenreyro (2006), who estimate the so-called gravity equation for international trade. Omitting the year index, this gravity equation states that $T_{i_{1},i_{2}}$ satisfies

[TABLE]

where $G_{i}$ denotes country $i$ ’s GDP, which would correspond to the mass of $i$ in a traditional gravity equation, $D_{i_{1},i_{2}}$ denotes the distance between $i_{1}$ and $i_{2}$ , $A_{i_{1},i_{2}}$ are additional control variables and $\eta_{i_{1},i_{2}}$ is an unobserved term.

To estimate $\theta_{0}=(\alpha_{0},...,\alpha_{3},\beta^{\prime})^{\prime}$ , Santos Silva and Tenreyro (2006) suggest to use the Poisson pseudo maximum likelihood (PPML for short) estimator $\widehat{\theta}$ . The idea, formalized in Gourieroux et al. (1984), is that with i.i.d data, the PPML estimator is consistent and asymptotically normal for $\theta_{0}$ even if $T_{\bm{i}}$ does not follow a Poisson model, provided that $\mathbb{E}\left[\eta_{\bm{i}}|X_{\bm{i}}\right]=1$ , with $X_{\bm{i}}=(1,\ln(G_{i_{1}}),\ln(G_{i_{2}}),\ln(D_{\bm{i}}),A_{\bm{i}})$ . This is because the PPML estimator is based on the empirical counterpart of

[TABLE]

and this equality holds true if $\mathbb{E}\left[\eta_{\bm{i}}|X_{\bm{i}}\right]=1$ .

Now, assuming as in Santos Silva and Tenreyro (2006) that the variables $(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{2}}$ (with $Y_{\bm{i}}=(T_{\bm{i}},X_{\bm{i}})$ ) are i.i.d. is restrictive. We suppose instead that Assumption 1 holds. Then Theorem 2.3 applies to this setting, implying that $\widehat{\theta}$ is still consistent and asymptotically normal in this case.111111In this case, $\mathcal{H}=\{1,...,\text{dim}(X_{\bm{i}})\}$ and $\psi_{\theta,h}(Y_{\bm{i}})=X_{h,\bm{i}}(T_{\bm{i}}-\exp(X_{\bm{i}}\theta_{0}))$ . Then the key conditions 2 and 3 in Theorem 2.3 are satisfied as soon as $\Theta$ is bounded, see e.g. Example 19.7 in van der Vaart (2000). Nonetheless, the rates of convergence and asymptotic variance are different in the two cases, resulting in different inference on $\theta_{0}$ .121212The same application has been considered by Graham (2019), who shows, assuming convergence of a certain sample average, the asymptotic normality of the PPML estimator under the same dependence structure as ours. On the other hand, he neither considers bootstrap-based inference nor proves the consistency of his (asymptotic) variance estimator.

We use the same dataset as Santos Silva and Tenreyro (2006), which covers 136 countries for year 1990, and consider the exact same specification as the one they use in their Table 3. In this specification, the additional control variables $A_{\bm{i}}$ include exporter- and importer-level variables, namely their GDP per capita, a dummy variable equal to one if countries are landlocked and a remoteness index, which is the log of GDP-weighted average distance to all other countries. It also includes variables at the pair level, namely dummy variables for contiguity, common language, colonial tie, free-trade agreement and openness. This openness dummy is equal to one if at least one country is part of a preferential trade agreement. We refer to Santos Silva and Tenreyro (2006) for additional details.

Table 2 below presents the results. The first column displays the point estimates, which, as expected, are identical to those in Santos Silva and Tenreyro (2006). The other columns display the p-values for the null hypothesis that $\theta_{0j}$ , the $j$ -th component of $\theta_{0}$ , is equal to 0. We consider the same forms of dependence as with the KS test above. Under joint exchangeability, we compute the p-value $p_{j}$ for $\theta_{0j}=0$ using $p_{j}=\mathbb{P}\left(|\widehat{\theta}_{j}^{*}-\widehat{\theta}_{j}|>|\widehat{\theta}_{j}|\big{|}(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{n,k}}\right)$ . For other forms of dependence, we follow the usual practice of computing the p-values using the asymptotic normality of $\widehat{\theta}_{j}$ and estimators of the asymptotic variance under these various dependence structures.

Using our bootstrap leads to much larger p-values than under the i.i.d. assumption. Only the log of distance and the log of GDP of the exporter and the importer appear to be significant at the $10^{-3}$ levels, whereas five additional control variables are significant at that level under the i.i.d. assumption. In particular, common language and importer’s remoteness are not even significant at the usual 5% level.131313 As in Footnote 10 above, we test for pairwise dependence, to see whether our results could be driven by the fact that our bootstrap is conservative in such cases. We obtain a p-value smaller than $10^{-4}$ and thus reject this hypothesis at all usual levels. Interestingly, there is also a gap between assuming one-way clustering, either at the exporter or at the importer level, and assuming to have a jointly exchangeable and dissociated array. In the former case, we still have seven variables that are significant at the $10^{-3}$ levels. Confidence intervals, not displayed here, lead to similar conclusions. In particular, compared to the average length of i.i.d.-based 95% confidence intervals, those based on pairwise clustering are only 8% wider. Those based on one-way clustering on exporters (resp. importers) are 20% (resp. 17%) larger. On the other hand, those based on Assumption 1 are 136% wider.

5 Conclusion

While polyadic data are increasingly used in applied work, and empirical researchers routinely account for multiway clustering when computing standard errors, the statistical theory behind these forms of dependence has lagged behind. Following Bickel and Chen (2009) and Menzel (2019), we link these dependence structures to jointly and separately exchangeable arrays. Using representation results for such arrays, we then prove uniform laws of large numbers and central limit theorems. These results imply consistency and asymptotic normality of various nonlinear estimators under such dependence. We also establish the general validity of natural extensions of the standard nonparametric bootstrap to such arrays. Our application shows that using those bootstrap schemes may make a large difference compared to assuming i.i.d. data or clustering along a single dimension, as has often been done.

One caveat is that for the bootstrap confidence intervals to be valid, the asymptotic variance of the estimator should be positive. This may not be the case, for instance if the data $(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{k}}$ are actually i.i.d. Inference based on the wild bootstrap without this positivity condition has been studied for sample averages under multiway clustering by Menzel (2019). How to conduct inference on nonlinear estimators under joint exchangeability or multiway clustering without this positivity condition remains an avenue for future research.

Appendix A Key lemmas

We first state the symmetrisation lemma. Let $(\varepsilon_{A})_{A\subset\mathbb{N}^{+}}$ denote Rademacher independent variables, independent of $\left(Y_{\bm{i}}\right)_{\bm{i}\in\mathbb{I}_{k}}$ . Then:

Lemma A.1.

Suppose that Assumptions 1-2 hold and $P|f|<\infty$ for all $f\in\mathcal{F}$ . Then there exist real numbers $C_{1,k},...,C_{k,k}$ depending only on $k$ and $(Y_{\bm{i}}^{1})_{\bm{i}\in\mathbb{I}_{k}}$ ,…, $(Y_{\bm{i}}^{k})_{\bm{i}\in\mathbb{I}_{k}}$ , jointly exchangeable and dissociated arrays with $Y_{\bm{1}}^{j}\overset{d}{=}Y_{\bm{1}}$ for all $j\in\{1,...,k\}$ , satisfying

[TABLE]

Though more complicated than its i.i.d. version (see e.g. Lemma 2.3.1 in van der Vaart and Wellner, 1996), it serves the exact same purpose in the proofs of Theorems 2.1-2.2: conditional on the $\left(Y_{\bm{i}}^{r}\right)_{\bm{i}\in\mathbb{I}_{k}}$ , the process $f\mapsto\sum_{\bm{i}\in\mathbb{I}_{n,k}}\varepsilon_{\{\bm{i}\odot\bm{e}^{\prime}\}^{+}}f\left(Y_{\bm{i}}^{r}\right)$ is sub-Gaussian. In view of the AHK representation, the terms $\varepsilon_{\{\bm{i}\odot\bm{e}^{\prime}\}^{+}}$ could be expected. Given the aforementioned link with U-statistics, Lemma A.1 can also be seen as a generalization of the symmetrization lemma for U-processes for non-degenerate cases, see in particular Theorem 3.5.3 in de la Peña and Giné (1999).

The proof of Lemma A.1 crucially hinges upon the following decoupling inequality, which may be of independent interest. Hereafter, we let $\mathcal{A}_{r}=\{A\subseteq\{1,...n\}:|A|=r\}$ .

Lemma A.2.

Let $r\leq k$ , $\left(W_{A}\right)_{A\in\mathcal{A}_{r}}$ be a family of i.i.d. random variables with values in a Polish space $\mathcal{S}$ and $\left(W^{(j)}_{A}\right)_{A\in\mathcal{A}_{r}}$ , $j=1,...,|\mathcal{E}_{r}|$ be some independent copies of this family. Let $\Phi$ be a non-decreasing convex function from $\mathbb{R}^{+}$ to $\mathbb{R}$ and $\ell$ be a bijection from $\mathcal{E}_{r}$ to $\{1,...,|\mathcal{E}_{r}|\}$ . Let $\mathcal{H}$ be a pointwise measurable class of functions from $\mathcal{S}^{|\mathcal{E}_{r}|}\times\mathbb{I}_{n,k}$ to $\mathbb{R}$ such that $\mathbb{E}\left(\sup_{h\in\mathcal{H}}\left|h\left(\left(W_{\{\bm{i}\odot\bm{e}\}^{+}}\right)_{\bm{e}\in\mathcal{E}_{r}},\bm{i}\right)\right|\right)<\infty$ . Finally, let $L_{r}=\left(3|\mathcal{E}_{r}|^{|\mathcal{E}_{r}|}\right)^{|\mathcal{E}_{r}|-1}$ . Then

[TABLE]

The proof is given in the supplement. This result generalizes the decoupling inequality for $U$ -statistics of de la Peña (1992) to our setting. As with $U$ -statistics, it is possible to obtain a reverse inequality if $r\in\{1,k-1,k\}$ and $\pi\mapsto h\left(\left(W_{\{\bm{i}_{\pi}\odot\bm{e}\}^{+}}\right)_{\bm{e}\in\mathcal{E}_{r}},\bm{i}_{\pi}\right)$ is constant on $\mathfrak{S}_{k}$ , for all $h\in\mathcal{H}$ . With such a reverse inequality, it is possible to replace $Y_{\bm{i}}^{r}$ by $Y_{\bm{i}}$ in Lemma A.1. It is unclear to us, however, whether this reverse inequality still holds if $r\not\in\{1,k-1,k\}$ (implying $k\geq 4$ ). The key argument for the reverse inequality in de la Peña (1992) is that by the symmetry condition above, we can replace $h\left(\left(W_{\{\bm{i}_{\pi}\odot\bm{e}\}^{+}}\right)_{\bm{e}\in\mathcal{E}_{r}},\bm{i}_{\pi}\right)$ by an average over $k!$ terms. However, for the proof to extend to our setting, one would need an average over $|\mathcal{E}_{r}|!$ terms. This is not possible in general when $|\mathcal{E}_{r}|>k$ , which is the case when $r\not\in\{1,k-1,k\}$ .

Next, in order to prove the convergence of the empirical process under the bracketing entropy condition (Assumption 4-(ii)), we establish the following maximal inequality, which is very close to that of Giné and Nickl (2015) for i.i.d. data (see their Lemma 3.5.12).

Lemma A.3.

Suppose that Assumption 1 holds. Let $(f_{j})_{1\leq j\leq N}$ be real-valued functions and $\mathcal{F}=\{x\mapsto ef_{j}(x),e\in\{-1,1\},j=1,...,N\}$ . Then:

[TABLE]

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Aldous (1981) Aldous, D. J. (1981), ‘Representations for partially exchangeable arrays of random variables’, Journal of Multivariate Analysis 11 (4), pp. 581–598.
3Arcones and Gine (1992) Arcones, M. A. and Gine, E. (1992), ‘On the bootstrap of u 𝑢 u and v 𝑣 v statistics’, Annals of Statistics 20 (2), 655–674.
4Arcones and Giné (1994) Arcones, M. A. and Giné, E. (1994), ‘U-processes indexed by vapnik-červonenkis classes of functions with applications to asymptotics and bootstrap of u-statistics with estimated parameters’, Stochastic Processes and their Applications 52 (1), 17–38.
5Arcones and Giné (1993) Arcones, M. and Giné, E. (1993), ‘Limit theorems for U-processes’, The Annals of Probability 21 (3), pp. 1494–1542.
6Barbe and Bertail (1995) Barbe, P. and Bertail, P. (1995), The weighted bootstrap , Vol. 98, Springer-Verlag New York.
7Bertail et al. (2017) Bertail, P., Chautru, E. and Clémençon, S. (2017), ‘Empirical processes in survey sampling with (conditional) poisson designs’, Scandinavian Journal of Statistics 44 (1), 97–111.
8Bertrand et al. (2004) Bertrand, M., Duflo, E. and Mullainathan, S. (2004), ‘How much should we trust differences-in-differences estimates?’, The Quarterly Journal of Economics 119 (1), 249–275.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Abstract

1 Introduction

2 The set up and main results

2.1 Set up

Assumption 1**.**

Lemma 2.1**.**

2.2 Uniform laws of large numbers and central limit theorems

Assumption 2**.**

Assumption 3**.**

Assumption 4**.**

Theorem 2.1**.**

Proposition 2.1**.**

2.3 Convergence of the bootstrap process

Theorem 2.2**.**

2.4 Application to nonlinear estimators

Theorem 2.3**.**

Theorem 2.4**.**

3 Extensions

3.1 Degenerate cases

Assumption 5**.**

Theorem 3.1**.**

3.2 Further results on the bootstrap

Theorem 3.2**.**

Theorem 3.3**.**

3.3 Separately exchangeable arrays

Assumption 6**.**

Theorem 3.4**.**

4 Applications to international trade

4.1 Evolution of international trade

4.2 Estimation of a gravity equation

5 Conclusion

Appendix A Key lemmas

Lemma A.1**.**

Lemma A.2**.**

Lemma A.3**.**

Assumption 1.

Lemma 2.1.

Assumption 2.

Assumption 3.

Assumption 4.

Theorem 2.1.

Proposition 2.1.

Theorem 2.2.

Theorem 2.3.

Theorem 2.4.

Assumption 5.

Theorem 3.1.

Theorem 3.2.

Theorem 3.3.

Assumption 6.

Theorem 3.4.

Lemma A.1.

Lemma A.2.

Lemma A.3.