Empirical Process Results for Exchangeable Arrays
Laurent Davezies, Xavier D'Haultfoeuille, Yannick Guyonvarch

TL;DR
This paper establishes uniform laws of large numbers and central limit theorems for exchangeable arrays, which model dependence in dyadic data and multiway clustering, extending classical results to dependent data structures.
Contribution
It provides the first uniform laws of large numbers and CLTs for exchangeable arrays, including bootstrap convergence, under conditions similar to i.i.d. data.
Findings
Proves uniform laws of large numbers for exchangeable arrays.
Establishes central limit theorems for exchangeable arrays.
Demonstrates bootstrap convergence for dependent array data.
Abstract
Exchangeable arrays are natural tools to model common forms of dependence between units of a sample. Jointly exchangeable arrays are well suited to dyadic data, where observed random variables are indexed by two units from the same population. Examples include trade flows between countries or relationships in a network. Separately exchangeable arrays are well suited to multiway clustering, where units sharing the same cluster (e.g. geographical areas or sectors of activity when considering individual wages) may be dependent in an unrestricted way. We prove uniform laws of large numbers and central limit theorems for such exchangeable arrays. We obtain these results under the same moment restrictions and conditions on the class of functions as those typically assumed with i.i.d. data. We also show the convergence of bootstrap processes adapted to such arrays.
| Pairs of | KS test | p-values under different assumptions | ||||
| years | statistic | i.i.d. | P.W. cl. | E. cl. | I. cl. | dyadic |
| 2012-2013 | 0.048 | |||||
| 2013-2014 | 0.018 | 0.026 | 0.038 | |||
| 2014-2015 | 0.022 | 0.005 | 0.007 | |||
| 2015-2016 | 0.002 | 0.44 | 0.391 | 0.377 | 0.951 | 0.998 |
| 2016-2017 | 0.012 | 0.215 | 0.254 | |||
| 2017-2018 | 0.045 | |||||
| Notes: data from the Comtrade database. “cl.”, “E”, “I” and “P.W.” stand for clustering, exporter, importer and pairwise, respectively. The p-values were obtained with 1,000 bootstrap samples. | ||||||
| p-values under different assumptions | ||||||
| Variable | Estimator | i.i.d | P.W. cl. | E. cl. | I. cl. | dyadic |
| Log(E’s GDP) | 0.732 | |||||
| Log(I’s GDP) | 0.741 | |||||
| Log(E’s PCGDP) | 0.157 | 0.003 | 0.04 | 0.001 | 0.078 | |
| Log(I’s PCGDP) | 0.135 | 0.003 | 0.004 | 0.055 | 0.076 | |
| Log of distance | -0.784 | |||||
| Contiguity | 0.193 | 0.064 | 0.16 | 0.112 | 0.077 | 0.461 |
| Common-language | 0.746 | 0.056 | ||||
| Colonial-tie | 0.025 | 0.867 | 0.902 | 0.891 | 0.882 | 0.952 |
| Landlocked E | -0.863 | 0.004 | ||||
| Landlocked I | -0.696 | 0.011 | ||||
| E’s remoteness | 0.66 | 0.036 | ||||
| I’s remoteness | 0.562 | 0.003 | 0.004 | 0.105 | ||
| P-T agreement | 0.181 | 0.041 | 0.117 | 0.054 | 0.122 | 0.456 |
| Openness | -0.107 | 0.416 | 0.522 | 0.498 | 0.453 | 0.771 |
| Notes: data from Santos Silva and Tenreyro (2006), same specification as in their Table 3. “cl.”, “E”, “I”, “PCGDP”, “P-T”, “P.W.” stand for clustering, exporter, importer, per capita GPD, preferential-trade and pairwise, respectively. The p-values for the last column were obtained with 1,000 bootstrap samples. | ||||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Empirical Process Results for Exchangeable Arrays††thanks: We are grateful to anonymous referees and an associate editor for their thoughtful comments that improved the paper. We would also like to thank Stéphane Bonhomme, Bryan Graham, Isabelle Méjean, Pedro Sant’ Anna and participants at various seminars and conferences for their remarks.
Laurent Davezies CREST-ENSAE, [email protected]
Xavier D’Haultfœuille CREST-ENSAE. [email protected]
Yannick Guyonvarch CREST-ENSAE. [email protected]
Abstract
Exchangeable arrays are natural tools to model common forms of dependence between units of a sample. Jointly exchangeable arrays are well suited to dyadic data, where observed random variables are indexed by two units from the same population. Examples include trade flows between countries or relationships in a network. Separately exchangeable arrays are well suited to multiway clustering, where units sharing the same cluster (e.g. geographical areas or sectors of activity when considering individual wages) may be dependent in an unrestricted way. We prove uniform laws of large numbers and central limit theorems for such exchangeable arrays. We obtain these results under the same moment restrictions and conditions on the class of functions as those typically assumed with i.i.d. data. We also show the convergence of bootstrap processes adapted to such arrays.
Keywords: exchangeable arrays, empirical processes, bootstrap.
1 Introduction
Taking into account dependence between observations is crucial for making correct inference. For instance, different observations may face common shocks, tending to correlate them positively and thus leading to overly optimistic inference when ignored (Bertrand et al., 2004). Such common shocks may arise if the data are polyadic (e.g., dyadic), namely they involve interactions between several units of a given population. An example is international trade, where each observation corresponds to a pair of countries, one exporting and the other importing. We can then expect that two such pairs may be dependent whenever they share at least one country, because of that country’s specificities in terms of international trade. Common shocks may also correspond to aggregate fluctuations that affect all units sharing some characteristics. For instance, wages of two individuals may be correlated either because they live in the same geographical area, or because they work in the same sector. We refer to multiway clustering when there are several dimensions along which units may be correlated.
Holland and Leinhardt (1976), Fafchamps and Gubert (2007) derived variance formulas for linear regressions with dyadic data, while Cameron et al. (2011) propose similar formulas for multiway clustering. The Stata command ivreg2 and the R package multiwaycov are now used routinely to report standard errors accounting for multiway clustering. However, theory has lagged behind this practice. Tabord-Meehan (2019) shows the asymptotic validity of inference based on Holland and Leinhardt’s suggestion for dyadic data, but for OLS estimators only. Graham (2019) and Graham et al. (2019) study respectively parametric regressions and density estimation with dyadic data. Regarding multiway clustering, the only papers we are aware of are the recent works of Menzel (2019) and MacKinnon et al. (2019). Again, they focus on linear parameters.111On the other hand and interestingly, Menzel (2019) studies inference both with and without asymptotically normality. He also shows that refinements in asymptotic approximations are possible using the wild bootstrap.
In this paper, we establish uniform laws of large numbers (LLN) and central limit theorems (CLT) for such type of data. Uniform LLNs and CLTs are key in showing consistency and asymptotic normality of nonlinear estimators under weak regularity conditions. As such, they have been studied extensively with i.i.d. but also dependent data. We refer to, e.g., van der Vaart and Wellner (1996) and Giné and Nickl (2015) for overviews with i.i.d. data, and Dehling and Philipp (2002) for the case of time series (see also, e.g., Bertail et al., 2017; Han and Wellner, 2019, for recent results on sampling designs). Noteworthy, we obtain these uniform LLNs and CLTs under the same moment restrictions and conditions on the class of functions as those usually considered with i.i.d. data. Thus, statistical results deducted from the uniform LLNs and CLTs with i.i.d. data directly extend to the exchangeable arrays we consider. As a proof of concept, we consider Z-estimators and smooth functionals of the empirical cumulative distribution function (cdf).
We also study consistency of a direct generalization of the standard bootstrap for i.i.d. data to polyadic data. A related bootstrap scheme for multiway clustering is the so-called pigeonhole bootstrap, suggested by McCullagh (2000) and studied by Owen (2007), but for which no uniform result has been established so far. For both, we establish weak convergence of the corresponding process. These results imply the validity of the corresponding bootstrap schemes in a wide range of setting, including Z-estimators and smooth functionals of the empirical cdf.
To prove these results, we first argue that polyadic data correspond to dissociated, jointly exchangeable arrays. Similarly, multiway clustering corresponds to dissociated separately exchangeable arrays. We then rely extensively on the so-called Aldous-Hoover-Kallenberg representation (Hoover, 1979; Aldous, 1981; Kallenberg, 1989) for such arrays. This representation allows us in particular to prove a symmetrization lemma, which is very useful to derive the uniform LLNs and CLTs. This lemma generalizes a similar result for i.i.d. data, but also for U-processes (see, e.g. de la Peña and Giné, 1999, Theorem 3.5.3). Note that simple LLNs and CLTs have been already proved, or are direct consequences of known results on dissociated, jointly exchangeable arrays. For LLNs, we refer to Eagleson and Weber (1978) and Lemma 7.35 in Kallenberg (2005). For CLTs, see Silverman (1976). But to our knowledge, no abstract uniform LLNs and CLTs have been proved so far for such arrays.
Finally, we illustrate our results with two applications to international trade. In the first, we test whether international trade remains stable from one year to another, using a Kolmogorov-Smirnov test. Given the dependence structure over pairs of countries and through time, the asymptotic distribution of the test under the null is complicated, making the bootstrap attractive. We show that neglecting the dependence between dyads leads to important overrejection of the null hypothesis. Next, we estimate the so-called gravity equation, a very popular model for explaining trade between countries. Since Santos Silva and Tenreyro (2006), this equation has often been estimated with Poisson pseudo maximum likelihood, an estimator for which our results apply. Again, much fewer explanatory variables are significant at usual levels when accounting for dependence between pairs of countries than when considering such pairs to be i.i.d. observations (as in Santos Silva and Tenreyro, 2006).
The paper is organized as follows. Section 2 describes the set-up and gives our main results for jointly exchangeable arrays. In addition to uniform LLNs and CLTs, we prove weak convergence of our bootstrap scheme. We also show results for Z-estimators and smooth functionals of the empirical cdf. Section 3 considers a few extensions. In particular, we study separately exchangeable arrays. An important difference for such arrays is that the multiple dimensions, corresponding to different sources of clustering, may not grow at the same rate. We show that our results still hold in this case. We also study “degenerate” cases (in the same sense as with U-processes) and consider another bootstrap scheme. The two applications to international trade are developed in Section 4. The appendix presents three key lemmas. In the supplementary material, we present additional extensions. In particular, we generalize our main results to cases where the number of observations for each -tuple (e.g., the number of matches between two sport players) varies. We also display Monte Carlo simulations and all the proofs of our results.
2 The set up and main results
2.1 Set up
Before formally defining our data generating process, we introduce some notation. For any and for some , we let and
[TABLE]
We then let denote the set of -tuples of without repetition. Similarly, for any , we let . For any and in , we let . With a slight abuse of notation, we also let, for any , denote the set of distinct elements of . For any , we let
[TABLE]
Finally, for any , we let denote the set of permutations on . For any and , we let .
We are interested in polyadic data, that is to say random variables (whose support is denoted by ) indexed by . Dyadic data, which are the most common case, correspond to . For instance, when considering trade data, corresponds to export flows from country to country . In network data, could be a dummy for whether there is a link from to . In directed networks, , while in undirected networks. Similarly, could capture whether forms a triad or not (see, e.g. Wasserman and Faust, 1994, for a motivation on triad counts). could also correspond to data subject to multiway clustering. Then ,…, are the indexes corresponding to the different dimensions of clustering, for instance geographical areas and sectors of activity. In such cases, however, adaptations of our set-up are needed, and we postpone this discussion to Section 3.3 below.
We assume that the random variables are generated according to a jointly exchangeable and dissociated array, defined formally as follows:
Assumption 1**.**
For any , . Moreover, for any disjoint subsets of with , is independent of .
The first part imposes that the labelling conveys no information: the joint distribution of the data remains identical under any possible permutation of the labels. The second part states that the array is dissociated: the variables are independent if they share no unit in common. For instance, must be independent of if . On the other hand, Assumption 1 does not impose independence otherwise. This is important in many applications. In the international trade example, and are likely to be dependent because if is open to international trade, it tends to export more than the average to any other country. It may also import more from other countries, meaning that and could also be dependent.
Lemma 2.1 below is very helpful to better understand the dependence structure imposed by joint exchangeability and dissociation. It may be seen as an extension of de Finetti’s theorem to arrays satisfying such restrictions. It is also key in establishing our asymptotic results below.
Lemma 2.1**.**
Assumption 1 holds if and only if there exist i.i.d. variables and a measurable function such that almost surely,222In this formula, the appear according to a precise ordering, which we let nonetheless implicit as it bears no importance hereafter.
[TABLE]
This result is due to Kallenberg (1989) but a weaker version, where the equality only holds in distribution, is known as Aldous-Hoover representation (Aldous, 1981; Hoover, 1979). Accordingly, we refer to (2.1) as the AHK representation hereafter. To illustrate it, let us consider dyadic data (). Then, according to Lemma 2.1, we have, for every ,
[TABLE]
Thus, in the example of trade flows, the volume of exports from to depends on factors specific to and , such as their own GDP, but also on factors relating both, such as the distance between the two countries. (2.2) has been also used by Bickel and Chen (2009) and Bickel et al. (2011) to model network formation (in which case if there is a link between and , 0 otherwise). Note also the link between (2.2) and U-statistics: would correspond to such a statistic if did not depend on its third argument.
Under Assumption 1, the have a common marginal probability distribution, which we denote by . We are interested in estimating and making inference on features of this distribution, such as its expectation or a quantile, based on observing the first units only, namely the sample , with .
2.2 Uniform laws of large numbers and central limit theorems
Let denote a class of real-valued functions admitting a first moment with respect to the distribution and let denote the corresponding moment (with the tuple ). To avoid measurability issues and the use of outer expectations subsequently, we maintain the following assumption:
Assumption 2**.**
There exists a countable subclass such that elements of are pointwise limits of sequences of elements of .
Assumption 2 is not necessary but often imposed (see, e.g. Chernozhukov et al., 2014; Kato, 2019). We refer to Kosorok (2006, pp.137-140) for further discussion.
In this section, we study the empirical measure and the empirical process defined on by
[TABLE]
[TABLE]
Let denote the set of bounded functions on . We prove below that under restrictions on , converges almost surely to uniformly over , while converges weakly in to a Gaussian process. We refer to, e.g., van der Vaart and Wellner (1996) for a formal definition of weak convergence of empirical processes. These results, stronger than pointwise convergence of and , are key in establishing the consistency and asymptotic normality of, e.g., smooth functionals of the empirical cdf or Z- and M-estimators. We consider briefly applications in Section 2.4 below, and refer to Part 3 of van der Vaart and Wellner (1996) for a more comprehensive review of statistical applications of empirical process results.
We use the rate to normalize , though we have different random variables. In general, we cannot expect a better rate of convergence. To see this, let be i.i.d. random variables and let . Then satisfies Assumption 1, and boils down to an average over i.i.d. terms only. In some cases, however, for instance if the are i.i.d., the convergence rate is faster than .333 As with U-statistics, we expect different rates depending on the degree of “degeneracy”. Theorem 2.1 below remains valid in such cases, but the limit Gaussian process is then degenerate. We come back in more details to such cases in Section 3.1 below.
Let us now introduce the restrictions on that we use to obtain uniform laws. We require additional notation for that purpose. For any and any seminorm on a space containing , denotes the minimal number of -closed balls of radius with centers in needed to cover . denotes the minimal number of -brackets needed to cover , where an -bracket for is a pair of functions such that and . The seminorms we consider hereafter are for any and probability measure or cdf . Hereafter, an envelope of is a measurable function satisfying . Finally, we let denote the set of probability measures with finite support on .
Assumption 3**.**
The class either:
- (i)
admits an envelope with and ,
[TABLE] 2. (ii)
or satisfies for all .
Assumption 4**.**
The class either:
- (i)
admits an envelope with and
[TABLE] 2. (ii)
or satisfies .
Assumptions 3 and 4 are exactly the same as the conditions often imposed with i.i.d. data to show uniform LLNs and CLTs (see, e.g., Theorems 19.4, 19.5, 19.13 and 19.14 in van der Vaart, 2000).444In van der Vaart (2000), the supremum in Assumptions 3 and 4 is taken over the set of probability measures with finite support on and such that . This additional restriction is simply due to a different convention in constructing covering numbers, as van der Vaart considers open balls while we use closed balls, following, e.g., Kato (2019). In particular, Assumption 4-(i) (resp. (ii)) imposes a condition on what is usually referred to as the uniform (resp. bracketing) entropy integral, see, e.g., van der Vaart and Wellner (1996). Finiteness of the uniform entropy integral is satisfied by any VC-type class of functions (see Chernozhukov et al., 2014, for a definition), or by the convex hull of such classes under some restrictions. The bracketing entropy integral is finite for instance for classes of monotone or Hölder continuous functions (see, e.g. van der Vaart and Wellner, 1996).
The following theorem establishes uniform LLNs and CLTs under these two conditions. We denote by the tuple .
Theorem 2.1**.**
Suppose that Assumptions 1-2 hold. Then:
If Assumption 3 holds, tends to 0 a.s. and in . 2. 2.
If Assumption 4 holds, the process converges weakly in to a centered Gaussian process on as tends to infinity. Moreover, the covariance kernel of satisfies:
[TABLE]
The proof is in Section LABEL:sub:proof_thm_unif of the supplement. When Assumption 3-(ii) holds, Part 1 can be proved by essentially combining Theorem 3 in Eagleson and Weber (1978) and Lemma 7.35 in Kallenberg (2005). Part 2 was also proved for a finite by Silverman (1976). But the weak convergence result under the bracketing entropy condition, and the uniform laws under the uniform entropy conditions, do not follow from such results. To prove the former, we adapt a maximal inequality in Giné and Nickl (2015, see their Lemma 3.5.12) to our context. To this end, we show that Hoeffding’s bound on U-statistic (Hoeffding, 1963, Section 5.a) still applies to our context.
To prove the results under the uniform entropy conditions, the key ingredient, as with i.i.d. data, is a symmetrization lemma stated in Appendix A below and proved in the supplement. Its proof relies extensively on Lemma 2.1 and a decoupling inequality that may be of independent interest (see Lemma A.2). The latter result generalizes a similar inequality for U-processes (see de la Peña, 1992). In the proofs of both lemmas, we follow similar strategies as with U-processes, with two complications. First, even with , does not only depend on and , but also on . Second, when , dependence between observations arises not only because of single-unit terms such as or , but also because of multiple-unit terms such as .
As in the i.i.d. case, Assumption 3 is actually stronger than necessary to obtain the uniform law of large numbers. The following proposition gives an exact characterization, where, for simplicity, we restrict to . It is similar to the characterization for i.i.d. data (see, e.g. Theorem 3.7.4 in Giné and Nickl, 2015) or for U-processes (see Theorem 5.2.2 in de la Peña and Giné, 1999). Let us introduce the following norms:
[TABLE]
Proposition 2.1**.**
Suppose that Assumptions 1-2 hold and admits an envelop with . Then if and only if both and tend to [math] in outer probability.555For a definition of convergence in outer probability or outer almost-sure convergence considered below, see e.g. Chapter 1.9 in van der Vaart and Wellner (1996).
Proposition 2.1 emphasizes the two aspects of dissociated, exchangeable arrays. The first is i.i.d. variations, through the random entropy term related to , which only involves . The second is U-statistic like variations, through the random entropy term related to : up to negligible terms, only depends on . Key in establishing the necessity of these two conditions is a weak converse of the symmetrization lemma for , see Equation (LABEL:eq:desym) in the supplement.
2.3 Convergence of the bootstrap process
We now study the properties of the following bootstrap sampling scheme, which extends the pigeonhole bootstrap (McCullagh, 2000; Owen, 2007) to jointly separable arrays:
units are sampled independently in with replacement and equal probability. denotes the number of times unit is sampled. 2. 2.
the tuple is then selected times in the bootstrap sample.
Then we consider and , defined on by
[TABLE]
[TABLE]
Asymptotic validity of the bootstrap amounts to showing that conditional on the data , converges weakly to the process defined in Theorem 2.1.666For the sake of brevity, we focus afterwards on convergence results under the sole uniform entropy condition (Assumption 4-(i)). As discussed in, e.g., van der Vaart and Wellner (1996, Chapter 3.6), the outer almost-sure conditional weak convergence boils down to proving
[TABLE]
where is the set of bounded and Lipschitz functions from to and “” denotes outer almost-sure convergence.
Theorem 2.2**.**
If Assumptions 1-2 and 4-(i) hold, the process converges weakly in to , conditional on and outer almost surely.
This theorem ensures the asymptotic validity of the bootstrap above not only for sample means, but also for smooth functionals of the empirical cdf and nonlinear estimators, as we shall see below. The proof of Theorem 2.2, in Section LABEL:ssec:boot_gen_case of the supplement, follows the same lines as that of Theorem 2.1, though some of the corresponding steps are more involved, as often with the bootstrap. In particular, to prove pointwise convergence, we use arguments in Lindeberg’s proof of the CLT for triangular arrays, Theorem 2.1.1 and Urysohn’s subsequence principle, combined with Prohorov’s theorem.
Note that in contrast with the standard bootstrap for i.i.d. data,
[TABLE]
However, the difference between and , the empirical measure with weights , becomes negligible as . Accordingly, we also show in the proof of Theorem 2.2 the almost-sure conditional convergence of , in addition to that of .
2.4 Application to nonlinear estimators
Theorem 2.1 ensures consistency and asymptotic normality of a large class of estimators. In turn, Theorem 2.2 shows that using the bootstrap for such estimators is asymptotically valid. To illustrate these points, we consider here two popular classes of estimators, namely Z-estimators and smooth functionals of the empirical cdf. Similar results could be obtained for, e.g., M-estimators (see, e.g. Cheng and Huang, 2010) or generalized method of moments estimators (see, e.g. Hansen, 1982).
Let us first consider Z-estimators. Let denote a normed space, endowed with the norm and let denote a class of real, measurable functions. Let , and . We let, for any real function on , . The parameter of interest , which satisfies , is estimated by . We also define as the bootstrap counterpart of . The following theorem extends Theorem 13.4 in Kosorok (2006) to jointly exchangeable and dissociated arrays. For related results on Z-estimators in the i.i.d. case, see Section 3.2 in van der Vaart and Wellner (1996) and Wellner and Zhan (1996).
Theorem 2.3**.**
Suppose that Assumption 1 holds and:
* implies for every in ;* 2. 2.
The class satisfies Assumptions 2-3, with the envelope function satisfying ; 3. 3.
There exists such that the class satisfies Assumptions 2 and 4, with an envelope function satisfying ; 4. 4.
; 5. 5.
* and for every ;* 6. 6.
* is Fréchet-differentiable at , with continuously invertible derivative .*
Then converges in distribution to a centered Gaussian process . Moreover, conditional on and almost surely, converges in distribution to .
Next, we consider smooth functionals of , the cdf of . Suppose that for some and , where is Hadamard differentiable (for a definition, see, e.g., van der Vaart and Wellner, 1996, Section 3.9.1). We estimate with , where denotes the empirical cdf of . Finally, we let denote the bootstrap counterpart of .
Theorem 2.4**.**
Suppose that is Hadamard differentiable at tangentially to a set , with derivative equal to . Suppose also that Assumption 1 holds. Then:
* converges weakly, as a process indexed by , to a Gaussian process with kernel satisfying*
[TABLE] 2. 2.
If with probability one,
[TABLE]
Moreover, conditional on and almost surely, converges in distribution to the same limit.
In practice, often corresponds to the set of functions that are continuous everywhere or at a certain point . This is the case for instance with for . In such cases, one can show that under the same condition as for i.i.d. data, namely that is continuous everywhere or at the point .
3 Extensions
We now consider several extensions to our main results. First, we study the asymptotic behavior of the properly normalized empirical process in degenerate cases where . Second, we establish additional results on the bootstrap. Third, we study separately, rather than jointly, separable arrays. Other extensions to arrays with multiple observations per -tuple and arrays where is defined even if there are identical indices in are considered in the supplement. We also develop therein a test that the data are in fact i.i.d.
3.1 Degenerate cases
We consider here situations where for all , focusing for simplicity on .777If for only some , we focus on . Such a degeneracy appears for instance if the variables in the array are actually i.i.d., in which case converges to a Gaussian process with covariance kernel . As another example (see Menzel, 2019; Bretagnolle, 1983), suppose that , with i.i.d. variables with , . Let also for a compact . Then one can easily see that converges weakly in to , with a standard normal variable.
More generally and as with U-processes (see, e.g. Arcones and Giné, 1993), when , the rate of convergence of is rather than and the asymptotic distribution may not be normal. For any , let be the Aldous-Hoover-Kallenberg representation where, without loss of generality, the variables in are assumed to be uniform on . Let for even and for odd. Then forms an orthonormal basis of . For all and any , we define by
[TABLE]
Let , and denote independent standard normal variables. We then define the process on by
[TABLE]
To prove the convergence of , we consider a condition on that slightly differs from Assumption 4-(i).
Assumption 5**.**
The class admits an envelope with and
[TABLE]
Assumption 5 is more stringent than Assumption 4-(i). A similar condition was also imposed by Arcones and Giné (1993) for degenerate U-processes of order 1, see their condition (5.1).
Theorem 3.1**.**
Suppose that , Assumptions 1-2 and 5 hold and for all . Then converges weakly in to .
As with degenerate U-processes (see Section 5 of Arcones and Giné, 1993), the limit process is a Gaussian chaos process. The result is based in particular on a symmetrization lemma and a maximal inequality taylored to these degenerate cases. Specifically, the symmetrized process only includes Rademacher variables at the pair level, or products of Rademacher variables. We refer to Lemmas SLABEL:lem:sym3 and SLABEL:lem:max_ineq_deg_case in the supplement for more details.
Finally, we note that the bootstrap process considered above does not generally converge to .888The same holds true for the multiplier bootstrap process considered below. With i.i.d. data, for instance, one can show that the variance of the bootstrapped mean converges to . We expect similar phenomena as with U statistics, where the bootstrap is known to fail in degenerate cases (Arcones and Gine, 1992; Arcones and Giné, 1994). In the close case of separately exchangeable arrays (see Section 3.3 below), Menzel (2019) shows that a suitable wild bootstrap is consistent for the sample average, whether or not we have degeneracy. Whether such a result generalizes to the empirical process is left for future research.
3.2 Further results on the bootstrap
Theorem 2.2 shows convergence of the bootstrap process under conditions on that ensure the convergence of the initial process . The following result shows that under moment conditions, convergence of is actually necessary for the convergence of to a Gaussian process.
Theorem 3.2**.**
Suppose that Assumptions 1-2 hold, for all and admits an envelope such that for some . Then, if conditional on and outer almost surely, the process converges weakly in to , a centered Gaussian process, the process also converges weakly in to .
Theorem 3.2 may be seen as a partial extension to jointly exchangeable arrays of Theorem 2.4 in Giné and Zinn (1990), which, with i.i.d. data, establishes the equivalence between the convergence of the bootstrap process and together with convergence of the initial process.
With i.i.d. data, several other bootstrap schemes than the multinomial bootstrap are possible: see, e.g., Barbe and Bertail (1995) for an extensive review. The situation is probably no different with jointly exchangeable arrays. To illustrate this, we consider a version of the multiplier bootstrap adapted to such data (see, e.g., Kosorok, 2003, for the case of i.i.d. data). Specifically, let be a sequence of i.i.d. random variables that are centered, have unit variance and are independent from the original data We then consider the following process:
[TABLE]
The next theorem shows the conditional weak convergence of under the same conditions on as previously.
Theorem 3.3**.**
Suppose that Assumptions 1-2 and 4-(i) hold and is i.i.d. with , . Then, conditional on and outer almost surely, the process converges weakly in to .
3.3 Separately exchangeable arrays
Up to now, we have considered cases where the units that interact stem from the same population. In some cases, however, they do not, because the populations differ. For instance, we may be interested only in relationships between men and women. In that case, the symmetry condition in Assumption 1 has to be strengthened: both the labelling of men and the labelling of women should be irrelevant. This corresponds to so-called separately exchangeable arrays, defined formally in Assumption 6 below. Another important motivation for considering separately exchangeable arrays is multiway clustering, namely dependence arising through different dimensions of clustering. For instance, wages of workers may be affected by local shocks or sector-of-activity shocks. In such cases, we observe , the wage of a worker in geographical area and sector of activity .999Oftentimes, we actually have several observations per cell, and the number varies from one cell to another. This extension is discussed in Section LABEL:sub:heterog of the supplement.
More generally, we consider in this section random variables where , implying that repetitions (e.g. ) are allowed. We impose the following condition on these random variables.
Assumption 6**.**
For any ,
[TABLE]
Moreover, for any , disjoint subsets of , is independent of .
This condition is stronger than Assumption 1 since it implies in particular equality in distribution for .
Let us redefine here as and let , where denotes the number of units observed in population (or cluster with multiway clustering). Note that in general, for . The sample at hand is then , where means that for all . Let . The empirical measure and empirical process that we consider for separately exchangeable arrays are:
[TABLE]
We also consider the “pigeonhole bootstrap”, suggested by McCullagh (2000) and studied, in the case of the sample mean and for particular models, by Owen (2007). This bootstrap scheme is very close to the one we considered in Section 2 for jointly exchangeable arrays, except that the weights are now independent from one coordinate to another:
For each , elements are sampled with replacement and equal probability in the set . For each in this set, let denote the number of times is selected this way. 2. 2.
The -tuple is then selected times in the bootstrap sample.
The bootstrap process is thus defined on by
[TABLE]
Henceforth, we consider the convergence of , and as tends to infinity. More precisely, as with multisample U-statistics (see, e.g. van der Vaart, 2000, Section 12.2), we assume that there is an index , left implicit hereafter, and increasing functions such that for all , as (we also assume without loss of generality that for all , for some ). The following theorem extends Theorems 2.1 and 2.2 to this set-up.
Theorem 3.4**.**
Suppose that Assumptions 2 and 6 hold and that for every , there exists such that . Then:
If Assumption 3 holds, tends to 0 a.s. and in . 2. 2.
If Assumption 4-(i) holds, the process converges weakly in to a centered Gaussian process on as tends to infinity. Moreover, the covariance kernel of satisfies:
[TABLE]
where is the -tuple with 2 in each entry but 1 in entry . 3. 3.
If Assumption 4-(i) holds, the process converges weakly to , conditional on and outer almost surely.
Theorem 3.4 includes the case where for some , corresponding to “strongly unbalanced” designs with different rates of convergence to along the different dimensions of the array. In that case, only the dimensions with the slowest rate of convergence contribute to the asymptotic distribution, as can be seen in (3.1).
Because the are not all equal in general, Theorem 3.4 does not follow directly from Theorem 2.1, even if Assumption 6 is stronger than Assumption 1. We prove the result by showing a simpler and convenient version of the symmetrization lemma in this setting. We refer to Lemma SLABEL:lem:sym2 in the supplement for more details.
4 Applications to international trade
Finally, we illustrate the importance of accounting for dependence in real dyadic data, through two applications to international trade data.
4.1 Evolution of international trade
There is a large interest in economics on the evolution of international trade. But before analyzing the causes and consequences of such an evolution, one must check that there is indeed some significant changes. In this first application, we test whether the distribution of exports remains the same between two consecutive years, using Comtrade data on all countries from 2012 to 2018. We use for that purpose the Kolmogorov-Smirnov (KS) test statistic
[TABLE]
where denotes the trade volume from country to country in year . Let us assume that Assumption 1 holds, with . Then, under the null hypothesis that the distributions of and are equal, we have, by Theorem 2.1, , with . Given the dependence structure both between pairs of countries and across time, the distribution of depends on the true data generating process. To estimate it, we rely on the recentered bootstraped test statistic:
[TABLE]
We compute the p-value of the test by \mathbb{P}\left(KS^{*}_{t}>KS_{t}\big{|}(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{n,k}}\right). For the sake of comparison, we also compute p-values based on alternative forms of dependence that have been considered in applied work on similar data. Specifically, we also assume that the variables are i.i.d. We then assume pairwise clustering, where and may be dependent, but and are independent if is not a permutation of . We also consider one-way clustering according to (and, similarly, according to ). In this case, and may be dependent, but and are independent as soon as , whether or not . For each of these cases, we use the bootstrap, but with different bootstrap schemes accounting for these different dependence structures.
The results are displayed in Table 1. They suggest significant changes in export volumes in some years but not all. In particular, international trade seems very stable between 2015 and 2017. There is some evidence of changes between 2012 and 2015 but we still do not reject the null hypothesis at the 1% level for the years 2013-2014. The other columns of the table shows the importance of accounting for dependence along both dimensions. In particular, assuming i.i.d. data or pairwise dependence always leads to a strong rejection of the null, except for 2015-2016.101010 A concern is that if the data are actually i.i.d. (or, more generally, pairwise dependent), our bootstrap is conservative, which would explain the discrepancy between the p-values under pariwise dependence and non-degenerate joint exchangeability. Using the methodology in Section LABEL:sub:test_of_i_i_d_data of the supplement, we test for pairwise dependence. For the eight years we consider, the null hypothesis is rejected at all standard levels, with p-values always smaller than . Clustering along exporters also leads to artificially small p-values, in particular for the pairs 2013-2014, 2014-2015 and 2016-2017. In this context, clustering along importers leads to results that are closer to those based on dyadic data.
4.2 Estimation of a gravity equation
Second, we revisit Santos Silva and Tenreyro (2006), who estimate the so-called gravity equation for international trade. Omitting the year index, this gravity equation states that satisfies
[TABLE]
where denotes country ’s GDP, which would correspond to the mass of in a traditional gravity equation, denotes the distance between and , are additional control variables and is an unobserved term.
To estimate , Santos Silva and Tenreyro (2006) suggest to use the Poisson pseudo maximum likelihood (PPML for short) estimator . The idea, formalized in Gourieroux et al. (1984), is that with i.i.d data, the PPML estimator is consistent and asymptotically normal for even if does not follow a Poisson model, provided that , with . This is because the PPML estimator is based on the empirical counterpart of
[TABLE]
and this equality holds true if .
Now, assuming as in Santos Silva and Tenreyro (2006) that the variables (with ) are i.i.d. is restrictive. We suppose instead that Assumption 1 holds. Then Theorem 2.3 applies to this setting, implying that is still consistent and asymptotically normal in this case.111111In this case, and . Then the key conditions 2 and 3 in Theorem 2.3 are satisfied as soon as is bounded, see e.g. Example 19.7 in van der Vaart (2000). Nonetheless, the rates of convergence and asymptotic variance are different in the two cases, resulting in different inference on .121212The same application has been considered by Graham (2019), who shows, assuming convergence of a certain sample average, the asymptotic normality of the PPML estimator under the same dependence structure as ours. On the other hand, he neither considers bootstrap-based inference nor proves the consistency of his (asymptotic) variance estimator.
We use the same dataset as Santos Silva and Tenreyro (2006), which covers 136 countries for year 1990, and consider the exact same specification as the one they use in their Table 3. In this specification, the additional control variables include exporter- and importer-level variables, namely their GDP per capita, a dummy variable equal to one if countries are landlocked and a remoteness index, which is the log of GDP-weighted average distance to all other countries. It also includes variables at the pair level, namely dummy variables for contiguity, common language, colonial tie, free-trade agreement and openness. This openness dummy is equal to one if at least one country is part of a preferential trade agreement. We refer to Santos Silva and Tenreyro (2006) for additional details.
Table 2 below presents the results. The first column displays the point estimates, which, as expected, are identical to those in Santos Silva and Tenreyro (2006). The other columns display the p-values for the null hypothesis that , the -th component of , is equal to 0. We consider the same forms of dependence as with the KS test above. Under joint exchangeability, we compute the p-value for using p_{j}=\mathbb{P}\left(|\widehat{\theta}_{j}^{*}-\widehat{\theta}_{j}|>|\widehat{\theta}_{j}|\big{|}(Y_{\bm{i}})_{\bm{i}\in\mathbb{I}_{n,k}}\right). For other forms of dependence, we follow the usual practice of computing the p-values using the asymptotic normality of and estimators of the asymptotic variance under these various dependence structures.
Using our bootstrap leads to much larger p-values than under the i.i.d. assumption. Only the log of distance and the log of GDP of the exporter and the importer appear to be significant at the levels, whereas five additional control variables are significant at that level under the i.i.d. assumption. In particular, common language and importer’s remoteness are not even significant at the usual 5% level.131313 As in Footnote 10 above, we test for pairwise dependence, to see whether our results could be driven by the fact that our bootstrap is conservative in such cases. We obtain a p-value smaller than and thus reject this hypothesis at all usual levels. Interestingly, there is also a gap between assuming one-way clustering, either at the exporter or at the importer level, and assuming to have a jointly exchangeable and dissociated array. In the former case, we still have seven variables that are significant at the levels. Confidence intervals, not displayed here, lead to similar conclusions. In particular, compared to the average length of i.i.d.-based 95% confidence intervals, those based on pairwise clustering are only 8% wider. Those based on one-way clustering on exporters (resp. importers) are 20% (resp. 17%) larger. On the other hand, those based on Assumption 1 are 136% wider.
5 Conclusion
While polyadic data are increasingly used in applied work, and empirical researchers routinely account for multiway clustering when computing standard errors, the statistical theory behind these forms of dependence has lagged behind. Following Bickel and Chen (2009) and Menzel (2019), we link these dependence structures to jointly and separately exchangeable arrays. Using representation results for such arrays, we then prove uniform laws of large numbers and central limit theorems. These results imply consistency and asymptotic normality of various nonlinear estimators under such dependence. We also establish the general validity of natural extensions of the standard nonparametric bootstrap to such arrays. Our application shows that using those bootstrap schemes may make a large difference compared to assuming i.i.d. data or clustering along a single dimension, as has often been done.
One caveat is that for the bootstrap confidence intervals to be valid, the asymptotic variance of the estimator should be positive. This may not be the case, for instance if the data are actually i.i.d. Inference based on the wild bootstrap without this positivity condition has been studied for sample averages under multiway clustering by Menzel (2019). How to conduct inference on nonlinear estimators under joint exchangeability or multiway clustering without this positivity condition remains an avenue for future research.
Appendix A Key lemmas
We first state the symmetrisation lemma. Let denote Rademacher independent variables, independent of . Then:
Lemma A.1**.**
Suppose that Assumptions 1-2 hold and for all . Then there exist real numbers depending only on and ,…, , jointly exchangeable and dissociated arrays with for all , satisfying
[TABLE]
Though more complicated than its i.i.d. version (see e.g. Lemma 2.3.1 in van der Vaart and Wellner, 1996), it serves the exact same purpose in the proofs of Theorems 2.1-2.2: conditional on the , the process is sub-Gaussian. In view of the AHK representation, the terms could be expected. Given the aforementioned link with U-statistics, Lemma A.1 can also be seen as a generalization of the symmetrization lemma for U-processes for non-degenerate cases, see in particular Theorem 3.5.3 in de la Peña and Giné (1999).
The proof of Lemma A.1 crucially hinges upon the following decoupling inequality, which may be of independent interest. Hereafter, we let .
Lemma A.2**.**
Let , be a family of i.i.d. random variables with values in a Polish space and , be some independent copies of this family. Let be a non-decreasing convex function from to and be a bijection from to . Let be a pointwise measurable class of functions from to such that . Finally, let . Then
[TABLE]
The proof is given in the supplement. This result generalizes the decoupling inequality for -statistics of de la Peña (1992) to our setting. As with -statistics, it is possible to obtain a reverse inequality if and is constant on , for all . With such a reverse inequality, it is possible to replace by in Lemma A.1. It is unclear to us, however, whether this reverse inequality still holds if (implying ). The key argument for the reverse inequality in de la Peña (1992) is that by the symmetry condition above, we can replace by an average over terms. However, for the proof to extend to our setting, one would need an average over terms. This is not possible in general when , which is the case when .
Next, in order to prove the convergence of the empirical process under the bracketing entropy condition (Assumption 4-(ii)), we establish the following maximal inequality, which is very close to that of Giné and Nickl (2015) for i.i.d. data (see their Lemma 3.5.12).
Lemma A.3**.**
Suppose that Assumption 1 holds. Let be real-valued functions and . Then:
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Aldous (1981) Aldous, D. J. (1981), ‘Representations for partially exchangeable arrays of random variables’, Journal of Multivariate Analysis 11 (4), pp. 581–598.
- 3Arcones and Gine (1992) Arcones, M. A. and Gine, E. (1992), ‘On the bootstrap of u 𝑢 u and v 𝑣 v statistics’, Annals of Statistics 20 (2), 655–674.
- 4Arcones and Giné (1994) Arcones, M. A. and Giné, E. (1994), ‘U-processes indexed by vapnik-červonenkis classes of functions with applications to asymptotics and bootstrap of u-statistics with estimated parameters’, Stochastic Processes and their Applications 52 (1), 17–38.
- 5Arcones and Giné (1993) Arcones, M. and Giné, E. (1993), ‘Limit theorems for U-processes’, The Annals of Probability 21 (3), pp. 1494–1542.
- 6Barbe and Bertail (1995) Barbe, P. and Bertail, P. (1995), The weighted bootstrap , Vol. 98, Springer-Verlag New York.
- 7Bertail et al. (2017) Bertail, P., Chautru, E. and Clémençon, S. (2017), ‘Empirical processes in survey sampling with (conditional) poisson designs’, Scandinavian Journal of Statistics 44 (1), 97–111.
- 8Bertrand et al. (2004) Bertrand, M., Duflo, E. and Mullainathan, S. (2004), ‘How much should we trust differences-in-differences estimates?’, The Quarterly Journal of Economics 119 (1), 249–275.
