TL;DR
This paper introduces the concept of local exchangeability, a relaxation of exchangeability allowing bounded distributional changes under local data swaps, with implications for Bayesian nonparametrics and permutation tests.
Contribution
It formalizes local exchangeability, proves its connection to measure-valued processes, and demonstrates practical applications in Bayesian inference and hypothesis testing.
Findings
Local empirical measures approximate underlying processes.
Local exchangeability characterizes certain stochastic processes.
Applications include Bayesian nonparametrics and covariate-dependent tests.
Abstract
Exchangeability -- in which the distribution of an infinite sequence is invariant to reorderings of its elements -- implies the existence of a simple conditional independence structure that may be leveraged in the design of statistical models and inference procedures. In this work, we study a relaxation of exchangeability in which this invariance need not hold precisely. We introduce the notion of local exchangeability -- where swapping data associated with nearby covariates causes a bounded change in the distribution. We prove that locally exchangeable processes correspond to independent observations from an underlying measure-valued stochastic process. Using this main probabilistic result, we show that the local empirical measure of a finite collection of observations provides an approximation of the underlying measure-valued process and Bayesian posterior predictive distributions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Local Exchangeability
Trevor Campbelllabel=e1 [
mark][email protected]
Saifuddin Syedlabel=e2 [
mark][email protected]
Chiao-Yu Yanglabel=e3 [
mark][email protected]
Michael I. Jordanlabel=e4 [
mark][email protected]
Tamara Brodericklabel=e5 [
mark][email protected]
Department of Statistics, University of British Columbia, Vancouver, Canada.
Department of Electrical Engineering and Computer Science, University of California Berkeley, Berkeley, USA.
Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, USA.
Abstract
Exchangeability—in which the distribution of an infinite sequence is invariant to reorderings of its elements—implies the existence of a simple conditional independence structure that may be leveraged in the design of statistical models and inference procedures. In this work, we study a relaxation of exchangeability in which this invariance need not hold precisely. We introduce the notion of local exchangeability—where swapping data associated with nearby covariates causes a bounded change in the distribution. We prove that locally exchangeable processes correspond to independent observations from an underlying measure-valued stochastic process. Using this main probabilistic result, we show that the local empirical measure of a finite collection of observations provides an approximation of the underlying measure-valued process and Bayesian posterior predictive distributions. The paper concludes with applications of the main theoretical results to a model from Bayesian nonparametrics and covariate-dependent permutation tests.
exchangeability,
local,
representation,
de Finetti,
Bayesian nonparametrics,
keywords:
\startlocaldefs\endlocaldefs
, , , , and
1 Introduction
Let be an infinite sequence of random elements in a standard Borel space . The sequence is said to be exchangeable if for any finite permutation of ,
[TABLE]
At first sight this assumption appears innocent; intuitively, it suggests only that the order in which observations appear provides no information about those or future observations. But despite its apparent innocence, exchangeability has a powerful implication. In particular, the well-known de Finetti’s theorem (e.g. Kallenberg, 2002, Theorem 11.10) states that an infinite sequence is exchangeable if and only if it is mixture of i.i.d. sequences, i.e., there exists a unique random probability measure on such that
[TABLE]
where is the countable infinite product measure constructed from . Thus, exchangeability provides a strong justification for the Bayesian approach to modeling (Jordan, 2010), and guarantees a latent conditional independence structure of useful in the design of computationally efficient inference algorithms. Exchangeability is also the basis of well-known nonparametric permutation testing procedures (Pitman, 1937a, b, c; Fisher, 1966, Ch. 3; Ernst, 2004; Lehmann and Romano, 2005, Ch. 15).
However, although exchangeability may be a useful idealization in modeling and analysis, many data come with covariates that preclude an honest belief in its validity. For example, given a corpus of documents tagged by publication date, one might reasonably expect the data to exhibit a time-dependence that is incompatible with exchangeability. Nevertheless, one might still expect the distribution not to change too much if we permuted documents published only one day apart; i.e., observations with similar covariates are intuitively “nearly exchangeable.” In this work, we investigate how to codify this intuition.
One option is to use a kind of partial exchangeability (de Finetti, 1938; Lauritzen, 1974; Diaconis and Freedman, 1978; Camerlenghi et al., 2019) in which the distribution is invariant to permutations within equivalence classes. Formally, we endow each observation with a covariate from a set , and assert that the sequence distribution is invariant only to reordering observations with equivalent covariate values. Under this assumption as well as the availability of infinitely many observations at each covariate value, we have a similar representation of as a mixture of independent sequences given random probability measures ,
[TABLE]
The random probability measures can have an arbitrary dependence on one another; partially exchangeable sequences encompass those that are exchangeable (where the covariate does not matter), decoupled (where subsequences for each different covariate value are mutually independent), and the full range of models in between. In particular, partial exchangeability does not enforce the desideratum that observations with nearby covariates should have a similar law, and is too weak to be useful for restricting the class of underlying mixing measures for the data.
In this work, we introduce a new notion of local exchangeability—lying between partial and exact exchangeability—in which swapping data associated with nearby covariates causes a bounded change in total variation distance. We begin by studying probabilistic properties of locally exchangeable processes in Sections 2.1 and 2.2. The main result from this section is in the spirit of de Finetti’s theorem: we prove that locally exchangeable processes correspond to independent observations from a unique underlying smooth measure-valued stochastic process. To the best of our knowledge, this representation theorem is the first to arise from an approximate probabilistic symmetry. Further, the existence of such an underlying process not only shows that de Finetti’s theorem is robust to perturbations away from exact exchangeability, justifying the Bayesian analysis of real data, but also imposes a useful constraint on the space of models one should consider when dealing with data that one suspects follows a locally exchangeable random process. Next in Section 2.3, we use this result to show that the local empirical measure of a finite collection of observations can be used to provide an approximation of the underlying measure-valued process, Bayesian predictive posterior distributions, and the premetric that governs local exchangeability. These results rely heavily on the intuition that locally exchangeable observations from nearby covariates behave essentially like exchangeable observations. Finally, in Section 3, we provide example applications in two statistical models exhibiting local exchangeability—Gaussian processes (Rasmussen and Williams, 2006) and dependent Dirichlet processes (MacEachern, 1999, 2000)—as well as grouped permutation tests in the presence of covariates. The paper concludes with a discussion of directions for future work. Proofs of all results are provided in the appendix.
1.1 Related work
Beyond de Finetti’s original result for infinite binary sequences (de Finetti, 1931) and its extensions to more general range spaces (de Finetti, 1937; Hewitt and Savage, 1955) and finite sequences (Diaconis, 1977; Diaconis and Freedman, 1980a)—see Aldous (1985) for an in-depth introduction—correspondences between probabilistic invariances and conditional latent structure (known as representation theorems) have been studied extensively. Notions of exchangeability and corresponding latent conditional structure now exist for a wide variety of probabilistic models, such as arrays (Aldous, 1981; Hoover, 1979; Austin and Panchenko, 2014; Jung et al., 2021), Markov processes (Diaconis and Freedman, 1980b), networks (Caron and Fox, 2017; Veitch and Roy, 2015; Borgs et al., 2018; Crane and Dempsey, 2016; Cai, Campbell and Broderick, 2016; Janson, 2017), combinatorial structures (Kingman, 1978; Pitman, 1995; Broderick, Pitman and Jordan, 2013; Campbell, Cai and Broderick, 2018; Crane and Dempsey, 2019), random measures (Kallenberg, 1990), and more (Diaconis, 1988; Kallenberg, 2005; Orbanz and Roy, 2015). Furthermore, weaker notions of exchangeability such as conditionally identical distributions (Berti, Pratelli and Rigo, 2004; Kallenberg, 1988) have been developed. All past work on probabilistic invariance and its consequences has pertained to exact invariance.
2 Local exchangeability
2.1 Definition
Let be a stochastic process on an index (or covariate) set taking values in a standard Borel space . To encode distance between covariates, we endow the set with a premetric satisfying and for . We will formalize local exchangeability based on the finite dimensional projections of . For any subset and injection , let and denote stochastic processes on index set such that
[TABLE]
In other words, is the restriction of to index set , while is the restriction to under the mapping . Definition 1 captures the notion that observations with similar covariates should be close to exchangeable, i.e., the total variation between and is small as long as the distances between and are small for all .
Definition 1**.**
The process is locally exchangeable with respect to a premetric if for any finite subset and injection ,
[TABLE]
Definition 1 generalizes both exchangeability and partial exchangeability among equivalence classes. In particular, the zero premetric where identically yields classical exchangeability, while the premetric for equivalence relation yields partial exchangeability. Further, any process is locally exchangeable with respect to the discrete premetric ; in order to say something of value about a process , it must satisfy Eq. 6 for a tighter premetric.
To quantify differences in distributions, Definition 1 employs the total variation distance, which for random elements in a measurable space is defined as
[TABLE]
The choice of total variation distance (as opposed to other metrics and divergences, see e.g. (Gibbs and Su, 2002)) is motivated by its symmetry and generality. We make a premetric—as opposed to a (pseudo)metric, say—as the triangle inequality and positive definiteness are unused in the theory below. Further we use a premetric with range because total variation always lies in this range, and so any valid bound in Eq. 6 for a premetric can be improved by replacing with . And although Definition 1 imposes a total variation bound only for all finite sets of covariates, it is equivalent to do so for all countable sets of covariates, as shown in Proposition 2.
Proposition 2**.**
If is locally exchangeable with respect to , then for any countable subset and injection ,
[TABLE]
Example 3**.**
A simple example of local exchangeability that we will return to throughout the paper is the process of observable measurements from a Bayesian linear regression model on with a quadratic trend,
[TABLE]
By Lemma 14, since the are independent conditioned on ,
[TABLE]
We bound the terms in the sum using the Lipschitz continuity of the standard normal CDF ,
[TABLE]
Therefore the process in the Bayesian linear regression model Eq. 9 is locally exchangeable with respect to the premetric d(t,t^{\prime})=\min(|t^{2}-t^{\prime 2}|/\mathchoice{{\hbox{\displaystyle\sqrt{2\pi,}}\lower 0.4pt\hbox{\vrule height=6.44444pt,depth=-5.15558pt}}}{{\hbox{\textstyle\sqrt{2\pi,}}\lower 0.4pt\hbox{\vrule height=6.44444pt,depth=-5.15558pt}}}{{\hbox{\scriptstyle\sqrt{2\pi,}}\lower 0.4pt\hbox{\vrule height=4.51111pt,depth=-3.6089pt}}}{{\hbox{\scriptscriptstyle\sqrt{2\pi,}}\lower 0.4pt\hbox{\vrule height=3.22221pt,depth=-2.57779pt}}},1). Note that we are free to take because the total variation is bounded above by 1. This example illustrates why we opt for the generality of a premetric; here, observations at points and are exactly exchangeable since , which does not generally hold for a metric, and does not satisfy the triangle inequality. Also note that the marginal distribution of is a multivariate Gaussian with off-diagonal covariance terms , which varies with ; multivariate Gaussians with exchangeable components must have constant off-diagonal covariance terms. Therefore this example also shows that there exist processes that are locally exchangeable but not exchangeable.
2.2 de Finetti representation
In the previous example, we used the fact that the variables were conditionally independent given a latent random variable to demonstrate their local exchangeability. A natural question to ask is whether all locally exchangeable processes exhibit a similar structure. Theorem 5 answers this question in the affirmative, by providing a de Finetti-like representation of locally exchangeable processes similar to Eq. 3 and Eq. 4. This representation guarantees the existence of a simple conditional structure that can be leveraged in the design of statistical inference procedures, and justifies a Bayesian approach when dealing with covariate-dependent data. We first require a weak assumption on the space .
Definition 4** (Infinitely-separable space).**
A premetric space is infinitely separable if there exists a countable subset such that for all , there exists a Cauchy sequence in such that and .
When is a metric, infinite separability is equivalent to being separable with no isolated points. When is a pseudometric, it is equivalent to the existence of a countable dense subset such that for all and , . In general, infinite separability ensures that there are infinitely many elements to swap “nearby” each covariate value of interest . This assumption precludes the situation where observations satisfy finite exchangeability (Diaconis, 1977; Diaconis and Freedman, 1980a) but not infinite exchangeability.
Theorem 5 shows that under infinite separability, the desired de Finetti-like representation indeed does exist. In particular, we show that there is a unique probability measure-valued process that renders conditionally independent, and that satisfies a continuity property with the same “smoothness” as the observed process. For the precise statement of the result in Theorem 5, recall that a modification of a stochastic process on is any other process on such that .
Theorem 5**.**
Suppose is infinitely separable. Then the process is locally exchangeable with respect to if and only if there exists a random measure-valued stochastic process (unique up to modification) such that for any finite subset of covariates and ,
[TABLE]
For example, given and the zero premetric , one recovers the de Finetti representation of exchangeable sequences; the smoothness condition asserts that must be constant for all as expected. Similarly, suppose we are given an equivalence relation on where each equivalence class has infinite cardinality. Then setting and recovers the de Finetti representation of partially exchangeable sequences under permutation within equivalence classes; here the smoothness condition asserts that must be constant within each equivalence class, but allows for general dependence between across the equivalence classes. Thus, in the same way that Definition 1 generalizes (partial) exchangeability, Theorem 5 generalizes the de Finetti representation theorem.
Note that we still obtain the “if” direction of Theorem 5 without imposing the infinite separability assumption on . In particular, if we are given a process satisfying Eq. 13, then the process is locally exchangeable with respect to both
[TABLE]
We refer to as the canonical premetric and as the strong canonical premetric. Note that is locally exchangeable with respect any premetric satisfying , and in particular, . Given a particular , one can use Lemma 14 to derive an upper bound on these two premetrics (as demonstrated in Example 3), which then provides insight into the extent to which data generated from are exchangeable. Note that and may or may not be infinitely separable, depending on the characteristics of the process .
Example** (continued).**
In the linear regression example, the underlying measure-valued process is the collection of normal distributions
[TABLE]
Theorem 5 guarantees that this process is unique up to modification. In this case, the randomness in is entirely due to the latent variable ; in general need not be determined by a finite-dimensional quantity. We can also verify that satisfies the required smoothness condition with respect to , although it is not surprising in this case given that we originally derived the premetric using the same technique:
[TABLE]
2.3 Local empirical measure process
The de Finetti result in Theorem 5 guarantees the existence of a unique underlying measure-valued process , but does not provide any direct insight into the distribution of or whether it is identifiable given only (countably many) measurements of the process . In the classical setting of an exchangeable sequence , the empirical measure of a finite collection of observations serves this purpose, as it converges weakly to almost surely (Varadarajan, 1958), i.e.,
[TABLE]
where denotes the Lévy-Prokhorov metric. In the setting of local exchangeability more generally, however, the usual empirical measure does not provide a result similar to Eq. 17. If we are interested in understanding the distribution of for some , and we collect measurements of at a finite set of covariates , the presence of far-away covariates in from can result in a non-vanishing bias in the empirical measure. To address this issue, for each , let , be an ordering of the set such that the values are ordered from smallest to largest. Then define
[TABLE]
We construct the local empirical measure process via
[TABLE]
The local empirical measure process serves as an approximation of the measure-valued process underlying the locally exchangeable process . Note that , so is a probability measure for each . Further note that is measurable with respect to . Intuitively, includes only those observations at covariates sufficiently close to the point of interest such that the decrease in variance associated with adding another observation outweighs the potential increase in bias. The value represents how many observations are included in the local empirical measure at that location, and represents the average distance of their covariates to .
Our goal now is to provide a weak convergence result for the local empirical measure process in the limit of many observations, similar to that of Eq. 17. As a key step towards that goal, Theorem 6 provides bounds on both the expected squared estimation error (Eq. 21) as well as error tail probabilities (Eq. 22) when using the local empirical measure process in place of or , for all . Each bound in Theorem 6 has two terms: the first is related to the variance incurred by estimation via independent sampling, and the second is related to the bias incurred by using observations from . Note that Theorem 6 quantifies the approximation error using the metric
[TABLE]
where , are measurable subsets of , , and . We work with rather than standard metrics because it simplifies the analysis substantially. Although the properties of depend on the choice of in general, there exists a choice such that implies weak convergence (see Lemma 16 in the appendix), and the bounds below in Theorem 6 are valid for any choice of , as indicated by the supremum. We will use the metric and the results in Theorem 6 as a stepping stone to obtain weak convergence in Corollary 7 below.
Theorem 6**.**
Let be infinitely separable and be locally exchangeable with respect to . Then
[TABLE]
and for all , ,
[TABLE]
Furthermore, the same bounds in Eqs. 21 and 22 apply when is replaced with .
When all of the covariates in the observed set are close to , the bounds in Theorem 6 provide essentially the same guarantees as one would expect for exchangeable random variables. In particular, suppose for all , , and so . In this situation the bounds above reduce to
[TABLE]
Corollary 7 uses the results in Theorem 6 to obtain a weak convergence result for similar to Eq. 17. In particular, if we collect measurements of from a sequence of sets that concentrate around —for example, such that there exists a subsequence —then the local empirical measure converges weakly to both and the Bayesian posterior predictive distribution in probability. Recall that denotes the Lévy-Prokhorov metric.
Corollary 7**.**
Fix . Suppose we make observations at a sequence of finite sets , of covariates such that for all , . Then
[TABLE]
A byproduct of Corollary 7 is that one can characterize the distribution of by analyzing the distribution of conditioned on for a sequence of sets of covariates that concentrate around , i.e., and as . Note that it is not required to know the premetric governing local exchangeability in order to identify using this technique; one can instead construct the set of covariates such that for any premetric that dominates in the sense that for any two sequences of covariates , ,
[TABLE]
The requirement in Eq. 25 is typically not stringent; it states only that when covariates get close under , they must also get close under , with no other stipulation about relative rates, bounds, etc. In the following linear regression example, we will use the usual metric on .
Example** (continued).**
We return to the linear regression example to show how the distribution of can be recovered from the process via Corollary 7. The joint density of is
[TABLE]
Therefore the conditional distribution of given is given by
[TABLE]
If we then consider a sequence of sets of covariates that grows in size and concentrates quickly around —e.g., —we find that the conditional distribution of given converges to
[TABLE]
By setting , we recover the fact that is generated from , , i.e., the marginal of the original Bayesian linear regression model. Note that one can repeat essentially the same analysis for multiple covariates to recover finite marginal distributions. For example, if we consider the bivariate distribution of , we find that are generated independently from
[TABLE]
The analysis from the example in Section 2.1 can then be used to bound the strong canonical premetric d_{sc}(t,t^{\prime})=d_{\mathrm{TV}}(G_{t},G_{t^{\prime}})\leq\min\left(|t-t^{\prime}|/\mathchoice{{\hbox{\displaystyle\sqrt{2\pi,}}\lower 0.4pt\hbox{\vrule height=6.44444pt,depth=-5.15558pt}}}{{\hbox{\textstyle\sqrt{2\pi,}}\lower 0.4pt\hbox{\vrule height=6.44444pt,depth=-5.15558pt}}}{{\hbox{\scriptstyle\sqrt{2\pi,}}\lower 0.4pt\hbox{\vrule height=4.51111pt,depth=-3.6089pt}}}{{\hbox{\scriptscriptstyle\sqrt{2\pi,}}\lower 0.4pt\hbox{\vrule height=3.22221pt,depth=-2.57779pt}}},1\right). Thus, given only the process , we have identified a premetric under which is locally exchangeable as well as the measure-valued process .
2.4 Regularity
The smoothness property of in Eq. 13 may seem unsatisfying at a first glance; it bounds the absolute difference in the underlying mixing measure process at nearby locations only in expectation, leaving room for the possibility of sample discontinuities in as a function of . However, there are many probabilistic models that, intuitively, generate observations that should be considered locally exchangeable but which have discontinuous latent mixing measures. For example, some dynamic nonparametric mixture models (Lin and Fisher, 2010; Chen et al., 2013) have components that are created and destroyed over time, causing discrete jumps in the mixing measure. As long as the jumps happen at diffuse random times, the probability of a jump occurring between two times decreases as the difference in time decreases, and the observations may still be locally exchangeable. However, intuitively, if there is a fixed location with a nonzero probability of a discrete jump in the mixing measure process, the observations cannot be locally exchangeable. Corollary 8 provides the precise statement.
Corollary 8**.**
Suppose is infinitely separable and is locally exchangeable with respect to . Then for all , , and ,
[TABLE]
That being said, it is worth examining whether different guarantees on properties of the underlying measure process result as a consequence of different properties of the premetric . Theorem 9 answers this question in the affirmative for processes on ; in particular, the faster the decay of relative to as , the stronger the guarantees on the behavior of the mixing measure . Note that while this result is presented for covariate space , the result can be extended to processes on and more general separable spaces (Pothoff, 2009, Theorems 2.8, 2.9, 4.5).
Theorem 9**.**
Let , , and be locally exchangeable with respect to a premetric satisfying as . Then:
(): is exchangeable and is a constant process. 2. 2.
(): is stationary and for any and , is weak-sense stationary with an -Hölder continuous modification. 3. 3.
(): may have no continuous modification.
Remark*.*
A rough converse of the first point holds: exchangeable implies constant , and is trivially for . But a similar claim for the second point is not true in general: stationary and locally exchangeable does not necessarily imply that for . For a counterexample, consider a square wave shifted by a uniform random variable, i.e., the process for . Here is stationary and locally exchangeable with , but for any as .
2.5 Approximate conditional independence
In the classical setting of exchangeable sequences , the empirical measure satisfies the following property: for all bounded measurable functions ,
[TABLE]
Thus and are conditionally independent given . In other words, the fact that corresponds to covariate values provides no additional information about beyond itself.
In the setting of local exchangeability, the question of how important the covariate values are in inferring the measure-valued process is relevant in practice: we do not often get to observe the true covariate values , but rather we observe discretized versions that are grouped into “bins.” For example, if corresponds to observed document data with timestamps , we may know those timestamps up to only a certain precision (e.g. days, months, years). This section shows that a “binned” version of the empirical measure provides an approximate conditional independence similar to Eq. 31, where the error of approximation decays smoothly by an amount corresponding to the uncertainty in covariate values.
Formally, suppose we partition our covariate space into disjoint bins , where each bin has observations . We may use a finite partition by setting all but finitely many to the empty set. Although we know the number of points in each bin (i.e., the cardinality of ), we will encode our lack of knowledge of their positions as randomness: , where is a probability distribution capturing our belief of how the unobserved covariates are generated within each bin. Following the intuition from the classical de Finetti’s theorem, we define the binned empirical measures , , and let denote the subgroup of permutations that permute observations only within each bin, i.e., such that , . Note that since there are only finitely many observations in total. Unlike classical exchangeability, does not provide exact conditional independence of and ; but Theorem 10 guarantees that it provides a form of approximate conditional independence, with error that depends on .
Theorem 10**.**
Suppose is infinitely separable. If is locally exchangeable with respect to , and is a bounded measurable function,
[TABLE]
where and .
Remark*.*
Note that the expectation on the right hand side averages over the randomness both in the uncertain covariates and the permutation .
If is exchangeable within each bin , Theorem 10 states that and are conditionally independent given , as desired. Further, the deviance from independence is controlled by the deviance from exchangeability within each bin. In particular,
[TABLE]
where . Both bounds in Eq. 33 are independent of ; thus the result holds even if we are unwilling to express our uncertainty in the binned covariates via a distribution.
3 Examples
In this section, we provide example applications of the theory in Section 2. First, we use a case study of Gaussian processes to show how one can use posterior predictive distributions to analyze the local exchangeability of a process. In particular, we show how to derive the underlying measure process , as well as an appropriate premetric governing local exchangeability, using only finite marginals of the process . Second, we use a case study of dependent Dirichlet processes to show that one can use local empirical measures as a surrogate for otherwise intractable posterior predictive distributions in discrete Bayesian nonparametric models. See the appendix for other examples of Bayesian nonparametric models exhibiting local exchangeability—e.g., kernel beta process feature models (Hjort, 1990; Ren et al., 2011) and dynamic topic models (Blei and Lafferty, 2006; Wang, Blei and Heckerman, 2008), among others. Finally, we demonstrate a usage of local exchangeability as a tool to analyze the inflation of type-I error in matched permutation tests involving covariates.
3.1 Obtaining the underlying measure-valued process and premetric
We will first provide an example of how one can use the Bayesian posterior predictive distributions of a locally exchangeable process to derive the distribution of the underlying measure-valued process as well as the premetric of local exchangeability . This example applies the same strategy as in the running example from Section 2.3, albeit in a more sophisticated nonparametric model.
Consider a Gaussian process on with continuous mean function , and covariance function for continuous nonnegative and continuous symmetric positive-definite kernel . Define a set of unique covariate values , and consider the Euclidean metric on . For each and , let be a finite subset of covariates such that and . Direct analysis of the conditional density yields that as , the conditional distribution of given converges to
[TABLE]
where
[TABLE]
Eqs. 34 and 35 demonstrate that is conditionally independently drawn from the process where
[TABLE]
We now derive the strong canonical premetric of local exchangeability. In this setting,
[TABLE]
By Devroye, Mehrabian and Reddad (2020, Theorem 1.3),
[TABLE]
Applying Jensen’s inequality \mathbb{E}|Y_{t}-Y_{t^{\prime}}|\leq\mathchoice{{\hbox{\displaystyle\sqrt{\mathbb{E}(Y_{t}-Y_{t^{\prime}})^{2},}}\lower 0.4pt\hbox{\vrule height=9.30444pt,depth=-7.44359pt}}}{{\hbox{\textstyle\sqrt{\mathbb{E}(Y_{t}-Y_{t^{\prime}})^{2},}}\lower 0.4pt\hbox{\vrule height=9.30444pt,depth=-7.44359pt}}}{{\hbox{\scriptstyle\sqrt{\mathbb{E}(Y_{t}-Y_{t^{\prime}})^{2},}}\lower 0.4pt\hbox{\vrule height=6.53888pt,depth=-5.23112pt}}}{{\hbox{\scriptscriptstyle\sqrt{\mathbb{E}(Y_{t}-Y_{t^{\prime}})^{2},}}\lower 0.4pt\hbox{\vrule height=5.03888pt,depth=-4.03113pt}}}, then evaluating the expectation and using the bounds , and \mathchoice{{\hbox{\displaystyle\sqrt{x^{2}+y^{2},}}\lower 0.4pt\hbox{\vrule height=6.10999pt,depth=-4.88802pt}}}{{\hbox{\textstyle\sqrt{x^{2}+y^{2},}}\lower 0.4pt\hbox{\vrule height=6.10999pt,depth=-4.88802pt}}}{{\hbox{\scriptstyle\sqrt{x^{2}+y^{2},}}\lower 0.4pt\hbox{\vrule height=4.30276pt,depth=-3.44223pt}}}{{\hbox{\scriptscriptstyle\sqrt{x^{2}+y^{2},}}\lower 0.4pt\hbox{\vrule height=3.44165pt,depth=-2.75334pt}}}\leq x+y yields
[TABLE]
In the usual setting with zero mean , constant noise variance for some , and stationary kernel for some , Eq. 39 reduces to
[TABLE]
This example demonstrates that Gaussian processes are locally exchangeable in the presence of measurement noise, i.e. where . However, note that is not strictly necessary for local exchangeability; to obtain a necessary and sufficient characterization of local exchangeability in Gaussian processes, we could instead analyze the canonical metric per Theorem 5.
3.2 Approximate predictive distributions in discrete Bayesian nonparametrics
Next, we demonstrate that the local empirical measure can serve as a useful surrogate for otherwise intractable posterior predictive distributions in discrete Bayesian nonparametric models. The Dirichlet process (Ferguson, 1973) is a popular prior for the weights and component parameters in nonparametric mixture models. Draws from a Dirichlet process are discrete probability measures,
[TABLE]
where are weights satisfying , , and are component parameters, each with distribution given by (Sethuraman, 1994)
[TABLE]
for some distribution and concentration parameter . Given draws , the posterior predictive distribution of given the first draws is
[TABLE]
The fact that one can marginalize the (infinitely many) weights and parameters to arrive at Eq. 43 is critical in tractable computational inference for models involving the Dirichlet process (Neal, 2000).
When the observations come with additional covariate information, the dependent Dirichlet process mixture model (MacEachern, 1999, 2000) may be used instead. There are many instantiations of the dependent Dirichlet process; for simplicity we consider a model where the weights are a function of a covariate but the component parameters are constant across covariate values, i.e.,
[TABLE]
where , and the stick variables are now i.i.d. stochastic processes on . The marginal distributions of at are designed to be so that the dependent Dirichlet process is marginally a Dirichlet process for each covariate value. But even for simple stochastic processes , the posterior predictive distribution is not tractable to obtain in closed-form. However, we can note that the process is locally exchangeable with strong canonical premetric
[TABLE]
where and . Since is a product of independent variables, Lemma 13 yields
[TABLE]
The infinite sum converges to some , and so
[TABLE]
Therefore, as long as the stochastic process is smooth enough, and we condition on , where concentrates closely around , the posterior predictive distribution of given is approximately equal to the local empirical measure , by Theorem 6; the latter has a tractable closed-form expression.
3.3 Type-I error inflation in grouped permutation tests
One of the key applications of exchangeability in statistical data analysis is in the design of nonparametric permutation tests with exact type-I error bounds (Pitman, 1937a, b, c; Fisher, 1966, Ch. 3). In the notation of this work, we are given observations of a stochastic process at a finite set of covariates , a subgroup of permutations , and a test statistic . The null hypothesis is that is exchangeable; so we set a desired threshold , and reject the null with type-I error at most if
[TABLE]
where is defined as in Eq. 5. This setup is commonly used in observational studies with a control group and treatment group, where consists of permutations that swap matched pairs of elements in the control and treatment groups. However, a typical problem is that elements in the two groups are not exactly comparable due to the presence of covariates. In this case, a standard approach is to construct to permute only those elements with similar covariates from the control and treatment groups, under some metric (Cochran, 1965; Rubin, 1973a, b; Rosenbaum, 1989, 2002; Lu and Rosenbaum, 2004; Greevy et al., 2004; Hansen, 2004; Hansen and Klopfer, 2006; Baiocchi et al., 2010; Lu et al., 2011). Local exchangeability provides a general way to analyze the type-I error of these methods; Proposition 11 shows that for a locally exchangeable process, the type-I error may potentially be increased by the average distance between pairs of covariates permuted by . Eq. 49 also incidentally provides a rigorous justification for past work that formulates the construction of as the minimization of this penalty (e.g., Rosenbaum (1989)).
Proposition 11**.**
Let be locally exchangeable with respect to . For ,
[TABLE]
4 Discussion
The major question posed in this paper is what we can do with data when we do not believe that they are exchangeable, but are willing to believe that they are nearly exchangeable. This paper answers the question with a relaxed notion of local exchangeability in which swapping data associated with nearby covariates causes a bounded change in total variation distance. We have demonstrated that classical results for exchangeable processes are “robust to the real world;” indeed, locally exchangeable processes have a de Finetti representation that may be leveraged in the design of statistical models and inference procedures. Finally, many popular covariate-dependent statistical models—which violate the assumptions of exchangeability—satisfy local exchangeability, extending the reach of exchangeability-based analyses to these models.
One limitation of local exchangeability is the infinite separability assumption. There are applications in which the covariate space has isolated points that violate this condition, e.g., discrete time series where the covariate space is endowed with the Euclidean metric. However, if can be extended to a process on such that is infinitely separable and is locally exchangeable with respect to , then the theoretical results from this work hold for the marginal process . Another limitation is that the total variation bound in the definition of local exchangeability is quite weak, which has downstream consequences for the tightness of the error bounds in Section 2.3. Further study on alternate definitions of local exchangeability is warranted to strengthen these guarantees.
As a final note, it is also possible that an analogue of the theory of finite exchangeability (Diaconis and Freedman, 1980a) holds in the local setting; but it is not yet clear whether this is indeed true or what form it would take. It would also be of interest to investigate more general notions of local exchangeability under group actions, e.g., permutations that preserve some statistic of the data, which have been used in past work on randomization testing in the presence of covariates (Rosenbaum, 1984).
Acknowledgements
The authors thank Jonathan Huggins for illuminating discussions. T. Campbell is supported by a National Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant and Discovery Launch Supplement. T. Broderick is supported in part by an NSF CAREER Award, an ARO YIP Award, ONR, and a Sloan Research Fellowship.
Appendix A Proofs
Proof of Proposition 2.
Choose some ordering of the countable set . We note that and are -valued random variables that are measurable with respect to , which is generated by the algebra of cylinder sets of the form for . Therefore,
[TABLE]
where we have replaced with its generator by the fact that for any algebra of sets , , , and probability measures on , there exists an such that . So by the definition of local exchangeability for finite sets of covariates,
[TABLE]
∎
Proof of Theorem 5.
We start with the reverse direction. Define the two product measures and . Then since and , by Jensen’s inequality,
[TABLE]
Finally, the proof technique of Sendler (1975, Lem. 2.1) and the smoothness of yields the conclusion,
[TABLE]
For the forward direction, suppose is locally exchangeable. Let be any ordering of the countable set from Definition 4, and let be the tail -algebra of . We will show that for any two covariates , , and are conditionally independent given . The argument extends via standard methods to that may be elements of , and then to any finite subset of .
By infinite separability (Definition 4), there exists a subsequence of indices such that is Cauchy and converges to . By taking another subsequence we can assume without loss of generality for all , and . Let be the mapping that takes , for all , and leaves all other fixed. Then denote , and let be the sequence with covariates mapped under . By reverse martingale convergence, for any bounded measurable ,
[TABLE]
as . Next, by local exchangeability and Proposition 2,
[TABLE]
and by Lemma 12(2), we have that the Wasserstein distance between and converges to 0 as . Together, the Wasserstein distance bound and reverse martingale above yield
[TABLE]
By Aldous (1985, Lemma 3.4),
[TABLE]
and thus and are conditionally independent given . As mentioned earlier this argument extends to any finite subset of covariates, by considering subsequences of converging to each . Since takes values in a standard Borel space, there is a random measure for each for which (e.g. Kallenberg, 2002, Theorem 6.3). The collection of these random measures forms the desired stochastic process .
Next, we develop the smoothness property of . By both reverse and forward martingale convergence, we have that
[TABLE]
Using dominated convergence to move the limits out of the expectation, local exchangeability to bound the total variation between and , and Lemma 12(1),
[TABLE]
Finally, we show that is approximated by empirical averages of the observations ; this property will be used below to show that is unique up to modification. Consider any and any sequence converging to such that for each . Define . Then
[TABLE]
Noting that the right term is -measurable and applying Hoeffding’s inequality to the left,
[TABLE]
Splitting the above expectation across two events—one where the measures satisfy
[TABLE]
and the other its complement—yields
[TABLE]
Applying Markov’s inequality, the triangle inequality, and Eq. 63,
[TABLE]
Thus, . We now show that is unique. Suppose there is another measure process that satisfies Eq. 13, from which is generated conditionally independently given some -algebra . By repeating the steps above, one can show that . Therefore,
[TABLE]
Since is a standard Borel space, for some countable algebra of sets (Preston, 2008, Prop. 3.1, 3.3). By noting that the countable intersection of unit-measure sets is also unit-measure,
[TABLE]
Finally by Carathéodory’s extension theorem (Kallenberg, 2002, Theorem 2.5), the probability measures and are almost surely equal. The extension of this argument to any finite subset of covariates is straightforward, implying that is uniquely determined up to modification. ∎
Proof of Theorem 6.
First, since and , by Jensen’s inequality,
[TABLE]
We will focus on a single term in the sum for some and drop the subscript, as the bound for all terms will be identical. Adding and subtracting ,
[TABLE]
Since is locally exchangeable, by Theorem 5, it is conditionally independently drawn from . Therefore . Hence we can use the tower property and expand the square to find that
[TABLE]
The first term can be bounded by using the same conditional independence property again—in particular, that —followed by Popoviciu’s inequality:
[TABLE]
For the second term, we first apply Jensen’s inequality by noting that , ,
[TABLE]
Since , we have that . Finally by Theorem 5, we know that . Hence
[TABLE]
We can combine the bounds on the first and second terms for each set , , since :
[TABLE]
Before proceeding further with this bound by substituting the definition of , we will obtain a similar result for the tail bound. We add and subtract and use the triangle inequality:
[TABLE]
By Lemma 15, we have
[TABLE]
For the first term in the sum, note that is a function of , which are conditionally independent given . Further note that for each , the value of can change by at most when varying the value of . Therefore by McDiarmid’s inequality,
[TABLE]
whenever . Expanding the definition of the norm and using Jensen’s inequality yields
[TABLE]
at which point the same logic as in Eq. 79 yields
[TABLE]
and hence for all \delta\geq\mathchoice{{\hbox{\displaystyle\sqrt{\sum_{t\in T}\xi_{t}(\tau)^{2},}}\lower 0.4pt\hbox{\vrule height=9.30444pt,depth=-7.44359pt}}}{{\hbox{\textstyle\sqrt{\sum_{t\in T}\xi_{t}(\tau)^{2},}}\lower 0.4pt\hbox{\vrule height=9.30444pt,depth=-7.44359pt}}}{{\hbox{\scriptstyle\sqrt{\sum_{t\in T}\xi_{t}(\tau)^{2},}}\lower 0.4pt\hbox{\vrule height=6.53888pt,depth=-5.23112pt}}}{{\hbox{\scriptscriptstyle\sqrt{\sum_{t\in T}\xi_{t}(\tau)^{2},}}\lower 0.4pt\hbox{\vrule height=5.03888pt,depth=-4.03113pt}}},
[TABLE]
For the second term in the sum, we apply Markov’s inequality to find that
[TABLE]
Noting that , we can apply Jensen’s inequality,
[TABLE]
Finally by local exchangeability and Theorem 5,
[TABLE]
Combining the bounds for the first and second term and shifting yields, for all ,
[TABLE]
We now substitute the definition of into both results in Eqs. 84 and 97. First, note that (suppressing notation in the remainder of the proof for brevity),
[TABLE]
where
[TABLE]
Further, by Bhatia and Davis (2000, Theorem 1) and the definition of ,
[TABLE]
Therefore
[TABLE]
and
[TABLE]
Finally, because neither upper bound depends explicitly on , we can take the supremum. To obtain the same results for , we apply the same proof technique, noting that (1) and (2) by the tower property and de Finetti result in Theorem 5, . ∎
Proof of Corollary 7.
For each , denote . Note that for any ,
[TABLE]
Therefore, for all ,
[TABLE]
We iterate this bound from to to find that for all ,
[TABLE]
Finally, we rearrange this bound to obtain an upper bound on for any :
[TABLE]
We also have that as : by definition of ,
[TABLE]
so if , then there would exist a subsequence such that for all sufficiently large. But this is not possible, since for any fixed , as concentrates around . Therefore , so that for any ,
[TABLE]
and hence
[TABLE]
Theorem 6 implies that both and as . By Markov’s inequality,
[TABLE]
Finally, note that Eq. 122 implies that any subsequence likewise satisfies , and hence any subsequence has a further subsequence such that . Since was arbitrary, Lemma 16 asserts that we can choose such that implies weak convergence, i.e., . Thus any subsequence has a further subsequence that satisfies . Hence by (Durrett, 2010, Theorem 2.3.2). ∎
Proof of Corollary 8.
By Markov’s inequality,
[TABLE]
and by Theorem 5,
[TABLE]
∎
Proof of Theorem 9.
First, note that by assumption, the space is infinitely separable. By local exchangeability and Theorem 5, for any , finite subset , and , Theorem 5 implies that
[TABLE]
where denotes the translation of all covariates in by . The Kolmogorov continuity theorem (Kallenberg, 2002, Theorem 3.23) implies that for all , has an -Hölder continuous modification. Note that an -Hölder continuous function for is constant.
First, assume . If we select , we have that for any , has a constant modification. In other words, for all , , . Since for a countable algebra (Preston, 2008, Prop. 3.1, 3.3), we have that , and hence by Carathéodory’s extension theorem (Kallenberg, 2002, Theorem 2.5), and are almost surely equal probability measures. This implies that is a constant process (up to modification) and is exchangeable.
Next, suppose . Then by Eq. 125,
[TABLE]
showing that is stationary. Next, since is stationary, for any and , the mean of satisfies
[TABLE]
Similarly, the autocovariance satisfies
[TABLE]
Hence is weak-sense stationary.
Finally, consider the process for , which is locally exchangeable with and hence . The underlying random measure process is specified by where is the Dirac measure at ; this has no sample-continuous modification. ∎
Proof of Theorem 10.
Let be the countable subset provided by infinite separability in Definition 4. Let be any ordering of , and . Reverse martingale convergence implies that
[TABLE]
where is the tail -algebra of . Defining , we have that is invariant to and thus is -measurable. Therefore
[TABLE]
By Lemma 12(1) and Proposition 2,
[TABLE]
Taking the limit as , moving it into the expectation in Eq. 134 via dominated convergence, and using the limit from Eq. 133 yields
[TABLE]
Identical reasoning to the above also shows that
[TABLE]
Finally we add and subtract in left hand side of Eq. 32, apply the triangle inequality with the above bounds, and note that the sum over is the expectation over a uniformly random permutation to obtain the result. ∎
Proof of Proposition 11.
We rewrite the probability as an expectation,
[TABLE]
By local exchangeability, we can remap under any bijection , so that
[TABLE]
Finally, note that the outer indicator function tests whether is strictly greater than of the statistics across all . There can be at most of such indicator functions, so
[TABLE]
Rearranging the bound yields the result. ∎
Appendix B Technical lemmata
Lemma 12**.**
Let be bounded random variables in for some , , and be random elements in some probability space.
If , then
[TABLE] 2. 2.
If , then for any 1-Lipschitz function ,
[TABLE]
Proof.
- Denoting ,
[TABLE]
Using the fact that is measurable with respect to and the tower property yields
[TABLE]
Since the difference is between the expectation of a function bounded in evaluated at and at , the assumed total variation bound provides the result.
- First, note that by 1-Lipschitz continuity. Then defining and , the triangle inequality yields
[TABLE]
The right hand term is bounded by by the assumed total variation bound and 1-Lipschitz continuity. Defining ,
[TABLE]
The first term in the expression can be bounded by via substitution of the conditional expectation formulae for , using the tower property, and controlling the difference in expectations with the assumed total variation bound. The second term is again a difference in expectation of a bounded function under and with the same bound . ∎
Lemma 13**.**
For any two sequences of real numbers , ,
[TABLE]
Proof.
The proof follows by adding and subtracting , then , etc., and then using the triangle inequality. ∎
Lemma 14** ((Reiss, 1981)).**
For any two finite product probability measures and ,
[TABLE]
Lemma 15**.**
For any two real-valued random variables and constants ,
[TABLE]
Proof.
[TABLE]
∎
Lemma 16**.**
Let be a standard Borel space. There exists a countable collection of measurable subsets , such that for all , , , and probability measures ,
[TABLE]
and for all such that each is a continuity set of ,
[TABLE]
Proof.
Since is a standard Borel space, we know that is generated by a topology with a countable base . Any open set can be expressed as a countable union of these sets. Consider the collection of all possible unions of , and construct a countable sequence of sets by ordering , then , and so on. Then for any open set , there exists a subsequence of such that .
Assume ; then for any open set and ,
[TABLE]
But if and only if , . Hence
[TABLE]
Since this holds for all and , by the continuity of measures,
[TABLE]
Hence . If each is a continuity set of , then implies that for each , which then implies . ∎
Appendix C Additional Examples
In this section, we show that many popular covariate-dependent models from Bayesian nonparametrics exhibit local exchangeability.
C.1 Dependent Dirichlet process mixtures
In a typical mixture model setting, we have observations generated via
[TABLE]
where are the mixture weights satisfying , ; are the component parameters; is the mixture component likelihood; and are the observations. A popular nonparametric prior for the weights and component parameters is the Dirichlet process (Ferguson, 1973), defined by (Sethuraman, 1994)
[TABLE]
for some distribution . When the observations come with additional covariate information, the dependent Dirichlet process mixture model (MacEachern, 1999, 2000) may be used to capture similarities between related mixture population data. Here, observations are generated via
[TABLE]
where the component parameters and stick variables are now i.i.d. stochastic processes on , and . The marginal distributions of and at are and , respectively. Thus, the dependent Dirichlet process is marginally a Dirichlet process for each covariate value, but can exhibit a wide range of dependencies across covariates. In this setting, we have and strong canonical premetric
[TABLE]
where and . We add and subtract and apply the triangle inequality to find that
[TABLE]
Since is a product of independent random variables, Lemma 13 yields
[TABLE]
The infinite sum converges to some , and so
[TABLE]
Therefore, if the stochastic processes for the parameters and stick variables are both smooth enough such that
[TABLE]
for some premetric , then is locally exchangeable with respect to . Many dependent processes (e.g., (Foti and Williamson, 2015)) similar to the dependent Dirichlet process (and kernel beta process below) can be shown to exhibit local exchangeability using similar techniques.
C.2 Kernel beta processes
Another example of a model exhibiting local exchangeability from the Bayesian nonparametrics literature is the kernel beta process latent feature model (Ren et al., 2011). In a typical nonparametric latent feature modelling setting, we have observations generated via
[TABLE]
where are the feature frequencies satisfying , ; are the feature parameters; is the Bernoulli process that sets with probability and [math] otherwise independently across ; and is the likelihood for each observation. A popular nonparametric prior for the weights and feature parameters is the beta process (Hjort, 1990), defined by
[TABLE]
where is a Poisson point process parametrized by its mean measure, is some positive function, is a probability distribution, and . When the observations come with covariate information, the kernel beta process (Ren et al., 2011) may be used to capture similarities in the latent features of related populations. In particular, we replace with
[TABLE]
where is a kernel function with range in centered at with parameters , and
[TABLE]
where and are probability distributions. In other words, the kernel beta process endows each atom with i.i.d. covariates and parameters , and makes the likelihood that an observation with covariate selects a feature with covariate depend on both and . Taking to be the space of covariates for simplicity, again we have and (marginalizing ) strong canonical premetric
[TABLE]
where and . Suppose is -Hölder continuous in total variation for , in the sense that
[TABLE]
for any collection of points , where , independently with probability and , respectively, and both assign 0 mass to all other sets. Then
[TABLE]
Finally, if the kernel is -Hölder continuous with constant depending on , the independence of , , and may be used to show that
[TABLE]
Therefore the observations are locally exchangeable with and collects the product of constants from the previous expression.
C.3 Dynamic topic model
The dynamic topic model (Blei and Lafferty, 2006; Wang, Blei and Heckerman, 2008) is a model for text data that extends latent Dirichlet allocation (Blei, Ng and Jordan, 2003) to incorporate timestamp covariate information. In a continuous version of the model, observations are generated via
[TABLE]
where represents timestamps, is a vector of independent Wiener processes representing the popularity of topics at time , is a vector of independent Wiener processes representing the word frequencies for vocabulary of size in topic , is any -Lipschitz mapping from to the probability simplex for any , is the mean number of words per document, is the vector of counts of each vocabulary word in the document observed at time , and is the number of words in each document, taken to be the same across documents for simplicity. Here the covariate space is , and the observations are count vectors in where is the vocabulary size. In this setting, the strong canonical premetric is
[TABLE]
where and . But since multinomial variables are a function (in particular, a sum) of independent categorical random variables, Lemma 14 yields the bound
[TABLE]
We evaluate the total variation between two categorical distributions and apply the triangle inequality to find that
[TABLE]
Since , the components of and are i.i.d. across , and is -Lipschitz,
[TABLE]
where the last line follows by Jensen’s inequality. Therefore the observations are locally exchangeable with d(t,t^{\prime})=\min\left(1,\frac{1}{2}\mu L\left(K+V\right)\mathchoice{{\hbox{\displaystyle\sqrt{|x-x^{\prime}|,}}\lower 0.4pt\hbox{\vrule height=7.5pt,depth=-6.00003pt}}}{{\hbox{\textstyle\sqrt{|x-x^{\prime}|,}}\lower 0.4pt\hbox{\vrule height=7.5pt,depth=-6.00003pt}}}{{\hbox{\scriptstyle\sqrt{|x-x^{\prime}|,}}\lower 0.4pt\hbox{\vrule height=5.25pt,depth=-4.20003pt}}}{{\hbox{\scriptscriptstyle\sqrt{|x-x^{\prime}|,}}\lower 0.4pt\hbox{\vrule height=3.75pt,depth=-3.00002pt}}}\right).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aldous (1981) {barticle} [author] \bauthor \bsnm Aldous, \bfnm David \binits D. ( \byear 1981). \btitle Representations for partially exchangeable arrays of random variables. \bjournal Journal of Multivariate Analysis \bvolume 11 \bpages 581–598. \endbibitem
- 2Aldous (1985) {bbook} [author] \bauthor \bsnm Aldous, \bfnm David \binits D. ( \byear 1985). \btitle Exchangeability and related topics. \bseries École d’été de probabilités de Saint-Flour , XIII. \bpublisher Springer, \baddress Berlin. \endbibitem
- 3Austin and Panchenko (2014) {barticle} [author] \bauthor \bsnm Austin, \bfnm Tim \binits T. and \bauthor \bsnm Panchenko, \bfnm Dmitry \binits D. ( \byear 2014). \btitle A hierarchical version of the de Finetti and Aldous–Hoover representations. \bjournal Probability Theory and Related Fields \bvolume 159 \bpages 809-823. \endbibitem
- 4Baiocchi et al. (2010) {barticle} [author] \bauthor \bsnm Baiocchi, \bfnm Mike \binits M., \bauthor \bsnm Small, \bfnm Dylan \binits D., \bauthor \bsnm Lorch, \bfnm Scott \binits S. and \bauthor \bsnm Rosenbaum, \bfnm Paul \binits P. ( \byear 2010). \btitle Building a stronger instrument in an observational study of perinatal care for premature infants. \bjournal Journal of the American Statistical Association \bvolume 105 \bpages 1285–1296. \endbibitem
- 5Berti, Pratelli and Rigo (2004) {barticle} [author] \bauthor \bsnm Berti, \bfnm Patrizia \binits P., \bauthor \bsnm Pratelli, \bfnm Luca \binits L. and \bauthor \bsnm Rigo, \bfnm Pietro \binits P. ( \byear 2004). \btitle Limit theorems for a class of identically distributed random variables. \bjournal The Annals of Probability \bvolume 32 \bpages 2029–2052. \endbibitem
- 6Bhatia and Davis (2000) {barticle} [author] \bauthor \bsnm Bhatia, \bfnm Rajendra \binits R. and \bauthor \bsnm Davis, \bfnm Chandler \binits C. ( \byear 2000). \btitle A better bound on the variance. \bjournal The American Mathematical Monthly \bvolume 107 \bpages 353–357. \endbibitem
- 7Blei and Lafferty (2006) {binproceedings} [author] \bauthor \bsnm Blei, \bfnm David \binits D. and \bauthor \bsnm Lafferty, \bfnm John \binits J. ( \byear 2006). \btitle Dynamic topic models. In \bbooktitle International Conference on Machine Learning. \endbibitem
- 8Blei, Ng and Jordan (2003) {barticle} [author] \bauthor \bsnm Blei, \bfnm David \binits D., \bauthor \bsnm Ng, \bfnm Andrew \binits A. and \bauthor \bsnm Jordan, \bfnm Michael \binits M. ( \byear 2003). \btitle Latent Dirichlet allocation. \bjournal Journal of Machine Learning Research \bvolume 3 \bpages 993–1022. \endbibitem
