A unified construction for series representations and finite approximations of completely random measures
Juho Lee, Xenia Miscouridou, Fran\c{c}ois Caron

TL;DR
This paper introduces a unified framework for deriving series representations and finite approximations of completely random measures, enhancing scalability and simulation in Bayesian nonparametrics.
Contribution
It extends existing constructions to include new series representations for important CRMs like the generalized gamma and stable beta processes.
Findings
Includes known and novel series representations for CRMs.
Provides analysis of truncation errors in approximations.
Enables scalable inference in complex Bayesian models.
Abstract
Infinite-activity completely random measures (CRMs) have become important building blocks of complex Bayesian nonparametric models. They have been successfully used in various applications such as clustering, density estimation, latent feature models, survival analysis or network science. Popular infinite-activity CRMs include the (generalized) gamma process and the (stable) beta process. However, except in some specific cases, exact simulation or scalable inference with these models is challenging and finite-dimensional approximations are often considered. In this work, we propose a general and unified framework to derive both series representations and finite-dimensional approximations of CRMs. Our framework can be seen as an extension of constructions based on size-biased sampling of Poisson point process [Perman1992]. It includes as special cases several known series representations…
| Name | ||||||
|---|---|---|---|---|---|---|
| Deterministic | – | – | ||||
| Exponential | ||||||
| Gamma | ||||||
| Inv. gamma | ||||||
| Gen. Pareto |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Statistical Methods and Bayesian Inference
A unified construction for series representations and finite approximations of completely random measures
Juho Lee Corresponding author, [email protected] AITRICS, Seoul, South Korea
Xenia Miscouridou
Department of Statistics, University of Oxford, Oxford, United Kingdom
François Caron
Department of Statistics, University of Oxford, Oxford, United Kingdom
Abstract
Infinite-activity completely random measures (CRMs) have become important building blocks of complex Bayesian nonparametric models. They have been successfully used in various applications such as clustering, density estimation, latent feature models, survival analysis or network science. Popular infinite-activity CRMs include the (generalized) gamma process and the (stable) beta process. However, except in some specific cases, exact simulation or scalable inference with these models is challenging and finite-dimensional approximations are often considered. In this work, we propose a general and unified framework to derive both series representations and finite-dimensional approximations of CRMs. Our framework can be seen as an extension of constructions based on size-biased sampling of Poisson point process [46]. It includes as special cases several known series representations as well as novel ones. In particular, we show that one can get novel series representations for the generalized gamma process and the stable beta process. We also provide some analysis of the truncation error.
1 Introduction
Infinite-activity completely random measures (CRMs), and more generally functionals of infinite-activity Poisson random measures arise as building blocks of numerous modern structured statistical models. Examples include clustering and density estimation [49, 32, 33], spatial statistics [13, 56, 41], latent factor/trait models [24, 44, 59, 16, 3], network modeling [14, 60, 15, 17], recommendation systems [26], prediction, risk management and option pricing of financial assets [19] or survival analysis [28, 42]; see [40] for a review. Popular CRMs include the (generalized) gamma random measure (also known as tempered stable) [30, 13] or the (stable) beta random measure [28, 54]. Other popular random probability measures such as the Dirichlet process or the Pitman-Yor process are obtained by normalization or transformation of CRMs.
The use of statistical models based on infinite-activity CRMs poses a number of practical challenges regarding posterior inference and estimation. Except in some specific cases, most algorithms, either based on Gibbs sampling [31, 60], slice sampling [27, 23], mean-field variational inference [9, 21, 39] or sequential Monte Carlo [11],[4, Section 3.2.], require the use of a finite-dimensional approximation of the CRM. Finite-dimensional approximations can either be obtained by (i) truncating a series representation of the CRM, with stochastically decreasing weights, or (ii) by considering a finite measure with atoms and iid weights, converging in distribution to the CRM as tends to infinity. For example, for the beta process with scale parameter and probability distribution , the inverse Lévy series representation is [55]
[TABLE]
A classical finite-dimensional approximation with iid weights is [24]
[TABLE]
Both representations are routinely used in Markov chain Monte Carlo and variational Bayes approximate inference algorithms [21, 45, 43]. Series and iid approximations are similarly used for the gamma process [60, 48] and the generalized gamma process [39]. Since the early work of [35] and [22] on the so-called inverse Lévy representation, various generic series representations of Poisson random measures have been proposed [12, 46, 50, 52]. Nested series representations have also been recently proposed for some specific CRMs [45, 43, 48]. Finite iid representations can be obtained using the infinite divisibility properties of the CRM [37] but as noted by [39], it generally does not lead to tractable representations, except in the gamma process case. Other ways of obtaining iid constructions are described in [29] for some family of CRMs. [18] provided a recent survey of the existing series representations and approximations as well as a truncation analysis.
The objective of this article is to present a general framework to obtain both series and iid representations of CRMs. Our construction builds on the definition of a Poisson random measure on an extended space; it generalizes the size-biased approach of [46], and admits as special cases both the size-biased and inverse-Lévy representations. We show that under this construction, one can draw connections between existing series and iid representations that appeared unrelated, and it allows to derive new series and iid representations. More precisely, we show how the iid representation of [39] is related to the size-biased construction of [46], derive novel series and iid representations of the generalized gamma and stable beta random measures. We also provide an asymptotic analysis of the truncation error for this class of approximations.
This article is organised as follows. In Section 2 we provide background material on completely random measures and some existing series representations for CRMs and describe the objectives. Section 3 describes the general construction for obtaining series and iid representations of CRMs. In Section 4 we describe a number of specific constructions, showing how one recovers some existing constructions as a particular case of our framework. In Section 5 we provide an analysis of the asymptotic truncation error, and discuss related approaches in Section 6. The proofs and additional background material are provided in the appendix.
Notations. For a measure on and a positive measurable function on , write . Let be the ordered points of a unit-rate Poisson point process on , that is are iid unit-rate exponential random variables. With a slight abuse of notation, we use the same notation for the distribution of a random variable and its pdf. For instance, the probability density function (pdf) of a gamma random variable is written as .
2 Background
2.1 Completely random measures
Let be a measurable space where . For any point , we refer to as the size of . Let be a Poisson random measure on with mean measure where is a Borel measure on , called size measure, satisfying
[TABLE]
and is a Markov probability kernel from to . The linear functional
[TABLE]
is an infinite-activity completely random measure [36] on with random weights and random atoms. We write . The conditions (3) imply that the atomic random measures and have an infinite number of atoms, and is almost surely finite. If does not depend on , the CRM is said to be homogeneous. We assume in the rest of this article that one can easily simulate from (or ) and/or it admits a tractable density with respect to some reference measure (e.g. Lebesgue). Two popular examples of CRMs are the generalized gamma process (GGP) [30, 13], also known as (exponentially) tilted stable process, with size measure
[TABLE]
where , and , or and , and the stable beta process (SBP) [28, 54] with
[TABLE]
where , , and is the beta function. When , both random measures are infinite-activity.
Remark 2.1*.*
The constructions described in this paper hold more generally when the first condition in Equation (3) is not satisfied, but for all . Note that in this case almost surely. An example of this more general case is given in Section 4.4 where .
2.2 Objective
Our objective is to derive general series representations for the Poisson random measure , or equivalently the CRM , of the form
[TABLE]
where the sizes , are stochastically ordered. That is, for any , We write and . Denote the measure obtained by truncating the above series after points
[TABLE]
where is a finitely exchangeable random sequence defined by where is a random permutation of the set . We will refer to the sequence (or ) as the sequential truncated representation, and (or ) as the exchangeable truncated representation. In Section 3 we will show that the exchangeable truncated representation can be approximated by a finite iid representation, which will be denoted .
In the rest of this paper, we will assume that the mean measure is available and one can sample from the conditional distribution (or in the homogeneous case). Under these conditions, we can obtain the representations (7) by first sampling (or ), then conditionally sample (or ) from (or ).
2.3 Existing representations of CRMs
Inverse-Lévy representation. For any , let
[TABLE]
be the tail intensity of the size measure , and denote by its generalized inverse. The inverse Lévy representation [35, 22] is given by
[TABLE]
In this case, the sizes are ordered and it therefore leads to the best possible approximation in terms of the sizes. While this representation has been used in many applications [57, 42, 27, 10, 1, 6] its main limitation is that is in general non-tractable. Two exceptions are the beta random measure, whose inverse Lévy representation is given by Equation (1), and the stable random measure (corresponding to the measure (4) with and ) where the inverse Lévy representation is given by
Size-biased representation. The size-biased sequential and exchangeable representations and , introduced by [46, Section 4], are given as follows111Note that this is different from what [18] call a size-biased representation.. Let be defined as where is the generalized inverse of the Laplace exponent and where Additionally, given , we have are iid with distribution
[TABLE]
The term size-biased comes from the fact that the atoms are ordered by successively sampling without replacement according to their size
[TABLE]
In the case of the gamma random measure, which corresponds to Equation (4) with , [46] show that the series representation corresponds to [12]’s representation and is given by and .
3 Series representations and finite approximations of CRMs
3.1 Arrival-time augmentation
Let be some Markov probability kernel from to with cdf satisfying, for any
[TABLE]
That is, if and , with , then .
We consider a Poisson random measure on the augmented space with mean measure . For a point , we refer to as the arrival time of the point . Indeed, the second condition in Equation (9) ensures that points with larger size are more likely to have a smaller arrival time . We may therefore consider the following analogy: atoms of the Poisson random measure are enrolled in a race, each atom having a strength , and stronger atoms are more likely to finish faster and therefore have a smaller . The first condition in Equation (9) ensures that for any hence we can order the arrival times. Let denote the sequence of ordered arrival times, and consider the augmented sequential representation where , are the associated sizes and locations. By the restriction theorem [38], is a Poisson random measure with mean and . We now give the general definitions of the sequential, exchangeable and iid representations of the CRM associated to the arrival time kernel . For simplicity of presentation, we assume that for any , is absolutely continuous with respect to the Lebesgue measure with , but one can also consider discontinuous cdfs , see Section 4.1 for an example.
3.2 Series and truncated exchangeable constructions
Theorem 3.1**.**
Let be a parametric distribution on with parameter and be the associated parametric cumulative density function (cdf) satisfying condition (9). Consider the conditional distributions
[TABLE]
where
[TABLE]
The sequential construction is obtained as follows, for
[TABLE]
The truncated exchangeable construction is obtained, for by
[TABLE]
3.3 Finite iid construction
Note that tends to 1 almost surely as tends to infinity. This therefore suggests the following finite iid construction, as an approximation to the truncated measure
[TABLE]
Proposition 3.1**.**
Let be the finite iid approximation defined by Equation (13). Then converges in distribution to as .
For the iid construction, one needs to evaluate only once, and this can be done numerically if there is an analytic form for . Instead of the distribution , we can alternatively use more general distributions where is an increasing function such that . 3.1 also holds as the proof can be straightforwardly adapted to this case. Note that if as tends to infinity for some constant and , then we can take . B.2 gives examples of admissible functions under generic assumptions on and .
4 Examples
We first show how the inverse Lévy and size-biased constructions described in Section 2.3 can be recovered as special cases of the general construction introduced in Section 3. We then derive novel constructions within this framework.
4.1 Deterministic arrival times (inverse-Lévy construction)
Assume that the arrival times are deterministic given the size , and inversely proportional to it, that is The distribution does not admit a density with respect to the Lebesgue measure, but one can still obtain expressions for the different quantities of interest. We obtain
[TABLE]
The sequential construction corresponds to the inverse-Lévy construction described in Section 2.3. The exchangeable representation is similar to the -truncation of normalized CRMs, used in [1, 2], except that the truncation threshold is treated as a random variable here.
4.2 Exponential arrival times (size-biased construction)
Consider an exponential arrival time distribution with and . This leads to [46]’s size-biased sequential and exchangeable representations described in Section 2.3. While this construction is not novel, it appears that it provides a novel series representation for the generalized gamma random measure. We also show that the iid representation associated to this arrival time distribution corresponds to the finite approximation proposed by [39].
Generalized gamma process.
In the case of the size measure (4) with and , we obtain the following sequential construction for the GGP, which appears to be novel
[TABLE]
In Section E.1, we compare this representation with Rosinski’s series representation for the GGP [51, 53]. The conditional distribution for the exchangeable and iid constructions is given by
[TABLE]
The random variable having this density is called the exponentially-tilted BFRY distribution [7, 20, 39], and written as . One can easily simulate from Equation (16), see Section D.2. Note that as hence we can consider the iid distribution . This corresponds precisely to the finite-dimensional approximation introduced by [39] for the GGP, which can therefore be seen as a particular case of our approach.
4.3 Gamma arrival times
As a generalization of the exponential arrival times, consider now a gamma arrival distribution
[TABLE]
where is a tuning parameter and is the lower incomplete gamma function. Since and as , converges in probability hence in distribution to , and therefore as tends to infinity, which corresponds to the arrival time cdf of the inverse Lévy representation. Hence, the construction based on the gamma arrival times bridges between the size-biased () and inverse-Lévy () constructions.
Generalized gamma process. Consider the generalized gamma process with size measure (4) and parameters , and . We have
[TABLE]
where . For the sequential and exchangeable constructions, we get
[TABLE]
For the iid construction, we can use Eq. 14 and estimate numerically or, using B.2 and Table 1, we can alternatively use . The normalizing constant of (and therefore ) has an analytic expression via standard functions. We call the random variable having distribution a exponentially-tilted generalized BFRY random variable, due to the form of the pdf obtained by exponentially tilting the pdf of generalized BFRY. This distribution has a number of remarkable properties that make it amenable for tractable simulation and posterior inference. Refer to Section D.4 for a more detailed description.
4.4 Inverse gamma arrival times
Consider now an inverse gamma arrival distribution where is the pdf of an inverse gamma random variable and is a tuning parameter. By a similar argument as for the gamma arrival times, we have as hence it also admits the inverse Lévy construction as a limiting case. The case is of particular interest, as it leads to a tractable novel representation for the GGP (see Section E.3), and provide a novel way of interpreting the classical iid approximation of the beta process.
Beta process. Consider the one-parameter beta process with size measure (5) with and . The bijective transformation gives the measure on . Note that is not a Lévy measure, but we can nonetheless use our construction as the tail Lévy intensity is finite. Using the inverse gamma kernel with , we obtain and the iid distribution . Applying the inverse transformation , we obtain , which corresponds to the classical iid approximation for the beta process, described in Equation (2). The iid construction for the beta process can alternatively be recovered using the arrival time distribution directly with (5), without change of variable.
4.5 Generalized Pareto arrival time
Consider the arrival time distribution where .
Stable beta process. Consider the stable beta process with measure (5) with . We have and
[TABLE]
These distributions admit the same conjugacy properties as the beta distribution, and one can sample exactly from these distributions as detailed in Section E.4.
5 Truncation error analysis
5.1 Error on functionals of the CRM
For a measurable function such that a.s., the error term associated with the truncation is defined as
[TABLE]
Taking for example corresponds to the error between the CRM and .
Proposition 5.1**.**
For , has the following moment generating function
[TABLE]
We now consider results for the special case . The mean and variance of the truncation error given are given by
[TABLE]
The next proposition provides an asymptotic expression for the error term, giving insights on how the error relates to the choice of the arrival time distribution . The proposition makes some assumptions of regular variation on the mean measure . Background on regular variation and Mellin transforms is given in Appendix A and the proof of 5.2 is given in Section B.4.
Proposition 5.2**.**
Assume that the mean measure is absolutely continuous with respect to the Lebesgue measure with density function such that
[TABLE]
where and . Assume additionally that where is a positive function on such that its Mellin transform (see A.3) converges in some open interval containing . Assume additionally that either (i) is differentiable with derivative and that the Mellin transform of is defined in some open interval containing , or (ii) that . Then we have
[TABLE]
*where the constant is given by if and if is differentiable, and only depends on the arrival time distribution and . *
The deterministic, gamma, inverse-gamma (for ) and generalized Pareto (for ) arrival time distributions discussed in Section 4 all verify the assumptions of 5.1. The associated kernels, Mellin transforms and constants are given in Table 1 in the appendix. Figure 1(a) shows the value of the constant for the deterministic and gamma arrival time, with different values of . As indicated in Section 4.3, the approximation gets closer to the deterministic/inverse Lévy construction as increases. Both the GGP and the SBP with verify Equation (18), with for the GGP and for the SBP. We run a simulation study in order to investigate the finite- properties of the proposed approximations. We report in Figure 1(b-c) the mean and variance of for gamma arrival times, for the stable process and the GGP with . For the stable process, we also compare to the inverse-Lévy approximation, as it has an analytic form. As expected, the approximation gets better as the value increases. Additional simulations for other arrival time distributions are given in Appendix F.
5.2 error on the marginal likelihood
In this section we discuss the error on the marginal likelihood when truncated CRMs are used for hierarchical Bayesian models under the framework described in [16]. Let , and be its approximation with atoms. Let be a probability distribution on for all , and denote . Consider a hierarchical Bayesian model for observations , and denote the marginal likelihood for this model. Similarly, denote be the marginal likelihood of the model with the same generative process, except for instead of . Following [16], we analyze the quality of approximation by comparing and . For the inverse Lévy case, one recovers the bound derived in [16, Theorem D.3.].
Proposition 5.3**.**
We have the bound , where .
6 Discussion
Our series construction can be seen as a special case of Rosinski’s shot-noise series representation [52] (as is the case for most series constructions, see [52]), using the disintegration where is a Markov kernel (noting that ). [29] proposed alternative ways of deriving iid approximations for some classes of CRMs. The approach does not rely on a latent Poisson construction and is therefore different from the approach considered here. We emphasize that the finite iid construction is useful for both simulation and hierarchical Bayesian modeling in various contexts. Using B.2, one can approximate infinite-dimensional priors with finite-dimensional iid distributions without any numerical inversion. See Appendix G where we discuss an example of our construction applied to normalized GGP mixture models.
Appendix A Background on regular variation and Mellin transforms
This background material comes from the book of [8].
A.1 Definitions
Definition A.1** (Slowly varying function).**
A function is slowly varying at infinity if for all ,
[TABLE]
Definition A.2** (Regularly varying function).**
A function is regularly varying at infinity with exponent if for some slowly varying . A function is regularly varying at 0 if is regularly varying at infinity, i.e., for some and slowly varying .
A.2 Basic theorems for regularly varying functions
Let be a regularly function with exponent and slowly varying function locally bounded on .
Theorem A.1** (Karamata’s theorem).**
[8, Propositions 1.5.8 and 1.5.10]**. Suppose that as .
- •
When ,
[TABLE]
- •
When ,
[TABLE]
Corollary A.1**.**
This also holds when is regularly varying at 0. When and as ,
[TABLE]
A.3 Generalized Abelian theorem
Definition A.3**.**
Given a measurable kernel let
[TABLE]
be its Mellin transform, for such that the integral converges.
Remark A.1*.*
If has a Mellin transform which converges in , then has a Mellin transform which converges in .
Theorem A.2**.**
[8, Theorem 4.1.6 page 201]** Let the Mellin transform of converge at least in the strip , where . Let , a slowly varying function, If is measurable, is bounded on every interval and
[TABLE]
then
[TABLE]
The next result is a trivial corollary of A.2, considering limits as tends to [math].
Corollary A.2**.**
Let the Mellin transform of converge at least in the strip , where . Let , a slowly varying function, If is measurable, is bounded on every interval and
[TABLE]
then
[TABLE]
Proof.
[TABLE]
where , bounded on every interval with
[TABLE]
and is such that its Mellin transform converges in the strip . A.2 above therefore gives the result. ∎
Appendix B Proofs
B.1 Proof of 3.1
The proof is an adaption of the proof for the size-biased construction in [46, Section 4]. The mean measure of the Poisson random measure can be expressed as
[TABLE]
This is the mean measure of a marked Poisson point process, where are the points of an inhomogeneous Poisson point process with intensity , hence admit the representation Equation (11), and the marks have conditional distribution as shown in Equation (11). Let where is a random permutation of , and . By properties of the Poisson process on the real line, the random variables are iid given , with pdf
[TABLE]
Hence, given , the marks and are also iid, with conditional distribution where
[TABLE]
B.2 Proof of 3.1
The proof is similar to that of [39, Section 3.1]. Let be a measurable function.
[TABLE]
Note that and as tends to infinity. By the bounded convergence theorem, we therefore have
[TABLE]
as . Additionally, for any real sequence converging to we have as . We therefore obtain
[TABLE]
where the right-handside is equal to the Laplace functional of the CRM by Campbell’s theorem [38].
B.3 Proof of 5.1
By the marking theorem for Poisson point processes [38, Chapter 5], given , the random measure is a Poisson random measure with mean measure . The result follows from Campbell’s theorem and the fact that .
B.4 Proof of 5.2
We state a slightly more general version of 5.2, where the constant in Equation (18) can more generally be any slowly varying function . We then prove this generalized proposition.
Proposition B.1**.**
[Slight generalization of 5.2] Assume that the mean measure is absolutely continuous with respect to the Lebesgue measure with density function such that
[TABLE]
where and is a slowly varying function. Assume additionally that where is a positive function on such that its Mellin transform (see A.3) converges in some open interval containing . Assume additionally that either (i) is differentiable with derivative and that the Mellin transform of is defined in some open interval containing , or (ii) that . Then we have
[TABLE]
where is some slowly varying function that depends on and but not and the constant is given by if and if is differentiable, and only depends on the arrival time distribution and .
In order to B.1, we first state the following proposition.
Proposition B.2**.**
Assume that
[TABLE]
where and is a slowly varying function. Assume additionally that
[TABLE]
where is a positive and differentiable function on , with derivative . Assume that the Mellin transform of is defined in some open interval containing . Then
[TABLE]
where is another slowly varying function depending on and , and defined in Equation (28). If is constant, then we simply have . In particular, this is the case for both the generalized gamma process and the stable beta process, which verify condition (23) when with for the GGP and for the SBP.
Proof of B.2.
The assumptions (21) and the first condition in Equation (9) both imply that
[TABLE]
Using integration by parts
[TABLE]
Now assume where is differentiable on . Then
[TABLE]
If the Mellin transform of is defined in some open interval containing , then A.2 implies
[TABLE]
as tends to infinity. In the case , is not differentiable, but we have directly
[TABLE]
Now we use inversion formulas for regularly varying function to get the asymptotic regime for . Assume , then [25, Lemma 22] implies
[TABLE]
as , where is a slowly varying function defined by
[TABLE]
where denotes the de Bruijn conjugate of the slowly varying function [8, Theorem 1.5.13]. Note that only depends on and , but not the arrival time distribution .
∎
Assume that the mean measure is absolutely continuous with respect to the Lebesgue measure with density function verifying
[TABLE]
where and is a slowly varying function. Equation (21) and [8, Proposition 1.5.8] imply that
[TABLE]
as tends to 0 where is a slowly varying function defined by
[TABLE]
Assume additionally that where is a positive function on such that its Mellin transform (see A.3) converges in some open interval containing . Note that
[TABLE]
As tends to infinity almost surely as tends to infinity, A.2 implies
[TABLE]
almost surely as tends to infinity.
As where almost surely as tends to infinity, using B.2 we obtain
[TABLE]
almost surely as tends to infinity. Combining Equation (33) with Equations (31) and (32), we obtain
[TABLE]
almost surely as tends to infinity. Note that if is constant, then all the other slowly varying functions are also constant with
[TABLE]
Finally, Equation (22) follows similarly to the proof of [25, Proposition 2]. Using Chebyshev’s inequality
[TABLE]
Take . As
[TABLE]
by the Borel-Cantelli lemma, given ,
[TABLE]
almost surely as . As is decreasing, we have, for any
[TABLE]
and it follows by sandwiching that
[TABLE]
almost surely as . Combining this with Equation (34) gives the final result, with the slowly varying function defined by
[TABLE]
Note that in the case of the GGP, the different slowly varying functions are all constant functions
[TABLE]
For the SBP, we have
[TABLE]
B.5 Proof of 5.3
From [16], we have the protobound
[TABLE]
In our case,
[TABLE]
where the last inequality follows from Jensen’s inequality.
Appendix C Mellin transforms
C.1 Deterministic kernel
Take Then
[TABLE]
if .
C.2 Gamma kernel
Take for .
[TABLE]
which converges for . Additionally, hence
[TABLE]
which converges for .
C.3 Inverse gamma kernel
Take . Note that if , . Then
[TABLE]
therefore defined for . Note that
[TABLE]
as tends to infinity, which corresponds to the inverse-Lévy case.
We have
[TABLE]
and
[TABLE]
defined for . Note again that as (inverse Lévy case).
C.4 Generalized Pareto kernel
Take . Then
[TABLE]
for . We have hence
[TABLE]
defined for .
Appendix D BFRY and related distributions
D.1 BFRY distribution
The BFRY distribution, first named in [20] after the work of Bertoin, Fujita, Roynette, and Yor [7], arises much earlier in various contexts [47, 58]. Recently it was highlighted in [39] as a finite-dimensional approximate distribution for stable, generalized gamma, and special case of stable-beta processes. The density of a BFRY distribution with parameter is written as
[TABLE]
One can easily verify that the distribution can be simulated as a ratio of independent gamma and beta random variables.
[TABLE]
D.2 Exponentially-tilted BFRY distribution
In [39, 34], the exponentially-tilted version of BFRY distribution was discussed. The density of exponentially-tiled random variable with parameters , and is
[TABLE]
Then it is easy to show that
[TABLE]
D.3 Generalized BFRY distribution
The generalized BFRY distribution, first discussed in [5], is obtained by generalizing the sampling procedure of BFRY distribution. The generalized BFRY distribution with parameter and is obtained as
[TABLE]
By a simple algebra, we obtain the density as
[TABLE]
where is the lower incomplete gamma function.
D.4 Exponentially-tilted generalized BFRY distribution
The density of exponentially-tilted generalized BFRY distribution with parameter and is
[TABLE]
where is the incomplete beta function. A random variable having this distribution can be simulated by rejection sampling. Alternatively, note that
[TABLE]
which means that the distribution is an infinite mixture of gamma distributions with mixing proportion
[TABLE]
Hence, sampling is straightforward as first sampling the component from above infinite discrete distribution and sampling from corresponding gamma distribution.
The expoentially-tilted GBFRY distribution has a nice property to be a conjugate prior for Poisson, gamma, normal with fixed mean, and Pareto. Let . Then, for Poisson,
[TABLE]
For gamma,
[TABLE]
For normal,
[TABLE]
For Pareto,
[TABLE]
D.5 Inverse generalized BFRY
One can also consider the counterpart of generalized BFRY distribution where gamma is replaced with inverse gamma. We define inverse generalized BFRY distribution, whose pdf is written as
[TABLE]
Hence, one can realize that
[TABLE]
This distribution corresponds to the truncated exchangeable density of stable process with inverse gamma arrival times.
D.6 Exponentially-tilted inverse generalized BFRY
Finally, we consider an exponentially tilted version of inverse GBFRY distribution, whose pdf is written as
[TABLE]
Unfortunately, we don’t have an analytic expression for the normalization constant. We can still sample from this distribution via rejection sampling. This distribution arises as the truncated exchangeable density of generalized gamma process with inverse gamma arrival times.
Appendix E Detailed derivations of the results in Section 4 and additional examples
E.1 Exponential arrival times
Generalized gamma.
In the case of the size measure (4) with and , we have
[TABLE]
The arrival times are thus generated as
[TABLE]
The conditional distribution for the sequential construction is
[TABLE]
In summary, the sequential construction for the GGP, is given by
[TABLE]
Comparison to Rosinski’s series representation for the GGP
Rosinski [51, 53] proposed the following series representation for the GGP/tempered stable process
[TABLE]
where ,. For large, with high probability, which corresponds to the inverse-Lévy construction for the stable process, and this construction has the same asymptotic error rate as the inverse-Lévy construction for the GGP. The asymptotic error of Rosinski’s representation is therefore lower than the asymptotic error for the series defined by Eq. 45, by a factor , according to Table 1.
E.2 Gamma arrival times
Generalized gamma. Consider the generalized gamma process with (4), , and . We have
[TABLE]
where is the incomplete beta function. For , has the analytic expression
[TABLE]
For , there is no analytic expression for . For the sequential construction, we get
[TABLE]
For the exchangeable and iid constructions, we obtain
[TABLE]
When , is the distribution of a generalized BFRY distribution (Section D.3). When , corresponds to the distribution of exponentially-tilted generalized BFRY (Section D.4).
E.3 Inverse gamma arrival times
Take
[TABLE]
Stable process.
Consider the stable process with size measure (4), , and . We have
[TABLE]
For the sequential construction, we have
[TABLE]
For the exchangeable construction, we get
[TABLE]
which correspond to inverse generalized BFRY distribution. We can sample from this as . See Section D.5 for more details.
The case is of particular interest, as it leads to a tractable novel representation for the GGP, and provide a novel way of interpreting the classical iid approximation of the beta process.
Generalized gamma process.
Consider GGP with size measure (4) with and . Take inverse gamma arrival time with . We have
[TABLE]
where is a modified Bessel function of the second kind. Unfortunately, the arrival time is not given analytically, so we may resort to a numerical root finding algorithm to compute . The sequential construction is then given by
[TABLE]
where is a generalized inverse Gaussian distribution with parameters . The exchangeable construction is given by
[TABLE]
This particular case, which seems to be novel to the best of our knowledge, is useful because it covers the gamma process (). It also includes the stable process as its limiting case - as ,
[TABLE]
The sequential construction is impractical since we have to invert for each , but the exchangeable construction requires only one inversion for .
Beta process.
Consider the beta process:
[TABLE]
Take the bijective transformation which gives the measure on
[TABLE]
Note that is not a Lévy measure. Using the inverse gamma kernel with to obtain a series approximation for , we obtain
[TABLE]
and for the iid model we have where
[TABLE]
which is the distribution of an inverse gamma random variable with parameter . Setting the inverse transformation , we obtain
[TABLE]
which corresponds to the classical iid approximation for the beta process, described in the introduction.
This construction can also be obtained directly with different arrival time distribution. Consider
[TABLE]
A sample from this distribution can be obtained as
[TABLE]
Then we have
[TABLE]
and as a result
[TABLE]
A sample from can be obtained by
[TABLE]
and the exchangeable construction correspond to the iid beta approximation.
E.4 Generalized Pareto arrival time distribution
Consider the following arrival time distribution,
[TABLE]
where .
Stable beta process.
Consider the stable beta process with Lévy measure
[TABLE]
With change of variable , we see that
[TABLE]
For the sequential model we obtain
[TABLE]
With the change of variable , we see that
[TABLE]
and thus a sample from can be obtained as
[TABLE]
For the exchangeable model, we have
[TABLE]
and by a similar calculation we see that a sample from can be obtained as
[TABLE]
Appendix F Details on simulations and additional results
We are interested in measuring the error , apparently not tractable. Hence, we consider the approximate error defined for as
[TABLE]
where we simulate via sequential constructions. When no analytic expression is available (computing for Gamma arrival times for GGP when , computing for the inverse-Lévy for GGP), we resort to numerical inversion algorithm. For each configuration, we first sample a series of arrival time sequences, and conditioned on that sample 100 series of jumps to compute . We repeat this procedure 10 times for each arrival time sequence (thus 1,000 jump simulations in total for each configuration) and report the mean and standard deviations. Unless specified otherwise, we use the hyperameters for Stable Process (SP) and GGP, and set . Fig. 1 in the paper reports for SP and GGP.
Fig. 2 shows the approximate errors for gamma and inverse gamma arrival times for SP, and gamma arrival times for GGP. We can observe that the variances quickly approach zero, except for the inverse gamma arrival time with case for which our theory predicts to have infinite variance.
Fig. 3 shows the value of constants for gamma and inverse gamma arrival times with varying and values. Note that lower implies lower expected error by 5.2. We found that gamma arrival times exhibit lower when , and inverse gamma has lower when . This observation is empirically confirmed in Fig. 4.
Finally, we compared the approximate error to asymptotic value of . In case of gamma arrival times for GGP, according to 5.2, we have
[TABLE]
Fig. 5 compares empirical approximate errors with to (64) with different values of . We fixed here. One can see that the approximate error quickly approaches asymptotic errors.
Appendix G Example on normalized GGP mixture models
Consider a hierarchical Bayesian model
[TABLE]
We approximate infinite dimensional process with finite iid process . Then, the rest of the model can be rewritten as
[TABLE]
We construct via gamma arrival times with . Using B.2, we have
[TABLE]
Note that we used the function f(n)=\big{(}\frac{\sigma\Gamma(\kappa)\Gamma(1-\sigma)n}{\alpha\kappa^{\sigma}\Gamma(\kappa-\sigma)}\big{)}^{\frac{1}{\sigma}} in place of , thus both evaluation of the pdf and sampling can be done without any numerical approximation. The joint density of the mixture model is then written as
[TABLE]
where and are the density for and , , and . Now we are free to any posterior inference algorithm, such as variational inference or stochastic gradient MCMC as in [39].
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. Argiento, I. Bianchini, and A. Guglielmi. A blocked Gibbs sampler for NGG-mixture models via a priori truncation. Statistics and Computing , 26(3):641–661, 2016.
- 2[2] R. Argiento, I. Bianchini, and A.s Guglielmi. Posterior sampling from ε 𝜀 \varepsilon -approximation of normalized completely random measure mixtures. Electronic Journal of Statistics , 10(2):3516–3547, 2016.
- 3AC [19] F. Ayed and F. Caron. Nonnegative Bayesian nonparametric factor models with completely random measures for community detection. ar Xiv:1902.10693 , 2019.
- 4ADH [10] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Marlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 72(3):269–342, 2010.
- 5ALC [19] F. Ayed, J. Lee, and F. Caron. Beyond the Chinese restaurant and Pitman-Yor processes: statistical models with double power-law behavior. ar Xiv:1902.04714 , 2019.
- 6AP [17] J. Arbel and I. Prünster. A moment-matching ferguson & klass algorithm. Statistics and Computing , 27(1):3–17, 2017.
- 7BFRY [06] J. Bertoin, T. Fujita, B. Roynette, and M. Yor. On a particular class of self-decomposable random variables : the durations of Bessel excursions straddling independent exponential times. Probability and Mathematical Statistics , 26(2):315–366, 2006.
- 8BGT [89] N. H. Bingham, C. M. Goldie, and J. L. Teugels. Regular variation , volume 27. Cambridge university press, 1989.
