A unified construction for series representations and finite   approximations of completely random measures

Juho Lee; Xenia Miscouridou; Fran\c{c}ois Caron

arXiv:1905.10733·math.ST·February 6, 2025

A unified construction for series representations and finite approximations of completely random measures

Juho Lee, Xenia Miscouridou, Fran\c{c}ois Caron

PDF

Open Access

TL;DR

This paper introduces a unified framework for deriving series representations and finite approximations of completely random measures, enhancing scalability and simulation in Bayesian nonparametrics.

Contribution

It extends existing constructions to include new series representations for important CRMs like the generalized gamma and stable beta processes.

Findings

01

Includes known and novel series representations for CRMs.

02

Provides analysis of truncation errors in approximations.

03

Enables scalable inference in complex Bayesian models.

Abstract

Infinite-activity completely random measures (CRMs) have become important building blocks of complex Bayesian nonparametric models. They have been successfully used in various applications such as clustering, density estimation, latent feature models, survival analysis or network science. Popular infinite-activity CRMs include the (generalized) gamma process and the (stable) beta process. However, except in some specific cases, exact simulation or scalable inference with these models is challenging and finite-dimensional approximations are often considered. In this work, we propose a general and unified framework to derive both series representations and finite-dimensional approximations of CRMs. Our framework can be seen as an extension of constructions based on size-biased sampling of Poisson point process [Perman1992]. It includes as special cases several known series representations…

Tables1

Table 1. Table 1: Kernels, Mellin transforms and asymptotic constants for different arrival time distributions

Name	$Λ_{w} (t)$	$k (x)$	$\overset{ˇ}{k} (- z)$	$k^{'} (x)$	$- \overset{ˇ}{k^{'}} (- z)$	$C_{1} (σ)$
Deterministic	$𝟙_{{t \geq 1 / w}}$	$𝟙_{{x \leq 1}}$	$z^{- 1}$	–	–	${(1 - σ)}^{- 1}$
Exponential	$1 - e^{- w t}$	$e^{- x}$	$Γ (z)$	$- e^{- x}$	$Γ (z)$	$Γ {(1 - σ)}^{1 / σ}$
Gamma	$1 - \frac{Γ (κ, κ w t)}{Γ (κ)}$	$\frac{Γ (κ, x κ)}{Γ (κ)}$	$\frac{Γ (z + κ)}{z Γ (κ) κ^{z}}$	$\frac{- κ^{κ} x^{κ - 1} e^{- κ x}}{Γ (κ)}$	$\frac{κ^{1 - z} Γ (κ + z - 1)}{Γ (κ)}$	$\frac{(κ - σ) Γ {(κ - σ)}^{1 / σ}}{(1 - σ) Γ {(κ)}^{1 / σ}}$
Inv. gamma	$\frac{Γ (κ, κ / (w t))}{Γ (κ)}$	$\frac{γ (κ, κ / x)}{Γ (κ)}$	$\frac{Γ (κ - z)}{z Γ (κ) κ^{- z}}$	$- \frac{κ^{κ} e^{- κ / x}}{x^{κ + 1} Γ (κ)}$	$\frac{κ^{z - 1} Γ (- z + κ + 1)}{Γ (κ)}$	$\frac{Γ {(κ + σ)}^{1 / σ}}{(1 - σ) (κ + σ - 1) Γ {(κ)}^{1 / σ}}$
Gen. Pareto	$1 - {(1 + w t)}^{- c}$	${(1 + x)}^{- c}$	$B (z, c - z)$	$- c {(1 + x)}^{- c - 1}$	$c B (z, c + 1 - z)$	$\frac{B (1 - σ, c + σ - 1)}{{(c B (1 - σ, c + σ))}^{1 - 1 / σ}}$

Equations345

G = i = 1 \sum \infty W_{i} δ_{θ_{i}}, where W_{i} = j = 1 \prod i β_{j}, β_{i} \sim i.i.d. Beta (α, 1), θ_{i} \sim i.i.d. H, i = 1, 2 \dots, .

G = i = 1 \sum \infty W_{i} δ_{θ_{i}}, where W_{i} = j = 1 \prod i β_{j}, β_{i} \sim i.i.d. Beta (α, 1), θ_{i} \sim i.i.d. H, i = 1, 2 \dots, .

G_{n} = i = 1 \sum n W_{n, i} δ_{θ_{n, i}}, where W_{n, i} \sim i.i.d. Beta (α / n, 1), θ_{n, i} \sim i.i.d. H, i = 1, \dots, n .

G_{n} = i = 1 \sum n W_{n, i} δ_{θ_{n, i}}, where W_{n, i} \sim i.i.d. Beta (α / n, 1), θ_{n, i} \sim i.i.d. H, i = 1, \dots, n .

\int_{0}^{\infty} (1 - e^{- w}) ρ (d w) < \infty and \int_{0}^{\infty} ρ (d w) = \infty

\int_{0}^{\infty} (1 - e^{- w}) ρ (d w) < \infty and \int_{0}^{\infty} ρ (d w) = \infty

G = \int_{0}^{\infty} w N (d w, d θ)

G = \int_{0}^{\infty} w N (d w, d θ)

ρ (d w) = \frac{α}{Γ ( 1 - σ )} w^{- 1 - σ} e^{- τ w} d w

ρ (d w) = \frac{α}{Γ ( 1 - σ )} w^{- 1 - σ} e^{- τ w} d w

ρ (d w) = \frac{α}{B ( 1 - σ , c + σ )} w^{- σ - 1} (1 - w)^{c + σ - 1} \mathds 1_{{0 < w < 1}} d w,

ρ (d w) = \frac{α}{B ( 1 - σ , c + σ )} w^{- σ - 1} (1 - w)^{c + σ - 1} \mathds 1_{{0 < w < 1}} d w,

G = i = 1 \sum \infty W_{i} δ_{θ_{i}}

G = i = 1 \sum \infty W_{i} δ_{θ_{i}}

G_{n}

G_{n}

\overline{ρ} (x) = \int_{x}^{\infty} ρ (d w)

\overline{ρ} (x) = \int_{x}^{\infty} ρ (d w)

W_{i} = \overline{ρ}^{- 1} (ξ_{i}) .

W_{i} = \overline{ρ}^{- 1} (ξ_{i}) .

P (\overline{W}_{n, i} \in d w ∣ T_{n + 1} = t_{n + 1}) = \frac{( 1 - e ^{- w t_{n + 1}} ) ρ ( d w )}{Ψ ( t _{n + 1} )} .

P (\overline{W}_{n, i} \in d w ∣ T_{n + 1} = t_{n + 1}) = \frac{( 1 - e ^{- w t_{n + 1}} ) ρ ( d w )}{Ψ ( t _{n + 1} )} .

Pr (0 < T_{1} \leq T_{2} \leq T_{3} \leq \dots ∣ G) = k = 1 \prod \infty \frac{W _{k}}{\sum _{j \geq k} W _{j}} .

Pr (0 < T_{1} \leq T_{2} \leq T_{3} \leq \dots ∣ G) = k = 1 \prod \infty \frac{W _{k}}{\sum _{j \geq k} W _{j}} .

\int_{0}^{\infty} Λ_{w} (t) ρ (d w) < \infty, Λ_{w_{2}} (t) \leq Λ_{w_{1}} (t) for all 0 < w_{2} \leq w_{1} .

\int_{0}^{\infty} Λ_{w} (t) ρ (d w) < \infty, Λ_{w_{2}} (t) \leq Λ_{w_{1}} (t) for all 0 < w_{2} \leq w_{1} .

ϕ_{t} (d w) = \frac{λ _{w} ( t ) ρ ( d w )}{ψ ( t )} and φ_{t} (d w) = \frac{Λ _{w} ( t ) ρ ( d w )}{Ψ ( t )}

ϕ_{t} (d w) = \frac{λ _{w} ( t ) ρ ( d w )}{ψ ( t )} and φ_{t} (d w) = \frac{Λ _{w} ( t ) ρ ( d w )}{Ψ ( t )}

Ψ (t) = \int_{0}^{\infty} Λ_{w} (t) ρ (d w) and ψ (t) = \int_{0}^{\infty} λ_{w} (t) ρ (d w) .

Ψ (t) = \int_{0}^{\infty} Λ_{w} (t) ρ (d w) and ψ (t) = \int_{0}^{\infty} λ_{w} (t) ρ (d w) .

T_{i} = Ψ^{- 1} (ξ_{i}), W_{i} ∣ T_{i} = t_{i} \sim ϕ_{t_{i}} and θ_{i} ∣ W_{i} = w_{i} \sim μ_{w_{i}} .

T_{i} = Ψ^{- 1} (ξ_{i}), W_{i} ∣ T_{i} = t_{i} \sim ϕ_{t_{i}} and θ_{i} ∣ W_{i} = w_{i} \sim μ_{w_{i}} .

T_{n + 1}

T_{n + 1}

G_{n} = i = 1 \sum n W_{n, i} δ_{θ_{n, i}} where W_{n, i} \sim i.i.d. φ_{n} and

G_{n} = i = 1 \sum n W_{n, i} δ_{θ_{n, i}} where W_{n, i} \sim i.i.d. φ_{n} and

φ_{n} (d w) = φ_{Ψ^{- 1} (n)} (d w)

Λ_{w} (t)

Λ_{w} (t)

ϕ_{t} (d w)

W_{i} ∣ ξ_{i} \sim Gamma (1 - σ, (σ ξ_{i} / α + τ^{σ})^{\frac{1}{σ}}) .

W_{i} ∣ ξ_{i} \sim Gamma (1 - σ, (σ ξ_{i} / α + τ^{σ})^{\frac{1}{σ}}) .

φ_{t} (d w) = \frac{σ w ^{- 1 - σ} e ^{- τ w} ( 1 - e ^{- tw} )}{Γ ( 1 - σ ) (( t + τ ) ^{σ} - τ ^{σ} ))} .

φ_{t} (d w) = \frac{σ w ^{- 1 - σ} e ^{- τ w} ( 1 - e ^{- tw} )}{Γ ( 1 - σ ) (( t + τ ) ^{σ} - τ ^{σ} ))} .

λ_{w} (d t) = \frac{t ^{κ - 1} e ^{- κ w t} ( κ w ) ^{κ}}{Γ ( κ )}, Λ_{w} (t) = \frac{γ ( κ , κ w t )}{Γ ( κ )} .

λ_{w} (d t) = \frac{t ^{κ - 1} e ^{- κ w t} ( κ w ) ^{κ}}{Γ ( κ )}, Λ_{w} (t) = \frac{γ ( κ , κ w t )}{Γ ( κ )} .

\displaystyle\psi(t)=\eta\frac{t^{\kappa-1}}{(t+\tau/\kappa)^{\kappa-\sigma}},\quad\Psi(t)=\left\{\begin{array}[]{ll}\eta\left(\frac{\tau}{\kappa}\right)^{\sigma}B_{\frac{\kappa t}{\kappa t+\tau}}(\kappa,-\sigma)&\text{if }\tau>0\\ \frac{\eta}{\sigma}t^{\sigma}&\text{if }\tau=0\end{array}\right.

\displaystyle\psi(t)=\eta\frac{t^{\kappa-1}}{(t+\tau/\kappa)^{\kappa-\sigma}},\quad\Psi(t)=\left\{\begin{array}[]{ll}\eta\left(\frac{\tau}{\kappa}\right)^{\sigma}B_{\frac{\kappa t}{\kappa t+\tau}}(\kappa,-\sigma)&\text{if }\tau>0\\ \frac{\eta}{\sigma}t^{\sigma}&\text{if }\tau=0\end{array}\right.

ϕ_{t} (d w) = Gamma (w; κ - σ, κ t + τ) d w, φ_{t} (d w) \propto w^{- σ - 1} e^{- τ w} γ (κ, κ tw) d w .

ϕ_{t} (d w) = Gamma (w; κ - σ, κ t + τ) d w, φ_{t} (d w) \propto w^{- σ - 1} e^{- τ w} γ (κ, κ tw) d w .

ϕ_{t} (d w)

ϕ_{t} (d w)

φ_{t} (d w)

R_{n} = N (f) - N_{n} (f) = i > n \sum f (W_{i}, θ_{i}) .

R_{n} = N (f) - N_{n} (f) = i > n \sum f (W_{i}, θ_{i}) .

\mathbb{E}[e^{-\lambda R_{n}}]=\mathbb{E}_{\xi}\bigg{[}\exp\bigg{(}-\int_{S}(1-e^{-\lambda f(w,\theta)})(1-\Lambda_{w}(\Psi^{-1}(\xi)))\rho(dw)\mu_{w}(dw)\bigg{)}\bigg{]}.

\mathbb{E}[e^{-\lambda R_{n}}]=\mathbb{E}_{\xi}\bigg{[}\exp\bigg{(}-\int_{S}(1-e^{-\lambda f(w,\theta)})(1-\Lambda_{w}(\Psi^{-1}(\xi)))\rho(dw)\mu_{w}(dw)\bigg{)}\bigg{]}.

E [R_{n} ∣ T_{n + 1}] = \int_{0}^{\infty} w (1 - Λ_{w} (T_{n + 1})) ρ (d w), V [R_{n} ∣ T_{n + 1}] = \int_{0}^{\infty} w^{2} (1 - Λ_{w} (T_{n + 1})) ρ (d w) .

E [R_{n} ∣ T_{n + 1}] = \int_{0}^{\infty} w (1 - Λ_{w} (T_{n + 1})) ρ (d w), V [R_{n} ∣ T_{n + 1}] = \int_{0}^{\infty} w^{2} (1 - Λ_{w} (T_{n + 1})) ρ (d w) .

ρ (w) \sim ζ_{0} w^{- 1 - σ} as w \to 0

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Statistical Methods and Bayesian Inference

Full text

A unified construction for series representations and finite approximations of completely random measures

Juho Lee Corresponding author, [email protected] AITRICS, Seoul, South Korea

Xenia Miscouridou

Department of Statistics, University of Oxford, Oxford, United Kingdom

François Caron

Department of Statistics, University of Oxford, Oxford, United Kingdom

Abstract

Infinite-activity completely random measures (CRMs) have become important building blocks of complex Bayesian nonparametric models. They have been successfully used in various applications such as clustering, density estimation, latent feature models, survival analysis or network science. Popular infinite-activity CRMs include the (generalized) gamma process and the (stable) beta process. However, except in some specific cases, exact simulation or scalable inference with these models is challenging and finite-dimensional approximations are often considered. In this work, we propose a general and unified framework to derive both series representations and finite-dimensional approximations of CRMs. Our framework can be seen as an extension of constructions based on size-biased sampling of Poisson point process [46]. It includes as special cases several known series representations as well as novel ones. In particular, we show that one can get novel series representations for the generalized gamma process and the stable beta process. We also provide some analysis of the truncation error.

1 Introduction

Infinite-activity completely random measures (CRMs), and more generally functionals of infinite-activity Poisson random measures arise as building blocks of numerous modern structured statistical models. Examples include clustering and density estimation [49, 32, 33], spatial statistics [13, 56, 41], latent factor/trait models [24, 44, 59, 16, 3], network modeling [14, 60, 15, 17], recommendation systems [26], prediction, risk management and option pricing of financial assets [19] or survival analysis [28, 42]; see [40] for a review. Popular CRMs include the (generalized) gamma random measure (also known as tempered stable) [30, 13] or the (stable) beta random measure [28, 54]. Other popular random probability measures such as the Dirichlet process or the Pitman-Yor process are obtained by normalization or transformation of CRMs.

The use of statistical models based on infinite-activity CRMs poses a number of practical challenges regarding posterior inference and estimation. Except in some specific cases, most algorithms, either based on Gibbs sampling [31, 60], slice sampling [27, 23], mean-field variational inference [9, 21, 39] or sequential Monte Carlo [11],[4, Section 3.2.], require the use of a finite-dimensional approximation of the CRM. Finite-dimensional approximations can either be obtained by (i) truncating a series representation of the CRM, with stochastically decreasing weights, or (ii) by considering a finite measure with $n$ atoms and iid weights, converging in distribution to the CRM as $n$ tends to infinity. For example, for the beta process with scale parameter $\alpha>0$ and probability distribution $H$ , the inverse Lévy series representation is [55]

[TABLE]

A classical finite-dimensional approximation with iid weights is [24]

[TABLE]

Both representations are routinely used in Markov chain Monte Carlo and variational Bayes approximate inference algorithms [21, 45, 43]. Series and iid approximations are similarly used for the gamma process [60, 48] and the generalized gamma process [39]. Since the early work of [35] and [22] on the so-called inverse Lévy representation, various generic series representations of Poisson random measures have been proposed [12, 46, 50, 52]. Nested series representations have also been recently proposed for some specific CRMs [45, 43, 48]. Finite iid representations can be obtained using the infinite divisibility properties of the CRM [37] but as noted by [39], it generally does not lead to tractable representations, except in the gamma process case. Other ways of obtaining iid constructions are described in [29] for some family of CRMs. [18] provided a recent survey of the existing series representations and approximations as well as a truncation analysis.

The objective of this article is to present a general framework to obtain both series and iid representations of CRMs. Our construction builds on the definition of a Poisson random measure on an extended space; it generalizes the size-biased approach of [46], and admits as special cases both the size-biased and inverse-Lévy representations. We show that under this construction, one can draw connections between existing series and iid representations that appeared unrelated, and it allows to derive new series and iid representations. More precisely, we show how the iid representation of [39] is related to the size-biased construction of [46], derive novel series and iid representations of the generalized gamma and stable beta random measures. We also provide an asymptotic analysis of the truncation error for this class of approximations.

This article is organised as follows. In Section 2 we provide background material on completely random measures and some existing series representations for CRMs and describe the objectives. Section 3 describes the general construction for obtaining series and iid representations of CRMs. In Section 4 we describe a number of specific constructions, showing how one recovers some existing constructions as a particular case of our framework. In Section 5 we provide an analysis of the asymptotic truncation error, and discuss related approaches in Section 6. The proofs and additional background material are provided in the appendix.

Notations. For a measure $\nu$ on $S$ and a positive measurable function $h$ on $S$ , write $\nu(h)=\int_{S}h(x)\nu(dx)$ . Let $(\xi_{1},\xi_{2},\ldots,)$ be the ordered points of a unit-rate Poisson point process on $(0,\infty)$ , that is $\xi_{1},\xi_{2}-\xi_{1},\xi_{3}-\xi_{2},\ldots$ are iid unit-rate exponential random variables. With a slight abuse of notation, we use the same notation for the distribution of a random variable and its pdf. For instance, the probability density function (pdf) of a gamma random variable $X\sim\operatorname{Gamma}(a,b)$ is written as $\operatorname{Gamma}(x;a,b)$ .

2 Background

2.1 Completely random measures

Let $(S,\mathcal{S})$ be a measurable space where $S=(0,\infty)\times\Theta$ . For any point $X=(W,\theta)\in S$ , we refer to $W>0$ as the size of $X$ . Let $N$ be a Poisson random measure on $S$ with mean measure $\nu(dw,d\theta)=\rho(dw)\mu_{w}(d\theta)$ where $\rho$ is a Borel measure on $(0,\infty)$ , called size measure, satisfying

[TABLE]

and $\mu_{w}(\cdot)$ is a Markov probability kernel from $(0,\infty)$ to $\Theta$ . The linear functional

[TABLE]

is an infinite-activity completely random measure [36] on $\Theta$ with random weights and random atoms. We write $G\sim\operatorname*{CRM}(\rho,\mu_{w})$ . The conditions (3) imply that the atomic random measures $N$ and $W$ have an infinite number of atoms, and $W(\Theta)$ is almost surely finite. If $\mu_{w}=H$ does not depend on $w$ , the CRM is said to be homogeneous. We assume in the rest of this article that one can easily simulate from $\mu_{w}$ (or $H$ ) and/or it admits a tractable density with respect to some reference measure (e.g. Lebesgue). Two popular examples of CRMs are the generalized gamma process (GGP) [30, 13], also known as (exponentially) tilted stable process, with size measure

[TABLE]

where $\alpha>0$ , $\sigma\in(0,1)$ and $\tau\geq 0$ , or $\sigma\leq 0$ and $\tau>0$ , and the stable beta process (SBP) [28, 54] with

[TABLE]

where $\sigma\in(-\infty,1)$ , $c>-\sigma$ , $\alpha>0$ and $B(\cdot,\cdot)$ is the beta function. When $\sigma\geq 0$ , both random measures are infinite-activity.

*Remark 2.1**.*

The constructions described in this paper hold more generally when the first condition in Equation (3) is not satisfied, but $\int_{x}^{\infty}\rho(dw)<\infty$ for all $x$ . Note that in this case $W(\Theta)=\infty$ almost surely. An example of this more general case is given in Section 4.4 where $\rho(dw)=w^{-2}dw$ .

2.2 Objective

Our objective is to derive general series representations for the Poisson random measure $N$ , or equivalently the CRM $G$ , of the form

[TABLE]

where the sizes $(W_{1},W_{2},\ldots)$ , are stochastically ordered. That is, for any $w>0$ , $\Pr(W_{i+1}>w)\leq\Pr(W_{i}>w).$ We write $W_{1}\succeq W_{2}\succeq\ldots$ and $X_{i}=(W_{i},\theta_{i})$ . Denote $G_{n}$ the measure obtained by truncating the above series after $n$ points

[TABLE]

where $(\overline{X}_{n,1},\ldots,\overline{X}_{n,n})$ is a finitely exchangeable random sequence defined by $\overline{X}_{n,i}=X_{\pi_{n}(i)}$ where $\pi_{n}$ is a random permutation of the set $\{1,\ldots,n\}$ . We will refer to the sequence $(X_{1},\ldots,X_{n})$ (or $(W_{1},\ldots,W_{n})$ ) as the sequential truncated representation, and $(\overline{X}_{n,1},\ldots,\overline{X}_{n,n})$ (or $(\overline{W}_{n,1},\ldots,\overline{W}_{n,n})$ ) as the exchangeable truncated representation. In Section 3 we will show that the exchangeable truncated representation can be approximated by a finite iid representation, which will be denoted $(\widetilde{X}_{n,1},\ldots,\widetilde{X}_{n,n})$ .

In the rest of this paper, we will assume that the mean measure $\rho$ is available and one can sample from the conditional distribution $\mu_{w}$ (or $H$ in the homogeneous case). Under these conditions, we can obtain the representations (7) by first sampling $(W_{1},\ldots,W_{n})$ (or $(\overline{W}_{n,1},\ldots,\overline{W}_{n,n})$ ), then conditionally sample $(X_{1},\ldots,X_{n})$ (or $(\overline{X}_{n,1},\ldots,\overline{X}_{n,n})$ ) from $\mu_{w}$ (or $H$ ).

2.3 Existing representations of CRMs

Inverse-Lévy representation. For any $x>0$ , let

[TABLE]

be the tail intensity of the size measure $\rho$ , and denote by $\overline{\rho}^{-1}(y)=\inf\{x>0\mid\overline{\rho}(x)\leq y\}$ its generalized inverse. The inverse Lévy representation [35, 22] is given by

[TABLE]

In this case, the sizes are ordered $W_{1}\geq W_{2}\geq\ldots$ and it therefore leads to the best possible approximation in terms of the sizes. While this representation has been used in many applications [57, 42, 27, 10, 1, 6] its main limitation is that $\rho^{-1}$ is in general non-tractable. Two exceptions are the beta random measure, whose inverse Lévy representation is given by Equation (1), and the stable random measure (corresponding to the measure (4) with $\sigma\in(0,1)$ and $\tau=0$ ) where the inverse Lévy representation is given by $W_{i}=\left(\frac{\xi_{i}\sigma\Gamma(1-\sigma)}{\alpha}\right)^{-1/\sigma}.$

Size-biased representation. The size-biased sequential and exchangeable representations $(W_{1},\ldots,W_{n})$ and $(\overline{W}_{n,1},\ldots,\overline{W}_{n,n})$ , introduced by [46, Section 4], are given as follows111Note that this is different from what [18] call a size-biased representation.. Let $0<T_{1}\leq T_{2}\leq\ldots$ be defined as $T_{i}=\Psi^{-1}(\xi_{i})$ where $\Psi^{-1}$ is the generalized inverse of the Laplace exponent $\Psi(t)=\int_{0}^{\infty}(1-e^{-wt})\rho(dw)$ and $\mathbb{P}(W_{i}\in dw\mid T_{i}=t)=\frac{we^{-wt}\rho(dw)}{\psi(t)}$ where $\psi(t)=\Psi^{\prime}(t)=\int_{0}^{\infty}we^{-wt}\rho(dw).$ Additionally, given $T_{n+1}=t_{n+1}$ , we have $\overline{W}_{n,1},\ldots,\overline{W}_{n,n}$ are iid with distribution

[TABLE]

The term size-biased comes from the fact that the atoms are ordered by successively sampling without replacement according to their size $W$

[TABLE]

In the case of the gamma random measure, which corresponds to Equation (4) with $\sigma=0$ , [46] show that the series representation corresponds to [12]’s representation and is given by $T_{i}=\Psi^{-1}(\xi_{i})=\tau(e^{\xi_{i}/\alpha}-1)$ and $W_{i}\mid T_{i}=t_{i}\sim\operatorname{Gamma}(1,\tau+t_{i})$ .

3 Series representations and finite approximations of CRMs

3.1 Arrival-time augmentation

Let $\lambda_{w}(dt)$ be some Markov probability kernel from $(0,\infty)$ to $(0,\infty)$ with cdf $\Lambda_{w}(t)=\int_{0}^{t}\lambda_{w}(du)$ satisfying, for any $t>0$

[TABLE]

That is, if $T_{1}\sim\lambda_{w_{1}}$ and $T_{2}\sim\lambda_{w_{2}}$ , with $0<w_{2}\leq w_{1}$ , then $T_{2}\succeq T_{1}$ .

We consider a Poisson random measure $N^{\prime}$ on the augmented space $S^{\prime}=(0,\infty)\times\Theta\times(0,\infty)$ with mean measure $\nu^{\prime}(dw,d\theta,dt)=\rho(dw)\mu_{w}(d\theta)\lambda_{w}(dt)$ . For a point $X^{\prime}=(W,\theta,T)\in S^{\prime}$ , we refer to $T$ as the arrival time of the point $X^{\prime}$ . Indeed, the second condition in Equation (9) ensures that points with larger size $W$ are more likely to have a smaller arrival time $T$ . We may therefore consider the following analogy: atoms of the Poisson random measure are enrolled in a race, each atom having a strength $W$ , and stronger atoms are more likely to finish faster and therefore have a smaller $T$ . The first condition in Equation (9) ensures that $N^{\prime}(\mathbb{R}_{+},\Theta,(0,t))<\infty$ for any $t>0$ hence we can order the arrival times. Let $0<T_{1}\leq T_{2}\leq\ldots$ denote the sequence of ordered arrival times, and consider the augmented sequential representation $N^{\prime}=\sum_{i=1}^{\infty}\delta_{(W_{i},\theta_{i},T_{i})}$ where $X_{i}=(W_{i},\theta_{i})$ , $i\geq 1$ are the associated sizes and locations. By the restriction theorem [38], $N=\sum_{i=1}^{\infty}\delta_{(W_{i},\theta_{i})}$ is a Poisson random measure with mean $\rho(dw)\mu_{w}(d\theta)$ and $W=\int_{0}^{\infty}wN(dw,d\theta)\sim\operatorname*{CRM}(\rho,\mu_{w})$ . We now give the general definitions of the sequential, exchangeable and iid representations of the CRM associated to the arrival time kernel $\lambda_{w}$ . For simplicity of presentation, we assume that for any $w$ , $\lambda_{w}$ is absolutely continuous with respect to the Lebesgue measure with $\lambda_{w}(dt)=\lambda_{w}(t)dt$ , but one can also consider discontinuous cdfs $\Lambda_{w}$ , see Section 4.1 for an example.

3.2 Series and truncated exchangeable constructions

Theorem 3.1.

Let $\lambda_{w}$ be a parametric distribution on $(0,\infty)$ with parameter $w>0$ and $\Lambda_{w}$ be the associated parametric cumulative density function (cdf) satisfying condition (9). Consider the conditional distributions

[TABLE]

where

[TABLE]

The sequential construction $G=\sum_{i=1}^{\infty}W_{i}\delta_{\theta_{i}}$ is obtained as follows, for $i\geq 1$

[TABLE]

The truncated exchangeable construction $G_{n}=\sum_{i=1}^{n}\overline{W}_{n,i}\delta_{\overline{\theta}_{n,i}}$ is obtained, for $i=1,\ldots,n$ by

[TABLE]

3.3 Finite iid construction

Note that ${\xi_{n+1}}/{n}$ tends to 1 almost surely as $n$ tends to infinity. This therefore suggests the following finite iid construction, as an approximation to the truncated measure $G_{n}$

[TABLE]

Proposition 3.1.

Let $\widetilde{G}_{n}$ be the finite iid approximation defined by Equation (13). Then $\widetilde{G}_{n}$ converges in distribution to $G\sim\operatorname*{CRM}(\rho,\mu_{w})$ as $n\rightarrow\infty$ .

For the iid construction, one needs to evaluate $\Psi^{-1}(n)$ only once, and this can be done numerically if there is an analytic form for $\Psi$ . Instead of the distribution $\widetilde{\varphi}_{n}=\varphi_{\Psi^{-1}(n)}$ , we can alternatively use more general distributions $\widetilde{\varphi}_{n}=\varphi_{f(n)}$ where $f$ is an increasing function such that $\Psi(f(n))\sim n\text{ as }n\to\infty$ . 3.1 also holds as the proof can be straightforwardly adapted to this case. Note that if $\Psi(t)\sim ct^{\sigma}$ as $t$ tends to infinity for some constant $c$ and $\sigma>0$ , then we can take $f(n)=(n/c)^{1/\sigma}$ . B.2 gives examples of admissible functions $f$ under generic assumptions on $\rho$ and $\Lambda_{w}$ .

4 Examples

We first show how the inverse Lévy and size-biased constructions described in Section 2.3 can be recovered as special cases of the general construction introduced in Section 3. We then derive novel constructions within this framework.

4.1 Deterministic arrival times (inverse-Lévy construction)

Assume that the arrival times are deterministic given the size $W=w$ , and inversely proportional to it, that is $\lambda_{w}(dt)=\delta_{1/w}(dt).$ The distribution does not admit a density with respect to the Lebesgue measure, but one can still obtain expressions for the different quantities of interest. We obtain

[TABLE]

The sequential construction corresponds to the inverse-Lévy construction described in Section 2.3. The exchangeable representation is similar to the $\epsilon$ -truncation of normalized CRMs, used in [1, 2], except that the truncation threshold $\epsilon=1/T_{n+1}$ is treated as a random variable here.

4.2 Exponential arrival times (size-biased construction)

Consider an exponential arrival time distribution with $\lambda_{w}(dt)=we^{-wt}dt$ and $\Lambda_{w}(t)=1-e^{-wt}$ . This leads to [46]’s size-biased sequential and exchangeable representations described in Section 2.3. While this construction is not novel, it appears that it provides a novel series representation for the generalized gamma random measure. We also show that the iid representation associated to this arrival time distribution corresponds to the finite approximation proposed by [39].

Generalized gamma process.

In the case of the size measure (4) with $\alpha>0,\sigma\in(0,1)$ and $\tau\geq 0$ , we obtain the following sequential construction for the GGP, which appears to be novel

[TABLE]

In Section E.1, we compare this representation with Rosinski’s series representation for the GGP [51, 53]. The conditional distribution for the exchangeable and iid constructions is given by

[TABLE]

The random variable having this density is called the exponentially-tilted BFRY distribution [7, 20, 39], and written as $\mathrm{etBFRY}(\sigma,t,\tau)$ . One can easily simulate from Equation (16), see Section D.2. Note that $\Psi(t)\sim\alpha t^{\sigma}/\sigma$ as $t\to\infty$ hence we can consider the iid distribution $\widetilde{\varphi}_{n}(dw)=\varphi_{(n\sigma/\alpha)^{1/\sigma}}(dw)$ . This corresponds precisely to the finite-dimensional approximation introduced by [39] for the GGP, which can therefore be seen as a particular case of our approach.

4.3 Gamma arrival times

As a generalization of the exponential arrival times, consider now a gamma arrival distribution

[TABLE]

where $\kappa\geq 1$ is a tuning parameter and $\gamma(\kappa,t)=\int_{0}^{t}x^{\kappa-1}e^{-x}dx$ is the lower incomplete gamma function. Since $\mathbb{E}[T|w]\to 1/w$ and $\mathrm{Var}(T|w)\to 0$ as $\kappa\to\infty$ , $T$ converges in probability hence in distribution to $1/w$ , and therefore $\Lambda_{w}(t)\rightarrow\mathds{1}_{\{t\geq 1/w\}}$ as $\kappa$ tends to infinity, which corresponds to the arrival time cdf of the inverse Lévy representation. Hence, the construction based on the gamma arrival times bridges between the size-biased ( $\kappa=1$ ) and inverse-Lévy ( $\kappa\rightarrow\infty$ ) constructions.

Generalized gamma process. Consider the generalized gamma process with size measure (4) and parameters $\alpha>0$ , $\sigma\in(0,1)$ and $\tau\geq 0$ . We have

[TABLE]

where $\eta=\frac{\alpha\kappa^{\sigma}\Gamma(\kappa-\sigma)}{\Gamma(\kappa)\Gamma(1-\sigma)}$ . For the sequential and exchangeable constructions, we get

[TABLE]

For the iid construction, we can use Eq. 14 and estimate $\Psi^{-1}(n)$ numerically or, using B.2 and Table 1, we can alternatively use $\widetilde{\varphi}_{n}(dw)=\varphi_{(\sigma n/\eta)^{1/\sigma}}(dw)$ . The normalizing constant of $\varphi_{t}$ (and therefore $\widetilde{\varphi}_{n}$ ) has an analytic expression via standard functions. We call the random variable having distribution $\varphi_{t}$ a exponentially-tilted generalized BFRY random variable, due to the form of the pdf obtained by exponentially tilting the pdf of generalized BFRY. This distribution has a number of remarkable properties that make it amenable for tractable simulation and posterior inference. Refer to Section D.4 for a more detailed description.

4.4 Inverse gamma arrival times

Consider now an inverse gamma arrival distribution $\lambda_{w}(dt)=\operatorname{iGamma}(t;\kappa,\kappa/w)dt$ where $\operatorname{iGamma}(t;a,b)$ is the pdf of an inverse gamma random variable and $\kappa\geq 1$ is a tuning parameter. By a similar argument as for the gamma arrival times, we have $\Lambda_{w}(t)\rightarrow\mathds{1}_{\{t\geq 1/w\}}$ as $\kappa\to\infty$ hence it also admits the inverse Lévy construction as a limiting case. The case $\kappa=1$ is of particular interest, as it leads to a tractable novel representation for the GGP (see Section E.3), and provide a novel way of interpreting the classical iid approximation of the beta process.

Beta process. Consider the one-parameter beta process with size measure (5) with $\sigma=0$ and $c=1$ . The bijective transformation $u=-(\alpha\log(w))^{-1}$ gives the measure $\rho(du)=u^{-2}du$ on $(0,\infty)$ . Note that $\rho(du)$ is not a Lévy measure, but we can nonetheless use our construction as the tail Lévy intensity is finite. Using the inverse gamma kernel with $\kappa=1$ , we obtain $\Psi(t)=t$ and the iid distribution $\widetilde{\phi}_{n}(du)=\operatorname{iGamma}(u;1,1/n)dt$ . Applying the inverse transformation $\widetilde{W}_{i}=e^{-1/(\alpha\widetilde{U}_{i})}$ , we obtain $\widetilde{W}_{i}\sim\operatorname{Beta}(\alpha/n,1)$ , which corresponds to the classical iid approximation for the beta process, described in Equation (2). The iid construction for the beta process can alternatively be recovered using the arrival time distribution $\Lambda_{w}(t)=w^{\frac{\alpha}{t}}$ directly with (5), without change of variable.

4.5 Generalized Pareto arrival time

Consider the arrival time distribution $\lambda_{w}(dt)=\frac{cw}{(tw+1)^{c+1}}dt$ where $c>0$ .

Stable beta process. Consider the stable beta process with measure (5) with $\sigma>0$ . We have $\Psi(t)=\frac{\alpha c}{\sigma}((t+1)^{\sigma}-1)$ and

[TABLE]

These distributions admit the same conjugacy properties as the beta distribution, and one can sample exactly from these distributions as detailed in Section E.4.

5 Truncation error analysis

5.1 Error on functionals of the CRM

For a measurable function $f:S\rightarrow\mathbb{R}_{+}$ such that $N(f)<\infty$ a.s., the error term associated with the truncation is defined as

[TABLE]

Taking for example $f(w,\theta)=w$ corresponds to the $L_{1}$ error between the CRM $W$ and $W_{n}$ .

Proposition 5.1.

For $\xi\sim\operatorname{Gamma}(n+1,1)$ , $R_{n}$ has the following moment generating function

[TABLE]

We now consider results for the special case $f(w,\theta)=w$ . The mean and variance of the truncation error $R_{n}$ given $T_{n+1}$ are given by

[TABLE]

The next proposition provides an asymptotic expression for the error term, giving insights on how the error relates to the choice of the arrival time distribution $\lambda_{w}(t)$ . The proposition makes some assumptions of regular variation on the mean measure $\rho$ . Background on regular variation and Mellin transforms is given in Appendix A and the proof of 5.2 is given in Section B.4.

Proposition 5.2.

Assume that the mean measure $\rho(dw)$ is absolutely continuous with respect to the Lebesgue measure with density function $\rho(w)$ such that

[TABLE]

where $\sigma\in(0,1)$ and $\zeta_{0}>0$ . Assume additionally that $\Lambda_{w}(t)=1-k(wt)$ where $k$ is a positive function on $(0,\infty)$ such that its Mellin transform (see A.3) $\check{k}$ converges in some open interval containing $[\sigma-2,\sigma-1]$ . Assume additionally that either (i) $k$ is differentiable with derivative $k^{\prime}$ and that the Mellin transform $\check{k}^{\prime}$ of $k^{\prime}$ is defined in some open interval containing $\sigma-1$ , or (ii) that $k(x)=\mathds{1}_{\{x\leq 1\}}$ . Then we have

[TABLE]

*where the constant $C_{1}(\sigma)$ is given by $C_{1}(\sigma)=(1-\sigma)^{-1}$ if $k(x)=\mathds{1}_{\{x\leq 1\}}$ and $C_{1}(\sigma)=\frac{\check{k}(\sigma-1)}{(-\check{k}^{\prime}(\sigma-1))^{1-1/\sigma}}$ if $k$ is differentiable, and only depends on the arrival time distribution $\Lambda_{w}$ and $\sigma$ . *

The deterministic, gamma, inverse-gamma (for $\kappa>2-\sigma$ ) and generalized Pareto (for $c>2-\sigma$ ) arrival time distributions discussed in Section 4 all verify the assumptions of 5.1. The associated kernels, Mellin transforms and constants $C_{1}(\sigma)$ are given in Table 1 in the appendix. Figure 1(a) shows the value of the constant $C_{1}(\sigma)$ for the deterministic and gamma arrival time, with different values of $\kappa$ . As indicated in Section 4.3, the approximation gets closer to the deterministic/inverse Lévy construction as $\kappa$ increases. Both the GGP and the SBP with $\sigma>0$ verify Equation (18), with $\zeta_{0}=\frac{\alpha}{\Gamma(1-\sigma)}$ for the GGP and $\zeta_{0}=\frac{\alpha}{B(1-\sigma,c+\sigma)}$ for the SBP. We run a simulation study in order to investigate the finite- $n$ properties of the proposed approximations. We report in Figure 1(b-c) the mean and variance of $R_{n}$ for gamma arrival times, for the stable process and the GGP with $\sigma=0.4$ . For the stable process, we also compare to the inverse-Lévy approximation, as it has an analytic form. As expected, the approximation gets better as the value $\kappa$ increases. Additional simulations for other arrival time distributions are given in Appendix F.

5.2 $L_{1}$ error on the marginal likelihood

In this section we discuss the $L_{1}$ error on the marginal likelihood when truncated CRMs are used for hierarchical Bayesian models under the framework described in [16]. Let $W\sim\operatorname*{CRM}(\rho,\mu_{w})$ , and $W_{n}$ be its approximation with $n$ atoms. Let $H(\cdot|w)$ be a probability distribution on $\mathbb{N}\cup\{0\}$ for all $w$ , and denote $\pi(w):=H(0|w)$ . Consider a hierarchical Bayesian model for $m$ observations $X_{1:m}:=\{X_{j}\}_{j=1}^{m}$ , $Z_{j}|w_{j}\sim H(w_{j}),\quad X_{j}|z_{j}\sim F(z_{j}),\quad j=1,\dots,m,$ and denote $p_{m,\infty}(X_{1:m})$ the marginal likelihood for this model. Similarly, denote $p_{m,n}(X_{1:m})$ be the marginal likelihood of the model with the same generative process, except for $W_{n}$ instead of $W$ . Following [16], we analyze the quality of approximation by comparing $p_{m,\infty}(X_{1:m})$ and $p_{m,n}(X_{1:m})$ . For the inverse Lévy case, one recovers the bound derived in [16, Theorem D.3.].

Proposition 5.3.

We have the bound $0\leq\frac{1}{2}||p_{m,\infty}(X_{1:m})-p_{m,n}(X_{1:m})||\leq 1-e^{-B_{m,n}}$ , where $B_{m,n}\leq m\int_{0}^{\infty}\int_{0}^{\infty}(1-\pi(w))(1-\Lambda_{w}(\Psi^{-1}(\xi))\rho(dw)\operatorname{Gamma}(\xi;n+1,1)d\xi$ .

6 Discussion

Our series construction can be seen as a special case of Rosinski’s shot-noise series representation [52] (as is the case for most series constructions, see [52]), using the disintegration $\rho(dw)=\int_{0}^{\infty}\nu(\xi,dw)d\xi$ where $\nu(\xi,dw)=\lambda_{w}(\Psi^{-1}(\xi))\rho(dw)/\psi(\Psi^{-1}(\xi))$ is a Markov kernel (noting that $\int_{0}^{\infty}\lambda_{w}(\Psi^{-1}(\xi))/\psi(\Psi^{-1}(\xi)))d\xi=1$ ). [29] proposed alternative ways of deriving iid approximations for some classes of CRMs. The approach does not rely on a latent Poisson construction and is therefore different from the approach considered here. We emphasize that the finite iid construction is useful for both simulation and hierarchical Bayesian modeling in various contexts. Using B.2, one can approximate infinite-dimensional priors with finite-dimensional iid distributions without any numerical inversion. See Appendix G where we discuss an example of our construction applied to normalized GGP mixture models.

Appendix A Background on regular variation and Mellin transforms

This background material comes from the book of [8].

A.1 Definitions

Definition A.1 (Slowly varying function).

A function $\ell:(0,\infty)\to(0,\infty)$ is slowly varying at infinity if for all $c>0$ ,

[TABLE]

Definition A.2 (Regularly varying function).

A function $f:(0,\infty)\to(0,\infty)$ is regularly varying at infinity with exponent $\rho\in\mathbb{R}$ if $f(x)=x^{\rho}\ell(x)$ for some slowly varying $\ell$ . A function $f$ is regularly varying at 0 if $f(1/x)$ is regularly varying at infinity, i.e., $f(x)=x^{-\rho}\ell(1/x)$ for some $\rho\in\mathbb{R}$ and slowly varying $\ell$ .

A.2 Basic theorems for regularly varying functions

Let $U$ be a regularly function with exponent $\rho$ and slowly varying function $\ell$ locally bounded on $(0,\infty)$ .

Theorem A.1 (Karamata’s theorem).

[8, Propositions 1.5.8 and 1.5.10]**. Suppose that $U(t)\sim t^{\rho}\ell(t)$ as $t\to\infty$ .

•

When $\rho>-1$ ,

[TABLE]

•

When $\rho<-1$ ,

[TABLE]

Corollary A.1.

This also holds when $U$ is regularly varying at 0. When $\rho<-1$ and $U(s)\sim s^{\rho}\ell(1/s)$ as $s\to 0$ ,

[TABLE]

A.3 Generalized Abelian theorem

Definition A.3.

Given a measurable kernel $k:(0,\infty)\rightarrow\infty$ let

[TABLE]

be its Mellin transform, for $z\in\mathbb{C}$ such that the integral converges.

*Remark A.1**.*

If $k(x)$ has a Mellin transform $\check{k}(z)$ which converges in $(z_{1},z_{2})$ , then $h(x)=k(1/x)$ has a Mellin transform $\check{h}(z)=k(-z)$ which converges in $(-z_{2},-z_{1})$ .

Theorem A.2.

[8, Theorem 4.1.6 page 201]** Let the Mellin transform $\check{k}$ of $k$ converge at least in the strip $\sigma\leq\operatorname{Re}(z)\leq\tau$ , where $-\infty<\sigma<\tau<\infty$ . Let $\rho\in(\sigma,\tau)$ , $\ell$ a slowly varying function, $c\in\mathbb{R}.$ If $f$ is measurable, $f(x)/x^{\sigma}$ is bounded on every interval $(0,a]$ and

[TABLE]

then

[TABLE]

The next result is a trivial corollary of A.2, considering limits as $x$ tends to [math].

Corollary A.2.

Let the Mellin transform $\check{k}$ of $k$ converge at least in the strip $\tau_{1}\leq\operatorname{Re}(z)\leq\tau_{2}$ , where $-\infty<\tau_{1}<\tau_{2}<\infty$ . Let $\rho\in(\tau_{1},\tau_{2})$ , $\ell$ a slowly varying function, $c\in\mathbb{R}.$ If $f$ is measurable, $f(x)x^{-\tau_{2}}$ is bounded on every interval $[a,\infty)$ and

[TABLE]

then

[TABLE]

Proof.

[TABLE]

where $\widetilde{f}(x)=f(1/x)$ , $\widetilde{f}(x)/x^{-\tau_{2}}$ bounded on every interval $(0,1/a]$ with

[TABLE]

and $\widetilde{k}(x)=k(1/x)$ is such that its Mellin transform converges in the strip $-\tau_{2}\leq\operatorname{Re}(z)\leq-\tau_{1}$ . A.2 above therefore gives the result. ∎

Appendix B Proofs

B.1 Proof of 3.1

The proof is an adaption of the proof for the size-biased construction in [46, Section 4]. The mean measure $\nu^{\prime}$ of the Poisson random measure $N^{\prime}$ can be expressed as

[TABLE]

This is the mean measure of a marked Poisson point process, where $(T_{1},T_{2},\ldots)$ are the points of an inhomogeneous Poisson point process with intensity $\psi(t)$ , hence admit the representation Equation (11), and the marks $(W_{i},\theta_{i})$ have conditional distribution $\phi_{t}(dw)\mu_{w}(d\theta)$ as shown in Equation (11). Let $(\overline{X}_{n,1}^{\prime},\ldots,\overline{X}_{n,n}^{\prime})=(X^{\prime}_{\pi_{1}},\ldots,X^{\prime}_{\pi_{n}})$ where $\pi_{n}$ is a random permutation of $\{1,\ldots,n\}$ , and $\overline{X}_{n,i}^{\prime}=(\overline{T}_{n,i},\overline{W}_{n,i},\overline{\theta}_{n,i})$ . By properties of the Poisson process on the real line, the random variables $\overline{T}_{n,1},\ldots,\overline{T}_{n,n}$ are iid given $T_{n+1}=t_{n+1}$ , with pdf

[TABLE]

Hence, given $T_{n+1}=t_{n+1}$ , the marks $\overline{W}_{n,i}$ and $\overline{\theta}_{n,i}$ are also iid, with conditional distribution $\varphi_{t_{n+1}}(d\overline{w}_{n,i})\mu_{\overline{w}_{n,i}}(d\overline{\theta}_{n,i})$ where

[TABLE]

B.2 Proof of 3.1

The proof is similar to that of [39, Section 3.1]. Let $f:\Theta\rightarrow(0,\infty)$ be a measurable function.

[TABLE]

Note that $\Lambda_{w}(\Psi^{-1}(n))\leq 1,\forall n$ and $\Lambda_{w}(\Psi^{-1}(n))\rightarrow 1$ as $n$ tends to infinity. By the bounded convergence theorem, we therefore have

[TABLE]

as $n\rightarrow\infty$ . Additionally, for any real sequence $(a_{n})_{n\geq 1}$ converging to $a$ we have $(1-{a_{n}}/{n})^{n}\rightarrow e^{-a}$ as $n\rightarrow\infty$ . We therefore obtain

[TABLE]

where the right-handside is equal to the Laplace functional $\mathbb{E}[e^{-G(f)}]$ of the CRM $G\sim\operatorname*{CRM}(\rho,\mu_{w})$ by Campbell’s theorem [38].

B.3 Proof of 5.1

By the marking theorem for Poisson point processes [38, Chapter 5], given $T_{n+1}=t_{n+1}$ , the random measure $\sum_{i\mid T_{i}\geq t_{n+1}}\delta_{X_{i}}$ is a Poisson random measure with mean measure $(1-\Lambda_{w}(t_{n+1}))\rho(dw)\mu_{w}(d\theta)$ . The result follows from Campbell’s theorem and the fact that $T_{n+1}=\Psi^{-1}(\xi_{n+1})$ .

B.4 Proof of 5.2

We state a slightly more general version of 5.2, where the constant $\zeta_{0}$ in Equation (18) can more generally be any slowly varying function $\ell_{0}(1/x)$ . We then prove this generalized proposition.

Proposition B.1.

[Slight generalization of 5.2] Assume that the mean measure $\rho(dw)$ is absolutely continuous with respect to the Lebesgue measure with density function $\rho(w)$ such that

[TABLE]

where $\sigma\in(0,1)$ and $\ell_{0}$ is a slowly varying function. Assume additionally that $\Lambda_{w}(t)=1-k(wt)$ where $k$ is a positive function on $(0,\infty)$ such that its Mellin transform (see A.3) $\check{k}$ converges in some open interval containing $[\sigma-2,\sigma-1]$ . Assume additionally that either (i) $k$ is differentiable with derivative $k^{\prime}$ and that the Mellin transform $\check{k}^{\prime}$ of $k^{\prime}$ is defined in some open interval containing $\sigma-1$ , or (ii) that $k(x)=\mathds{1}_{\{x\leq 1\}}$ . Then we have

[TABLE]

where $\ell_{**}$ is some slowly varying function that depends on $\ell_{0}$ and $\sigma$ but not $\lambda_{w}$ and the constant $C_{1}(\sigma)$ is given by $C_{1}(\sigma)=(1-\sigma)^{-1}$ if $k(x)=\mathds{1}_{\{x\leq 1\}}$ and $C_{1}(\sigma)=\frac{\check{k}(\sigma-1)}{(-\check{k}^{\prime}(\sigma-1))^{1-1/\sigma}}$ if $k$ is differentiable, and only depends on the arrival time distribution $\Lambda_{w}$ and $\sigma$ .

In order to B.1, we first state the following proposition.

Proposition B.2.

Assume that

[TABLE]

where $\sigma\in(0,1)$ and $\ell$ is a slowly varying function. Assume additionally that

[TABLE]

where $k$ is a positive and differentiable function on $(0,\infty)$ , with derivative $k^{\prime}$ . Assume that the Mellin transform $\check{k}^{\prime}$ of $k^{\prime}$ is defined in some open interval containing $\sigma-1$ . Then

[TABLE]

where $\ell_{*}$ is another slowly varying function depending on $\ell$ and $\sigma$ , and defined in Equation (28). If $\ell(x)=c$ is constant, then we simply have $\ell_{*}(x)=c^{-1/\sigma}$ . In particular, this is the case for both the generalized gamma process and the stable beta process, which verify condition (23) when $\sigma>0$ with $\ell(x)=\frac{\alpha}{\sigma\Gamma(1-\sigma)}$ for the GGP and $\ell(x)=\frac{\alpha}{\sigma B(1-\sigma,c+\sigma)}$ for the SBP.

Proof of B.2.

The assumptions (21) and the first condition in Equation (9) both imply that

[TABLE]

Using integration by parts

[TABLE]

Now assume $\Lambda_{w}(t)=1-k(wt)$ where $k$ is differentiable on $(0,\infty)$ . Then

[TABLE]

If the Mellin transform $\check{k}^{\prime}$ of $k^{\prime}$ is defined in some open interval containing $\sigma-1$ , then A.2 implies

[TABLE]

as $t$ tends to infinity. In the case $k(x)=\mathds{1}_{\{x\leq 1\}}$ , $k$ is not differentiable, but we have directly

[TABLE]

Now we use inversion formulas for regularly varying function to get the asymptotic regime for $\Psi^{-1}(t)$ . Assume $\sigma>0$ , then [25, Lemma 22] implies

[TABLE]

as $t\rightarrow\infty$ , where $\ell_{*}$ is a slowly varying function defined by

[TABLE]

where $\ell^{\#}$ denotes the de Bruijn conjugate of the slowly varying function $\ell$ [8, Theorem 1.5.13]. Note that $\ell_{*}$ only depends on $\ell$ and $\sigma$ , but not the arrival time distribution $\Lambda_{w}(t)$ .

∎

Assume that the mean measure $\rho(dw)$ is absolutely continuous with respect to the Lebesgue measure with density function $\rho(w)$ verifying

[TABLE]

where $\sigma\in[0,1]$ and $\ell_{0}$ is a slowly varying function. Equation (21) and [8, Proposition 1.5.8] imply that

[TABLE]

as $x$ tends to 0 where $\ell$ is a slowly varying function defined by

[TABLE]

Assume additionally that $\Lambda_{w}(t)=1-k(wt)$ where $k$ is a positive function on $(0,\infty)$ such that its Mellin transform (see A.3) $\check{k}$ converges in some open interval containing $[\sigma-2,\sigma-1]$ . Note that

[TABLE]

As $T_{n+1}$ tends to infinity almost surely as $n$ tends to infinity, A.2 implies

[TABLE]

almost surely as $n$ tends to infinity.

As $T_{n+1}=\Psi^{-1}(\gamma_{n+1})$ where $\gamma_{n+1}\sim n$ almost surely as $n$ tends to infinity, using B.2 we obtain

[TABLE]

almost surely as $n$ tends to infinity. Combining Equation (33) with Equations (31) and (32), we obtain

[TABLE]

almost surely as $n$ tends to infinity. Note that if $\ell_{0}(t)=\zeta_{0}$ is constant, then all the other slowly varying functions are also constant with

[TABLE]

Finally, Equation (22) follows similarly to the proof of [25, Proposition 2]. Using Chebyshev’s inequality

[TABLE]

Take $a_{n}=n^{2}$ . As

[TABLE]

by the Borel-Cantelli lemma, given $T_{n}$ ,

[TABLE]

almost surely as $n\to\infty$ . As $R_{n}$ is decreasing, we have, for any $m^{2}\leq n\leq(m+1)^{2}$

[TABLE]

and it follows by sandwiching that

[TABLE]

almost surely as $n\to\infty$ . Combining this with Equation (34) gives the final result, with $\ell_{**}$ the slowly varying function defined by

[TABLE]

Note that in the case of the GGP, the different slowly varying functions are all constant functions

[TABLE]

For the SBP, we have

[TABLE]

B.5 Proof of 5.3

From [16], we have the protobound

[TABLE]

In our case,

[TABLE]

where the last inequality follows from Jensen’s inequality.

Appendix C Mellin transforms

C.1 Deterministic kernel

Take $k(t)=\mathds{1}_{\{t\leq 1\}}.$ Then

[TABLE]

if $z<0$ .

C.2 Gamma kernel

Take $k(t)=\frac{\Gamma(\kappa,\kappa t)}{\Gamma(\kappa)}$ for $\kappa\geq 1$ .

[TABLE]

which converges for $z<0$ . Additionally, $k^{\prime}(t)=-\frac{\kappa^{\kappa}t^{\kappa-1}e^{-\kappa t}}{\Gamma(\kappa)}$ hence

[TABLE]

which converges for $z<\kappa-1$ .

C.3 Inverse gamma kernel

Take $k(t)=\frac{\gamma(\kappa,\kappa/t)}{\Gamma(\kappa)}$ . Note that if $\kappa=1$ , $k(t)=(1-e^{-1/t})$ . Then

[TABLE]

therefore defined for $z\in(-\kappa,0)$ . Note that

[TABLE]

as $\kappa$ tends to infinity, which corresponds to the inverse-Lévy case.

We have

[TABLE]

and

[TABLE]

defined for $z\in(-1-\kappa,\infty)$ . Note again that $\check{k^{\prime}}(z)\rightarrow-1$ as $\kappa\rightarrow\infty$ (inverse Lévy case).

C.4 Generalized Pareto kernel

Take $k(t)=\frac{1}{(t+1)^{c}}$ . Then

[TABLE]

for $z\in(-c,0)$ . We have $k^{\prime}(t)=-\frac{c}{(t+1)^{c+1}}$ hence

[TABLE]

defined for $z\in(-c-1,0)$ .

Appendix D BFRY and related distributions

D.1 BFRY distribution

The BFRY distribution, first named in [20] after the work of Bertoin, Fujita, Roynette, and Yor [7], arises much earlier in various contexts [47, 58]. Recently it was highlighted in [39] as a finite-dimensional approximate distribution for stable, generalized gamma, and special case of stable-beta processes. The density of a BFRY distribution with parameter $\sigma<(0,1)$ is written as

[TABLE]

One can easily verify that the distribution can be simulated as a ratio of independent gamma and beta random variables.

[TABLE]

D.2 Exponentially-tilted BFRY distribution

In [39, 34], the exponentially-tilted version of BFRY distribution was discussed. The density of exponentially-tiled random variable with parameters $\sigma\in(0,1)$ , $c>0$ and $\tau>0$ is

[TABLE]

Then it is easy to show that

[TABLE]

D.3 Generalized BFRY distribution

The generalized BFRY distribution, first discussed in [5], is obtained by generalizing the sampling procedure of BFRY distribution. The generalized BFRY distribution with parameter $\sigma\in(0,1)$ and $\kappa>\sigma$ is obtained as

[TABLE]

By a simple algebra, we obtain the density as

[TABLE]

where $\gamma(\cdot,\cdot)$ is the lower incomplete gamma function.

D.4 Exponentially-tilted generalized BFRY distribution

The density of exponentially-tilted generalized BFRY distribution with parameter $\sigma\in(0,1),\kappa>\sigma,t>0$ and $\tau>0$ is

[TABLE]

where $B_{x}(\cdot,\cdot)$ is the incomplete beta function. A random variable having this distribution can be simulated by rejection sampling. Alternatively, note that

[TABLE]

which means that the distribution is an infinite mixture of gamma distributions with mixing proportion

[TABLE]

Hence, sampling is straightforward as first sampling the component $j$ from above infinite discrete distribution and sampling from corresponding gamma distribution.

The expoentially-tilted GBFRY distribution has a nice property to be a conjugate prior for Poisson, gamma, normal with fixed mean, and Pareto. Let $W\sim\operatorname*{etgBFRY}(\kappa,\sigma,t,\tau)$ . Then, for Poisson,

[TABLE]

For gamma,

[TABLE]

For normal,

[TABLE]

For Pareto,

[TABLE]

D.5 Inverse generalized BFRY

One can also consider the counterpart of generalized BFRY distribution where gamma is replaced with inverse gamma. We define inverse generalized BFRY distribution, whose pdf is written as

[TABLE]

Hence, one can realize that

[TABLE]

This distribution corresponds to the truncated exchangeable density $\varphi_{t}(w)$ of stable process with inverse gamma arrival times.

D.6 Exponentially-tilted inverse generalized BFRY

Finally, we consider an exponentially tilted version of inverse GBFRY distribution, whose pdf is written as

[TABLE]

Unfortunately, we don’t have an analytic expression for the normalization constant. We can still sample from this distribution via rejection sampling. This distribution arises as the truncated exchangeable density $\varphi_{t}(w)$ of generalized gamma process with inverse gamma arrival times.

Appendix E Detailed derivations of the results in Section 4 and additional examples

E.1 Exponential arrival times

Generalized gamma.

In the case of the size measure (4) with $\alpha>0,\sigma\in(0,1)$ and $\tau\geq 0$ , we have

[TABLE]

The arrival times are thus generated as

[TABLE]

The conditional distribution for the sequential construction is

[TABLE]

In summary, the sequential construction for the GGP, is given by

[TABLE]

Comparison to Rosinski’s series representation for the GGP

Rosinski [51, 53] proposed the following series representation for the GGP/tempered stable process

[TABLE]

where $e_{i}\overset{iid}{\sim}\operatorname{Exp}(\tau)$ , $u_{i}\overset{iid}{\sim}\operatorname{Unif}(0,1)$ . For $i$ large, $W_{i}=\left(\frac{\xi_{i}\sigma\Gamma(1-\sigma)}{\alpha}\right)^{-1/\sigma}$ with high probability, which corresponds to the inverse-Lévy construction for the stable process, and this construction has the same asymptotic error rate as the inverse-Lévy construction for the GGP. The asymptotic error of Rosinski’s representation is therefore lower than the asymptotic error for the series defined by Eq. 45, by a factor $\Gamma(1-\sigma)^{1/\sigma}(1-\sigma)\in(1,2)$ , according to Table 1.

E.2 Gamma arrival times

Generalized gamma. Consider the generalized gamma process with (4), $\alpha>0$ , $\sigma\in(0,1)$ and $\tau\geq 0$ . We have

[TABLE]

where $B_{x}(a,b)$ is the incomplete beta function. For $\tau=0$ , $\Psi^{-1}$ has the analytic expression

[TABLE]

For $\tau>0$ , there is no analytic expression for $\Psi^{-1}$ . For the sequential construction, we get

[TABLE]

For the exchangeable and iid constructions, we obtain

[TABLE]

When $\tau=0$ , $\varphi_{t}(dw)$ is the distribution of a generalized BFRY distribution (Section D.3). When $\tau>0$ , $\varphi_{t}(dw)$ corresponds to the distribution of exponentially-tilted generalized BFRY (Section D.4).

E.3 Inverse gamma arrival times

Take

[TABLE]

Stable process.

Consider the stable process with size measure (4), $\alpha>0$ , $\sigma\in(0,1)$ and $\tau=0$ . We have

[TABLE]

For the sequential construction, we have

[TABLE]

For the exchangeable construction, we get

[TABLE]

which correspond to inverse generalized BFRY distribution. We can sample from this as $x\sim\operatorname{iGamma}(\kappa+\sigma,1),y\sim\operatorname{Beta}(\sigma,1),w=\frac{\kappa x}{ty}$ . See Section D.5 for more details.

The case $\kappa=1$ is of particular interest, as it leads to a tractable novel representation for the GGP, and provide a novel way of interpreting the classical iid approximation of the beta process.

Generalized gamma process.

Consider GGP with size measure (4) with $\alpha>0,\sigma\in[0,1)$ and $\tau>0$ . Take inverse gamma arrival time with $\kappa=1$ . We have

[TABLE]

where $K_{\nu}(\cdot)$ is a modified Bessel function of the second kind. Unfortunately, the arrival time is not given analytically, so we may resort to a numerical root finding algorithm to compute $T_{i}=\Psi^{-1}(\xi_{i})$ . The sequential construction is then given by

[TABLE]

where $\operatorname{GIG}(p,a,b)$ is a generalized inverse Gaussian distribution with parameters $p\in{\mathbb{R}},a>0,b>0$ . The exchangeable construction is given by

[TABLE]

This particular case, which seems to be novel to the best of our knowledge, is useful because it covers the gamma process ( $\sigma=0$ ). It also includes the stable process $(\tau=0)$ as its limiting case - as $\tau\to 0$ ,

[TABLE]

The sequential construction is impractical since we have to invert $\Psi$ for each $t_{i}$ , but the exchangeable construction requires only one inversion for $t_{n+1}$ .

Beta process.

Consider the beta process:

[TABLE]

Take the bijective transformation $u=-(\alpha\log(w))^{-1}$ which gives the measure on $(0,\infty)$

[TABLE]

Note that $\rho(du)$ is not a Lévy measure. Using the inverse gamma kernel with $\kappa=1$ to obtain a series approximation for $\rho(du)$ , we obtain

[TABLE]

and for the iid model we have $\widetilde{U}_{i}\sim\widetilde{\phi}_{n}(du)$ where

[TABLE]

which is the distribution of an inverse gamma random variable with parameter $(1,1/n)$ . Setting the inverse transformation $\widetilde{W}_{i}=e^{-1/(\alpha\widetilde{U}_{i})}$ , we obtain

[TABLE]

which corresponds to the classical iid approximation for the beta process, described in the introduction.

This construction can also be obtained directly with different arrival time distribution. Consider

[TABLE]

A sample from this distribution can be obtained as

[TABLE]

Then we have

[TABLE]

and as a result

[TABLE]

A sample from $\phi_{t}(w)$ can be obtained by

[TABLE]

and the exchangeable construction correspond to the iid beta approximation.

E.4 Generalized Pareto arrival time distribution

Consider the following arrival time distribution,

[TABLE]

where $c>0$ .

Stable beta process.

Consider the stable beta process with Lévy measure

[TABLE]

With change of variable $v=\frac{w}{1-w}$ , we see that

[TABLE]

For the sequential model we obtain

[TABLE]

With the change of variable $v=\frac{w}{w-1}$ , we see that

[TABLE]

and thus a sample from $\phi_{t}$ can be obtained as

[TABLE]

For the exchangeable model, we have

[TABLE]

and by a similar calculation we see that a sample from $\varphi_{t}$ can be obtained as

[TABLE]

Appendix F Details on simulations and additional results

We are interested in measuring the error $R_{n}|T_{n+1}$ , apparently not tractable. Hence, we consider the approximate error $R_{n,\hat{n}}$ defined for $\hat{n}>n$ as

[TABLE]

where we simulate $(W_{i})_{i=1}^{\hat{n}}$ via sequential constructions. When no analytic expression is available (computing $\Psi^{-1}(t)$ for Gamma arrival times for GGP when $\kappa>1$ , computing $\bar{\rho}^{-1}(\xi)$ for the inverse-Lévy for GGP), we resort to numerical inversion algorithm. For each configuration, we first sample a series of arrival time sequences, and conditioned on that sample 100 series of jumps $(W_{i})_{i=1}^{\hat{n}}$ to compute $R_{n,\hat{n}}$ . We repeat this procedure 10 times for each arrival time sequence (thus 1,000 jump simulations in total for each configuration) and report the mean and standard deviations. Unless specified otherwise, we use the hyperameters $\alpha=2.0,\tau=1.0$ for Stable Process (SP) and GGP, and set $\hat{n}=10^{4}$ . Fig. 1 in the paper reports $R_{n,\hat{n}}$ for SP and GGP.

Fig. 2 shows the approximate errors for gamma and inverse gamma arrival times for SP, and gamma arrival times for GGP. We can observe that the variances quickly approach zero, except for the inverse gamma arrival time with $\kappa=1$ case for which our theory predicts to have infinite variance.

Fig. 3 shows the value of constants $C_{1}(\sigma)$ for gamma and inverse gamma arrival times with varying $\sigma$ and $\kappa$ values. Note that lower $C_{1}(\sigma)$ implies lower expected error $\mathbb{E}[R_{n}|T_{n+1}]$ by 5.2. We found that gamma arrival times exhibit lower $C_{1}(\sigma)$ when $0<\sigma<0.5$ , and inverse gamma has lower $C_{1}(\sigma)$ when $0.5<\sigma<1$ . This observation is empirically confirmed in Fig. 4.

Finally, we compared the approximate error to asymptotic value of $R_{n}$ . In case of gamma arrival times for GGP, according to 5.2, we have

[TABLE]

Fig. 5 compares empirical approximate errors with $\hat{n}=10^{6}$ to (64) with different values of $\sigma$ . We fixed $\kappa=1$ here. One can see that the approximate error quickly approaches asymptotic errors.

Appendix G Example on normalized GGP mixture models

Consider a hierarchical Bayesian model

[TABLE]

We approximate infinite dimensional process $G$ with finite iid process $\tilde{G}_{n}$ . Then, the rest of the model can be rewritten as

[TABLE]

We construct $\widetilde{G}_{n}$ via gamma arrival times with $\kappa>1$ . Using B.2, we have

[TABLE]

Note that we used the function $f(n)=\big{(}\frac{\sigma\Gamma(\kappa)\Gamma(1-\sigma)n}{\alpha\kappa^{\sigma}\Gamma(\kappa-\sigma)}\big{)}^{\frac{1}{\sigma}}$ in place of $\Psi^{-1}(n)$ , thus both evaluation of the pdf and sampling can be done without any numerical approximation. The joint density of the mixture model is then written as

[TABLE]

where $\ell$ and $h$ are the density for $L$ and $H$ , $w_{\bullet}\vcentcolon=\sum_{i=1}^{n}w_{i}$ , and $m_{k}\vcentcolon=\sum_{j=1}^{m}\mathds{1}_{\{z_{j}=k\}}$ . Now we are free to any posterior inference algorithm, such as variational inference or stochastic gradient MCMC as in [39].

Bibliography60

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. Argiento, I. Bianchini, and A. Guglielmi. A blocked Gibbs sampler for NGG-mixture models via a priori truncation. Statistics and Computing , 26(3):641–661, 2016.
2[2] R. Argiento, I. Bianchini, and A.s Guglielmi. Posterior sampling from ε 𝜀 \varepsilon -approximation of normalized completely random measure mixtures. Electronic Journal of Statistics , 10(2):3516–3547, 2016.
3AC [19] F. Ayed and F. Caron. Nonnegative Bayesian nonparametric factor models with completely random measures for community detection. ar Xiv:1902.10693 , 2019.
4ADH [10] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Marlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 72(3):269–342, 2010.
5ALC [19] F. Ayed, J. Lee, and F. Caron. Beyond the Chinese restaurant and Pitman-Yor processes: statistical models with double power-law behavior. ar Xiv:1902.04714 , 2019.
6AP [17] J. Arbel and I. Prünster. A moment-matching ferguson & klass algorithm. Statistics and Computing , 27(1):3–17, 2017.
7BFRY [06] J. Bertoin, T. Fujita, B. Roynette, and M. Yor. On a particular class of self-decomposable random variables : the durations of Bessel excursions straddling independent exponential times. Probability and Mathematical Statistics , 26(2):315–366, 2006.
8BGT [89] N. H. Bingham, C. M. Goldie, and J. L. Teugels. Regular variation , volume 27. Cambridge university press, 1989.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A unified construction for series representations and finite approximations of completely random measures

Abstract

1 Introduction

2 Background

2.1 Completely random measures

Remark 2.1*.*

2.2 Objective

2.3 Existing representations of CRMs

3 Series representations and finite approximations of CRMs

3.1 Arrival-time augmentation

3.2 Series and truncated exchangeable constructions

Theorem 3.1**.**

3.3 Finite iid construction

Proposition 3.1**.**

4 Examples

4.1 Deterministic arrival times (inverse-Lévy construction)

4.2 Exponential arrival times (size-biased construction)

Generalized gamma process.

4.3 Gamma arrival times

4.4 Inverse gamma arrival times

4.5 Generalized Pareto arrival time

5 Truncation error analysis

5.1 Error on functionals of the CRM

Proposition 5.1**.**

Proposition 5.2**.**

5.2 L1L_{1}L1​ error on the marginal likelihood

Proposition 5.3**.**

6 Discussion

Appendix A Background on regular variation and Mellin transforms

A.1 Definitions

Definition A.1** (Slowly varying function).**

Definition A.2** (Regularly varying function).**

A.2 Basic theorems for regularly varying functions

Theorem A.1** (Karamata’s theorem).**

Corollary A.1**.**

A.3 Generalized Abelian theorem

Definition A.3**.**

Remark A.1*.*

Theorem A.2**.**

Corollary A.2**.**

Proof.

Appendix B Proofs

B.1 Proof of 3.1

B.2 Proof of 3.1

B.3 Proof of 5.1

B.4 Proof of 5.2

Proposition B.1**.**

Proposition B.2**.**

Proof of B.2.

B.5 Proof of 5.3

Appendix C Mellin transforms

C.1 Deterministic kernel

C.2 Gamma kernel

C.3 Inverse gamma kernel

C.4 Generalized Pareto kernel

Appendix D BFRY and related distributions

D.1 BFRY distribution

D.2 Exponentially-tilted BFRY distribution

D.3 Generalized BFRY distribution

D.4 Exponentially-tilted generalized BFRY distribution

D.5 Inverse generalized BFRY

D.6 Exponentially-tilted inverse generalized BFRY

Appendix E Detailed derivations of the results in Section 4 and additional examples

E.1 Exponential arrival times

Generalized gamma.

Comparison to Rosinski’s series representation for the GGP

E.2 Gamma arrival times

E.3 Inverse gamma arrival times

Stable process.

Generalized gamma process.

Beta process.

E.4 Generalized Pareto arrival time distribution

Stable beta process.

*Remark 2.1**.*

Theorem 3.1.

Proposition 3.1.

Proposition 5.1.

Proposition 5.2.

5.2 $L_{1}$ error on the marginal likelihood

Proposition 5.3.

Definition A.1 (Slowly varying function).

Definition A.2 (Regularly varying function).

Theorem A.1 (Karamata’s theorem).

Corollary A.1.

Definition A.3.

*Remark A.1**.*

Theorem A.2.

Corollary A.2.

Proposition B.1.

Proposition B.2.