On quasi-infinitely divisible random measures
Riccardo Passeggeri

TL;DR
This paper investigates quasi-infinitely divisible (QID) random measures, showing their density in the space of all completely random measures (CRMs), establishing a Lévy-Khintchine formulation, and exploring implications for Bayesian nonparametrics.
Contribution
It proves the density of QID CRMs in the space of all CRMs and establishes a Lévy-Khintchine representation with a one-to-one law correspondence.
Findings
QID CRMs are dense in all CRMs with respect to distribution convergence.
QID CRMs possess a Lévy-Khintchine type representation.
Results have implications for Bayesian nonparametric models.
Abstract
Quasi-infinitely divisible (QID) distributions have been recently introduced by Lindner, Pan and Sato (\textit{Trans.~Amer.~Math.~Soc.}~\textbf{370}, 8483-8520 (2018)). A random variable is QID if and only if there exist two infinitely divisible (ID) random variables and s.t.~ and is independent of . In this work, we show that a family of QID completely random measures (CRMs) is dense in the space of all CRMs with respect to convergence in distribution. We further demonstrate that the elements of this family posses a L\'{e}vy-Khintchine formulation and that there exists a one to one correspondence between their law and certain characteristic pairs. We prove the same results also for the class of point processes with independent increments. In the second part of the paper, we show the relevance of these results in the general Bayesian nonparametric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Statistical Methods and Bayesian Inference
On quasi-infinitely divisible random measures
Riccardo Passeggeri111LPSM, Sorbonne University. Email: [email protected]
Abstract
Quasi-infinitely divisible (QID) distributions have been recently introduced by Lindner, Pan and Sato (Trans. Amer. Math. Soc. 370, 8483-8520 (2018)). A random variable is QID if and only if there exist two infinitely divisible (ID) random variables and s.t. and is independent of . In this work, we show that a family of QID completely random measures (CRMs) is dense in the space of all CRMs with respect to convergence in distribution. We further demonstrate that the elements of this family posses a Lévy-Khintchine formulation and that there exists a one to one correspondence between their law and certain characteristic pairs. We prove the same results also for the class of point processes with independent increments. In the second part of the paper, we show the relevance of these results in the general Bayesian nonparametric framework based on CRMs developed by Broderick, Wilson and Jordan (Bernoulli, 24, 3181-3221 (2018)).
Keywords: quasi-infinite divisibility, completely random measure, dense class, nonparametric Bayesian analysis, automatic conjugacy.
MSC (2010): 60E07, 60G57, 60A10, 62F15
Contents
1 Introduction
A random measure on , with underlying probability space , is a function , such that is a -measurable in for fixed and a locally finite measure in for fixed . Completely random measures (CRMs) have the additional property that for any disjoint , , the random variables are independent. CRMs, also called independently scattered random measures or random measures with independent increments, have a fundamental role in nonparametric Bayesian analysis; as Ghosal and van der Vaart affirm in their recent book (see [7]) CRMs “arise as priors, or building blocks for priors, in many Bayesian nonparametric applications”.
Completely random measures have a long history which is inextricably linked with the one of infinitely divisible distributions. In 1967, Kingman [14] proved a very appealing and useful representation theorem for all CRMs. He showed that any CRM is almost surely given by the sum of three components: one deterministic, one concentrated on a fixed set of atoms, and one concentrated on a random set of atoms. He further showed that the last component, which he called the ordinary component, is fully determined by a Poisson point process: , where is a Poisson point process on . The Poisson point process is the prime example of infinitely divisible CRM.
Infinitely divisible (ID) distributions have an even longer history that goes back to the work of Lévy, Kolmogorov and De Finetti, among others. They constitute one of the most studied classes of probability distributions. One of their most attractive properties is that their characteristic function have an explicit formulation, called the Lévy-Khintchine formulation, written in terms of three mathematical objects. These are the drift, which is a real valued constant, the Gaussian component, which is a non-negative constant, and the Lévy measure, which is a measure on satisfying an integrability condition and with no mass at . Gaussian and Poisson distributions are examples of this class.
In 2018, in [15] Sato, Lindner and Pan introduced the class of quasi-infinitely divisible (QID) distributions. A QID random variable is defined as follows: a random variable is QID (namely has a QID distribution) if and only if there exist two ID random variables and s.t. and is independent of . QID distributions are like ID distributions except for the fact that the Lévy measure is now allowed to take negative values. In other words, a QID distribution has a Lévy-Khintchine formulation which is uniquely determined by a drift, a Gaussian component and by a ‘signed measure’ (more precisely a real valued set function) called the quasi-Lévy measure. Any ID distribution is QID, but the converse is not always true.
In [15], the authors show that QID distributions are dense in the space of all probability distributions with respect to weak convergence and that distributions concentrated on the integers (or any shift and dilation of them) are QID if and only if their characteristic functions have no zeros, among other results. Further theoretical results have been achieved in [1, 13, 19, 20]. In [19], the QID framework is extended to real-valued random noises and stochastic processes. QID distributions have already shown to have an impact in various fields: from mathematical physics, see [4] and [6], to number theory, see [16] and [17], and to insurance mathematics, see [21].
The first main contribution of this paper is the density result for QID random measures. We prove that a certain class of QID completely random measures (CRMs), which we denote by , is dense with respect to convergence in distribution (precisely in both weak and vague convergence) in the space of all CRMs, also know as random measures with independent increments or as independently scattered random measures. This result extends the density result in [15] to the infinite dimensional setting of CRMs.
The class have quite remarkable features. First, as any CRM they have an almost sure representation in terms of an ‘atomless’ ID component and an ‘atomic’ one. Second, the number of atoms is finite. Third, these random measures are almost surely finite and even more their atomless component has finite Lévy measure.
Moreover, for the elements of this class, we are able to show an explicit spectral representation, namely the Lévy-Khintchine formulation, and prove that there exists a unique one-to-one correspondence between them and pairs of deterministic measures satisfying certain conditions, which we call characteristic pairs. We prove all these results also for the class of point processes with independent increments, of which the Poisson point process is an example.
With these results this paper shows that the fixed component of a CRM, which has been left out in Kingman’s analysis and in the theory of CRM in general, have exactly the same nice representation results as the widely studied ordinary component. Thus, not only there is no real need of leaving out of the analysis the fixed component (as Kingman graphically says, fixed atoms can be removed by simple surgery), but this might also be dangerous since in many applications the fixed component has an irreplaceable role. This will also appear evident in the Bayesian setting we discuss in this work (see also [2]).
In the last section we investigate the impact of these results in the nonparametric Bayesian statistical framework presented by Broderick, Wilson and Jordan in [2] based on CRMs (see also [3]). In particular, we consider priors to be given by elements in (with quasi-Lévy measure having a particular structure). We show that they are dense in the space of priors considered in [2] and [3] with respect to convergence in distribution, thus showing also that our density result is flexible enough to adjust to various assumptions/settings. Second, we present explicit formulations for their posterior distributions. Third, when focusing on point processes, we prove automatic conjugacy for all the elements of under the only condition that the characteristic function of the posterior distribution has no zeros. This condition is satisfied in many situations and the result is more general than the the one of [2] which is based on the exponential structure of the likelihood.
We remark that the general nature of our results allow them to be applied in many Bayesian settings. Thus, the choice of the work of Broderick, Wilson and Jordan [2] represents a first easy example.
The paper is structured as follows. Section 2 concerns with the notations and some preliminaries. In Section 3 we provide the density results for CRMs and in Subsection 3.1 the one for point processes with independent increments. In Section 4, we show various properties for the classes of QID random measures and QID point processes presented in Section 3. In particular we present the Lévy-Khintchine formulation and the one-to-one correspondence of these random measures with their unique characteristic pair. In Section 5, we present the Bayesian setting and the relative results: computation of the posterior, convergence results for the posterior, and automatic conjugacy.
2 Notation and Preliminaries
By a measure on a measurable space we always mean a positive measure on , i.e. a -valued -additive set function on that assigns the value [math] to the empty set. For a non-empty set , by we mean the Borel -algebra of , unless stated differently. The law and the characteristic function of a random variable will be denoted by and by , respectively. For two measurable spaces and , we denote by the product -algebra of and , and by their Cartesian product. Let us recall some definitions.
Definition 2.1** (extended signed measure).**
Given a measurable space , that is, a set with a -algebra on it, an extended signed measure is a function s.t. and is -additive, that is, it satisfies the equality where the series on the right must converge in absolutely (namely the value of the series is independent of the order of its elements), for any sequence of disjoint sets in .
As a consequence any extended signed measure can take plus or minus infinity as value, but not both. In this work, we use the term ‘signed measure’ for an extended signed measure. Further, the total variation of a signed measure is defined as the measure defined by
[TABLE]
where the supremum is taken over all the partitions of . The total variation is finite if and only if is finite. Let us recall the definition of a signed bimeasure.
Definition 2.2** (Signed bimeasure).**
*Let and be two measurable spaces. A signed bimeasure is a function such that:
(i) the function is a signed measure on for every ,
(i) the function is a signed measure on for every .*
Let be a separable and complete metric space with Borel -algebra S and let be the ring composed by bounded Borel sets in . The triplet is called localised Borel space (see page 19 in [12]).
Definition 2.3** (random measure).**
A random measure on , with underlying probability space , is a function , such that is a -measurable in for fixed and a locally finite measure in for fixed .
Definition 2.4** (completely random measure).**
A completely random measure (CRM) is a random measure s.t. for any disjoint , , the random variables are independent. CRMs are also called independently scattered random measure or random measure with independent increments.
Definition 2.5** (diffuse random measure).**
Using the notation of the previous definition, we say that a random measure on is diffuse if is a locally finite diffuse measure in for fixed .
Remark 2.6**.**
Term finite for random measures stands for a.s. finite. Thus, for a finite random measure we mean an a.s. finite random measure.
For a random measure on a Polish space , is a fixed atom of if and only if . Further, a random measure is called atomless if for every . The atomless condition is for random measures what the continuity in probability is for continuous time stochastic processes. We remark that an atomless random measure is not necessarily a diffuse random measure (see Corollary 12.11 in [11]). For example, think of a Poisson point process with , like the homogeneous Poisson point process, which has no fixed atoms but it is not diffuse.
Now, we introduce the concept of a quasi-Lévy type measure. We start with the following definition, which we recall from [15]:
Definition 2.7**.**
Let for and be the class of all Borel sets that are bounded away from zero. Let be a function such that is a finite signed measure for each and denote the total variation, positive and negative part of by , and respectively. Then the total variation , the positive part and the negative part of are defined to be the unique measures on satisfying
[TABLE]
[TABLE]
for , for some .
As mentioned in [15], is not a a signed measure because it is defined on , which is not a -algebra. In the case it is possible to extend the definition of to such that is a signed measure then we will identify with its extension to and speak of as a signed measure. Moreover, the uniqueness of , and is ensured by an application of the Carathéodory’s extension theorem (see Lemma 2.14 in [19]). Further, notice that (see Remark 2.6 in [19]).
Definition 2.8** (quasi-Lévy type measure, quasi-Lévy measure, QID distribution, from [15]).**
A quasi-Lévy type measure is a function satisfying the condition in Definition 2.7 and such that its total variation satisfies . Let be a probability distribution on . We say that is quasi-infinitely divisible if its characteristic function has a representation
[TABLE]
where and is a quasi-Lévy type measure. The characteristic triplet of is unique, and and are called the Gaussian variance and the drift of , respectively. A quasi-Lévy type measure is called quasi-Lévy measure, if additionally there exist a quasi-infinitely divisible distribution and some such that is the characteristic triplet of . We call the quasi-Lévy measure of .
The above definition extend to the case (for ) as shown in Remark 2.4 in [15]. As pointed out in Example 2.9 of [15], a quasi-Lévy measure is always a quasi-Lévy type measure, while the converse is not true. Moreover, we say that a function is integrable with respect to quasi-Lévy type measure if it is integrable with respect to . Then, we define:
[TABLE]
In this work we always keep the same order for the elements in the characteristic triplet: the first element is the drift, the second one is the Gaussian variance, and the third one is the (quasi) Lévy measure.
Definition 2.9** (QID random measure).**
Let be a random measure. If is a QID random variable, for every , then we call a QID random measure.
We conclude with the following result on QID distributions.
Theorem 2.10** (Theorem 4.3.4 in [5]).**
Let . The characteristic triplet , where is a finite quasi-Lévy type measure, is the characteristic triplet of a QID distribution on if and only if is a measure. In that case, is given by
[TABLE]
3 The density result for QID CRMs
In this section we present the density results for QID CRMs in the space of all CRMs with respect to convergence in distribution. Let us start with some preliminaries. Let be a separable and complete metric space with Borel -algebra S and let be the ring composed by bounded Borel sets in . Let be the space of all bounded continuous functions with bounded support. Let be the space of locally finite measures, namely if for every . The space might be endowed with the vague topology, denoted by , generated by the integration maps , for all . The vague topology is the coarsest topology making all continuous. The measurable space is a Polish space. The associated notion of vague convergence denoted by is defined by the condition for all .
An equivalent definition of random measure (see Definition 2.3) is the following: a random measure is a measurable mapping from to , where is the topology generated by all projection maps with , or, equivalently, by all integration maps with measurable . From Lemma 4.1 in [10] or Theorem 4.2 in [12], we know that and coincide. Hence it is equivalent to consider a random measure as a measurable mapping from to or to .
The convergence in distribution of to means that for every bounded continuous function on , or equivalently that , where for any bounded measures and , the weak convergence stands for for all as above. We write to stress that the convergence of distribution is for random measures considered as random elements in the space with vague topology. As mentioned in the previous section, in this setting an atom of a random measure is an element such that .
We recall now a fundamental result by Harris, see [8].
Theorem 3.1** (see Theorem 4.11 in [12]).**
*Let be random measures on . Then these conditions are equivalent:
(i) ,
(ii) for all ,
(iiI) for all with .*
The following density result extends Theorem 4.1 in [15].
Theorem 3.2**.**
Let be a connected interval of the real line. The class of QID distributions with finite quasi-Lévy measure, zero Gaussian variance and with support on is dense in the class of probability distributions with support on with respect to weak convergence.
Proof.
Some arguments of the proof are in nature similar to the ones of the proof of Theorem 4.1 in [15]. First, we prove the result when is bounded.
Let be a finite closed interval, thus for some . Let be a probability distribution with support . For , let , and define the discrete distribution concentrated on the lattice by
[TABLE]
Then, as . Observe that is the probability distribution of a random variable with values on . It remains to prove that each is a weak limit of QID distributions with finite quasi-Lévy measure, zero Gaussian variance and with support on . W.l.o.g. assume that the approximating sequence of distributions is such that for every . Assume that the characteristic function has zeros (in the other case we can directly use Corollary 3.10 in [15] to conclude). Let be a random variable with distribution and define . Then, is concentrated on with masses for , and its characteristic function has zeroes. Then, the polynomial has zeroes on the unit circle. Factorizing, we obtain , where , , denote the complex roots. Let , where and . Then, for small enough , is a polynomial with real coefficients, namely with . Observe that for small enough , and will be close, so . Now, let be a random variable with distribution and let . Observe that, for every , is random variable with values on the lattice and its characteristic function has no zeros, and that as . Finally, by Corollary 3.10 in [15] we know that is QID with finite quasi-Lévy measure and zero Gaussian variance.
Observe that if is a bounded open interval, say for some , then the above arguments apply. Let be a probability distribution with support . For any let and and let , and define the discrete distribution concentrated on the lattice as in (2). Then, as and, applying the same reaming arguments (in which is fixed) for and instead of and , we obtain the result for bounded and open.
Let now be an unbounded interval of the form for some . Let be a probability distribution with support on . For , let , and define the discrete distribution concentrated on the lattice as in (2). Then, as . Using the notation above, let be a random variable with distribution and define . Then, is concentrated on with masses and its characteristic function has zeroes by assumption. We proceed as before. Thus, for small enough , we obtain a polynomial with real coefficients , namely with and , for small enough . Then, let be a random variable with distribution and let . Then, is random variables with support on and its characteristic function has no zeros, and that as . Hence, by Corollary 3.10 in [15] we obtain the result.
Similarly we obtain the result for , for and for , where . ∎
Recall that the Lévy-Prokhorov metric (or better just Lévy metric since we work on ) for two probability distributions and on is defined as
[TABLE]
Lemma 3.3**.**
Let and be any two probability distributions on and let and where . For every positive constant we have that .
Proof.
Let be any positive constant . Observe that and similarly we have that . This implies that if satisfies F(x-\varepsilon)-\varepsilon\leq G(x)\leq F(x+\varepsilon)+\varepsilon\textnormal{ for all x\in\mathbb{R}}, then it also satisfies F_{c}(x-\varepsilon)-\varepsilon\leq G_{c}(x)\leq F_{c}(x+\varepsilon)+\varepsilon\textnormal{ for all x\in\mathbb{R}}. Then, we have
[TABLE]
[TABLE]
∎
Observe that for two real valued random variables and the above lemma affirms that for any we have that .
Another useful property of the Prokhorov metric is the following. From condition 3) of the section “Lévy metric” in [22] (page 405) given any probability distributions on , where , we have that
[TABLE]
For the next two results denote by the sequence of bounded sets (i.e. ) s.t. . Notice that such sequence exists by the definition of , see page 19 in [12].
Proposition 3.4**.**
Consider an atomless CRM with corresponding unique pair . Let and let , for every , and . Then, and are finite measures and there exists a sequence of atomless finite CRMs with pair s.t. .
Proof.
From Kingman’s representation theorem (see [14] and see also Corollary 12.11 in [11] and Corollary 3.21 in [12]), we have that every atomless CRM has the following representation:
[TABLE]
for some non-random diffuse measure and a Poisson process on with intensity satisfying
[TABLE]
for every . In particular, for every we have that if and only if and condition holds for (see Corollary 12.11 in [11]). Further, notice that the above formulation implies that for every and
[TABLE]
Moreover, the unique one to one correspondence between and is shown in Theorem 3.20 of [12].
It is possible to see that and are measures on S and on , respectively. In particular, since for every then and
[TABLE]
for every . Thus, and are finite measures, for every .
Now, for every , let be a Poisson process on with intensity and let
[TABLE]
Then, we have that is an atomless CRM and since and are finite then is finite, for every (see Corollary 12.11 in [11]).
Concerning the stated convergence we have the following. From Lemma 12.2 in [11] (or from Lemma 3.1 in [12]) we have that for every
[TABLE]
Hence, by assumption we have that for every
[TABLE]
[TABLE]
[TABLE]
as . Then, by point (iii) in Theorem 3.1 (see also Lemma 4.24 in [12]) we obtain that , as . ∎
Now, let us denote by the set of all CRMs on (considered as random elements in endowed with the vague topology) and recall that . From Theorem 7.1 in [10] we know that an element of has the following representation
[TABLE]
with , where is the set of fixed atoms of in , is an atomless CRM, and , , are -valued random variables, which are mutually independent and independent of . We call the fixed component of . We remark that in the Kingman’s representation is the sum of a deterministic and a ordinary component as shown in the proof of Proposition 3.4 in eq. (4).
Consider the following class of QID random measures:
[TABLE]
[TABLE]
[TABLE]
First, notice that is ID, because any atomless random measure with independent increments is ID. Second, observe that, in contrast with the usual representation of CRMs, the elements of have that the atomless random measure has finite Lévy measure (thus, is finite), that the number of fixed atoms is finite, and that , , -valued QID random variables with finite quasi-Lévy measure and zero Gaussian variance. Notice that the elements of are almost surely finite on S. Thus, is strictly smaller than the class of QID CRMs, which in turn is strictly smaller than the class of all CRMs (namely ).
We are ready to present the main result of this section.
Theorem 3.5**.**
* is dense in the space of all CRMs with respect to the convergence in distribution.*
Proof.
From Theorem 7.1 in [10] we know that any CRM has the following unique representation
[TABLE]
with , where is the set of fixed atoms of , is a random measure without fixed atoms with independent increments (hence, is an atomless ID random measure), and , , are -valued random variables, which are mutually independent and independent of .
From Theorem 3.2 with , we know that for each there exists a sequence of non-negative QID random variable with zero Gaussian variance and finite Lévy measure that converges in distribution to , for every . Denote by such a sequence.
Denote by the sequence of bounded sets s.t. and by be the pair associated to . Let and , for every , and , as in Proposition 3.4. Then, by Proposition 3.4 there exists a sequence of finite CRMs with pair s.t. .
The first step is to show the existence of random measures with ID random measure equals in distribution to , with fixed atoms in and weights equal in distributions to . The existence is not immediate because we do not know whether the are mutually independent and independent of in the underlying probability space of . This is a classical problem in probability and the solution lies in the construction of a probability space under which these conditions are satisfied, which is given by the ‘product’ of the probability spaces.
For the sake of clarity and completeness let us write here the arguments. Fix . Denote the underlying probability spaces of by and of the random variable by , for . Consider the probability space where , and is the product probability measure of ,…,.
Let and let , where , for every . Observe that for every and we have that
[TABLE]
[TABLE]
[TABLE]
Now, let
[TABLE]
where ,…, are the same as the ones in (6). It is possible to see that, for every , is a measure because it is the sum of measures and that, for every , is a measurable function because it is the sum of measurable functions. Thus, is a random measure on and from its definition it is possible to see that it belongs to .
Since we can choose a subsequence of , which by abuse of notation we denote it by , such that for every and . From the above arguments there exists a sequence of random measures in (with possibly different underlying probability spaces) such that . Thus, using that we obtain that for every and .
Now, we need to show that . From Theorem 3.1, it is sufficient to show that for all . Since for every and for every then . Further, since and are independent of the corresponding fixed component, this reduces the goal to prove that for all .
Let , hence, is bounded and has bounded support, and by denoting the support of we have that and so that almost surely , , and . Thus, for each , a.s. and a.s..
Moreover, notice that it is sufficient to prove the result for any with for every . Indeed, consider any and let be its bound, then and so if then .
Now, consider any with for every . By the triangular inequality we have that
[TABLE]
The last element converges to zero as because as . For the other element, by (3) and by Lemma 3.3 we obtain that
[TABLE]
Thus, we have that as , which concludes the proof. ∎
Remark 3.6**.**
We could alternatively consider an almost sure equality in (7) and then use the existence and uniqueness results for random measures (see Theorem 2.15 and Corollary 2.16 in [11]) to obtain a random measure almost surely equal to . In addition, by the Kolmogorov extension theorem the same arguments of the first part of the above proof hold for the case of ‘equal’ to infinity, namely .
Further, we point out that if is such that the number of fixed atoms in any bounded set (i.e. in any ) is finite then the number of fixed atoms in the support of every is finite, namely has finite cardinality, and so the stated result follows directly from the mutual independence of the , , from the fact that as , for every and , and from the continuous mapping theorem.
Remark 3.7**.**
Let be a class of random measures like , but such that the ID component is not necessarily finite, i.e. the ‘’ is not necessarily finite. Then, trivially is dense in w.r.t. the convergence in distribution. Indeed, let be any CRM on . If we know the ID component of , i.e. , and for modelling/theoretical reasons we can take an approximating sequence of unbounded , then we can define the s.t. , where . Then, and from the arguments of the proof of Theorem 3.5 it is possible to see that .
It is possible to consider also the set of bounded measures, denoted by , which can be endowed with the vague topology, as for , but also with the weak topology. The weak topology on is the topology generated by the integration maps for all bounded continuous functions. Then, for random measures considered as random elements in , endowed with the weak topology, we will denote by the convergence in distribution. Observe that in this setting a QID random measures as defined in Definition are QID random measures on (hence we do not need to extend them) because for every they are all a.s. bounded.
We will use the following result of Kallenberg to prove our next result.
Theorem 3.8** (see Theorem 4.19 in [12]).**
*Let be a.s. bounded random measures on . Then these conditions are equivalent
(i) ,
(ii) , and .*
We are now ready to present our next result, which is similar to Theorem 3.5, but applies to and involves both the vague and the weak topology.
Theorem 3.9**.**
* is dense in the space of all CRMs, considered as random elements in , endowed with either the vague topology or the weak topology, with respect to the convergence in distribution.*
Proof.
Consider first the case of endowed with the vague topology. Then, by the same arguments as the ones used in the proof of Theorem 3.5 we obtain the result.
For the weak topology case, by the same arguments as the ones used in the proof of Theorem 3.5 we have that . Hence, according to Theorem 3.8 it remains to prove that , namely that . However, this has been proved in the proof of Theorem 3.5 – indeed, consider and notice that and are a.s. finite since and are almost surely bounded. Thus, the proof is complete. ∎
3.1 The density result for QID point processes
In this subsection we answer positively the following question: given any point process with independent increments is it possible to find a sequence of QID point processes with independent increments which converges in distribution to it?
Thus, in this subsection we restrict our focus to point processes with independent increments and check that the density result holds. There are two main reasons for doing this. First, the class of point processes with independent increments represents one of the most studied class of completely random measures due to their nice theoretical properties and their importance in applications. Second, we have an explicit formulation for the quasi-Lévy measure and the drift of QID random variables supported on finite subsets of (see Theorem 3.9 in [15]).
Let us first show the density result for random variables supported on .
Proposition 3.10**.**
The class of QID distributions supported on finite subsets of is dense in the class of probability distributions with support on with respect to weak convergence.
Proof.
Let be a probability distribution with support on . For , define the discrete distribution concentrated on the lattice by
[TABLE]
Then, as . It remains to prove that each is a weak limit of QID distributions with support on . W.l.o.g. assume that the approximating sequence of distributions is such that for every . Assume that the characteristic function has zeros (in the other case we can directly use Theorem 3.9 in [15] to conclude). Let be a random variable with distribution and let for . Then, the polynomial has zeroes on the unit circle. Factorizing, we obtain , where , , denote the complex roots. Let , where and . Then, for small enough , is a polynomial with real coefficients, namely with . Observe that for small enough , and will be close, so . Now, let be a random variable with distribution . We conclude by noticing that, for every , is random variable with values on the lattice and its characteristic function has no zeros (thus it is QID by Theorem 3.9 in [15]), and that as . ∎
From Corollary 3.21 in [12], for an atomless point process with independent increments the corresponding unique pair, which we denote by , is such that and is restricted to .
Let be the set of all the point processes in . In other words, let
[TABLE]
[TABLE]
[TABLE]
Obviously, we have .
Theorem 3.11**.**
* is dense in the space of all point processes with independent increments with respect to the convergence in distribution.*
Proof.
It follows from the same arguments as the ones used in the proof of Theorem 3.5. In particular, now we need to use Proposition 3.10 instead of Theorem 3.2. Further, now and and are concentrated on . Then, following the same arguments as the ones used in the proof of Theorem 3.5 we obtain the result. ∎
We conclude this subsection with the density result for finite point processes (for which the weak topology might also be used), namely the equivalent of Theorem 3.9 for point processes with independent increments.
Proposition 3.12**.**
* is dense in the space of point processes with independent increments, considered as random elements in , endowed with either the vague topology or with the weak topology, with respect to the convergence in distribution.*
Proof.
It follows from the same arguments as the ones used in the proof of Theorem 3.9, with Theorem 3.11 instead of Theorem 3.5. ∎
4 Properties of the dense class
In this section we explore some of the properties of the random measures in , with a particular focus on spectral representations.
Consider the same notation as in the previous section. Let be an atomless CRM (hence, ID). Using Theorem 12.10 and Corollary 12.11 in [11] we have that
[TABLE]
for every and , where is a finite diffuse measure on S and is a finite measure on with diffuse projections onto . Observe that we can extend to a finite measure on , by assigning value zero outside ; by abuse of notation, we call this finite measure .
Further, let , where , , , and where the ’s are mutually independent QID random variables with finite quasi-Lévy measure and zero Gaussian variance. With centering function equal zero (as in (8)), denote by and the drift and the quasi-Lévy measure of , for . Notice that we can use such centering function because the ’s have finite quasi Lévy measure. Then, the Lévy-Khintchine formulation of is given by
[TABLE]
for every and , where and . Then, has the following formulation
[TABLE]
for every and , where and .
Proposition 4.1**.**
Let and adopt the notation above. Then, extends uniquely to a finite signed measure on .
Proof.
Consider the notations above. For the first statement we need to show that is a finite signed measure on . Since is a finite measure on , it remains to show that is a finite signed measure on . We know that where are finite signed measures on . It is possible to see that is a bimeasure on and that
[TABLE]
where the supremum is taken over all the finite families of disjoints elements of . Then, by Theorem 5.18 in [19] (see also Theorem 4 in [9]) extends to a finite signed measure on . Thus, is a finite signed measure on . ∎
Following the notation of the ID case (see [12] page 89), we call the quasi-Lévy measure of . Observe that in the ID case the Lévy measure might not even be -finite (see [14] pages 82-83), while here our quasi-Lévy measure is a finite signed measure. Further, we remark that a similar result to Proposition 4.1 holds for (see Remark 3.7). In this case, is a measure (not necessarily -finite) on (see Corollary 3.21 in [12]), and is the same as in the proof of Proposition 4.1. Thus, in this case is a signed measure not necessarily finite.
In the following result we show the existence of a unique correspondence between any element in and a characteristic pair.
Theorem 4.2**.**
*Let . Then, there exists a pair s.t. (9) holds, where and are a finite signed measure on S and , respectively, s.t. for every and :
(i) , for some diffuse finite measure on S, , and finitely many atoms ,
(ii) , for some finite measure on , which is the extension by zero of some measure on with diffuse projections onto , and for some finite signed measures ’s on , such that are measures.*
Conversely, for every such pair there exists a unique random measure s.t. (9) holds.
Proof.
Concerning the atomless component of , from Corollary 12.11 in [11] and Theorem 3.20 in [12] we know that there exists a one to one correspondence between an ID atomless random measure with independent increments and a characteristic pair, composed by a diffuse measure on S and a measure on with diffuse projections onto . In our case we note that the components of the characteristic pair are finite measures by definition.
For the fixed component of , by Theorem 2.10 we know that a characteristic triplet where the Gaussian component is zero and the quasi-Lévy measure is finite is the characteristic triplet of a QID random variable if and only if the exponential of the finite quasi-Lévy measure is a measure.
Then, by the definition of and by the discussion and the computations at the beginning of this section on the characteristic functions of the components of , we immediately obtain the result.
Notice that for the converse direction we need also to show the independence of the fixed and atomless components, but this follows immediately from the linear structure of and . ∎
Remark 4.3**.**
Notation: instead of using the characteristic pair we could have equivalently used the characteristic set , with the above structure, in order to have a one to one identification with .
4.1 Properties of the dense class
Since all the results presented in the previous section holds for . In this subsection, we show that even better results holds for the elements in . This is mainly due to the fact that we have more information about the structure of these random measures.
Let us recall Theorem 3.9 in [15]. Despite we have used this theorem before we present it here to facilitate the reader in the understanding of the results of this subsection.
Theorem 4.4** (Theorem 3.9 in [15]).**
*Let be a discrete distribution concentrated on for some , i.e., , where , and . Then the following are equivalent:
(i) is quasi-infinitely divisible.
(ii) The characteristic function of has no zeroes.
(iii) The polynomial in the complex variable has no roots on the unit circle, i.e. , for all with .*
Further, if one of the equivalent conditions (i)-(iii) holds, then the quasi-Lévy measure of is finite and concentrated on , the drift lies in , and the Gaussian variance of is 0. More precisely, if denote the complex roots of , counted with multiplicity, then the quasi-Lévy measure of is given by
[TABLE]
and the drift is equal to the number of those zeroes of this polynomial which lie inside the unit circle (counted with multiplicity), i.e., have modulus less than 1.
In the following theorem we adopt the following notation. Let . We denote by its atomless component and by , the QID random variables of its fixed component, i.e. . Further, for every , we denote the law of by , namely and denote by the complex roots of . Finally, we denote by the quasi-Lévy measure of , i.e.
[TABLE]
and by its drift, i.e. .
Theorem 4.5**.**
*Let . Then, there exists a pair s.t. (9) holds, where and are a finite signed measure on S and , respectively, s.t. for every and :
(i) , where , is an atom, and , for ,
(ii) , where is a finite measure on restricted on and with diffuse projections onto , and where satisfies (11), for .*
Conversely, for every such pair , where denote the complex roots of some polynomial for , there exists a unique random measure s.t. (9) holds.
Proof.
It follows from the same arguments as the one used in Theorem 4.2 and from Theorem 4.4. In particular, the first direction is trivial. For the other direction, we have the following. As mentioned in the proof of Theorem 4.2, we have a one-to-one correspondence for the atomless part of and its characteristic pair. Concerning the fixed component, let us assume that there exist and which are functions of some complex roots of some complex polynomial with no roots in the unite circle, where , , and . Then, by Theorem 4.4 there exists a QID probability distribution . Since this holds for every then from the set of atoms we obtain the fixed component of a random measure in . ∎
The same comment in Remark 4.3 for Theorem 4.2 holds here for Theorem 4.5. In addition, we refer to [18] for further properties of certain subclasses of point processes with quasi-Lévy measures.
5 A Nonparametric Bayesian example
In this section we show how the setting and the results presented in Sections 3 and 4 apply to a particular class of nonparametric prior distributions. The framework is the one of the paper by Broderick, Wilson and Jordan [2]. This framework is also explored in subsequent papers, see [3] among others. In their work they analyse Bayesian nonparametric prior and likelihood based on CRMs. In particular, they let the prior to be modelled as:
[TABLE]
where the cardinality may be either finite or infinity and where is a pair consisting of the frequency (or rate) of the -th trait together with its trait , which belongs to some space of traits. Further, they let the data point for the -th individual to be modelled as:
[TABLE]
where represents the degree to which the -th data point belongs to the trait .
This setting can be applied to many real world applications. In particular, in topic modelling we have that represents a topic; that is, is a distribution over words in a vocabulary. Further, might represent the frequency with which the topic occurs in a corpus of documents. Finally, represents the number of words in topic that occur in the th document. So the th document has a total length of words. In this case, the actual observation consists of the words in each documents, and the topics of the whole corpus of documents are latent.
From a mathematical (and formal) point of view and are defined as CRMs. In particular, for the data , we let be drawn according to some distribution that takes as a parameter and have support on , that is , independently across and . We assume that are i.i.d. conditional on . Moreover, [2] consider the following assumptions for and :
Assumption A00: the atomless component of has characteristic pair s.t. and , where is any -finite measure on and is a proper distribution on with no atoms.
Assumptions A0, A1, and A2: has a finite number of fixed atoms, , and , respectively.
We remark that by Assumption A00 we have that the location of the non-fixed atoms and the frequencies are stochastically independent. We call the weights rate measure of . Moreover, the assumptions A0, A1 and A2 comes from a modelling need. By assuming A0 we are saying that we initially know certain traits, by A1 that there are a countable infinity of possible traits, and by A2 that the amount of information from finitely represented data is finite (because by A2 the number of non-fixed atoms is finite).
The first main result in [2] is Theorem 3.1, which shows explicit formulations for the posterior distribution , and it is extended in Corollary 3.2 to the posterior . In the following result we are going to show that similar results hold for any random measure in without assuming A0, A1 or A2.
Notice that we can write , where , namely is the sum of the fixed and non-fixed atoms, thus is random. Following the notation of [2], we denote the fixed component of by and the law of by .
Proposition 5.1**.**
Let satisfying . Write , and let be generated conditional on according to with for proper, discrete probability mass function . It is enough to make the assumption for since the are i.i.d. conditional on .
Then let be a random measure with the distribution of (i.e. ). is a CRM with three parts.
1.* For each , has a fixed atom at with weight distributed according to the finite-dimensional posterior that comes from prior , likelihood , and observation . Moreover, is QID with no Gaussian component and finite quasi-Lévy measure, and .*
2.* Let be the union of atom locations across minus the fixed locations in the prior of . is finite. Let be the weight of the atom in located at , for some . Then has a fixed atom at with random weight , whose distribution .*
3.* The ordinary component of has finite weights rate measure .*
Remark 5.2**.**
Observe that since then it has finite fixed atoms so assumption A0 is satisfied. Moreover, since is also finite and , then assumption A2 is also satisfied. The only difference with Theorem 3.1 and Corollary 3.2 in [2] is that we do not necessarily satisfy assumption A1. However, A1 is a modelling assumption rather than a technical one. Indeed, the proof of this result follows from similar arguments as the one used in the proof of Theorem 3.1 and Corollary 3.2 in [2]. We write them for completeness.
Proof.
Let us first prove the result for . Any fixed atom in the prior is independent of the other fixed atoms and of the ordinary component. Thus, all of except is independent of . Thus, has a fixed atom at and . Recall that since is continuous, all the fixed and non-fixed atoms of are at a.s. distinct locations. Observe that by letting we can define the fixed and ordinary component of by and , respectively.
Let and let be all the locations of atoms in of size , which is finite and it is a subset of the locations of atoms of . Further, let . Observe that the values are generated from a thinned Poisson point process with rate measure (also known as intensity measure) , this is due to the -thinning of the Poisson point process which has rate measure . Moreover, given that , we have that . Finally, observe that there is a possibility that atoms in are not observed in , this happens when the likelihood draw returns a zero. These atom weights form a Poisson point process with rate measure .
Considering as the new prior we obtain the formulation for the posterior by induction and by observing that the assumptions are still satisfied by . Then, by induction we conclude the proof. ∎
In the next result, we show that random measures in satisfying A00 are dense in the space of all CRMs satisfying A0, A1 and A2, namely all the random measures considered in [2] (and in [3]). Further, we show how this result translates into a convergence for the ordinary component of the posterior of these random measures.
Proposition 5.3**.**
Consider any random measure satisfying A00, A0, A1 and A2, namely as in Theorem 3.1 in [2]. Then, there exists a sequence of random measures in and satisfying A00 such that , as . Further, , as .
Proof.
The first part of this proof consists in realising that the arguments in the proof of Proposition 3.4 and Theorem 3.5 adapt to the present case.
Denote by the Lévy measure of . Following the proofs of Proposition 3.4 and Theorem 3.5 it is possible to see that the approximating sequence should have Lévy measure where and . However, given the assumptions on , namely that is a finite measure, we can (and we do) take the Lévy measure of to be given by . Then, applying the same arguments as the one used in the proof of Proposition 3.4 and Theorem 3.5, we obtain that the ordinary component of converge in distribution to the one of . The convergence of the fixed component follows directly from Theorem 3.5. Since is finite, we have that is in and that it satisfies A00.
For the convergence of the posteriors, consider with its respective data points , which are defined conditional on as in Proposition 5.1 and belong to some probability spaces possibly different from the one of the other data points. From Proposition 5.1 we have that has finite weights rate measure , while from Corollary 3.2 in [2] we know that has finite weights rate measure . Since we obtain the result by Proposition 3.4. ∎
We summarise our findings so far in words. First, we obtain an explicit expression for the posterior of any random measure in satisfying A00. Second, such random measures are dense with respect to convergence in distribution in the space of all priors considered in [2]. Third, when approximating in distribution such a prior, the ordinary component of the posteriors of these random measures converge to the one of the prior.
Thus, by these results we have a random truncation procedure; this is so because the number of non-fixed atoms of the prior is random and almost surely finite for every . Thus, the present truncation procedure extends the one of [3]. Indeed, we do not arbitrarily fix the number non-fixed atoms of the truncated prior and we are able to keep explicit formulations for the posterior of the truncated prior.
In the next result we show that, under certain conditions, we have automatic conjugacy for random measures in satisfying A00.
Proposition 5.4**.**
Let satisfying A00 and with weights rate measure having finite support. Let be generated conditional on according to with for proper, discrete probability mass function . Assume that the characteristic functions of the random variables of the fixed component of have no zeros, namely assume that for every , and
[TABLE]
Then, , satisfies A00 and has weights rate measure with finite support.
Proof.
Assumption (12) implies that the characteristic functions of and of have no zeros. Further, they are also supported on a finite subset of . Then, by Theorem 4.4 we obtain the result. ∎
Remark 5.5**.**
Let and be as in Proposition 5.4. Notice that we can write , where , and , for . Further, we can write , where indicates the highest value in , and . Assumption (12) can be rewritten as: For every , and , assume that
[TABLE]
Moreover, by Theorem 4.4 this assumption (and so assumption (12)) is equivalent to the following assumption: For every and , assume that the polynomials and in the complex variable have no roots on the unit circle.
Remark 5.6**.**
The results presented in this section holds also if the weights rate measure is infinite, namely (under the additional assumptions A1 and A2). In particular, the equivalent of Proposition 5.1 would be identical to Corollary 3.2 except for the result of point 1, because here we additionally know that is QID with no Gaussian component and finite quasi-Lévy measure. Further, the equivalent of Proposition 5.3 would follows from the arguments presented taking into consideration Remark 3.7. The equivalent of Proposition 5.4 is more subtle and it is presented below.
Consider the following class of QID random measures:
[TABLE]
[TABLE]
[TABLE]
Let indicate the set of random measures like in but with being any atomless point process with independent increments. As a side comment, we remark that is possible to see that a result similar to Theorem 4.2 and Theorem 4.5 holds for the elements in , where thanks to Theorem 8.1 in [15] we are able to know the structure of their Lévy-Khintchine representation in more details.
Proposition 5.7**.**
Let and assume A00, A0, A1 and A2. Let be generated conditional on according to with for proper, discrete probability mass function . Assume that the characteristic functions of the random variables of the fixed component of have no zeros, namely assume that for every , and
[TABLE]
Then, and satisfies A00, A0, A1 and A2.
Proof.
Assumption (13) implies that the characteristic functions of and of have no zeros. Further, they are also supported on . Then, by Theorem 8.1 in [15] we obtain the result. ∎
Observe that assumption (13) can be rewritten more explicitly as done in Remark 5.5 for assumption (12).
Acknowledgement
The author would like to thank Almut Veraart, Fabio Bernasconi and Ismael Castillo for useful discussions. The research developed in this paper is supported by the EPSRC (award ref. 1643696) at Imperial College London and by the Fondation Sciences Mathématiques de Paris (FSMP) fellowship, held at LPSM (Sorbonne University).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Berger D. On Quasi-Infinitely Divisible Distributions with a Point Mass. In press in Mathematische Nachrichten , (2019+).
- 2[2] Broderick, T., Wilson, A.C. and Jordan, M.I. Posteriors, conjugacy, and exponential families for completely random measures. Bernoulli , 24, 3181-3221, (2018)
- 3[3] Campbell, T., Huggins, J.H., How, J., and Broderick, T. Truncated random measures. Bernoulli , 25, 1256-1288 (2019)
- 4[4] Chaiba H., Demni, N., Mouayn, Z. Analysis of generalized negative binomial distributions attached to hyperbolic Landau levels. Journal of Mathematical Physics 57, 072-103 (2016).
- 5[5] Cuppens, R. Decomposition of Multivariate Probabilities. Academic Press, New York, (1975).
- 6[6] Demni, N., Mouayn, Z. Analysis of generalized Poisson distributions associated with higher Landau levels. Infinite Dimensional Analysis, Quantum Probability and Related Topics , 18(04), (2015).
- 7[7] Ghosal, S., Van der Vaart, A. Fundamentals of Nonparametric Bayesian Inference. Cambridge University Press (2017).
- 8[8] Harris. Random measures and motions of point processes. Z. Wahrsch. verw. Geb. 18, 85-115, (1971).
