Stick-breaking processes, clumping, and Markov chain occupation laws
Zach Dietz, William Lippitt, Sunder Sethuraman

TL;DR
This paper explores the relationships between clumped residual allocation models, a broad class of stick-breaking processes including Dirichlet processes, and the occupation laws of certain Markov chains, revealing new connections and limit behaviors.
Contribution
It introduces an intermediate structure in RAMs involving clumping, linking stick-breaking processes to Markov chain occupation laws, and characterizes their limits in new settings.
Findings
Joint law of intermediate RAM and visited states expressed via disordered GEM sequence.
Identifies a class of stick-breaking processes as limits of empirical occupation measures.
Connects inhomogeneous Markov chain behavior with generalized stick-breaking processes.
Abstract
We consider the connections among `clumped' residual allocation models (RAMs), a general class of stick-breaking processes including Dirichlet processes, and the occupation laws of certain discrete space time-inhomogeneous Markov chains related to simulated annealing and other applications. An intermediate structure is introduced in a given RAM, where proportions between successive indices in a list are added or clumped together to form another RAM. In particular, when the initial RAM is a Griffiths-Engen-McCloskey (GEM) sequence and the indices are given by the random times that an auxiliary Markov chain jumps away from its current state, the joint law of the intermediate RAM and the locations visited in the sojourns is given in terms of a `disordered' GEM sequence, and an induced Markov chain. Through this joint law, we identify a large class of `stick breaking' processes as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Stick-breaking processes, clumping, and Markov chain occupation laws
Zach Dietz, William Lippitt, Sunder Sethuraman
Zach Dietz:
e-mail: [email protected]
William Lippitt: Department of Mathematics, University of Arizona, Tucson, AZ 85721
e-mail: [email protected]
Sunder Sethuraman: Department of Mathematics, University of Arizona, Tucson, AZ 85721
e-mail: [email protected]
Abstract.
We consider the connections among ‘clumped’ residual allocation models (RAMs), a general class of stick-breaking processes including Dirichlet processes, and the occupation laws of certain discrete space time-inhomogeneous Markov chains related to simulated annealing and other applications. An intermediate structure is introduced in a given RAM, where proportions between successive indices in a list are added or clumped together to form another RAM. In particular, when the initial RAM is a Griffiths-Engen-McCloskey (GEM) sequence and the indices are given by the random times that an auxiliary Markov chain jumps away from its current state, the joint law of the intermediate RAM and the locations visited in the sojourns is given in terms of a ‘disordered’ GEM sequence, and an induced Markov chain. Through this joint law, we identify a large class of ‘stick breaking’ processes as the limits of empirical occupation measures for associated time-inhomogeneous Markov chains.
Key words and phrases:
residual allocation model, RAM, GEM, Dirichlet, inhomogeneous, Markov, stick breaking, occupation, empirical, clumping
2010 Mathematics Subject Classification:
60G57, 60E99, 60J10
1. Introduction and summary
In this article, we introduce an intermediate ‘clumped’ structure in residual allocation models of apportionment of a resource, such as Griffiths-Engen-McCloskey (GEM) models. Although this intermediate structure is perhaps of its own interest, through it, we identify the empirical occupation law limits in a class of time-inhomogeneous discrete space Markov chains, associated with simulated annealing and other applications, as new types of stick-breaking processes built from Markovian samples, including Dirichlet processes. On the one hand, GEM models and Dirichlet processes have wide application in population genetics, ecology, combinatorial stochastic processes, and Bayesian nonparametric statistics; see books and surveys [8], [9], [18], [19], [27], [41] and references therein. On the other hand, the time-inhomogeneous Markov chains that we consider are stylized models of simulated annealing and Gibbs samplers or types of mRNA dynamics; see [5], [11], [15], [17], [25], [46]. In a sense, one purpose of the paper is to observe a perhaps unexpected connection between these apriori different objects.
We now discuss some of the relevant background on GEM and Dirichlet measures, and time-inhomogeneous Markov chains, before turning to an informal discussion of our results on the intermediate structure in GEM sequences and their connections with the occupation laws of the Markov chains.
1.1. GEM and Dirichlet measures
Consider the infinite-dimensional simplex of all all discrete (probability) distributions on . A residual allocation model (RAM) is a distribution on , introduced in the 1940’s [24] as a means to address problems of apportionment: Let be independent -valued random variables, called ‘residual fractions’. Consider the associated process , given by and
[TABLE]
see Lemma 3.1 for the induction leading to the last equality. If , the distribution is the associated RAM. In general, need not sum to for a given realization. We note a simple condition equivalent to is that , the case for nontrivial, independent, identically distributed (iid) fractions (cf. Lemma 3.1).
The RAM when the fractions are iid Beta random variables is the well-known Griffiths-Engen-McCloskey GEM model. There are many characterizations and studies of the GEM sequence and its variants in recent years. For instance, the GEM model is the unique RAM with iid fractions that is invariant in law under size-biased permutation. Also, the GEM sequence is the unique invariant measure of ‘split and merge’ dynamics. In addition, there are important connections with Poisson-Dirichlet models. See for instance, among others, [1], [2], [10], [14], [20], [28], [29], [30], [35], [38], [39], [40], [42], and references therein.
Moreover, the GEM sequence is a fundamental building block of Dirichlet processes, which often serve as a measure on priors in Bayesian nonparametric statistics [18], [19]. With respect to a measurable space , consider the space of probability measures endowed with -field generated by the sets for and . We say that is a random probability sample from the Dirichlet process, with ‘parameters’ and probability measure on , if for any finite partition that the vector has the Dirichlet distribution with parameters .
The ‘stick breaking’ representation of the Dirichlet process with parameters , in terms of a GEM sequence , and an independent sequence of iid random variables with common distribution , is given by
[TABLE]
There is a large literature on Dirichlet processes stemming from the seminal works [4], [16]. See [40], [45] with respect to the ‘stick breaking’ construction, and books [18], [19], [36], [41] for more on their history, other representations including that with respect to the ‘Chinese restaurant process’, and their use in practice.
In this article, we will concentrate on discrete spaces , that is those composed of either a finite or a countably infinite number of elements. We note, when is finite, and and for , the property that is given by a Dirichlet distribution was first stated in a population genetics context in [12]; see also [26].
1.2. Time-inhomogeneous Markov chains
Let be a generator kernel on , that is for , and . Suppose the entries of are suitably bounded so that the kernel
[TABLE]
is a stochastic kernel for all large enough, and set otherwise. Let be the time-inhomogeneous Markov chain on the discrete space associated to kernels . Consider without zero rows. Then, every point in represents a valley from which the chain rarely but almost surely exits to enter another point valley. In this way, a certain ‘landscape’ is explored. The chain can be considered as a simplified model of simulated annealing or metastability (cf. [6], [17], [31], [37], [46]). From another view, continuous-time variants of such inhomogeneous chains have been used in the modeling of certain mRNA dynamics [25].
Interestingly, for finite , it was noted in [17] and [46] that the sample means of these chains do not converge a.s. or in probability, as would be the case for a homogeneous Markov chain. For generators without zero entries, weak convergence to an empirical occupation law
[TABLE]
was identified by computing its moments in [11]. Curiously, when is of the form for and a stochastic matrix with constant rows , it was also shown that is a Dirichlet distribution with parameters by matching the moments. Similar occupation laws were also derived in the continuous-time mRNA model in [25] as the stationary distributions of a promoter process on states, influencing levels of mRNA production.
In this context, part of our motivation is to understand this limit and its generalizations more constructively (Theorem 2.12).
1.3. Clumped structure and generalized ‘stick-breaking’ processes
We now describe a class of generalized stick-breaking processes. Let be a GEM sequence and, to be focused, let be an independent Markov chain with irreducible, recurrent transition kernel on a discrete space with initial distribution , although we also consider more general Markov chains, not necessarily irreducible or composed only of recurrent states, in several of our results.
Another motivation of ours is to understand the random measures
[TABLE]
seen as a natural generalization of stick-breaking representation of the Dirichlet process, with respect to Markovian samples instead of the iid ones in (1.1).
In general, is not exchangeable in the sense that the GEM sequence may not be replaced by an arbitrary permutation without changing the measure. In contrast, when is iid and is the Dirichlet process, such an exchangeability property holds; for example, the Poisson-Dirichlet order statistics of may be used instead without changing the Dirichlet process (cf. [40]). We also note that other generalizations of Dirichlet processes have been considered, among them, Polya tree [33], Pitman-Yor [40], [43], and Beta processes [7].
We now introduce a clumped intermediate structure which will help analyze . Suppose are the times when the Markov chain jumps to a different state with the convention . In particular, ‘skip-repetition’ is allowed: The chain can begin in state , jump to at time , and then may jump back at time into state . We note that these times are not only those times when a state is observed for the first time, as used in the definition of size-biased permutations.
Consider for . We show that (cf. Theorems 2.4 and 2.7), conditional on the locations , the sequence is a RAM where the associated fractions are Beta\big{(}1,\theta(1-Q_{Y_{i},Y_{i}})\big{)} for , a sort of ‘disordered’ GEM. Also, the law of can be computed as another Markov chain on with a transition kernel found in terms of . We will call the joint law of \big{(}\langle R_{i}:i\geq 1\rangle,\{Y_{i}\}_{i\geq 1}\big{)} as a type of Markov Chain conditional GEM, or ‘MCcGEM’ distribution.
In terms of the clumped intermediate structure, we see that
[TABLE]
This representation will allow us to identify as the limit of occupation laws of a matched time-inhomogeneous Markov chain (Theorems 2.12, 2.13).
We will also see that satisfies a ‘self-similarity’ equation (cf. Theorem 2.17), uniquely characterizing its distribution. This equation is reminiscent of the regenerative structure present in ‘stick-breaking’ [45], in integral constructions of the Dirichlet processs [32], [44], and in other related settings [21], [22].
Moreover, when is finite, we discuss the joint moments of the distribution in Theorem 2.19. Although a formula for the moments is given in [11], the description in Theorem 2.19 is more detailed, allowing identification of the marginal distributions as Beta products (cf. Theorem 2.18 and Corollary 2.20).
1.4. Occupation laws of time-inhomogeneous Markov chains
With respect to the time-inhomogeneous Markov chain with kernels (1.2), starting from initial distribution , consider the random empirical occupation measure on ,
[TABLE]
To connect with the intermediate clumping structure from the previous section, we will again implement a clumping procedure, this time to investigate local occupations, or clumped occupations, of the empirical measure of up to time .
However, in a Markov chain with kernels , later clumps of the chain are typically larger than earlier clumps. To keep the clump sizes from tending to zero after normalization, we consider the clumps in reverse chronological order, starting from time , so that the clumped occupations converge nontrivially in distribution.
Formally, let be the successive times when the Markov chain changes state, and let . Going backwards from time , let be the length of the last visit to state , be the length of the visit to state , and be the length of the visit to for . Let also and for . In addition, define for .
The figure below depicts, in a realization, the clumping boundaries marked in forward times, and the lengths of local occupations given backwards in time starting from time .
……1V_{N_{n}-3}$$V_{N_{n}-2}$$V_{N_{n}-1}$$n$$\tau_{n,1}$$\tau_{n,2}$$\tau_{n,3}
Then, is written as
[TABLE]
We show (cf. Theorem 2.10), for generators satisfying natural conditions, conditionally on the values , that the distributions of converge, as , to a disordered GEM with parameters given in terms of and . Also, converges, as , to a homogeneous Markov chain , with transition kernel in terms of and . In particular, the joint law of and converges, as , to a Markov Chain conditional GEM distribution, denoted as the MCcGEM distribution with respect to .
In Theorem 2.12, we will then be able to show that converges to a random measure given in terms of and either in ‘stick-breaking’ or ‘clumped’ forms (1.4), (1.5). In particular, when where is a constant stochastic matrix with identical rows , the associated sequences and simplify, and the limit is identified in Subsection 2.2.2 as a Dirichlet process. Returning to one of our motivations, we comment that when is finite these results represent a more constructive view of the limits (1.3) found in [11].
Organization of the paper. We develop notions, make remarks, and state the main results, Theorems 2.4, 2.7, 2.10, 2.12, 2.13, 2.17, 2.18, and 2.19, in this order, in Section 2. Proofs are then given in Section 3.
2. Statement of results
We now formalize notation and state our main results, and related remarks about them, in several subsections. Throughout, we will use the convention that empty sums equal [math], and empty products are . Also, , , and . The notation signifies that the vector is in row form.
2.1. RAMs, GEMs and MCcGEM laws
A residual allocation model (RAM) is a way of defining a random probability measure on by iteratively assigning a random portion of the unassigned probability remaining to the next integer.
Definition 2.1** (Residual Allocation Model - RAM).**
Let be a collection of independent -valued random variables. Define
[TABLE]
Then, if is a.s. a probability measure on , that is if , we say is a RAM. If consists of iid fractions, and the associated is a RAM, we say is a self-similar RAM.
Consider now the following identity, verified in Lemma 3.1: For an arbitrary sequence of numbers and ,
[TABLE]
Then, the sequence in (2.1) satisfies for (cf. Proposition 3.2). Accordingly, we have the useful observation that is a RAM exactly when .
A specific, well-known example of a RAM is the Griffiths-Engen-McCloskey (GEM) sequence.
Definition 2.2** (GEM).**
Fix . Let be a sequence of iid variables with common distribution Beta. Then, the self-similar RAM , constructed from , is said to be a GEM distribution.
Also, consider a sequence of positive numbers, and let be a sequence of independent random variables where for . When the measure , found in terms of , is a RAM, we will say it is a disordered GEM sequence with parameters .
Now, in a RAM , one can clump adjacent probabilities with respect to an increasing sequence , marking boundaries of clumps, to form a new probability measure on .
Definition 2.3** (Clumped measure).**
Let be an increasing sequence in with and , and let be a RAM. We clump according to to construct a new probability measure on where, for ,
[TABLE]
We remark, when takes the value infinity at an entry in the sequence, necessarily is a distribution supported on .
An immediate question now is when is also a RAM. We will show that is always a RAM as long as is deterministic. However, the situation is more involved when a random sequence is used for the clumping.
Specifically, we will be interested in two types of random clumping sequences constructed from a Markov chain on the discrete space . The first sequence comes from considering clumps of repeated values in ; that is, will keep track of the times when switches values. The second sequence arises in considering the times when returns to its initial value .
For example, if is observed, we define and . More formally, Let and, for , set
[TABLE]
In the case that reaches an absorbing state, denoted , the chain is eventually constant and is eventually infinite. In the case that is a transient state, the chain returns to the first state finitely many times and eventually takes the value infinity.
Define now by for . When does not reach an absorbing state, we think of as the sequence of values taken by without repetition. If however meets an absorbing state , will eventually be constant at value .
In the following theorem, a reader may like to focus on first pass on the case when possesses no absorbing states and formulas simplify.
In what follows, we will say that a sequence is a ‘possible’ sequence for a Markov chain on if the event has positive probability for each .
Theorem 2.4** (Clumped RAMs).**
Let be a RAM. Fix an increasing sequence in with and . Then,
- (1)
* is a RAM with respect to fractions where*
[TABLE]
Let now be a Markov chain, independent of and with homogeneous transition kernel .
- (2)
Then, the sequence is a Markov chain with homogeneous transition kernel given by
[TABLE]
Let be a possible sequence in with respect to . Let be a possible sequence in with respect to .
- (3)
Then, {\bf P^{V}}\bigr{|}{\bf T}={\bf t} and {\bf P^{W}}\bigr{|}{\bf T}={\bf t} are RAMs.
- (4)
Also, if is self-similar, {\bf P^{V}}\bigr{|}{\bf Y}={\bf y} is a RAM and, when is a recurrent state with respect to , {\bf P^{W}}\bigr{|}T_{1}=t_{1} is a self-similar RAM.
We remark that the specifications of the fractions and their distributions in items (4) are given in the proof of Theorem 2.4. These specifications, in the case when is a GEM sequence, are part of Theorem 2.7.
Also, in item (4) above, we note that the self-similarity of is important to deduce in full generality that {\bf P^{V}}\bigr{|}{\bf Y} is a RAM. Later, in Example 2.9, we see that {\bf P^{V}}\bigr{|}{\bf Y} may not be a RAM if is not a self-similar RAM.
In addition, we observe that in item (4), when is a transient state, the sequence eventually takes constant value since is visited only a finitely many times a.s. Given is a nontrivial variable, {\bf X^{W}}\bigr{|}T_{1}=t_{1} cannot be iid. However, one may consider an iid sequence , say on a different probability space, where Z_{1}\stackrel{{\scriptstyle d}}{{=}}X^{W}_{1}\bigr{|}T_{1}=t_{1}, and check that the self-similar RAM formed from fractions has the same distribution as {\bf P^{W}}\bigr{|}T_{1}=t_{1}.
We now consider the clumping procedures with respect to a GEM distribution . It will be convenient to define the notion of a generator kernel or matrix, these terms used interchangeably.
Definition 2.5** (Generator kernel).**
Let be a square matrix on . We say that is a generator kernel if it satisfies for and . In addition, we will assume a boundedness condition, .
Every matrix of the form , where and is a stochastic kernel on , is a generator matrix. Moreover, we claim that every generator matrix can be (non-uniquely) decomposed in this fashion: The final condition in Definition 2.5 ensures that all entries are bounded, , so that a normalizing can be found.
We also observe that a generator matrix has a zero row, that is for some , exactly when is an absorbing state for a corresponding . In particular, when does not have zero rows, any corresponding does not have absorbing states.
We now formally define the notion of a Markov Chain conditional GEM (MCcGEM) joint distribution on the space , endowed with the product topology and product -field formed in terms of the Borel -fields on and . This topology is discussed more in Subsection 3.4. By convention, we will say that a Beta random variable equals a.s.
Definition 2.6** (MCcGEM distribution).**
With respect to a generator matrix , let be a homogeneous Markov chain with initial distribution and transition kernel on given by
[TABLE]
Consider variables , on the same probability space as , such that X_{j}\bigr{|}{\bf Y}={\bf y}\sim Beta and \{X_{j}\bigr{|}{\bf Y}={\bf y}\}_{j\geq 1} are independent. Define where for , and observe that {\bf P}\bigr{|}{\bf Y}={\bf y} is a disordered GEM with parameters (see below).
We say that the pair has MCcGEM distribution with respect to .
To see that {\bf P}\bigr{|}{\bf Y}={\bf y} is a disordered GEM, we need only observe that {\bf P}\bigr{|}{\bf Y}={\bf y} is a probability distribution on . Here, \prod_{n\geq 1}(1-X_{n})\bigr{|}\big{(}{\bf Y}={\bf y}\big{)}=0 a.s. exactly when \sum_{n\geq 1}X_{n}\bigr{|}{\bf Y}={\bf y} diverges a.s. As the tail -field is trivial, the opposite is the summability \sum_{n\geq 1}X_{n}\bigr{|}\big{(}{\bf Y}={\bf y}\big{)}<\infty a.s. By Kolmogorov’s -series theorem, and that is composed of Beta random variables on with means and variances dominated by the means, almost sure summability holds exactly when . For a generator matrix , this is never the case as the terms are uniformly bounded above.
We now describe a relation between GEM distributions and MCcGEM laws through clumping with respect to a homogeneous Markov chain.
Theorem 2.7** (GEM to MCcGEM).**
Let and be GEM distribution. Let also be an independent homogeneous Markov chain with kernel and initial distribution . Recall the associated switch times , the clumped distribution , and the Markov chain near (2.3).
Then, is a homogeneous Markov chain with kernel and is a disordered GEM with parameters , that is has MCcGEM distribution with respect to .
Some cases of interest are developed in the following examples.
Example 2.8**.**
Suppose GEM and that is a homogeneous Markov chain with stochastic kernel where has constant diagonal entries, for . By Theorem 2.7, {\bf P^{V}}\bigr{|}{\bf Y} is a disordered GEM sequence with parameters . However, since , we conclude {\bf P^{V}}\bigr{|}{\bf Y}={\bf P^{V}} does not depend on and is actually a GEM sequence. In this case, the pair consists of independent sequences.
More generally, suppose is any random distribution on . Then, indeed, with respect to this Markov chain , by the proof of Part (4) of Theorem 2.4 (cf. (3.14)), the fractions do not depend on , and so \bf{P^{V}}\bigr{|}{\bf Y}=\bf{P^{V}}.
Example 2.9**.**
We now consider a RAM constructed from independent fractions for . Such a RAM is a member of the well-known 2-parameter GEM family, here with GEM. Let be a sequence of iid Bernoulli variables. Thought of as a Markov chain on the -state space , every entry of the stochastic kernel of equals . By the discussion in Example 2.8, as the diagonal entries of are the constant , we have {\bf P^{V}}\bigr{|}{\bf Y}={\bf P^{V}}.
We now observe that is not a RAM: If it were a RAM, consider the associated non-atomic fractions (cf. Part (1) of Theorem 2.4). Compute
[TABLE]
Then, , , and . Hence, , and so the non-atomic fractions are not independent, and cannot be a RAM.
2.2. Clumping and time-inhomogeneous Markov chains
Of course, the notion of clumping can be applied to random probability measures on , which are not RAMs. In particular, to capture the empirical occupation law limit of a Markov chain, we study its local occupations, or clumps of the sequence indexed in time, as it explores the space . As noted in the introduction, we will look at these local occupations in reverse order.
Let be a Markov chain on the discrete space , without absorbing states. Recall the definition of the switching times (cf. (2.3)), and let index the first switch after time . For and , define
[TABLE]
Also, set
[TABLE]
and . Consider the sequences and .
As a concrete example, consider an observation . Then for , the local occupations are summarized by eventually constant sequences and . Similarly, when , we have and . For a more general depiction, please refer to the figure in Section 1.4.
Hence, for , we have generally that
[TABLE]
In the middle of the display, we see the average Markov chain occupation of state in the first steps. On the right-hand side, the sum is over local occupations, or clumps, of state , seen in the chain through steps. The notion suggested by this relation, part of the genesis of this article, is that we may study the limit average occupation law of by investigating the limit of the pair describing local occupations.
We now focus on a class of time-inhomogeneous Markov chains for which the limits of have succinct representation. Specifically, we consider inhomogeneous Markov chains with transition kernels , where is a generator matrix with no zero entries on the diagonal. A finite space case where was taken to have no zero entries at all was studied in [11]; see also [15], [5] for related developments.
In these chains, the clump lengths are typically growing with , unlike for homogeneous Markov chains. In particular, rather than an ergodic theorem, it was shown in [11] (cf. (1.3)) that the occupation laws converge weakly to a nontrivial distribution. Here, we consider a countable space generalization, allowing for reducibility and transient states, and formulate a characterization of these occupation limits through the reversed clumping device described above.
In the following statement, we say that a matrix is non-negative if all its entries are non-negative. Additionally, weak convergences here are in the sense of finite-dimensional distributions, the natural sense associated to the product space endowed with the product topology.
Theorem 2.10** (Time-inhomogenous MC to MCcGEM).**
Let be a generator matrix on without zero rows. Let and be such that both , and define . Let also be a stochastic vector and be a stationary distribution of so that entry-wise,
[TABLE]
Define kernels by
[TABLE]
and let be the inhomogeneous Markov chain with transition kernels and initial distribution . Define as above with respect to , and also define the generator matrix by
[TABLE]
Then, converges weakly to the homogeneous Markov chain with kernel and initial distribution . Also, for a possible sequence of , we have {\bf P}_{n}\bigr{|}{\bf Y}_{n}={\bf y} converges weakly to a disordered GEM sequence with parameters . Therefore, the associated pairs converge weakly to with MCcGEM distribution with respect to .
Example 2.11**.**
In the context of Example 2.8, suppose has constant diagonal entries . Then, the local occupations of the inhomogeneous Markov chain would converge to a GEM distribution, not just conditionally in terms of a MCcGEM distribution.
We now characterize the limit occupation law of in a ‘stick-breaking’ form with respect to either a MCcGEM distribution, or a paired GEM distribution and homogeneous Markov chain. In the following, weak convergence of is with respect to the discrete topology on , the space of probability measures on .
Theorem 2.12** (Occupation laws to MCcGEM and stick-breaking measures).**
Consider the setting and assumptions of Theorem 2.10. Observe that is a stationary distribution of , and let be the homogeneous and stationary Markov chain with kernel and initial distribution . Let be a GEM sequence independent of .
Then, , where
[TABLE]
In a sense, reversing the procedure, starting from the stick-breaking process , we may identify it as the limit of the occupation measure of a matched time-inhomogeneous Markov chain, almost a corollary of Theorem 2.12.
Theorem 2.13** (Stick-breaking measures to Occupation laws).**
Let and is a GEM sequence. Let also be a stochastic matrix without absorbing states and with stationary distribution . Suppose is an independent homogeneous Markov chain with kernel starting from .
Then,
[TABLE]
where is the occupation law defined with respect to an inhomogeneous Markov chain , as in the setting of Theorem 2.10, with respect to generator matrix , starting from any distribution satisfying entry-wise. Here, and are given by \tilde{G}_{ij}^{\prime}=\big{(}\mu_{j}/\mu_{i}\big{)}\tilde{G}_{j,i}{\mathbbm{1}}(\mu_{i}\neq 0) where , and .
In the next two subsections, we discuss remarks on Theorems 2.10 and 2.12, and a case when the random measure is a Dirichlet process.
2.2.1. Remarks
We now make several comments on Theorems 2.10 and 2.12.
1. Although we have specified that has no zero rows in Theorems 2.10 and 2.12, and therefore no absorbing states for , one can extend some of the statements trivially to the case when there are absorbing states. In particular, when the limit is the unit point mass at an absorbing state of , we have and . Then, the state is also an absorbing state for the inhomogeneous Markov chain , reached in finite time a.s. starting from . Also, the chain , starting from , is the constant sequence of ’s. In addition, the limit of is , and tends to a.s. We conclude that converges weakly to , a GEM with constant fractions . Moreover, the empirical distribution of the chain converges weakly to . We also observe that , and also both equal in distribution.
2. There is a degree of freedom in picking a pair . However, when specifying a MCcGEM distribution, each valid pair corresponds to the same generator matrix in this context. On the other hand, this family of pairs of a GEM distribution and Markov chain, indexed in , will have different joint distributions, although they all correspond to a single measure . We explore this notion in the case of Dirichlet processes in Subsection 2.2.2 below.
3. The convergence (2.9) is a condition on the structure of positive recurrent states of the homogeneous Markov chain run with kernel . Since the limit is a stationary distribution with respect to , the chain must have a positive recurrent state, and is positive only on such states. The initial distribution must be such that observation of a positive recurrent state occurs with probability 1.
In general, depends on when there is more than one irreducible class of positive recurrent states. We note, along with positive recurrent states, there may also be null recurrent and transient states associated with .
In the case that has a single class of positive recurrent states, then will be the unique stationary distribution associated with and will not depend on .
It could be that has an infinite number of null recurrent or transient states, in addition to positive recurrent states. But, the requirement that be stochastic means that the chain cannot visit a null recurrent state or remain indefinitely on transient states a.s. This reflects that the limit of corresponds to the long time average occupations of states in .
4. Any null recurrent or transient state of the chain run with corresponds to a zero row of or in other words an absorbing state for the chains and . However, such absorbing states are never visited by : The initial distribution is a stationary distribution of , which vanishes on these states. Moreover, as is also a stationary distribution of , the chain can only move on the positive recurrent states of , the states .
Similarly, starting from , the chain moves only on states , given that when either or and .
Also, we comment that the chain run with is a form of time-reversal of with respect to stationary distribution , reflecting the reverse chronological construction of the sequences.
2.2.2. Dirichlet process limits
In a particular case of Theorem 2.12, we observe that we may recover Dirichlet processes. Suppose for all . When has constant rows equal to , the Markov chain has transition kernel , and therefore is an iid sequence with common distribution . Then, , formed from a GEM sequence and an independent sequence of iid random variables , is the ‘stick-breaking’ representation of a Dirichlet process with parameters and measure on the discrete space (cf. [45]). Specifically, as noted in the introduction, when is finite we have that is a Dirichlet distribution with parameters . (cf. [12], [26]).
Moreover, since the distribution of is determined by , there is a degree of freedom in specifying via a pair . Write in two forms: (1) where and is stochastic with constant rows , and also (2) where , , and is stochastic. Then again, and via Theorem 2.12, we recover a different stick-breaking representation, , of the Dirichlet process with parameters and , in terms of GEM sequence and an independent homogeneous Markov chain with and kernel .
Here, is the weighted average of and . Since no longer has constant rows, no longer consists of iid variables. The chain is, in a sense, a more or less ‘sticky’ version of an iid sequence depending on the weight of in the weighted average relation for .
2.3. Self-similarity of the occupation laws
At this point, it is natural to ask for other ways to understand the laws in Theorem 2.12. Consider the general random measure
[TABLE]
where is a self-similar RAM composed of fractions , and is an independent homogeneous Markov chain with transition kernel and initial distribution , assigning zero probability to any transient state of . We remark that reduces to the measure in Theorem 2.12 when GEM and is a stationary vector of . We first discuss an example.
Example 2.14**.**
As we have noted earlier, if GEM and is an independent sequence of iid variables with distribution , the measure is the ‘stick-breaking’ representation of the Dirichlet process with parameters and measure on . Following [45], a self-similarity relation can be deduced:
[TABLE]
where is another random measure, and , and are independent. From such an equation, the Dirichlet process characterization of with parameters and measure on follows from classical considerations. Moreover, this relation is central in calculation of a posterior distribution, given say , when is thought of as a law on priors. See also the recent work [32] and [44] on related integral characterizations.
We now define a more general notion of self-similarity. This notion is well known (cf. [23] among other references). With respect to a measurable space , let be the space of probability measures on . Let be the smallest -field generated by sets of the form \Big{\{}\{\chi:\chi(A)<r\}:A\in\mathscr{B}_{\mathscr{A}},r\in[0,1]\Big{\}}.
Definition 2.15** (Self-similar random measure).**
We say that the law of a random distribution on is self-similar with respect to if it satisfies
[TABLE]
where is a -valued random variable, is a random distribution on , and is random measure with the same distribution as and independent of , defined on the space .
The key is that such self-similarity may uniquely identify a distribution. The following is part of Lemma 3.3 in [45]; see also [23] for more involved statements. For the convenience of the reader, a proof is given in Subsection 3.6.
Lemma 2.16**.**
There exists a unique in law self-similar random measure on with respect to when .
We now state that defined in (2.13) is self-similar in a certain way. Let be the iid fractions from which is constructed. For each recurrent state of , let be a Markov chain with transition kernel and initial value , independent of and . Define the finite cycle length and associated clumped residual fraction,
[TABLE]
Set
[TABLE]
Theorem 2.17** (Type of self-similarity).**
The law of uniquely satisfies the following: Marginally, and, for each recurrent state of ,
[TABLE]
where is random measure with the same law as , such that and are independent.
If is thought of as a distribution on priors, the notion of a posterior distribution given a cycle of data might be considered from the self-similarity (2.15). However, we remark that such a computation does not seem as tractable as in the case is a Dirichlet process (cf. [45]).
One might ask what happens when starting from a transient state . In this case, there is positive chance that one will not return to . As above, one may write down a first ‘cycle’ decomposition but, because may not be finite, the decomposition does not immediately lead to a ‘self-similarity’ equation as in (2.15). However, one might consider a stick-breaking construction, on a different probability space, which does lead to a ‘self-similarity’ equation. Indeed, following the discussion after Theorem 2.4, consider for transient states an iid sequence of pairs with common distribution (X^{i},\eta^{i})\bigr{|}T_{1}=i, and form a stick-breaking construction, which after an exercise is seen to be equivalent-in-distribution to :
[TABLE]
Then, where , and and are independent.
2.4. Moments of the occupation laws
We first recall Theorems 1.3 and 1.4 in [11]: Suppose is a generator matrix on finite state space with no 0 entries. By identification through its moments, the limiting occupation random variable (cf. (2.12)) of an inhomogeneous Markov chain with kernels of the form was found: Let be the whole numbers. For and , we have
[TABLE]
where is the unique stochastic eigenvector of , and is the set of distinct permutations of the list of integers consisting of many 1’s, many 2’s, up through many ’s.
In particular, when can be written where and is the stochastic matrix with constant rows , the expectation reduces to the moments of the Dirichlet distribution with parameters : where is the Pochhammer symbol, that is a rising factorial. However, when is not of this form and , one can see that the moments may not describe a Dirichlet distribution.
In this context, we detail now some more descriptions of these laws. Observe that the matrix \big{(}I-G/j\big{)}^{-1} for is a resolvent operator with respect to the transition function of a continuous time Markov chain with generator . In particular, it is standard to write
[TABLE]
As a consequence, \widetilde{K}_{j}:=\big{(}I-G/j\big{)}^{-1} itself is a stochastic kernel on .
In the Dirichlet case , where and each row of is the stationary measure , one can see by calculating via the backward equation and that
[TABLE]
More generally, let be the inhomogeneous Markov chain with initial distribution and transition kernels for .
We first observe a type of ‘duality’ relation between the moments of and .
Theorem 2.18** (Recasting moments I).**
Recall the setting of [11] given above. Then and the measure with respect to is also the occupation law with respect to ,
[TABLE]
Moreover, the moments may be expressed in terms of ,
[TABLE]
and, in particular, .
Alternatively, we now recast the moment result (2.16) in an algebraic form where it can be more easily exploited. Let be the minimal polynomial of and be the polynomial such that . Define, for ,
[TABLE]
Theorem 2.19** (Recasting moments II).**
We have is the matrix with constant rows , and for . As a consequence, for with and fixed constant ,
[TABLE]
One can now recover the moments of the marginals.
Corollary 2.20** (Marginals).**
Let be the non-zero roots of , all of which are non-zero eigenvalues of . Let also be the zeros of considered as a function of . Then,
[TABLE]
Interestingly, when and are real and pairwise ordered , we recognize these marginal moments as the product of the th order moments of independent Beta variables for .
In the Dirichlet case, when is of the form where and is stochastic with constant rows , we have and . This corresponds to and , the th order moments of a Beta variable or equivalently the th marginal of a Dirichlet variable with parameters .
However, in general, and need not be sets of real numbers, and (2.21) gives the moments of beta products in a sense with complex parameters.
The marginal density function of can be written in terms of Meijer G-functions, typically denoted G^{M,N}_{P,Q}\left(\left.\begin{array}[]{c}\vec{a}\\ \vec{b}\end{array}\right|z\right) where and are non-negative integers, , and . Given and , is given by
[TABLE]
The class of Meijer-G functions includes generalized hypergeometric functions, among others. For a thorough review of Meijer G-functions, their specification, and connection to Beta products via the Mellin transform, see [34]. See also [13] for a discussion of the distributional properties of the product of two Beta variables with complex parameters with an application to risk theory.
3. Proofs
We first note a standard algebraic identity which leads to useful formulas for RAMs. Recall our conventions specified at the beginning of section 2.
Lemma 3.1**.**
For any sequence of numbers and integer , we have
[TABLE]
Proof. We proceed by an induction. Equation (3.1) is trivially true for : . If it is true for , then the left-hand side of (3.1) equals
[TABLE]
Proposition 3.2**.**
Consider a distribution on and factors with
[TABLE]
Then, for .
In particular, if is a RAM constructed from , for , we have
[TABLE]
Proof.
Part (I) follows from (3.1) by an induction: Trivially, . Suppose for and so, by (3.1), we have . Then, .
For Part (II), the lines in (3.2) follow from Part (I) and (3.1). ∎
3.1. Proof of Theorem 2.4: Clumped RAMs
Let be a RAM, and let be the independent proportions from which is constructed. From Proposition 3.2, for , we have .
Let be an increasing sequence in with and . Define new proportions from , using Proposition 3.2 again: For ,
[TABLE]
Recall, for , that when and otherwise, and .
We now proceed to the proofs of Parts (1)-(4).
3.1.1. Proof of Part (1)
We now verify that is a RAM with respect to fractions : Let . For , noting (3.6), write
[TABLE]
For , note and . Then, .
Since is composed of independent variables, so is . Hence, as , by definition, is a RAM constructed from independent proportions . ∎
3.1.2. Proof of Part (2)
Let be a possible sequence for in . Define . Then, is then either non-repeating and , or is non-repeating until reaching a finite time , after which the sequence is constant.
For , the event that for means the chain starts in , staying there until time , when it switches to , remaining there until time , and so on up to time when it moves into . Write for that
[TABLE]
Suppose . Then, is an absorbing state of and, for , we have . Define and write for that
[TABLE]
We conclude therefore that is a Markov chain with kernel . ∎
3.1.3. Proof of Part (3)
Recall the definitions of the increasing random sequences and with (cf. (2.3)), and and . For each realization, and are functions of the Markov sequence . Therefore, conditional on given the possible trajectory with respect to , it follows immediately from the proved Part (1) that {\bf P^{V}}\bigr{|}{\bf T}={\bf t} and {\bf P^{W}}\bigr{|}{\bf T}={\bf t} are RAMs. ∎
3.1.4. Proof of Part (4)
If is a RAM, we have a.s. or a.s. respectively. Hence, in the two situations, we need only show the associated fractions or are conditionally independent or iid to deduce, respectively, that {\bf P^{V}}\bigr{|}{\bf Y}={\bf y} is a RAM or {\bf P^{W}}\bigr{|}T_{1}=t_{1} is a self-similar RAM. We consider first the claim for , before discussing the statement for at the end.
Let be a possible sequence with respect to , and associate to the time as in the proof of part (2). With respect to fixed times for , noting (3.7), we have for that
[TABLE]
Suppose , and define . For , noting the calculation after (3.7), write
[TABLE]
Recall (3.6), and consider the variables where
[TABLE]
When is composed of iid variables, that is is a self-similar RAM, we will argue now that the fractions {\bf X^{V}}\bigr{|}{\bf Y}={\bf y} form a conditionally independent sequence, and therefore {\bf P^{V}}\bigr{|}{\bf Y}={\bf y} is RAM. We split into subcases, versus .
When , let , and . Write
[TABLE]
Relative to , define the sequence where and for , which marks the first times when changes states. In particular, on the event \big{\{}V_{i+1}-V_{i}=l_{i},1\leq i\leq n\big{\}}, we have for . Given this event, from (3.6), the fractions satisfy for and are independent, no longer depending on . The last display (3.13), noting (3.1.4), equals
[TABLE]
in factored form. Therefore, the fractions are conditionally independent as desired and {\bf P^{V}}\bigr{|}{\bf Y}={\bf y} is a RAM in the case .
When , note that the collection is a deterministic sequence of 1s. Thus, we need only show that the proportions are independent. Define and for , write that
[TABLE]
Define for the sequence as before, and note . One derives similarly, noting the calculation after (3.1.4), that the last display (3.15) equals
[TABLE]
in factored form. Therefore, the fractions are conditionally independent as desired and {\bf P^{V}}\bigr{|}{\bf Y}={\bf y} is a RAM also in the case .
We now aim to show when is a self-similar RAM and is a recurrent state for that is a self-similar RAM. As is a recurrent state with respect to , almost surely the sequence does not take on the value . Consider the variables where
[TABLE]
Then, noting (3.6), almost surely, .
Following the above argument, with respect to when , we arrive at the equation
[TABLE]
But, given , the variables are iid cycle lengths of the Markov chain. Hence, the last display equals
[TABLE]
indicating the fractions are conditionally iid, and therefore {\bf P^{W}}\bigr{|}T_{1}=t_{1} is a self-similar RAM. ∎
3.2. Proof of Theorem 2.7: GEM to MCcGEM
Let be a GEM sequence, with respect to corresponding iid Beta proportions . Also, let be an independent Markov chain on , starting from distribution , with homogeneous kernel .
In Part (2) of Theorem 2.4, we showed that the associated sequence is a Markov chain with transition kernel on such that
[TABLE]
By inspection, the kernel , in the definition of the MCcGEM distribution (2.7), where .
Recall now the switch times with respect to the chain (cf. (2.3)). In Part (4) of Theorem 2.4, as is a self-similar RAM, we proved that , conditional on , is a RAM. In particular, we showed that the associated fractions , given , are independent variables. Hence, to identify the joint distribution of , we need only find the conditional distribution of each fraction X^{V}_{j}\bigr{|}{\bf Y}, for .
To this end, let be a possible sequence for . Associate to as before. Recall from (3.12) that for . Write, for and ,
[TABLE]
Note now, if is a Beta random variable, then E. Then, by the independence of and , noting from (3.1.4) that , the above display equals
[TABLE]
Thus, we see that X^{V}_{j}\biggr{|}{\bf Y}={\bf y} is a Beta random variable when .
When , recall that is an absorbing state, and so and . Thus Beta Beta.
Then, for all , we see that X^{V}_{j}\biggr{|}{\bf Y}={\bf y} is a Beta random variable. Hence {\bf P^{V}}\bigr{|}{\bf Y}={\bf y} is a disordered GEM with parameters for . Therefore, we conclude that has a MCcGEM distribution with respect to . ∎
3.3. Proof of Theorem 2.10: Time inhomogeneous MC to MCcGEM
We first specify certain asymptotics which will be helpful, before going to the main body of the proof in Subsection 3.3.1.
Lemma 3.3**.**
For and integers , let
[TABLE]
Then, for and integers , we have
[TABLE]
Proof.
Write
[TABLE]
By Stirling’s approximation, for , we have as , from which the desired asymptotics follow immediately. ∎
Proposition 3.4**.**
Let be an integer. Let also , , and be collections of positive numbers such that for . Then,
[TABLE]
Proof.
The argument follows by inputting the asymptotics in Lemma 3.3. We show only the case , as the extension to is straightforward.
Again, by Stirling’s approximation, for each , . Then, for and all large , we have
[TABLE]
Hence, for with , and sufficiently large , we estimate
[TABLE]
Now, by the monotonicity of , we have for that is between the integrals and . We may compute
[TABLE]
Then, inserting into (3.3), the proposition follows for . ∎
We now show a form of ‘weak ergodicity’ for the Markov chain .
Lemma 3.5**.**
For a generator matrix , let , and be an integer, such that and are non-negative kernels on . Recall that for (cf. (2.10)). Let be a stochastic vector and be a stationary distribution for such that entry-wise. Then, as , both (a) , and (b) \big{(}\mu^{n}\big{)}^{t}Q\rightarrow\mu^{t}, hold entry-wise.
Proof.
We separate into four steps.
Step 1. Fix an integer and write the stochastic matrix,
[TABLE]
as a polynomial in with positive coefficients.
Step 2. We now show that any fixed degree coefficient of the polynomial vanishes as . For each , denote the th coefficient of by . By Lemma 3.3, as . Also, as by Lemma 3.3, we have for that
[TABLE]
Step 3. For each , let denote the vector in with a in the entry corresponding to state and [math]’s elsewhere. Since is a stochastic kernel, observe for each and that
[TABLE]
Also, as is a stationary eigenvector of , note that is also a stationary eigenvector of . Recall that as entry-wise, and . Hence, as is a polynomial in .
With these observations, for each and positive integers and , we may bound
[TABLE]
The last display converges by the calculation in Step 2 to \max_{r>R}\big{|}\left(\mu^{m}-\mu\right)^{T}Q^{r}e_{l}\big{|}, as , and in turn vanishes as . Hence, the first limit follows.
Step 4. Finally, by Fatou’s lemma, the proved first limit (a), and that is a stationary vector of , we have for each that
[TABLE]
Now, suppose for a particular that \limsup_{n\rightarrow\infty}\big{(}\mu^{n}\big{)}^{t}Qe_{k}=L>\mu_{k}. Then, as \big{(}\mu^{n}\big{)}^{t}Q is a stochastic vector, we would have for each that
[TABLE]
But, as is a stochastic vector and noting (3.17), we have by Fatou’s lemma again that the last display is larger than , a contradiction, and the second limit (b) holds. ∎
3.3.1. Completion of the proof of Theorem 2.10
We will argue in a few steps.
Step 1. Recall the definition of kernel (cf. (2.11)). We now argue that is a generator matrix: As is a stationary vector of and , we have is the zero vector. Since is a generator matrix, we have for , and . Moreover,
[TABLE]
Step 2. Recall the Markov chain , with transition kernels (cf. (2.10)), starting from . Recall the associated variable and sequence .
Now, for define
[TABLE]
The variables are the associated fractions to the distribution on and, by Proposition 3.2, for ,
[TABLE]
For , also define
[TABLE]
In terms of the switching times , and the first time that the chain switches after time , we have , for , and for . Recall also that for . In words, are the times before time at which the chain switches states when considered in reverse order, and are the lengths of the associated sojourns in the figure below.
……1S_{3}$$S_{2}$$S_{1}$$S_{0}=n$$\tau_{n,1}$$\tau_{n,2}$$\tau_{n,3}
Step 3. Recall the sequence given in (2.8), where for and for . We now aim to compute the finite dimensional distributions of or equivalently of . To this end, fix the integer , and consider numbers such that , for , are all integers. Set also and recall .
Note from (3.19) and (3.20) that
[TABLE]
Then, with respect to a possible sequence , we have
[TABLE]
Note the computation for and ,
[TABLE]
Recall also that . Since , we observe
[TABLE]
Then, (3.21) equals
[TABLE]
Step 4. We now sum the display (3.22) over all appropriate values of such that for , where we recall is the time the chain switches after time . Then, we have from (3.20) that
[TABLE]
Moreover, also from (3.20), we have diverges to infinity as .
Recall and a.s. Then, with equation (3.23) in hand,
[TABLE]
Step 5. From (3.20), the sum index diverges to infinity as . Also, by Lemma 3.5, we have and \lim_{s\rightarrow\infty}\big{(}\mu^{s}\big{)}^{t}Qe_{y}=\mu_{y} for each . Therefore, as , we have
[TABLE]
Note that for each since by assumption has no zero rows. Thus, by Proposition 3.4, we have
[TABLE]
Hence, if for some , by bounding say , the limit (3.24) vanishes. Now, suppose that is such that for each . We may write the limit (3.24) as
[TABLE]
decomposed as a product of (a) the transition probability of the chain , with kernel (cf. (2.7)) and initial distribution , running through states , and of (b) the distribution functions of independent Beta random variables for . Hence, the finite dimensional distributional convergence of as is established. ∎
3.4. Proof of Theorem 2.12: Occupation laws to MCcGEM and stick-breaking measures
Consider the pairs , and in the setting of Theorems 2.10 and 2.12. These objects belong to . We now discuss the topology on this space and its relatives, before going to the proof of (2.12) in Subsection 3.4.2.
3.4.1. Topology
We endow the space with a standard product metric and -field, generated in terms of this metric, which yields the usual product -field built from the Borel -fields on copies of : For ,
[TABLE]
Consider now the metric on defined as follows: For ,
[TABLE]
The corresponding -field on , generated by , is the usual product -field formed from the Borel -fields on copies of and . Importantly, weak convergence of probability measures on translates to finite dimensional convergence of these laws. Moreover, is a complete, separable metric space.
Recall that is the collection of all probabilities on :
[TABLE]
Since
[TABLE]
is a measurable set in . We may endow with the restriction of the metric and the -field generated from the associated metric topology.
For a fixed point , the projection map , given by
[TABLE]
is measurable, and also continuous on the subset .
Now, denote the collection of probabilities on ,
[TABLE]
and endow it with the metric , and the associated Borel -field. Define by
[TABLE]
Then, is a continuous and therefore measurable function on : Indeed, if and belong to , and the finite dimensional convergence holds, for each , we have . The claim now follows since (1) , and (2) .
3.4.2. Proof of (2.12)
First, we verify that the pairs , and belong almost surely to . Clearly, surely lives in by construction. Also, and lie almost surely in since, by Theorem 2.10 and the assumptions of Theorem 2.12, we have that and are RAMs, and so .
Now, from the finite dimensional or in other words weak convergence of to in Theorem 2.10, we have \nu_{n}=g\big{(}({\bf P}_{n},{\bf Y}_{n})\big{)}=g\circ f\big{(}({\bf P}_{n},{\bf Y}_{n})\big{)} converges weakly to \nu=g\circ f\big{(}({\bf P}^{\prime},{\bf Y}^{\prime})\big{)} by the continuous mapping theorem, and so the left equality in (2.12) holds.
On the other hand, with respect to , define and as in the setting of Theorem 2.7. Recall that is a Markov chain with kernel and initial stationary distribution . Then, by Theorem 2.7, noting that , we have that has a MCcGEM distribution. Hence, . Since almost surely, by ‘unclumping’,
[TABLE]
we have g\circ f\big{(}({\bf P}^{\prime},{\bf Y}^{\prime})\big{)}\stackrel{{\scriptstyle d}}{{=}}g\circ f\big{(}({\bf P}^{+},{\bf T}^{\prime})\big{)}, and the right equality of (2.12) holds. ∎
3.5. Proof of Theorem 2.13: Stick-breaking measures to Occupation laws
The claim follows from Theorem 2.12 once we verify that a homogeneous Markov chain with kernel and a homogeneous Markov chain with kernel , each with initial distribution , are equivalent in distribution.
To this end, for any generator matrix and associated stationary distribution , we observe that when and are both positive:
[TABLE]
Since and , we conclude that when and are both positive.
Finally, as is a stationary distribution, is only positive on positive recurrent states and for each recurrence class of , either assigns [math] weight to each state in that class or strictly positive weights to each state in that class. Hence, homogeneous Markov chains with kernels and , starting from , are equal in distribution. ∎
3.6. Proof of Theorem 2.17: Type of self-similarity
We first give a proof of Lemma 2.16, before going to the main argument in Subsection 3.6.2
3.6.1. Proof of Lemma 2.16
Let be i.i.d. copies of , independent of , all on a common probability space.
Existence: Let . Since , we have a.s., and so \big{\langle}X_{j}\prod_{i=1}^{j-1}(1-X_{i}):{j\geq 1}\big{\rangle} is a RAM. Hence, is a random probability measure on as . Moreover, (2.14) holds straightfowardly:
[TABLE]
where has the same law as and is independent of .
Uniqueness: Suppose and both satisfy the self-similarity equation (2.14). On a probability space, where , and are independent, define a sequence of measures: , and, for ,
[TABLE]
By construction, and are two sequences of identically distributed random measures distributed as and respectively.
We note again that a.s. as . Then, in terms of the variational norm ,
[TABLE]
which vanishes a.s. as . Hence, . ∎
3.6.2. Completion of the proof of Theorem 2.17
Recall our conventions at the beginning of Section 2 and that is a collection of iid variables, and is the homogeneous Markov chain with kernel and initial distribution supported on recurrent states. Let be the RAM constructed from . For each recurrent state of , let {\bf T}^{i}={\bf T}\bigr{|}T_{1}=i be the Markov chain with transition kernel and initial value . Recall the a.s. finite time , and variable
[TABLE]
Recall also .
We now rewrite the measure \nu^{i}=\nu\bigr{|}T_{1}=i as follows:
[TABLE]
Then, by (3.25) and Proposition 3.2 for we have
[TABLE]
Hence, as is composed of iid variables, independent of and therefore , we see that
[TABLE]
Clearly, as the chain starts over again at location , .
Moreover, by conditioning on the value of and noting that and are independent, the sequences \big{\langle}\frac{P_{j-1+W^{i}}}{1-X^{i}}:j\geq 1\big{\rangle} and are independent. Similarly, we see that the sum , which depends only on variables and indexed beyond the first cycle, is independent of the pair . In particular, the sum .
Hence, from these observations, (3.6.2) represents the sought after self-similarity equation (2.15).
Finally, a distribution satisfying (2.15) is unique by Lemma 2.16 since a.s. Also, by assumption, where is supported only on recurrent states. Therefore, as necessarily is a recurrent state, the distribution of the pair is also unique. ∎
3.7. Proof of Theorems 2.18, 2.19 and Corollary 2.20: Recasting moments I, II, and marginals
We prove these results in succession.
3.7.1. Proof of Theorem 2.18
First, since is a generator matrix with bounded entries and for large enough
[TABLE]
we verify that
Next, to show (2.17), we relate the occupation law of the Markov chain , with transition kernels , to the occupation law of the Markov chain , with kernels , through a Borel-Cantelli argument. In passing, we note this could be also accomplished via an analytic argument.
Define , for , and note has constant row sums of [math]. Since does not have [math] entries and , there exists an such that is a non-negative matrix, and hence stochastic. Note
[TABLE]
Consider now an auxilliary sequence of independent Bernoulli variables by possibly enlargening the probability space. Define a process {\bf Z^{\prime}}\bigr{|}{\bf B} with Z^{\prime}_{1}\bigr{|}{\bf B}\sim\mu and
[TABLE]
Then, noting (3.27), marginally, is a Markov chain with initial distribution and transition kernel
[TABLE]
Now, by Borel Cantelli lemma, and so a.s. Conditional on the event that , the chain is a Markov chain with transition kernels . Also, since is irreducible in the setting of [11], the initial distribution does not matter in the calculation of the occupation law (cf. Remark 3 in Subsection 2.2.1). Hence, the occupation law with respect to is also and (2.17) holds: Indeed, for and interval for , we have
[TABLE]
where is an expression which vanishes uniformly in as .
Finally, (2.18) follows straightforwardly by gathering together terms. ∎
3.7.2. Proof of Theorem 2.19
We break the argument into steps.
Step 1. First, we show that , , and their quotients are all well-defined. A generator matrix can always be written as for some and a stochastic matrix . The eigenvalues of correspond with the eigenvalues of . Additionally, since has no zero entries, is irreducible. Therefore, the algebraic multiplicity of the eigenvalue [math] of is . Thus, with respect to the minimal polynomial of , , there exists a polynomial such that and .
Define
[TABLE]
Since the eigenvalues of the stochastic matrix are bounded by , the (complex) eigenvalues of satisfy . Hence, the eigenvalues of have non-positive real part. Since and , we obtain that is not an eigenvalue of and so for . Thus, is well-defined for .
Step 2. We now verify for that
[TABLE]
Write
[TABLE]
In particular, as , we have
[TABLE]
from which the desired identity follows.
Step 3: We now show that is the constant matrix with rows . Note that is well-defined in (2.19). Since row sums of vanish for , we see that has constant row sums of . Now, necessarily, as is the minimal polynomial of . Since is irreducible, we can conclude that is a matrix with rows given by multiples of the unique stochastic eigenvector associated to and eigenvalue [math]. However, since has row sums equal to , the claim follows.
Moreover, noting that for any , the moment identity (2.20) is now a direct consequence of these calculations. ∎
3.7.3. Proof of Corollary 2.20
Recall is a degree polynomial where . Then, noting (2.19), we see that is also degree polynomial in with -free leading coefficient . In particular, is a degree polynomial in with leading coefficient for each .
Now, fix , and denote by and the roots of and respectively when considered as functions of . In the formula (2.20), to calculate , there is only one list in , namely one composed of ’s. Then,
[TABLE]
as desired. ∎
Acknowledgement. We thank J. Sethuraman for enjoyable conversations on Dirichlet processes. Part of this research was supported by ARO W911NF-14-1-0179, and a Simons Foundation Sabbatical grant.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Arratia, R., Barbour, A. D., Tavaré, S. (1999) The Poisson-Dirichlet distribution and the scale-invariant Poisson process. structures. Combin. Probab. Comput. 8 407–416.
- 2[2] Arratia, R., Barbour, A. D., Tavaré, S. (2003) Logarithmic Combinatorial Structures: A Probabilistic Approach. European Mathematical Society, Zürich.
- 3[3] Berman, A., Plemmons, R.J. (1979) Nonnegative Matrices in the Mathematical Sciences. Academic Press, New York.
- 4[4] Blackwell, D., Mac Queen, J.B. (1973) Ferguson distributions via Polya urn schemes. Ann. Stat. 1 353–355.
- 5[5] Bouguet, F., Cloez, B. (2018) Fluctuations of the empirical measure of freezing Markov chains. Elec. J. Probab. 23 1–31.
- 6[6] Bovier, A., den Hollander, F (2015) Metastability: a potential-theoretic approach. Grundlehren der mathematischen Wissenschaften 351 , Springer, Berlin.
- 7[7] Broderick, T., Jordan, M., Pitman, J. (2012) Beta processes, stick-breaking and power laws. Bayesian Anal. 7 439–475.
- 8[8] Crane, H. (2016) The ubiquitous Ewens sampling formula. Statist. Sci. 31 1–19.
