Stick-breaking processes, clumping, and Markov chain occupation laws

Zach Dietz; William Lippitt; Sunder Sethuraman

arXiv:1901.08135·math.PR·January 25, 2019

Stick-breaking processes, clumping, and Markov chain occupation laws

Zach Dietz, William Lippitt, Sunder Sethuraman

PDF

TL;DR

This paper explores the relationships between clumped residual allocation models, a broad class of stick-breaking processes including Dirichlet processes, and the occupation laws of certain Markov chains, revealing new connections and limit behaviors.

Contribution

It introduces an intermediate structure in RAMs involving clumping, linking stick-breaking processes to Markov chain occupation laws, and characterizes their limits in new settings.

Findings

01

Joint law of intermediate RAM and visited states expressed via disordered GEM sequence.

02

Identifies a class of stick-breaking processes as limits of empirical occupation measures.

03

Connects inhomogeneous Markov chain behavior with generalized stick-breaking processes.

Abstract

We consider the connections among `clumped' residual allocation models (RAMs), a general class of stick-breaking processes including Dirichlet processes, and the occupation laws of certain discrete space time-inhomogeneous Markov chains related to simulated annealing and other applications. An intermediate structure is introduced in a given RAM, where proportions between successive indices in a list are added or clumped together to form another RAM. In particular, when the initial RAM is a Griffiths-Engen-McCloskey (GEM) sequence and the indices are given by the random times that an auxiliary Markov chain jumps away from its current state, the joint law of the intermediate RAM and the locations visited in the sojourns is given in terms of a `disordered' GEM sequence, and an induced Markov chain. Through this joint law, we identify a large class of `stick breaking' processes as the…

Equations292

P_{n} = (1 - j = 1 \sum n - 1 P_{j}) X_{n} = (1 - X_{1}) \dots (1 - X_{n - 1}) X_{n} for n \geq 2;

P_{n} = (1 - j = 1 \sum n - 1 P_{j}) X_{n} = (1 - X_{1}) \dots (1 - X_{n - 1}) X_{n} for n \geq 2;

D (\cdot; θ, μ) = i = 1 \sum \infty P_{i} δ_{Z_{i}} (\cdot) .

D (\cdot; θ, μ) = i = 1 \sum \infty P_{i} δ_{Z_{i}} (\cdot) .

K_{n} = I + \frac{G}{n}

K_{n} = I + \frac{G}{n}

ν = n \to \infty lim ⟨ \frac{1}{n} j = 1 \sum n δ_{T_{j}} (i) : i \in X ⟩,

ν = n \to \infty lim ⟨ \frac{1}{n} j = 1 \sum n δ_{T_{j}} (i) : i \in X ⟩,

ν (\cdot; θ, μ, Q) = i = 1 \sum \infty P_{i} δ_{T_{i}^{'}} (\cdot),

ν (\cdot; θ, μ, Q) = i = 1 \sum \infty P_{i} δ_{T_{i}^{'}} (\cdot),

ν (\cdot) = i = 1 \sum \infty R_{i} δ_{Y_{i}} (\cdot) .

ν (\cdot) = i = 1 \sum \infty R_{i} δ_{Y_{i}} (\cdot) .

ν_{n} (\cdot) = \frac{1}{n} j = 1 \sum n δ_{T_{i}} (\cdot) .

ν_{n} (\cdot) = \frac{1}{n} j = 1 \sum n δ_{T_{i}} (\cdot) .

ν_{n} (\cdot) = j = 1 \sum N_{n} - 1 P_{n, j} δ_{Y_{n, j}} (\cdot) = j = 1 \sum \infty P_{n, j} δ_{Y_{n, j}} (\cdot) .

ν_{n} (\cdot) = j = 1 \sum N_{n} - 1 P_{n, j} δ_{Y_{n, j}} (\cdot) = j = 1 \sum \infty P_{n, j} δ_{Y_{n, j}} (\cdot) .

P_{1} = X_{1} and P_{j} = X_{j} (1 - i = 1 \sum j - 1 P_{i}) for j \geq 2.

P_{1} = X_{1} and P_{j} = X_{j} (1 - i = 1 \sum j - 1 P_{i}) for j \geq 2.

j = 1 \prod k (1 - a_{j}) + j = 1 \sum k a_{j} i = 1 \prod j - 1 (1 - a_{i}) = 1.

j = 1 \prod k (1 - a_{j}) + j = 1 \sum k a_{j} i = 1 \prod j - 1 (1 - a_{i}) = 1.

\displaystyle P^{u}_{j}=\left\{\begin{array}[]{rl}\sum_{i=u_{j}}^{u_{j+1}-1}P_{i}&{\rm if\ }u_{j}<\infty\\ 0&{\rm if\ }u_{j}=\infty.\end{array}\right.

\displaystyle P^{u}_{j}=\left\{\begin{array}[]{rl}\sum_{i=u_{j}}^{u_{j+1}-1}P_{i}&{\rm if\ }u_{j}<\infty\\ 0&{\rm if\ }u_{j}=\infty.\end{array}\right.

V_{j + 1} = in f {v > V_{j} : T_{v} \neq = T_{v - 1}} and W_{j + 1} = in f {w > W_{j} : T_{w} = T_{1}} .

V_{j + 1} = in f {v > V_{j} : T_{v} \neq = T_{v - 1}} and W_{j + 1} = in f {w > W_{j} : T_{w} = T_{1}} .

\displaystyle X^{u}_{j}=\left\{\begin{array}[]{cl}\sum_{i=u_{j}}^{u_{j+1}-1}X_{i}\prod_{l=u_{j}}^{i-1}(1-X_{l})&\\ \ \ \ \ \ \ \ \ =1-\prod_{i=u_{j}}^{u_{j+1}-1}(1-X_{i})&{\rm if\ }u_{j}<\infty\\ 1&{\rm if\ }u_{j}=\infty.\end{array}\right.

\displaystyle X^{u}_{j}=\left\{\begin{array}[]{cl}\sum_{i=u_{j}}^{u_{j+1}-1}X_{i}\prod_{l=u_{j}}^{i-1}(1-X_{l})&\\ \ \ \ \ \ \ \ \ =1-\prod_{i=u_{j}}^{u_{j+1}-1}(1-X_{i})&{\rm if\ }u_{j}<\infty\\ 1&{\rm if\ }u_{j}=\infty.\end{array}\right.

\displaystyle K(z,w)=\left\{\begin{array}[]{rl}\frac{Q_{z,w}}{1-Q_{z,z}}&{\rm for\ }z\neq w;\ Q_{z,z}\neq 1\\ 1&{\rm for\ }z=w;\ Q_{z,z}=1\\ 0&{\rm otherwise.}\end{array}\right.

\displaystyle K(z,w)=\left\{\begin{array}[]{rl}\frac{Q_{z,w}}{1-Q_{z,z}}&{\rm for\ }z\neq w;\ Q_{z,z}\neq 1\\ 1&{\rm for\ }z=w;\ Q_{z,z}=1\\ 0&{\rm otherwise.}\end{array}\right.

\displaystyle K_{G}(w,z)=\left\{\begin{array}[]{rl}\frac{G_{w,z}}{-G_{w,w}}&{\rm if}\ \ w\neq z;\ G_{w,w}\neq 0\\ 1&{\rm if}\ \ w=z;\ G_{w,w}=0\\ 0&{\rm otherwise.}\end{array}\right.

\displaystyle K_{G}(w,z)=\left\{\begin{array}[]{rl}\frac{G_{w,z}}{-G_{w,w}}&{\rm if}\ \ w\neq z;\ G_{w,w}\neq 0\\ 1&{\rm if}\ \ w=z;\ G_{w,w}=0\\ 0&{\rm otherwise.}\end{array}\right.

E [1 - X_{1}^{V} ∣ V_{2} - V_{1} = m, V_{3} - V_{2} = n]

E [1 - X_{1}^{V} ∣ V_{2} - V_{1} = m, V_{3} - V_{2} = n]

= j = 1 \prod m \frac{2 + j}{3 + j} = \frac{3}{3 + m}

E [1 - X_{2}^{V} ∣ V_{2} - V_{1} = m, V_{3} - V_{2} = n]

E [(1 - X_{1}^{V}) (1 - X_{2}^{V}) ∣ V_{2} - V_{1} = m, V_{3} - V_{2} = n]

τ_{n, 1} = n + 1 - V_{N_{n} - 1}, τ_{n, k} = V_{N_{n} - (k - 1)} - V_{N_{n} - k}, and τ_{n, i} = 0.

τ_{n, 1} = n + 1 - V_{N_{n} - 1}, τ_{n, k} = V_{N_{n} - (k - 1)} - V_{N_{n} - k}, and τ_{n, i} = 0.

Y_{n, 1} = T_{n} = T_{V_{N_{n} - 1}}, Y_{n, k} = T_{V_{N_{n} - k}}, and Y_{n, i} = T_{1},

Y_{n, 1} = T_{n} = T_{V_{N_{n} - 1}}, Y_{n, k} = T_{V_{N_{n} - k}}, and Y_{n, i} = T_{1},

ν_{n} (l) := \frac{1}{n} j = 1 \sum n δ_{T_{j}} (l) = j = 1 \sum \infty P_{n, j} δ_{Y_{n, j}} (l) .

ν_{n} (l) := \frac{1}{n} j = 1 \sum n δ_{T_{j}} (l) = j = 1 \sum \infty P_{n, j} δ_{Y_{n, j}} (l) .

π^{t} Q^{n} \to μ^{t} as n \to \infty.

π^{t} Q^{n} \to μ^{t} as n \to \infty.

K_{n} = I + \frac{G}{n} \mathbbm 1 (n > M),

K_{n} = I + \frac{G}{n} \mathbbm 1 (n > M),

G_{ij}^{'} = \frac{μ _{j}}{μ _{i}} G_{j i} \mathbbm 1 (μ_{i} \neq = 0) .

G_{ij}^{'} = \frac{μ _{j}}{μ _{i}} G_{j i} \mathbbm 1 (μ_{i} \neq = 0) .

ν = d ⟨ j = 1 \sum \infty P_{j}^{'} δ_{Y_{j}^{'}} (l) : l \in X ⟩ = d ⟨ j = 1 \sum \infty P_{j}^{+} δ_{T_{j}^{'}} (l) : l \in X ⟩ .

ν = d ⟨ j = 1 \sum \infty P_{j}^{'} δ_{Y_{j}^{'}} (l) : l \in X ⟩ = d ⟨ j = 1 \sum \infty P_{j}^{+} δ_{T_{j}^{'}} (l) : l \in X ⟩ .

⟨ j = 1 \sum \infty P_{j}^{+} δ_{T_{j}^{'}} (l) : l \in X ⟩ = d ν,

⟨ j = 1 \sum \infty P_{j}^{+} δ_{T_{j}^{'}} (l) : l \in X ⟩ = d ν,

ν = d ⟨ j = 1 \sum \infty P_{j} δ_{T_{j}} (l) : l \in X ⟩,

ν = d ⟨ j = 1 \sum \infty P_{j} δ_{T_{j}} (l) : l \in X ⟩,

ν = d X_{1} δ_{T_{1}} + (1 - X_{1}) \tilde{ν},

ν = d X_{1} δ_{T_{1}} + (1 - X_{1}) \tilde{ν},

χ (\cdot) = d X η (\cdot) + (1 - X) \tilde{χ} (\cdot),

χ (\cdot) = d X η (\cdot) + (1 - X) \tilde{χ} (\cdot),

W^{i} := in f {j > 1 : T_{j}^{i} = i} and X^{i} := j = 1 \sum W^{i} - 1 X_{j} l = 1 \prod j - 1 (1 - X_{l}) .

W^{i} := in f {j > 1 : T_{j}^{i} = i} and X^{i} := j = 1 \sum W^{i} - 1 X_{j} l = 1 \prod j - 1 (1 - X_{l}) .

\eta^{i}:=\left(X^{i}\right)^{-1}\sum_{j=1}^{W^{i}-1}\left[X_{j}\prod_{l=1}^{j-1}(1-X_{l})\right]\delta_{T_{j}^{i}}\ \ {\rm and\ \ }\nu^{i}:=\nu\bigr{|}T_{1}=i.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Stick-breaking processes, clumping, and Markov chain occupation laws

Zach Dietz, William Lippitt, Sunder Sethuraman

Zach Dietz:

e-mail: [email protected]

William Lippitt: Department of Mathematics, University of Arizona, Tucson, AZ 85721

e-mail: [email protected]

Sunder Sethuraman: Department of Mathematics, University of Arizona, Tucson, AZ 85721

e-mail: [email protected]

Abstract.

We consider the connections among ‘clumped’ residual allocation models (RAMs), a general class of stick-breaking processes including Dirichlet processes, and the occupation laws of certain discrete space time-inhomogeneous Markov chains related to simulated annealing and other applications. An intermediate structure is introduced in a given RAM, where proportions between successive indices in a list are added or clumped together to form another RAM. In particular, when the initial RAM is a Griffiths-Engen-McCloskey (GEM) sequence and the indices are given by the random times that an auxiliary Markov chain jumps away from its current state, the joint law of the intermediate RAM and the locations visited in the sojourns is given in terms of a ‘disordered’ GEM sequence, and an induced Markov chain. Through this joint law, we identify a large class of ‘stick breaking’ processes as the limits of empirical occupation measures for associated time-inhomogeneous Markov chains.

Key words and phrases:

residual allocation model, RAM, GEM, Dirichlet, inhomogeneous, Markov, stick breaking, occupation, empirical, clumping

2010 Mathematics Subject Classification:

60G57, 60E99, 60J10

1. Introduction and summary

In this article, we introduce an intermediate ‘clumped’ structure in residual allocation models of apportionment of a resource, such as Griffiths-Engen-McCloskey (GEM) models. Although this intermediate structure is perhaps of its own interest, through it, we identify the empirical occupation law limits in a class of time-inhomogeneous discrete space Markov chains, associated with simulated annealing and other applications, as new types of stick-breaking processes built from Markovian samples, including Dirichlet processes. On the one hand, GEM models and Dirichlet processes have wide application in population genetics, ecology, combinatorial stochastic processes, and Bayesian nonparametric statistics; see books and surveys [8], [9], [18], [19], [27], [41] and references therein. On the other hand, the time-inhomogeneous Markov chains that we consider are stylized models of simulated annealing and Gibbs samplers or types of mRNA dynamics; see [5], [11], [15], [17], [25], [46]. In a sense, one purpose of the paper is to observe a perhaps unexpected connection between these apriori different objects.

We now discuss some of the relevant background on GEM and Dirichlet measures, and time-inhomogeneous Markov chains, before turning to an informal discussion of our results on the intermediate structure in GEM sequences and their connections with the occupation laws of the Markov chains.

1.1. GEM and Dirichlet measures

Consider the infinite-dimensional simplex $\Delta_{\infty}$ of all all discrete (probability) distributions on ${\mathbb{N}}=\{1,2,\ldots\}$ . A residual allocation model (RAM) is a distribution on $\Delta_{\infty}$ , introduced in the 1940’s [24] as a means to address problems of apportionment: Let $\{X_{n}\}_{n\geq 1}$ be independent $[0,1]$ -valued random variables, called ‘residual fractions’. Consider the associated process $\langle P_{n}:n\geq 1\rangle\in[0,1]^{{\mathbb{N}}}$ , given by $P_{1}=X_{1}$ and

[TABLE]

see Lemma 3.1 for the induction leading to the last equality. If $\sum_{n\geq 1}P_{n}\stackrel{{\scriptstyle a.s.}}{{=}}1$ , the distribution $\langle P_{n}:n\geq 1\rangle\in\Delta_{\infty}$ is the associated RAM. In general, $\langle P_{n}:n\geq 1\rangle$ need not sum to $1$ for a given realization. We note a simple condition equivalent to $\sum_{n\geq 1}P_{n}\stackrel{{\scriptstyle a.s.}}{{=}}1$ is that $\prod_{j=1}^{\infty}(1-X_{j})\stackrel{{\scriptstyle a.s.}}{{=}}0$ , the case for nontrivial, independent, identically distributed (iid) fractions (cf. Lemma 3.1).

The RAM when the fractions $\{X_{n}\}_{n\geq 1}$ are iid Beta $(1,\theta)$ random variables is the well-known Griffiths-Engen-McCloskey GEM $(\theta)$ model. There are many characterizations and studies of the GEM sequence and its variants in recent years. For instance, the GEM model is the unique RAM with iid fractions that is invariant in law under size-biased permutation. Also, the GEM sequence is the unique invariant measure of ‘split and merge’ dynamics. In addition, there are important connections with Poisson-Dirichlet models. See for instance, among others, [1], [2], [10], [14], [20], [28], [29], [30], [35], [38], [39], [40], [42], and references therein.

Moreover, the GEM sequence is a fundamental building block of Dirichlet processes, which often serve as a measure on priors in Bayesian nonparametric statistics [18], [19]. With respect to a measurable space $(\mathscr{X},\mathscr{B})$ , consider the space of probability measures $\mathbb{P}_{\mathscr{X}}$ endowed with $\sigma$ -field generated by the sets $\{P:P(A)<r\}$ for $A\in\mathscr{B}$ and $r>0$ . We say that $D$ is a random probability sample from the Dirichlet process, with ‘parameters’ $\theta>0$ and probability measure $\mu$ on $\mathscr{X}$ , if for any finite partition $\{A_{i}\}_{i=1}^{m}$ that the vector $\langle D(A_{1}),\ldots,D(A_{m})\rangle$ has the Dirichlet distribution with parameters $\langle\theta\mu(A_{i}):1\leq i\leq m\rangle$ .

The ‘stick breaking’ representation of the Dirichlet process with parameters $(\theta,\mu)$ , in terms of a GEM $(\theta)$ sequence $\langle P_{i}:i\geq 1\rangle$ , and an independent sequence of iid random variables $\{Z_{i}\}_{i\geq 1}$ with common distribution $\mu$ , is given by

[TABLE]

There is a large literature on Dirichlet processes stemming from the seminal works [4], [16]. See [40], [45] with respect to the ‘stick breaking’ construction, and books [18], [19], [36], [41] for more on their history, other representations including that with respect to the ‘Chinese restaurant process’, and their use in practice.

In this article, we will concentrate on discrete spaces $\mathscr{X}\subset{\mathbb{N}}$ , that is those composed of either a finite or a countably infinite number of elements. We note, when $\mathscr{X}=\{1,\ldots k\}$ is finite, $\mu=\langle\mu(1),\ldots,\mu(k)\rangle$ and and $A_{i}=\{i\}$ for $1\leq i\leq k$ , the property that $\langle D(A_{1}),\ldots,D(A_{k})\rangle$ is given by a Dirichlet distribution was first stated in a population genetics context in [12]; see also [26].

1.2. Time-inhomogeneous Markov chains

Let $G$ be a generator kernel on $\mathscr{X}$ , that is $G_{i,j}\geq 0$ for $i\neq j\in\mathscr{X}$ , and $G_{i,i}=-\sum_{j\neq i}G_{i,j}$ . Suppose the entries of $G$ are suitably bounded so that the kernel

[TABLE]

is a stochastic kernel for all $n$ large enough, and set $K_{n}=I$ otherwise. Let $\{T_{n}\}_{n\geq 1}$ be the time-inhomogeneous Markov chain on the discrete space $\mathscr{X}$ associated to kernels $\{K_{n}\}_{n\geq 1}$ . Consider $G$ without zero rows. Then, every point in $\mathscr{X}$ represents a valley from which the chain rarely but almost surely exits to enter another point valley. In this way, a certain ‘landscape’ is explored. The chain can be considered as a simplified model of simulated annealing or metastability (cf. [6], [17], [31], [37], [46]). From another view, continuous-time variants of such inhomogeneous chains have been used in the modeling of certain mRNA dynamics [25].

Interestingly, for finite $\mathscr{X}$ , it was noted in [17] and [46] that the sample means of these chains do not converge a.s. or in probability, as would be the case for a homogeneous Markov chain. For generators $G$ without zero entries, weak convergence to an empirical occupation law

[TABLE]

was identified by computing its moments in [11]. Curiously, when $G$ is of the form $G=\theta(Q-I)$ for $\theta>0$ and $Q$ a stochastic matrix with constant rows $\mu$ , it was also shown that $\nu$ is a Dirichlet distribution with parameters $\{\theta\mu(i)\}_{i=1}^{k}$ by matching the moments. Similar occupation laws were also derived in the continuous-time mRNA model in [25] as the stationary distributions of a promoter process on $k$ states, influencing levels of mRNA production.

In this context, part of our motivation is to understand this limit and its generalizations more constructively (Theorem 2.12).

1.3. Clumped structure and generalized ‘stick-breaking’ processes

We now describe a class of generalized stick-breaking processes. Let $\langle P_{i}:i\geq 1\rangle$ be a GEM $(\theta)$ sequence and, to be focused, let $\{T_{i}^{\prime}\}_{i\geq 1}$ be an independent Markov chain with irreducible, recurrent transition kernel $Q$ on a discrete space $\mathscr{X}$ with initial distribution $\pi$ , although we also consider more general Markov chains, not necessarily irreducible or composed only of recurrent states, in several of our results.

Another motivation of ours is to understand the random measures

[TABLE]

seen as a natural generalization of stick-breaking representation of the Dirichlet process, with respect to Markovian samples $\{T^{\prime}_{i}\}_{i\geq 1}$ instead of the iid ones in (1.1).

In general, $\nu$ is not exchangeable in the sense that the GEM sequence $\langle P_{i}:i\geq 1\rangle$ may not be replaced by an arbitrary permutation without changing the measure. In contrast, when $\{T_{i}^{\prime}\}_{i\geq 1}$ is iid and $\nu$ is the Dirichlet process, such an exchangeability property holds; for example, the Poisson-Dirichlet order statistics $\langle\hat{P}_{i}:i\geq 1\rangle$ of $\langle P_{i}:i\geq 1\rangle$ may be used instead without changing the Dirichlet process (cf. [40]). We also note that other generalizations of Dirichlet processes have been considered, among them, Polya tree [33], Pitman-Yor [40], [43], and Beta processes [7].

We now introduce a clumped intermediate structure which will help analyze $\nu$ . Suppose $\{V_{i}\}_{i\geq 1}$ are the times when the Markov chain jumps to a different state with the convention $V_{1}=1$ . In particular, ‘skip-repetition’ is allowed: The chain can begin in state $x$ , jump to $y\neq x$ at time $V_{2}$ , and then may jump back at time $V_{3}$ into state $x$ . We note that these times are not only those times when a state is observed for the first time, as used in the definition of size-biased permutations.

Consider $R_{i}=\sum_{j=V_{i}}^{V_{i+1}-1}P_{j}$ for $i\geq 1$ . We show that (cf. Theorems 2.4 and 2.7), conditional on the locations $\{Y_{i}=T_{V_{i}}^{\prime}\}_{i\geq 1}$ , the sequence $\langle R_{i}:i\geq 1\rangle$ is a RAM where the associated fractions are Beta $\big{(}1,\theta(1-Q_{Y_{i},Y_{i}})\big{)}$ for $i\geq 1$ , a sort of ‘disordered’ GEM. Also, the law of $\{Y_{i}\}_{i\geq 1}$ can be computed as another Markov chain on $\mathscr{X}$ with a transition kernel found in terms of $Q$ . We will call the joint law of $\big{(}\langle R_{i}:i\geq 1\rangle,\{Y_{i}\}_{i\geq 1}\big{)}$ as a type of Markov Chain conditional GEM, or ‘MCcGEM’ distribution.

In terms of the clumped intermediate structure, we see that

[TABLE]

This representation will allow us to identify $\nu$ as the limit of occupation laws of a matched time-inhomogeneous Markov chain (Theorems 2.12, 2.13).

We will also see that $\nu$ satisfies a ‘self-similarity’ equation (cf. Theorem 2.17), uniquely characterizing its distribution. This equation is reminiscent of the regenerative structure present in ‘stick-breaking’ [45], in integral constructions of the Dirichlet processs [32], [44], and in other related settings [21], [22].

Moreover, when ${\mathscr{X}}$ is finite, we discuss the joint moments of the distribution in Theorem 2.19. Although a formula for the moments is given in [11], the description in Theorem 2.19 is more detailed, allowing identification of the marginal distributions as Beta products (cf. Theorem 2.18 and Corollary 2.20).

1.4. Occupation laws of time-inhomogeneous Markov chains

With respect to the time-inhomogeneous Markov chain ${\bf T}=\{T_{n}\}_{n\geq 1}$ with kernels $\{K_{n}\}_{n\geq 1}$ (1.2), starting from initial distribution $\mu$ , consider the random empirical occupation measure on $\mathscr{X}$ ,

[TABLE]

To connect with the intermediate clumping structure from the previous section, we will again implement a clumping procedure, this time to investigate local occupations, or clumped occupations, of the empirical measure of ${\bf T}$ up to time $n$ .

However, in a Markov chain with kernels $\{K_{n}\}_{n\geq 1}$ , later clumps of the chain are typically larger than earlier clumps. To keep the clump sizes from tending to zero after normalization, we consider the clumps in reverse chronological order, starting from time $n$ , so that the clumped occupations converge nontrivially in distribution.

Formally, let $1=V_{1}<V_{2}<\cdots$ be the successive times when the Markov chain changes state, and let $N_{n}=\min\{i:V_{i}>n\}$ . Going backwards from time $n$ , let $\tau_{n,1}$ be the length $n+1-V_{N_{n}-1}$ of the last visit to state $Y_{n,1}=T_{V_{N_{n}-1}}$ , $\tau_{n,2}$ be the length $V_{N_{n}-1}-V_{N_{n}-2}$ of the visit to state $Y_{n,2}=T_{V_{N_{n}-2}}$ , and $\tau_{n,k}$ be the length $V_{N_{n}-(k-1)}-V_{N_{n}-k}$ of the visit to $Y_{n,k}=T_{V_{N_{n}-k}}$ for $1<k<N_{n}$ . Let also $\tau_{n,k}=0$ and $Y_{n,k}=T_{1}$ for $k\geq N_{n}$ . In addition, define $P_{n,k}=\tau_{n,k}/n$ for $k\geq 1$ .

The figure below depicts, in a realization, the clumping boundaries $V_{j}$ marked in forward times, and the lengths of local occupations $\tau_{n,j}=nP_{n,j}$ given backwards in time starting from time $n$ .

……1 $V_{N_{n}-3}$$V_{N_{n}-2}$$V_{N_{n}-1}$$n$$\tau_{n,1}$$\tau_{n,2}$$\tau_{n,3}$

Then, $\nu_{n}$ is written as

[TABLE]

We show (cf. Theorem 2.10), for generators $G$ satisfying natural conditions, conditionally on the values $\{Y_{n,j}\}_{j\geq 1}$ , that the distributions of $\langle P_{n,j}:j\geq 1\rangle$ converge, as $n\rightarrow\infty$ , to a disordered GEM $\langle P^{+}_{j}:j\geq 1\rangle$ with parameters given in terms of $G$ and $\mu$ . Also, $\{Y_{n,j}\}_{j\geq 1}$ converges, as $n\rightarrow\infty$ , to a homogeneous Markov chain $\{Y_{j}\}_{j\geq 1}$ , with transition kernel in terms of $G$ and $\mu$ . In particular, the joint law of $\langle P_{n,j}:j\geq 1\rangle$ and $\{Y_{n,j}\}_{j\geq 1}$ converges, as $n\rightarrow\infty$ , to a Markov Chain conditional GEM distribution, denoted as the MCcGEM $(G)$ distribution with respect to $\mu$ .

In Theorem 2.12, we will then be able to show that $\nu_{n}$ converges to a random measure $\nu$ given in terms of $\langle P^{+}_{j}:j\geq 1\rangle$ and $\{Y_{j}\}_{j\geq 1}$ either in ‘stick-breaking’ or ‘clumped’ forms (1.4), (1.5). In particular, when $G=\theta(Q-I)$ where $Q$ is a constant stochastic matrix with identical rows $\mu$ , the associated sequences $\langle P^{+}_{j}:j\geq 1\rangle$ and $\{Y_{j}\}_{j\geq 1}$ simplify, and the limit $\nu$ is identified in Subsection 2.2.2 as a Dirichlet process. Returning to one of our motivations, we comment that when $\mathscr{X}$ is finite these results represent a more constructive view of the limits (1.3) found in [11].

Organization of the paper. We develop notions, make remarks, and state the main results, Theorems 2.4, 2.7, 2.10, 2.12, 2.13, 2.17, 2.18, and 2.19, in this order, in Section 2. Proofs are then given in Section 3.

2. Statement of results

We now formalize notation and state our main results, and related remarks about them, in several subsections. Throughout, we will use the convention that empty sums equal [math], and empty products are $1$ . Also, $1/0=\infty$ , $0/0=0$ , and $0^{0}=1$ . The notation $v^{t}$ signifies that the vector $v$ is in row form.

2.1. RAMs, GEMs and MCcGEM laws

A residual allocation model (RAM) is a way of defining a random probability measure on $\mathbb{N}$ by iteratively assigning a random portion of the unassigned probability remaining to the next integer.

Definition 2.1 (Residual Allocation Model - RAM).

Let ${\bf X}=\{X_{j}\}_{j\geq 1}$ be a collection of independent $[0,1]$ -valued random variables. Define

[TABLE]

Then, if ${\bf P}=\langle P_{j}:j\geq 1\rangle$ is a.s. a probability measure on ${\mathbb{N}}$ , that is if $\sum_{j=1}^{\infty}P_{j}\stackrel{{\scriptstyle a.s.}}{{=}}1$ , we say ${\bf P}$ is a RAM. If ${\bf X}$ consists of iid fractions, and the associated ${\bf P}$ is a RAM, we say ${\bf P}$ is a self-similar RAM.

Consider now the following identity, verified in Lemma 3.1: For an arbitrary sequence of numbers $\{a_{j}\}_{j\geq 1}$ and $k\geq 1$ ,

[TABLE]

Then, the sequence in (2.1) satisfies $P_{j}=X_{j}\prod_{i=1}^{j-1}(1-X_{i})$ for $j\geq 1$ (cf. Proposition 3.2). Accordingly, we have the useful observation that ${\bf P}$ is a RAM exactly when $\prod_{j\geq 1}(1-X_{j})\stackrel{{\scriptstyle a.s.}}{{=}}0$ .

A specific, well-known example of a RAM is the Griffiths-Engen-McCloskey (GEM) sequence.

Definition 2.2 (GEM).

Fix $\theta>0$ . Let ${\bf X}=\{X_{j}\}_{j\geq 1}$ be a sequence of iid variables with common distribution Beta $(1,\theta)$ . Then, the self-similar RAM ${\bf P}$ , constructed from ${\bf X}$ , is said to be a GEM $(\theta)$ distribution.

Also, consider a sequence $\{\theta_{j}\}_{j\geq 1}$ of positive numbers, and let ${\bf X}$ be a sequence of independent random variables where $X_{j}\sim{\rm Beta}(1,\theta_{j})$ for $j\geq 1$ . When the measure ${\bf P}$ , found in terms of ${\bf X}$ , is a RAM, we will say it is a disordered GEM sequence with parameters $\{\theta_{j}\}_{j\geq 1}$ .

Now, in a RAM ${\bf P}$ , one can clump adjacent probabilities with respect to an increasing sequence ${\bf u}$ , marking boundaries of clumps, to form a new probability measure ${\bf P^{u}}$ on $\mathbb{N}$ .

Definition 2.3 (Clumped measure).

Let ${\bf u}=\{u_{j}\}_{j\geq 1}$ be an increasing sequence in $\mathbb{N}\cup\{\infty\}$ with $u_{1}=1$ and $\lim_{j\rightarrow\infty}u_{j}=\infty$ , and let ${\bf P}$ be a RAM. We clump ${\bf P}$ according to ${\bf u}$ to construct a new probability measure ${\bf P^{u}}=\langle P^{u}_{j}:j\geq 1\rangle$ on $\mathbb{N}$ where, for $j\geq 1$ ,

[TABLE]

We remark, when ${\bf u}$ takes the value infinity at an entry $u_{j+1}$ in the sequence, necessarily ${\bf P^{u}}$ is a distribution supported on $\{1,2,\ldots,j\}$ .

An immediate question now is when ${\bf P^{u}}$ is also a RAM. We will show that ${\bf P^{u}}$ is always a RAM as long as ${\bf u}$ is deterministic. However, the situation is more involved when a random sequence is used for the clumping.

Specifically, we will be interested in two types of random clumping sequences constructed from a Markov chain ${\bf T}=\{T_{i}\}_{i\geq 1}$ on the discrete space $\mathscr{X}$ . The first sequence ${\bf V}$ comes from considering clumps of repeated values in ${\bf T}$ ; that is, ${\bf V}$ will keep track of the times when ${\bf T}$ switches values. The second sequence ${\bf W}$ arises in considering the times when ${\bf T}$ returns to its initial value $T_{1}$ .

For example, if ${\bf T}=(1,1,2,2,2,2,4,1,1,5,\ldots)$ is observed, we define ${\bf V}=(1,3,7,8,10,\ldots)$ and ${\bf W}=(1,2,8,9,\ldots)$ . More formally, Let $V_{1}=W_{1}=1$ and, for $j\geq 1$ , set

[TABLE]

In the case that ${\bf T}$ reaches an absorbing state, denoted $T_{\infty}$ , the chain is eventually constant and ${\bf V}$ is eventually infinite. In the case that $T_{1}$ is a transient state, the chain returns to the first state finitely many times and ${\bf W}$ eventually takes the value infinity.

Define now ${\bf Y}=\{Y_{j}\}_{j\geq 1}$ by $Y_{j}=T_{V_{j}}$ for $j\geq 1$ . When ${\bf T}$ does not reach an absorbing state, we think of ${\bf Y}$ as the sequence of values taken by ${\bf T}$ without repetition. If however ${\bf T}$ meets an absorbing state $T_{\infty}$ , ${\bf Y}$ will eventually be constant at value $T_{\infty}$ .

In the following theorem, a reader may like to focus on first pass on the case when ${\bf T}$ possesses no absorbing states and formulas simplify.

In what follows, we will say that a sequence ${\bf z}$ is a ‘possible’ sequence for a Markov chain ${\bf Z}$ on $\mathscr{X}$ if the event $\{Z_{i}=z_{i}:1\leq i\leq n\}$ has positive probability for each $n\geq 1$ .

Theorem 2.4 (Clumped RAMs).

Let ${\bf P}$ be a RAM. Fix an increasing sequence ${\bf u}=\{u_{j}\}_{j\geq 1}$ in $\mathbb{N}\cup\{\infty\}$ with $u_{1}=1$ and $\lim_{j\rightarrow\infty}u_{j}=\infty$ . Then,

(1)

${\bf P^{u}}$ * is a RAM with respect to fractions ${\bf X^{u}}=\{X^{u}_{j}\}_{j\geq 1}$ where*

[TABLE]

Let now ${\bf T}=\{T_{j}\}_{j\geq 1}$ be a Markov chain, independent of ${\bf P}$ and with homogeneous transition kernel $Q$ .

(2)

Then, the sequence ${\bf Y}=\{T_{V_{j}}\}_{j\geq 1}$ is a Markov chain with homogeneous transition kernel $K$ given by

[TABLE]

Let ${\bf t}$ be a possible sequence in $\mathscr{X}$ with respect to ${\bf T}$ . Let ${\bf y}$ be a possible sequence in $\mathscr{X}$ with respect to ${\bf Y}$ .

(3)

Then, ${\bf P^{V}}\bigr{|}{\bf T}={\bf t}$ and ${\bf P^{W}}\bigr{|}{\bf T}={\bf t}$ are RAMs.

(4)

Also, if ${\bf P}$ is self-similar, ${\bf P^{V}}\bigr{|}{\bf Y}={\bf y}$ is a RAM and, when $t_{1}$ is a recurrent state with respect to ${\bf T}$ , ${\bf P^{W}}\bigr{|}T_{1}=t_{1}$ is a self-similar RAM.

We remark that the specifications of the fractions and their distributions in items (4) are given in the proof of Theorem 2.4. These specifications, in the case when ${\bf P}$ is a GEM $(\theta)$ sequence, are part of Theorem 2.7.

Also, in item (4) above, we note that the self-similarity of ${\bf P}$ is important to deduce in full generality that ${\bf P^{V}}\bigr{|}{\bf Y}$ is a RAM. Later, in Example 2.9, we see that ${\bf P^{V}}\bigr{|}{\bf Y}$ may not be a RAM if ${\bf P}$ is not a self-similar RAM.

In addition, we observe that in item (4), when $t_{1}$ is a transient state, the sequence ${\bf X^{W}}|T_{1}=t_{1}$ eventually takes constant value $1$ since $t_{1}$ is visited only a finitely many times a.s. Given $X^{W}_{1}|T_{1}=t_{1}$ is a nontrivial variable, ${\bf X^{W}}\bigr{|}T_{1}=t_{1}$ cannot be iid. However, one may consider an iid sequence $\{Z_{i}\}_{i\geq 1}$ , say on a different probability space, where $Z_{1}\stackrel{{\scriptstyle d}}{{=}}X^{W}_{1}\bigr{|}T_{1}=t_{1}$ , and check that the self-similar RAM formed from fractions $\{Z_{i}\}_{i\geq 1}$ has the same distribution as ${\bf P^{W}}\bigr{|}T_{1}=t_{1}$ .

We now consider the clumping procedures with respect to a GEM distribution ${\bf P}$ . It will be convenient to define the notion of a generator kernel or matrix, these terms used interchangeably.

Definition 2.5 (Generator kernel).

Let $G=\{G_{i,j}:i,j\in\mathscr{X}\}$ be a square matrix on $\mathscr{X}$ . We say that $G$ is a generator kernel if it satisfies $G_{i,j}\geq 0$ for $i\neq j$ and $G_{i,i}=-\sum_{j\neq i}G_{i,j}$ . In addition, we will assume a boundedness condition, $\sup_{i}|G_{i,i}|<\infty$ .

Every matrix of the form $G=\theta(Q-I)$ , where $\theta>0$ and $Q$ is a stochastic kernel on $\mathscr{X}$ , is a generator matrix. Moreover, we claim that every generator matrix can be (non-uniquely) decomposed in this fashion: The final condition in Definition 2.5 ensures that all entries are bounded, $\sup_{l,k}|G_{l,k}|\leq\sup_{i}|G_{i,i}|<\infty$ , so that a normalizing $\theta$ can be found.

We also observe that a generator matrix $G$ has a zero row, that is $G_{i,i}=0$ for some $i\geq 1$ , exactly when $i$ is an absorbing state for a corresponding $Q$ . In particular, when $G$ does not have zero rows, any corresponding $Q$ does not have absorbing states.

We now formally define the notion of a Markov Chain conditional GEM (MCcGEM) joint distribution on the space $[0,1]^{{\mathbb{N}}}\times\mathscr{X}^{{\mathbb{N}}}$ , endowed with the product topology and product $\sigma$ -field formed in terms of the Borel $\sigma$ -fields on $[0,1]$ and $\mathscr{X}$ . This topology is discussed more in Subsection 3.4. By convention, we will say that a Beta $(1,0)$ random variable equals $1$ a.s.

Definition 2.6 (MCcGEM distribution).

With respect to a generator matrix $G$ , let ${\bf Y}$ be a homogeneous Markov chain with initial distribution $\mu$ and transition kernel $K_{G}$ on $\mathscr{X}$ given by

[TABLE]

Consider variables ${\bf X}=\{X_{j}\}_{j\geq 1}$ , on the same probability space as ${\bf Y}$ , such that $X_{j}\bigr{|}{\bf Y}={\bf y}\sim$ Beta $(1,-G_{y_{j},y_{j}})$ and $\{X_{j}\bigr{|}{\bf Y}={\bf y}\}_{j\geq 1}$ are independent. Define ${\bf P}$ where $P_{j}=X_{j}\prod_{i=1}^{j-1}\left(1-X_{i}\right)$ for $j\geq 1$ , and observe that ${\bf P}\bigr{|}{\bf Y}={\bf y}$ is a disordered GEM with parameters $\{-G_{y_{j},y_{j}}\}_{j\geq 1}$ (see below).

We say that the pair $({\bf P},{\bf Y})$ has MCcGEM $(G)$ distribution with respect to $\mu$ .

To see that ${\bf P}\bigr{|}{\bf Y}={\bf y}$ is a disordered GEM, we need only observe that ${\bf P}\bigr{|}{\bf Y}={\bf y}$ is a probability distribution on $\mathbb{N}$ . Here, $\prod_{n\geq 1}(1-X_{n})\bigr{|}\big{(}{\bf Y}={\bf y}\big{)}=0$ a.s. exactly when $\sum_{n\geq 1}X_{n}\bigr{|}{\bf Y}={\bf y}$ diverges a.s. As the tail $\sigma$ -field is trivial, the opposite is the summability $\sum_{n\geq 1}X_{n}\bigr{|}\big{(}{\bf Y}={\bf y}\big{)}<\infty$ a.s. By Kolmogorov’s $3$ -series theorem, and that ${\bf X}|{\bf Y}={\bf y}$ is composed of Beta random variables on $[0,1]$ with means $\{(1-G_{y_{j},y_{j}})^{-1}\}_{j\geq 1}$ and variances dominated by the means, almost sure summability holds exactly when $\sum_{j\geq 1}|G^{-1}_{y_{j},y_{j}}|<\infty$ . For a generator matrix $G$ , this is never the case as the terms $\{|G_{x,x}|\}_{x\in{\mathscr{X}}}$ are uniformly bounded above.

We now describe a relation between GEM distributions and MCcGEM laws through clumping with respect to a homogeneous Markov chain.

Theorem 2.7 (GEM to MCcGEM).

Let $\theta>0$ and ${\bf P}$ be GEM $(\theta)$ distribution. Let also ${\bf T}=\{T_{j}\}_{j\geq 1}$ be an independent homogeneous Markov chain with kernel $Q$ and initial distribution $\mu$ . Recall the associated switch times ${\bf V}$ , the clumped distribution ${\bf P^{V}}$ , and the Markov chain ${\bf Y}$ near (2.3).

Then, ${\bf Y}$ is a homogeneous Markov chain with kernel $K_{\theta(Q-I)}$ and ${\bf P^{V}}|{\bf Y=y}$ is a disordered GEM with parameters $\{\theta(1-Q_{y_{j},y_{j}})\}_{j\geq 1}$ , that is $({\bf P^{V}},{\bf Y})$ has MCcGEM $(\theta(Q-I))$ distribution with respect to $\mu$ .

Some cases of interest are developed in the following examples.

Example 2.8.

Suppose $\bf{P}\sim$ GEM $(\theta)$ and that $\bf{T}$ is a homogeneous Markov chain with stochastic kernel $Q$ where $Q$ has constant diagonal entries, $Q_{i,i}=q$ for $i\in{\mathscr{X}}$ . By Theorem 2.7, ${\bf P^{V}}\bigr{|}{\bf Y}$ is a disordered GEM sequence with parameters $\{\theta(1-Q_{y_{i},y_{i}})\}_{i\geq 1}$ . However, since $Q_{y_{i},y_{i}}\equiv q$ , we conclude ${\bf P^{V}}\bigr{|}{\bf Y}={\bf P^{V}}$ does not depend on ${\bf Y}$ and is actually a GEM $(\theta(1-q))$ sequence. In this case, the pair $({\bf P^{V}},{\bf Y})$ consists of independent sequences.

More generally, suppose $\bf{P}$ is any random distribution on $\mathbb{N}$ . Then, indeed, with respect to this Markov chain $\bf{T}$ , by the proof of Part (4) of Theorem 2.4 (cf. (3.14)), the fractions ${\bf X^{V}}$ do not depend on ${\bf Y}$ , and so $\bf{P^{V}}\bigr{|}{\bf Y}=\bf{P^{V}}$ .

Example 2.9.

We now consider a RAM $\bf{P}$ constructed from independent fractions $X_{j}\sim{\rm Beta}(1/2,1+j/2)$ for $j\geq 1$ . Such a RAM is a member of the well-known 2-parameter GEM $(\alpha,\theta)$ family, here with ${\bf P}\sim$ GEM $(1/2,1)$ . Let $\bf{T}$ be a sequence of iid Bernoulli $(1/2)$ variables. Thought of as a Markov chain on the $2$ -state space ${\mathscr{X}}=\{1,2\}$ , every entry of the stochastic kernel $Q$ of ${\bf T}$ equals $1/2$ . By the discussion in Example 2.8, as the diagonal entries of $Q$ are the constant $q=1/2$ , we have ${\bf P^{V}}\bigr{|}{\bf Y}={\bf P^{V}}$ .

We now observe that ${\bf P^{V}}$ is not a RAM: If it were a RAM, consider the associated non-atomic fractions ${\bf X^{V}}$ (cf. Part (1) of Theorem 2.4). Compute

[TABLE]

Then, ${\mathscr{E}}\left[1-X_{1}^{V}\right]=\sum_{m\geq 1}\frac{3}{3+m}(.5)^{m}$ , ${\mathscr{E}}\left[1-X_{2}^{V}\right]=\sum_{n,m\geq 1}\frac{3+m}{3+m+n}(.5)^{m+n}$ , and ${\mathscr{E}}\left[(1-X_{1}^{V})(1-X_{2}^{V})\right]=\sum_{m,n\geq 1}\frac{3}{3+m+n}(.5)^{m+n}$ . Hence, $\text{Cov}[1-X_{1}^{V},1-X_{2}^{V}]\approx-.005391$ , and so the non-atomic fractions are not independent, and ${\bf P^{V}}$ cannot be a RAM.

2.2. Clumping and time-inhomogeneous Markov chains

Of course, the notion of clumping can be applied to random probability measures on $\mathbb{N}$ , which are not RAMs. In particular, to capture the empirical occupation law limit of a Markov chain, we study its local occupations, or clumps of the sequence indexed in time, as it explores the space ${\mathscr{X}}$ . As noted in the introduction, we will look at these local occupations in reverse order.

Let ${\bf T}=\{T_{j}\}_{j\geq 1}$ be a Markov chain on the discrete space ${\mathscr{X}}$ , without absorbing states. Recall the definition of the switching times ${\bf V}$ (cf. (2.3)), and let $N_{n}=\min\{i:V_{i}>n\}$ index the first switch after time $n$ . For $1<k<N_{n}\leq i$ and $j\geq 1$ , define

[TABLE]

Also, set

[TABLE]

and $P_{n,j}=\tau_{n,j}/n$ . Consider the sequences ${\bf P}_{n}=\langle P_{n,j}:j\geq 1\rangle$ and ${\bf Y}_{n}=\{Y_{n,j}\}_{j\geq 1}$ .

As a concrete example, consider an observation ${\bf T}=(1,1,1,6,6,1,3,3,3,5,\ldots)$ . Then for $n=4$ , the local occupations are summarized by eventually constant sequences ${\bf P}_{4}=(1/4,3/4,0,0,0,0,\ldots)$ and ${\bf Y}_{4}=(6,1,1,1,1,1,\ldots)$ . Similarly, when $n=7$ , we have ${\bf P}_{7}=(1/7,1/7,2/7,3/7,0,0,0,\ldots)$ and ${\bf Y}_{7}=(3,1,6,1,1,1,1,\ldots)$ . For a more general depiction, please refer to the figure in Section 1.4.

Hence, for $l\in\mathscr{X}$ , we have generally that

[TABLE]

In the middle of the display, we see the average Markov chain ${\bf T}$ occupation of state $l$ in the first $n$ steps. On the right-hand side, the sum is over local occupations, or clumps, of state $l$ , seen in the chain ${\bf T}$ through $n$ steps. The notion suggested by this relation, part of the genesis of this article, is that we may study the limit average occupation law of ${\bf T}$ by investigating the limit of the pair $({\bf P}_{n},{\bf Y}_{n})$ describing local occupations.

We now focus on a class of time-inhomogeneous Markov chains for which the limits of $({\bf P_{n}},{\bf Y_{n}})$ have succinct representation. Specifically, we consider inhomogeneous Markov chains ${\bf T}$ with transition kernels $\{I+G/n\}$ , where $G$ is a generator matrix with no zero entries on the diagonal. A finite space ${\mathscr{X}}$ case where $G$ was taken to have no zero entries at all was studied in [11]; see also [15], [5] for related developments.

In these chains, the clump lengths $V_{k}-V_{k-1}$ are typically growing with $k$ , unlike for homogeneous Markov chains. In particular, rather than an ergodic theorem, it was shown in [11] (cf. (1.3)) that the occupation laws converge weakly to a nontrivial distribution. Here, we consider a countable space generalization, allowing for reducibility and transient states, and formulate a characterization of these occupation limits through the reversed clumping device described above.

In the following statement, we say that a matrix is non-negative if all its entries are non-negative. Additionally, weak convergences here are in the sense of finite-dimensional distributions, the natural sense associated to the product space $[0,1]^{\mathbb{N}}\times\mathscr{X}^{\mathbb{N}}$ endowed with the product topology.

Theorem 2.10 (Time-inhomogenous MC to MCcGEM).

Let $G$ be a generator matrix on $\mathscr{X}$ without zero rows. Let $\theta>0$ and $M\in\mathbb{N}$ be such that both $M,\theta>\inf\{r\in\mathbb{R}^{+}:I+r^{-1}G\text{ is non-negative}\}$ , and define $Q=I+G/\theta$ . Let also $\pi$ be a stochastic vector and $\mu$ be a stationary distribution of $Q$ so that entry-wise,

[TABLE]

Define kernels $\{K_{n}\}_{n\geq 1}$ by

[TABLE]

and let ${\bf T}$ be the inhomogeneous Markov chain with transition kernels $\{K_{n}\}_{n\geq 1}$ and initial distribution $\pi$ . Define $({\bf P}_{n},{\bf Y}_{n})$ as above with respect to ${\bf T}$ , and also define the generator matrix $G^{\prime}$ by

[TABLE]

Then, ${\bf Y}_{n}$ converges weakly to the homogeneous Markov chain ${\bf Y}^{\prime}$ with kernel $K_{G^{\prime}}$ and initial distribution $\mu$ . Also, for a possible sequence ${\bf y}$ of ${\bf Y}^{\prime}$ , we have ${\bf P}_{n}\bigr{|}{\bf Y}_{n}={\bf y}$ converges weakly to a disordered GEM sequence ${\bf P}^{\prime}$ with parameters $\{-G^{\prime}_{y_{n},y_{n}}\}_{n\geq 1}$ . Therefore, the associated pairs $({\bf P}_{n},{\bf Y}_{n})$ converge weakly to $({\bf P}^{\prime},{\bf Y}^{\prime})$ with MCcGEM $(G^{\prime})$ distribution with respect to $\mu$ .

Example 2.11.

In the context of Example 2.8, suppose $G$ has constant diagonal entries $g$ . Then, the local occupations of the inhomogeneous Markov chain ${\bf P}_{n}$ would converge to a GEM $(-g)$ distribution, not just conditionally in terms of a MCcGEM distribution.

We now characterize the limit occupation law of ${\bf T}$ in a ‘stick-breaking’ form with respect to either a MCcGEM distribution, or a paired GEM distribution and homogeneous Markov chain. In the following, weak convergence of $\nu_{n}$ is with respect to the discrete topology on $\Delta_{\mathscr{X}}$ , the space of probability measures on ${\mathscr{X}}$ .

Theorem 2.12 (Occupation laws to MCcGEM and stick-breaking measures).

Consider the setting and assumptions of Theorem 2.10. Observe that $\mu$ is a stationary distribution of $Q^{\prime}=I+G^{\prime}/\theta$ , and let ${\bf T}^{\prime}$ be the homogeneous and stationary Markov chain with kernel $Q^{\prime}$ and initial distribution $\mu$ . Let ${\bf P}^{+}$ be a GEM $(\theta)$ sequence independent of ${\bf T}^{\prime}$ .

Then, $\nu_{n}=\left\langle\frac{1}{n}\sum_{j=1}^{n}\delta_{T_{j}}(l):l\in\mathscr{X}\right\rangle\xrightarrow{d}\nu$ , where

[TABLE]

In a sense, reversing the procedure, starting from the stick-breaking process $\sum_{j\geq 1}P^{+}_{j}\delta_{T^{\prime}_{j}}$ , we may identify it as the limit of the occupation measure of a matched time-inhomogeneous Markov chain, almost a corollary of Theorem 2.12.

Theorem 2.13 (Stick-breaking measures to Occupation laws).

Let $\theta>0$ and ${\bf P}^{+}$ is a GEM $(\theta)$ sequence. Let also $\tilde{Q}$ be a stochastic matrix without absorbing states and with stationary distribution $\mu$ . Suppose ${\bf T}^{\prime}$ is an independent homogeneous Markov chain with kernel $\tilde{Q}$ starting from $\mu$ .

Then,

[TABLE]

where $\nu\stackrel{{\scriptstyle d}}{{=}}\lim_{n\rightarrow\infty}\nu_{n}$ is the occupation law defined with respect to an inhomogeneous Markov chain ${\bf T}$ , as in the setting of Theorem 2.10, with respect to generator matrix $\tilde{G}^{\prime}$ , starting from any distribution $\pi$ satisfying $\pi^{t}(\tilde{Q}^{\prime})^{n}\rightarrow\mu^{t}$ entry-wise. Here, $\tilde{G}^{\prime}$ and $\tilde{Q}^{\prime}$ are given by $\tilde{G}_{ij}^{\prime}=\big{(}\mu_{j}/\mu_{i}\big{)}\tilde{G}_{j,i}{\mathbbm{1}}(\mu_{i}\neq 0)$ where $\tilde{G}=\theta(\tilde{Q}-I)$ , and $\tilde{Q}^{\prime}=I+\tilde{G}^{\prime}/\theta$ .

In the next two subsections, we discuss remarks on Theorems 2.10 and 2.12, and a case when the random measure $\nu$ is a Dirichlet process.

2.2.1. Remarks

We now make several comments on Theorems 2.10 and 2.12.

1. Although we have specified that $G$ has no zero rows in Theorems 2.10 and 2.12, and therefore no absorbing states for ${\bf T}$ , one can extend some of the statements trivially to the case when there are absorbing states. In particular, when the limit $\mu$ is the unit point mass at an absorbing state $z$ of $Q$ , we have $G^{\prime}_{z,z}=G_{z,z}=0$ and $K_{n}(z,z)=K_{G^{\prime}}(z,z)=1$ . Then, the state $z$ is also an absorbing state for the inhomogeneous Markov chain ${\bf T}$ , reached in finite time a.s. starting from $\pi$ . Also, the chain ${\bf T}^{\prime}$ , starting from $\mu$ , is the constant sequence of $z$ ’s. In addition, the limit of $Y_{n,1}$ is $z$ , and $P_{n,1}$ tends to $1$ a.s. We conclude that ${\bf P}_{n}$ converges weakly to ${\bf P}^{\prime}=\langle 1,0,\ldots\rangle$ , a GEM with constant fractions $1={\rm Beta}(1,0)$ . Moreover, the empirical distribution $\nu_{n}$ of the chain ${\bf T}$ converges weakly to $\delta_{z}$ . We also observe that $\sum_{j\geq 1}P^{\prime}_{j}\delta_{Y_{j}^{\prime}}$ , and also $\sum_{j\geq 1}P_{j}^{+}\delta_{T^{\prime}_{j}}$ both equal $\delta_{z}$ in distribution.

2. There is a degree of freedom in picking a pair $(\theta,Q)$ . However, when specifying a MCcGEM distribution, each valid pair corresponds to the same generator matrix $G$ in this context. On the other hand, this family of pairs $({\bf P}^{+},{\bf T}^{\prime})$ of a GEM distribution and Markov chain, indexed in $\theta$ , will have different joint distributions, although they all correspond to a single measure $\nu$ . We explore this notion in the case of Dirichlet processes in Subsection 2.2.2 below.

3. The convergence (2.9) is a condition on the structure of positive recurrent states of the homogeneous Markov chain ${\bf T^{Q}}$ run with kernel $Q=I+G/\theta$ . Since the limit $\mu$ is a stationary distribution with respect to $Q$ , the chain must have a positive recurrent state, and $\mu$ is positive only on such states. The initial distribution $\pi$ must be such that observation of a positive recurrent state occurs with probability 1.

In general, $\mu$ depends on $\pi$ when there is more than one irreducible class of positive recurrent states. We note, along with positive recurrent states, there may also be null recurrent and transient states associated with $Q$ .

In the case that $Q$ has a single class of positive recurrent states, then $\mu$ will be the unique stationary distribution associated with $Q$ and will not depend on $\pi$ .

It could be that $Q$ has an infinite number of null recurrent or transient states, in addition to positive recurrent states. But, the requirement that $\mu$ be stochastic means that the chain ${\bf T^{Q}}$ cannot visit a null recurrent state or remain indefinitely on transient states a.s. This reflects that the limit of $({\bf P_{n}},{\bf Y_{n}})$ corresponds to the long time average occupations of states in ${\mathscr{X}}$ .

4. Any null recurrent or transient state of the chain run with $Q$ corresponds to a zero row of $G^{\prime}$ or in other words an absorbing state for the chains ${\bf T}^{\prime}$ and ${\bf Y}^{\prime}$ . However, such absorbing states are never visited by ${\bf T}^{\prime}$ : The initial distribution $\mu$ is a stationary distribution of $Q$ , which vanishes on these states. Moreover, as $\mu$ is also a stationary distribution of $Q^{\prime}$ , the chain ${\bf T}^{\prime}$ can only move on the positive recurrent states of ${\bf T^{Q}}$ , the states $\{i\in\mathscr{X}:\mu_{i}>0\}$ .

Similarly, starting from $\mu$ , the chain ${\bf Y}^{\prime}$ moves only on states $\{i\in\mathscr{X}:\mu_{i}>0\}$ , given that $G^{\prime}_{w,z}=K_{G^{\prime}}(w,z)=0$ when either $\mu_{z}=0$ or $\mu_{w}=0$ and $w\neq z$ .

Also, we comment that the chain ${\bf T}^{\prime}$ run with $Q^{\prime}$ is a form of time-reversal of ${\bf T^{Q}}$ with respect to stationary distribution $\mu$ , reflecting the reverse chronological construction of the ${\bf Y}_{n}$ sequences.

2.2.2. Dirichlet process limits

In a particular case of Theorem 2.12, we observe that we may recover Dirichlet processes. Suppose $\mu(i)>0$ for all $i\in{\mathscr{X}}$ . When $Q$ has constant rows equal to $\mu^{t}$ , the Markov chain ${\bf T^{\prime}}$ has transition kernel $Q^{\prime}=Q$ , and therefore ${\bf T}^{\prime}$ is an iid sequence with common distribution $\mu$ . Then, $\nu=\sum_{j\geq 1}P^{+}_{j}\delta_{T^{\prime}_{j}}$ , formed from a GEM $(\theta)$ sequence ${\bf P}^{+}$ and an independent sequence of iid random variables ${\bf T}^{\prime}$ , is the ‘stick-breaking’ representation of a Dirichlet process with parameters $\theta$ and measure $\mu$ on the discrete space $\mathscr{X}$ (cf. [45]). Specifically, as noted in the introduction, when $\mathscr{X}$ is finite we have that $\nu$ is a Dirichlet distribution with parameters $\{\theta\mu_{j}\}_{j\in\mathscr{X}}$ . (cf. [12], [26]).

Moreover, since the distribution of $\nu$ is determined by $G$ , there is a degree of freedom in specifying $G$ via a pair $(\theta,Q)$ . Write $G$ in two forms: (1) $G=\theta(Q-I)$ where $\theta>0$ and $Q$ is stochastic with constant rows $\mu^{t}$ , and also (2) $G=\tilde{\theta}(\tilde{Q}-I)$ where $\tilde{\theta}>0$ , $\theta\neq\tilde{\theta}$ , and $\tilde{Q}$ is stochastic. Then again, $\tilde{Q}=\tilde{Q}^{\prime}$ and via Theorem 2.12, we recover a different stick-breaking representation, $\sum_{j=1}^{\infty}P_{j}^{\tilde{\theta}}\delta_{T^{\tilde{Q}}_{j}}$ , of the Dirichlet process with parameters $\theta$ and $\mu$ , in terms of GEM $(\tilde{\theta})$ sequence $\mathbf{P}^{\tilde{\theta}}$ and an independent homogeneous Markov chain $\mathbf{T^{\tilde{Q}}}$ with $T^{\tilde{Q}}_{1}\sim\mu$ and kernel $\tilde{Q}$ .

Here, $\tilde{Q}=\frac{\theta}{\tilde{\theta}}Q+(1-\frac{\theta}{\tilde{\theta}})I$ is the weighted average of $Q$ and $I$ . Since $\tilde{Q}$ no longer has constant rows, $\mathbf{T^{\tilde{Q}}}$ no longer consists of iid variables. The chain ${\bf T^{\tilde{Q}}}$ is, in a sense, a more or less ‘sticky’ version of an iid $\sim\mu$ sequence depending on the weight of $I$ in the weighted average relation for $\tilde{Q}$ .

2.3. Self-similarity of the occupation laws

At this point, it is natural to ask for other ways to understand the laws in Theorem 2.12. Consider the general random measure

[TABLE]

where ${\bf P}$ is a self-similar RAM composed of fractions ${\bf X}$ , and ${\bf T}$ is an independent homogeneous Markov chain with transition kernel $Q$ and initial distribution $\mu$ , assigning zero probability to any transient state of $Q$ . We remark that $\nu$ reduces to the measure in Theorem 2.12 when ${\bf P}\sim$ GEM $(\theta)$ and $\mu$ is a stationary vector of $Q$ . We first discuss an example.

Example 2.14.

As we have noted earlier, if ${\bf P}\sim$ GEM $(\theta)$ and ${\bf T}$ is an independent sequence of iid variables with distribution $\mu$ , the measure $\nu$ is the ‘stick-breaking’ representation of the Dirichlet process with parameters $\theta$ and measure $\mu$ on $\mathscr{X}$ . Following [45], a self-similarity relation can be deduced:

[TABLE]

where $\tilde{\nu}\stackrel{{\scriptstyle d}}{{=}}\nu$ is another random measure, and $X_{1}\sim{\rm Beta}(1,\theta)$ , $T_{1}\sim\mu$ and $\tilde{\nu}$ are independent. From such an equation, the Dirichlet process characterization of $\nu$ with parameters $\theta$ and measure $\mu$ on $\mathscr{X}$ follows from classical considerations. Moreover, this relation is central in calculation of a posterior distribution, given say $X_{1}$ , when $\nu$ is thought of as a law on priors. See also the recent work [32] and [44] on related integral characterizations.

We now define a more general notion of self-similarity. This notion is well known (cf. [23] among other references). With respect to a measurable space $(\mathscr{A},\mathscr{B}_{\mathscr{A}})$ , let $\mathbb{P}_{\mathscr{A}}$ be the space of probability measures on $(\mathscr{A},\mathscr{B}_{\mathscr{A}})$ . Let $\mathbb{F}_{\mathscr{A}}$ be the smallest $\sigma$ -field generated by sets of the form $\Big{\{}\{\chi:\chi(A)<r\}:A\in\mathscr{B}_{\mathscr{A}},r\in[0,1]\Big{\}}$ .

Definition 2.15 (Self-similar random measure).

We say that the law of a random distribution $\chi$ on $(\mathbb{P}_{\mathscr{A}},\mathbb{F}_{\mathscr{A}})$ is self-similar with respect to $(\eta,X)$ if it satisfies

[TABLE]

where $X$ is a $[0,1]$ -valued random variable, $\eta$ is a random distribution on $\mathbb{P}_{\mathscr{A}}$ , and $\tilde{\chi}$ is random measure with the same distribution as $\chi$ and independent of $(\eta,X)$ , defined on the space $[0,1]\times\mathbb{P}_{\mathscr{A}}\times\mathbb{P}_{\mathscr{A}}$ .

The key is that such self-similarity may uniquely identify a distribution. The following is part of Lemma 3.3 in [45]; see also [23] for more involved statements. For the convenience of the reader, a proof is given in Subsection 3.6.

Lemma 2.16.

There exists a unique in law self-similar random measure $\chi$ on $(\mathbb{P}_{\mathscr{A}},\mathbb{F}_{\mathscr{A}})$ with respect to $(\eta,X)$ when $\mathscr{P}(X=0)<1$ .

We now state that $\nu$ defined in (2.13) is self-similar in a certain way. Let ${\bf X}=\{X_{j}\}_{j\geq 1}$ be the iid fractions from which ${\bf P}$ is constructed. For each recurrent state $i$ of $Q$ , let ${\bf T}^{i}$ be a Markov chain with transition kernel $Q$ and initial value $T^{i}_{1}=i$ , independent of ${\bf X}$ and $(\nu,T_{1})$ . Define the finite cycle length and associated clumped residual fraction,

[TABLE]

Set

[TABLE]

Theorem 2.17 (Type of self-similarity).

The law of $(\nu,T_{1})$ uniquely satisfies the following: Marginally, $T_{1}\sim\mu$ and, for each recurrent state $i$ of $Q$ ,

[TABLE]

where $\tilde{\nu}^{i}$ is random measure with the same law as $\nu^{i}$ , such that $\tilde{\nu}^{i}$ and $(\eta^{i},X^{i})$ are independent.

If $\nu$ is thought of as a distribution on priors, the notion of a posterior distribution given a cycle of data $X^{i}$ might be considered from the self-similarity (2.15). However, we remark that such a computation does not seem as tractable as in the case $\nu$ is a Dirichlet process (cf. [45]).

One might ask what happens when starting from a transient state $T_{1}=i$ . In this case, there is positive chance that one will not return to $T_{1}$ . As above, one may write down a first ‘cycle’ decomposition but, because $W^{i}$ may not be finite, the decomposition does not immediately lead to a ‘self-similarity’ equation as in (2.15). However, one might consider a stick-breaking construction, on a different probability space, which does lead to a ‘self-similarity’ equation. Indeed, following the discussion after Theorem 2.4, consider for transient states $T_{1}=i$ an iid sequence of pairs $\{(Z_{j},\eta_{j})\}_{j\geq 1}$ with common distribution $(X^{i},\eta^{i})\bigr{|}T_{1}=i$ , and form a stick-breaking construction, which after an exercise is seen to be equivalent-in-distribution to $\nu^{i}$ :

[TABLE]

Then, $\nu^{i}\stackrel{{\scriptstyle d}}{{=}}Z_{1}\eta_{1}+(1-Z_{1})\tilde{\nu}^{i}$ where $\tilde{\nu}^{i}\stackrel{{\scriptstyle d}}{{=}}\nu^{i}$ , and $\tilde{\nu}^{i}$ and $(\eta_{1},Z_{1})$ are independent.

2.4. Moments of the occupation laws

We first recall Theorems 1.3 and 1.4 in [11]: Suppose $G$ is a generator matrix on finite state space $\mathscr{X}=\{1,2,...,k\}$ with no 0 entries. By identification through its moments, the limiting occupation random variable $\nu=\langle\nu_{1},...,\nu_{k}\rangle$ (cf. (2.12)) of an inhomogeneous Markov chain $\mathbf{T}$ with kernels of the form $K_{n}=I+\frac{G}{n}1(n>M)$ was found: Let ${\mathbb{N}}_{0}=\{0,1,2,\ldots\}$ be the whole numbers. For ${\bf m}=(m_{1},\ldots,m_{k})\in{\mathbb{N}}_{0}^{k}$ and $N=\sum_{j=1}^{k}m_{j}>0$ , we have

[TABLE]

where $\mu$ is the unique stochastic eigenvector of $G$ , and $\mathbb{S}({\bf m})$ is the set of ${N\choose m_{1},...,m_{k}}$ distinct permutations of the list of $N$ integers consisting of $m_{1}$ many 1’s, $m_{2}$ many 2’s, up through $m_{k}$ many $k$ ’s.

In particular, when $G$ can be written $\theta(Q-I)$ where $\theta>0$ and $Q$ is the stochastic matrix with constant rows $\mu$ , the expectation reduces to the moments of the Dirichlet distribution with parameters $\theta\mu$ : ${\mathscr{E}}\left[\prod_{j=1}^{k}\nu_{j}^{m_{j}}\right]=((\theta)_{N})^{-1}\prod_{j=1}^{k}(\theta\mu_{j})_{m_{j}}$ where $(a)_{n}={\Gamma(a+n)}/{\Gamma(a)}$ is the Pochhammer symbol, that is a rising factorial. However, when $G$ is not of this form and $k\geq 3$ , one can see that the moments may not describe a Dirichlet distribution.

In this context, we detail now some more descriptions of these laws. Observe that the matrix $\big{(}I-G/j\big{)}^{-1}$ for $j\geq 1$ is a resolvent operator with respect to the transition function $\{{\mathscr{P}}^{G}_{s}:s\geq 0\}$ of a continuous time Markov chain with generator $G$ . In particular, it is standard to write

[TABLE]

As a consequence, $\widetilde{K}_{j}:=\big{(}I-G/j\big{)}^{-1}$ itself is a stochastic kernel on ${\mathscr{X}}$ .

In the Dirichlet case $G=\theta(Q-I)$ , where $\theta>0$ and each row of $Q$ is the stationary measure $\mu$ , one can see by calculating via the backward equation $\frac{d}{ds}{\mathscr{P}}^{G}_{s}=G{\mathscr{P}}^{G}_{s}$ and ${\mathscr{P}}^{G}_{0}=I$ that

[TABLE]

More generally, let ${\bf Z}=\{Z_{k}\}_{k\geq 1}$ be the inhomogeneous Markov chain with initial distribution $\mu$ and transition kernels $\widetilde{K}_{n}$ for $n\geq 1$ .

We first observe a type of ‘duality’ relation between the moments of $\nu$ and ${\bf Z}$ .

Theorem 2.18 (Recasting moments I).

Recall the setting of [11] given above. Then $\widetilde{K}_{n}=K_{n}+O(n^{-2})$ and the measure $\nu$ with respect to ${\bf T}$ is also the occupation law with respect to ${\bf Z}$ ,

[TABLE]

Moreover, the moments may be expressed in terms of ${\bf Z}$ ,

[TABLE]

and, in particular, ${\mathscr{E}}\left[\nu_{i}^{N}\right]=P(Z_{1}=...=Z_{N}=i)$ .

Alternatively, we now recast the moment result (2.16) in an algebraic form where it can be more easily exploited. Let $p_{min}(\lambda)$ be the minimal polynomial of $G$ and $q(\lambda)=\sum_{k=0}^{n}a_{k}\lambda^{k}$ be the polynomial such that $p_{min}(\lambda)=\lambda q(\lambda)$ . Define, for $j\geq 0$ ,

[TABLE]

Theorem 2.19 (Recasting moments II).

We have $p_{0}(G)/q(0)$ is the matrix with constant rows $\mu$ , and $p_{j}(G)/q(j)=\widetilde{K}_{j}$ for $j\geq 1$ . As a consequence, for ${\bf m}\in\mathbb{N}_{0}^{k}$ with $\sum_{i=1}^{k}m_{i}=N>0$ and fixed constant $\sigma_{0}=1$ ,

[TABLE]

One can now recover the moments of the marginals.

Corollary 2.20 (Marginals).

Let $\{\lambda_{l}\}_{l=1}^{n}$ be the non-zero roots of $q$ , all of which are non-zero eigenvalues of $G$ . Let also $\{\gamma_{i,l}\}_{l=1}^{n}$ be the zeros of $[p_{j}(G)]_{ii}$ considered as a function of $j$ . Then,

[TABLE]

Interestingly, when $\Lambda=\{\lambda_{l}\}_{l=1}^{n}$ and $\Gamma_{i}=\{\gamma_{i,l}\}_{l=1}^{n}$ are real and pairwise ordered $\lambda_{j}<\gamma_{i,j}<0$ , we recognize these marginal moments as the product of the $N$ th order moments of independent Beta $(-\gamma_{i,l},\gamma_{i,l}-\lambda_{l})$ variables for $1\leq l\leq n$ .

In the Dirichlet case, when $G$ is of the form $\theta(Q-I)$ where $\theta>0$ and $Q$ is stochastic with constant rows $\mu$ , we have $q(j)=j+\theta$ and $p_{j}(\lambda)=\lambda+j+\theta$ . This corresponds to $[p_{j}(G)]_{ii}=j+\theta\mu_{i}$ and ${\mathscr{E}}[\nu_{i}^{N}]=\frac{(\theta\mu_{i})_{N}}{(\theta)_{N}}$ , the $N$ th order moments of a Beta $(\theta\mu_{i},\theta(1-\mu_{i}))$ variable or equivalently the $i$ th marginal of a Dirichlet variable with parameters $\theta\mu$ .

However, in general, $\Lambda$ and $\Gamma_{i}$ need not be sets of real numbers, and (2.21) gives the moments of beta products in a sense with complex parameters.

The marginal density function $f_{i}$ of $\nu_{i}$ can be written in terms of Meijer G-functions, typically denoted $G^{M,N}_{P,Q}\left(\left.\begin{array}[]{c}\vec{a}\\ \vec{b}\end{array}\right|z\right)$ where $M\leq Q$ and $N\leq P$ are non-negative integers, $\vec{a}\in\mathbb{C}^{P}$ , and $\vec{b}\in\mathbb{C}^{Q}$ . Given $\Lambda$ and $\Gamma_{i}$ , $f_{i}$ is given by

[TABLE]

The class of Meijer-G functions includes generalized hypergeometric functions, among others. For a thorough review of Meijer G-functions, their specification, and connection to Beta products via the Mellin transform, see [34]. See also [13] for a discussion of the distributional properties of the product of two Beta variables with complex parameters with an application to risk theory.

3. Proofs

We first note a standard algebraic identity which leads to useful formulas for RAMs. Recall our conventions specified at the beginning of section 2.

Lemma 3.1.

For any sequence of numbers $a_{j}$ and integer $k\geq 1$ , we have

[TABLE]

Proof. We proceed by an induction. Equation (3.1) is trivially true for $k=1$ : $(1-a_{1})+a_{1}=1$ . If it is true for $k-1$ , then the left-hand side of (3.1) equals

[TABLE]

Proposition 3.2.

Consider a distribution ${\bf P}=\langle P_{j}:j\geq 1\rangle$ on ${\mathbb{N}}$ and factors ${\bf X}=\{X_{j}\}_{j\geq 1}$ with

[TABLE]

Then, $P_{j}=X_{j}\prod_{i=1}^{j-1}(1-X_{i})$ for $j\geq 1$ .

In particular, if ${\bf P}$ is a RAM constructed from ${\bf X}=\{X_{j}\}_{j\geq 1}$ , for $1\leq k\leq r$ , we have

[TABLE]

Proof.

Part (I) follows from (3.1) by an induction: Trivially, $P_{1}=X_{1}$ . Suppose $P_{k}=X_{k}\prod_{i=1}^{k-1}(1-X_{i})$ for $k\leq j$ and so, by (3.1), we have $\prod_{k=1}^{j}(1-X_{k})=1-\sum_{k=1}^{j}P_{k}$ . Then, $P_{j+1}=X_{j+1}\left(1-\sum_{k=1}^{j}P_{k}\right)=X_{j+1}\prod_{k=1}^{j}(1-X_{k})$ .

For Part (II), the lines in (3.2) follow from Part (I) and (3.1). ∎

3.1. Proof of Theorem 2.4: Clumped RAMs

Let ${\bf P}$ be a RAM, and let ${\bf X}=\{X_{j}\}_{j\geq 1}$ be the independent proportions from which ${\bf P}$ is constructed. From Proposition 3.2, for $j\geq 1$ , we have $P_{j}=X_{j}\prod_{i=1}^{j-1}(1-X_{i})$ .

Let ${\bf u}=\{u_{j}\}_{j\geq 1}$ be an increasing sequence in ${\mathbb{N}}\cup\{\infty\}$ with $u_{1}=1$ and $\lim_{j\rightarrow\infty}u_{j}=\infty$ . Define new proportions ${\bf X^{u}}=\{X^{u}_{j}\}_{j\geq 1}$ from ${\bf X}$ , using Proposition 3.2 again: For $j\geq 1$ ,

[TABLE]

Recall, for $j\geq 1$ , that $P^{u}_{j}=\sum_{i=u_{j}}^{u_{j+1}-1}P_{i}$ when $u_{j}<\infty$ and $P^{u}_{j}=0$ otherwise, and ${\bf P^{u}}=\{P^{u}_{j}\}_{j\geq 1}$ .

We now proceed to the proofs of Parts (1)-(4).

3.1.1. Proof of Part (1)

We now verify that ${\bf P^{u}}$ is a RAM with respect to fractions ${\bf X^{u}}$ : Let $B=\sup\{j:u_{j}<\infty\}$ . For $1\leq j\leq B$ , noting (3.6), write

[TABLE]

For $j>B$ , note $P_{j}^{u}=0$ and $\prod_{i=1}^{B}(1-X_{i}^{u})=1-\sum_{i=1}^{B}P_{i}^{u}=1-\sum_{i=1}^{\infty}P_{i}=0$ . Then, $X_{j}^{u}\prod_{i=1}^{j-1}(1-X_{i}^{u})=0=P_{j}^{u}$ .

Since ${\bf X}=\{X_{j}\}_{j\geq 1}$ is composed of independent variables, so is ${\bf X^{u}}=\{X^{u}_{j}\}_{j\geq 1}$ . Hence, as $\sum_{j\geq 1}P^{u}_{j}=\sum_{j\geq 1}P_{j}\stackrel{{\scriptstyle a.s.}}{{=}}1$ , by definition, ${\bf P^{u}}$ is a RAM constructed from independent proportions ${\bf X^{u}}$ . ∎

3.1.2. Proof of Part (2)

Let ${\bf y}=\{y_{i}\}_{i\geq 1}$ be a possible sequence for ${\bf Y}$ in $\mathscr{X}$ . Define $J=\inf\{j\geq 1:y_{j}=y_{j+1}\}$ . Then, ${\bf y}$ is then either non-repeating and $J=\infty$ , or ${\bf y}$ is non-repeating until reaching a finite time $J$ , after which the sequence is constant.

For $1\leq n<J$ , the event that $Y_{i}=y_{i}$ for $1\leq i\leq n$ means the chain ${\bf T}$ starts in $y_{1}$ , staying there until time $V_{2}$ , when it switches to $y_{2}$ , remaining there until time $V_{3}$ , and so on up to time $V_{n}$ when it moves into $y_{n}$ . Write for $n<J$ that

[TABLE]

Suppose $J<\infty$ . Then, $y_{J}$ is an absorbing state of ${\bf T}$ and, for $i\geq J$ , we have $Q_{y_{i},y_{i}}=K(y_{i},y_{i})=K(y_{i},y_{i+1})=1$ . Define $l_{J}=\infty$ and write for $n\geq J$ that

[TABLE]

We conclude therefore that ${\bf Y}$ is a Markov chain with kernel $K$ . ∎

3.1.3. Proof of Part (3)

Recall the definitions of the increasing random sequences ${\bf V}$ and ${\bf W}$ with $V_{1}=W_{1}=1$ (cf. (2.3)), and ${\bf P^{V}}$ and ${\bf P^{W}}$ . For each realization, ${\bf V}$ and ${\bf W}$ are functions of the Markov sequence ${\bf T}$ . Therefore, conditional on ${\bf T}$ given the possible trajectory ${\bf t}$ with respect to ${\bf T}$ , it follows immediately from the proved Part (1) that ${\bf P^{V}}\bigr{|}{\bf T}={\bf t}$ and ${\bf P^{W}}\bigr{|}{\bf T}={\bf t}$ are RAMs. ∎

3.1.4. Proof of Part (4)

If ${\bf P}$ is a RAM, we have $\sum_{i\geq 1}P^{V}_{i}=\sum_{i\geq 1}P_{i}=1$ a.s. or $\sum_{i\geq 1}P^{W}_{i}=1$ a.s. respectively. Hence, in the two situations, we need only show the associated fractions ${\bf X^{V}}$ or ${\bf X^{W}}$ are conditionally independent or iid to deduce, respectively, that ${\bf P^{V}}\bigr{|}{\bf Y}={\bf y}$ is a RAM or ${\bf P^{W}}\bigr{|}T_{1}=t_{1}$ is a self-similar RAM. We consider first the claim for ${\bf P^{V}}$ , before discussing the statement for ${\bf P^{W}}$ at the end.

Let ${\bf y}$ be a possible sequence with respect to ${\bf Y}$ , and associate to ${\bf y}$ the time $J$ as in the proof of part (2). With respect to fixed times $V_{i+1}-V_{i}=l_{i}\in{\mathbb{N}}$ for $1\leq i<J$ , noting (3.7), we have for $m\leq n<J$ that

[TABLE]

Suppose $J<\infty$ , and define $l_{J}=\infty$ . For $n\geq J$ , noting the calculation after (3.7), write

[TABLE]

Recall (3.6), and consider the variables ${\bf X^{V}}=\{X^{V}_{j}\}_{j\geq 1}$ where

[TABLE]

When ${\bf X}$ is composed of iid variables, that is ${\bf P}$ is a self-similar RAM, we will argue now that the fractions ${\bf X^{V}}\bigr{|}{\bf Y}={\bf y}$ form a conditionally independent sequence, and therefore ${\bf P^{V}}\bigr{|}{\bf Y}={\bf y}$ is RAM. We split into subcases, $J=\infty$ versus $J<\infty$ .

When $J=\infty$ , let $r\geq n\geq 1$ , and $\langle\alpha_{i}:1\leq i\leq n\rangle\in(0,1)^{n}$ . Write

[TABLE]

Relative to $\{l_{j}\}_{j=1}^{n}$ , define the sequence ${\bf u}=\{u_{j}\}_{j=1}^{n+1}$ where $u_{1}=1$ and $u_{j}=1+\sum_{k=1}^{j-1}l_{k}$ for $2\leq j\leq n+1$ , which marks the first $n$ times when ${\bf Y}$ changes states. In particular, on the event $\big{\{}V_{i+1}-V_{i}=l_{i},1\leq i\leq n\big{\}}$ , we have $V_{j}=u_{j}$ for $1\leq j\leq n+1$ . Given this event, from (3.6), the fractions $\{X^{V}_{j}\}_{j=1}^{n}$ satisfy $1-X^{V}_{j}=\prod_{k=u_{j}}^{u_{j+1}-1}(1-X_{k})$ for $1\leq j\leq n$ and are independent, no longer depending on ${\bf Y}$ . The last display (3.13), noting (3.1.4), equals

[TABLE]

in factored form. Therefore, the fractions ${\bf X^{V}}$ are conditionally independent as desired and ${\bf P^{V}}\bigr{|}{\bf Y}={\bf y}$ is a RAM in the case $J=\infty$ .

When $J<\infty$ , note that the collection $\{X^{V}_{j}|{\bf Y=y}\}_{j\geq J}$ is a deterministic sequence of 1s. Thus, we need only show that the proportions $\{X^{V}_{j}|{\bf Y=y}\}_{j=1}^{J-1}$ are independent. Define $l_{J}=\infty$ and for $n\geq J$ , write that

[TABLE]

Define for $j\leq J+1$ the sequence $u_{j}$ as before, and note $u_{J+1}=\infty$ . One derives similarly, noting the calculation after (3.1.4), that the last display (3.15) equals

[TABLE]

in factored form. Therefore, the fractions ${\bf X^{V}}$ are conditionally independent as desired and ${\bf P^{V}}\bigr{|}{\bf Y}={\bf y}$ is a RAM also in the case $J<\infty$ .

We now aim to show when ${\bf P}$ is a self-similar RAM and $t_{1}$ is a recurrent state for ${\bf T}$ that ${\bf P^{W}}|T_{1}=t_{1}$ is a self-similar RAM. As $t_{1}$ is a recurrent state with respect to ${\bf T}$ , almost surely the sequence ${\bf W}$ does not take on the value $\infty$ . Consider the variables ${\bf X^{W}}=\{X^{W}_{j}\}_{j\geq 1}$ where

[TABLE]

Then, noting (3.6), almost surely, $X^{W}_{j}=\sum_{i=W_{j}}^{W_{j+1}-1}X_{i}\prod_{l=W_{j}}^{i-1}(1-X_{l})$ .

Following the above argument, with respect to ${\bf X^{V}}$ when $J=\infty$ , we arrive at the equation

[TABLE]

But, given $T_{1}=t_{1}$ , the variables $\{W_{j+1}-W_{j}\}_{j\geq 1}$ are iid cycle lengths of the Markov chain. Hence, the last display equals

[TABLE]

indicating the fractions ${\bf X^{W}}$ are conditionally iid, and therefore ${\bf P^{W}}\bigr{|}T_{1}=t_{1}$ is a self-similar RAM. ∎

3.2. Proof of Theorem 2.7: GEM to MCcGEM

Let ${\bf P}=\langle P_{i}:i\geq 1\rangle$ be a GEM $(\theta)$ sequence, with respect to corresponding iid Beta $(1,\theta)$ proportions ${\bf X}=\{X_{j}\}_{j\geq 1}$ . Also, let ${\bf T}$ be an independent Markov chain on $\mathscr{X}$ , starting from distribution $\mu$ , with homogeneous kernel $Q$ .

In Part (2) of Theorem 2.4, we showed that the associated sequence ${\bf Y}$ is a Markov chain with transition kernel $K$ on ${\mathscr{X}}$ such that

[TABLE]

By inspection, the kernel $K=K_{G}$ , in the definition of the MCcGEM distribution (2.7), where $G=\theta(Q-I)$ .

Recall now the switch times ${\bf V}$ with respect to the chain ${\bf T}$ (cf. (2.3)). In Part (4) of Theorem 2.4, as $P$ is a self-similar RAM, we proved that ${\bf P^{V}}$ , conditional on ${\bf Y}$ , is a RAM. In particular, we showed that the associated fractions ${\bf X^{V}}=\{X^{V}_{j}\}_{j\geq 1}$ , given ${\bf Y}$ , are independent variables. Hence, to identify the joint distribution of $({\bf P^{V}},{\bf Y})$ , we need only find the conditional distribution of each fraction $X^{V}_{j}\bigr{|}{\bf Y}$ , for $j\geq 1$ .

To this end, let ${\bf y}$ be a possible sequence for ${\bf Y}$ . Associate $J=\sup\{j:y_{j}\neq y_{j-1}\}$ to ${\bf y}$ as before. Recall from (3.12) that $X^{V}_{j}=1-\prod_{k=V_{j}}^{V_{j+1}-1}(1-X_{k})$ for $j\leq J$ . Write, for $j<\min\{J,n\}$ and $m\geq 1$ ,

[TABLE]

Note now, if $Z$ is a Beta $(1,\alpha)$ random variable, then E $[(1-Z)^{m}]=\frac{\alpha}{\alpha+m}$ . Then, by the independence of ${\bf X}$ and ${\bf T}$ , noting from (3.1.4) that ${\mathscr{P}}(V_{j+1}-V_{j}=\ell|Y_{i}=y_{i}:1\leq i\leq n)=Q_{y_{i},y_{i}}^{\ell-1}(1-Q_{y_{i},y_{i}})$ , the above display equals

[TABLE]

Thus, we see that $X^{V}_{j}\biggr{|}{\bf Y}={\bf y}$ is a Beta $(1,\theta(1-Q_{y_{j},y_{j}}))$ random variable when $j<J$ .

When $J\leq j<\infty$ , recall that $y_{j}$ is an absorbing state, and so $Q_{y_{j},y_{j}}=1$ and $X_{j}^{V}=1$ . Thus $X_{j}^{V}|{\bf Y=y}\sim$ Beta $(1,0)=$ Beta $(1,\theta(1-Q_{y_{j},y_{j}}))$ .

Then, for all $j\geq 1$ , we see that $X^{V}_{j}\biggr{|}{\bf Y}={\bf y}$ is a Beta $(1,\theta(1-Q_{y_{j},y_{j}}))$ random variable. Hence ${\bf P^{V}}\bigr{|}{\bf Y}={\bf y}$ is a disordered GEM with parameters $\theta(1-Q_{y_{j},y_{j}})=-G_{y_{j},y_{j}}$ for $j\geq 1$ . Therefore, we conclude that $({\bf P^{V}},{\bf Y})$ has a MCcGEM $(\theta(Q-I))$ distribution with respect to $\mu$ . ∎

3.3. Proof of Theorem 2.10: Time inhomogeneous MC to MCcGEM

We first specify certain asymptotics which will be helpful, before going to the main body of the proof in Subsection 3.3.1.

Lemma 3.3.

For $\gamma>0$ and integers $1\leq m\leq n$ , let

[TABLE]

Then, for $0<a<b$ and integers $M\geq 0$ , we have

[TABLE]

Proof.

Write

[TABLE]

By Stirling’s approximation, for $u,v\in\mathbb{R}$ , we have $\frac{\Gamma(n+u)}{\Gamma(n+v)}n^{v-u}\rightarrow 1$ as $n\rightarrow\infty$ , from which the desired asymptotics follow immediately. ∎

Proposition 3.4.

Let $r\geq 1$ be an integer. Let also $\{a_{i}\}_{j=1}^{r}$ , $\{b_{i}\}_{i=1}^{r}$ , and $\{\gamma_{i}\}_{i=1}^{r}$ be collections of positive numbers such that $a_{j}<b_{j}$ for $1\leq j\leq r$ . Then,

[TABLE]

Proof.

The argument follows by inputting the asymptotics in Lemma 3.3. We show only the case $r=1$ , as the extension to $r>1$ is straightforward.

Again, by Stirling’s approximation, for each $u,v\in\mathbb{R}$ , $\lim_{n\rightarrow\infty}\frac{\Gamma(n+u)}{\Gamma(n+v)}n^{v-u}=1$ . Then, for $\epsilon>0$ and all large $n$ , we have

[TABLE]

Hence, for $\epsilon,a,b,\gamma>0$ with $a<b$ , and sufficiently large $n$ , we estimate

[TABLE]

Now, by the monotonicity of $s^{\gamma-1}$ , we have for $n>2/a$ that $\sum_{s=\lfloor an\rfloor}^{\lfloor bn\rfloor-1}s^{\gamma-1}$ is between the integrals $\int_{\lfloor an\rfloor-1}^{\lfloor bn\rfloor-1}s^{\gamma-1}ds$ and $\int_{\lfloor an\rfloor}^{\lfloor bn\rfloor}s^{\gamma-1}ds$ . We may compute

[TABLE]

Then, inserting into (3.3), the proposition follows for $r=1$ . ∎

We now show a form of ‘weak ergodicity’ for the Markov chain ${\bf T}$ .

Lemma 3.5.

For a generator matrix $G$ , let $\theta>0$ , and $M\geq 1$ be an integer, such that $Q:=I+G/\theta$ and $I+G/M$ are non-negative kernels on $\mathscr{X}$ . Recall that $K_{n}=I+\frac{G}{n}{\mathbbm{1}}(n>M)$ for $n\geq 1$ (cf. (2.10)). Let $\pi$ be a stochastic vector and $\mu$ be a stationary distribution for $Q$ such that $\pi^{t}Q^{n}\rightarrow\mu^{t}$ entry-wise. Then, as $n\rightarrow\infty$ , both (a) $\mu^{n}:=\pi^{t}\prod_{i=1}^{n}K_{i}\rightarrow\mu^{t}$ , and (b) $\big{(}\mu^{n}\big{)}^{t}Q\rightarrow\mu^{t}$ , hold entry-wise.

Proof.

We separate into four steps.

Step 1. Fix an integer $m\geq\max(M,\theta)$ and write the stochastic matrix,

[TABLE]

as a polynomial in $Q$ with positive coefficients.

Step 2. We now show that any fixed degree coefficient of the polynomial vanishes as $n\rightarrow\infty$ . For each $i$ , denote the $n$ th coefficient of $Q^{i}$ by $[Q^{i}]_{n}$ . By Lemma 3.3, $[Q^{0}]_{n}=f_{m}^{n}(\theta)\rightarrow 0$ as $n\rightarrow\infty$ . Also, as $f^{n}_{m}(\theta)\sim n^{-\theta}$ by Lemma 3.3, we have for $i\geq 1$ that

[TABLE]

Step 3. For each $x\in{\mathscr{X}}$ , let $e_{x}$ denote the vector in $\mathbb{R}^{\mathscr{X}}$ with a $1$ in the entry corresponding to state $x$ and [math]’s elsewhere. Since $Q$ is a stochastic kernel, observe for each $x\in{\mathscr{X}}$ and $n\geq m$ that

[TABLE]

Also, as $\mu$ is a stationary eigenvector of $Q$ , note that $\mu$ is also a stationary eigenvector of $\{K_{n}\}_{n\geq 1}$ . Recall that $\left(\pi-\mu\right)^{t}Q^{n}\rightarrow 0$ as $n\rightarrow\infty$ entry-wise, and $\mu^{m}=\pi^{t}\prod_{i=1}^{m}K_{i}$ . Hence, $\left(\mu^{m}-\mu\right)^{t}Q^{n}\rightarrow 0$ as $\prod_{i=1}^{m}K_{i}$ is a polynomial in $Q$ .

With these observations, for each $x\in{\mathscr{X}}$ and positive integers $n$ and $R<n-m$ , we may bound

[TABLE]

The last display converges by the calculation in Step 2 to $\max_{r>R}\big{|}\left(\mu^{m}-\mu\right)^{T}Q^{r}e_{l}\big{|}$ , as $n\rightarrow\infty$ , and in turn vanishes as $R\rightarrow\infty$ . Hence, the first limit follows.

Step 4. Finally, by Fatou’s lemma, the proved first limit (a), and that $\mu$ is a stationary vector of $Q$ , we have for each $j\in{\mathscr{X}}$ that

[TABLE]

Now, suppose for a particular $k\in{\mathscr{X}}$ that $\limsup_{n\rightarrow\infty}\big{(}\mu^{n}\big{)}^{t}Qe_{k}=L>\mu_{k}$ . Then, as $\big{(}\mu^{n}\big{)}^{t}Q$ is a stochastic vector, we would have for each $n\geq 1$ that

[TABLE]

But, as $\mu$ is a stochastic vector and noting (3.17), we have by Fatou’s lemma again that the last display is larger than $L+\sum_{l\neq k}\mu_{l}>1$ , a contradiction, and the second limit (b) holds. ∎

3.3.1. Completion of the proof of Theorem 2.10

We will argue in a few steps.

Step 1. Recall the definition of kernel $G^{\prime}$ (cf. (2.11)). We now argue that $G^{\prime}$ is a generator matrix: As $\mu$ is a stationary vector of $Q$ and $G=\theta(Q-I)$ , we have $\mu^{t}G=0$ is the zero vector. Since $G$ is a generator matrix, we have $G^{\prime}_{i,j}=(\mu_{j}/\mu_{i})G_{j,i}1(\mu_{i}\neq 0)\geq 0$ for $i\neq j$ , and $\sum_{j}G^{\prime}_{i,j}=\frac{1(\mu_{i}\neq 0)}{\mu_{i}}\sum_{j}\mu_{j}G_{j,i}=0$ . Moreover,

[TABLE]

Step 2. Recall the Markov chain ${\bf T}$ , with transition kernels $\{K_{n}=I+\frac{G}{n}1(n>M)\}_{n\geq 1}$ (cf. (2.10)), starting from $\pi$ . Recall the associated variable $N_{n}$ and sequence ${\bf P}_{n}$ .

Now, for $i\geq N_{n}>j\geq 1$ define

[TABLE]

The variables ${\bf X}_{n}=\{X_{n,i}\}_{i\geq 1}$ are the associated fractions to the distribution ${\bf P}_{n}$ on ${\mathbb{N}}$ and, by Proposition 3.2, for $j\geq 1$ ,

[TABLE]

For $j\geq 0$ , also define

[TABLE]

In terms of the switching times ${\bf V}$ , and the first time $N_{n}$ that the chain ${\bf T}$ switches after time $n$ , we have $S_{0}=n$ , $S_{j}=V_{N_{n}-j}-1$ for $1\leq j\leq N_{n}-1$ , and $S_{j}=0$ for $j\geq N_{n}$ . Recall also that $\tau_{n,j}=nP_{n,j}$ for $j\geq 1$ . In words, $\{S_{j}\}$ are the times before time $n$ at which the chain switches states when considered in reverse order, and $\{\tau_{n,j}\}$ are the lengths of the associated sojourns in the figure below.

……1 $S_{3}$$S_{2}$$S_{1}$$S_{0}=n$$\tau_{n,1}$$\tau_{n,2}$$\tau_{n,3}$

Step 3. Recall the sequence ${\bf Y}_{n}$ given in (2.8), where $Y_{n,j}=T_{V_{N_{n}-j}}$ for $1\leq j\leq N_{n}-1$ and $Y_{n,i}=T_{1}$ for $i\geq N_{n}$ . We now aim to compute the finite dimensional distributions of $({\bf P}_{n},{\bf Y}_{n})$ or equivalently of $({\bf X}_{n},{\bf Y}_{n})$ . To this end, fix the integer $r\geq 1$ , and consider numbers $\{\beta_{j}\}_{j=1}^{r}\in(0,1)^{r}$ such that $s_{j}:=n\prod_{i=1}^{j}(1-\beta_{i})\in\mathbb{N}$ , for $1\leq j\leq r$ , are all integers. Set also $s_{0}=n$ and recall $S_{0}=n$ .

Note from (3.19) and (3.20) that

[TABLE]

Then, with respect to a possible sequence ${\bf y}$ , we have

[TABLE]

Note the computation for $M\leq l<n$ and $z\neq y$ ,

[TABLE]

Recall also that $\mu^{s}_{y}=P\left(T_{s}=y\right)$ . Since $G=\theta(Q-I)$ , we observe

[TABLE]

Then, (3.21) equals

[TABLE]

Step 4. We now sum the display (3.22) over all appropriate values of $\{s_{j}\}_{j=1}^{r}$ such that $0<X_{n,j}\leq\beta_{j}$ for $1\leq j\leq r<N_{n}$ , where we recall $N_{n}$ is the time the chain switches after time $n$ . Then, we have from (3.20) that

[TABLE]

Moreover, also from (3.20), we have $s_{r}\geq n\prod_{j=1}^{r}(1-\beta_{j})$ diverges to infinity as $n\rightarrow\infty$ .

Recall $s_{0}=n$ and $\lim_{n\rightarrow\infty}N_{n}=\infty$ a.s. Then, with equation (3.23) in hand,

[TABLE]

Step 5. From (3.20), the sum index $s_{r}\geq n\prod_{j=1}^{r}(1-\beta_{j})$ diverges to infinity as $n\rightarrow\infty$ . Also, by Lemma 3.5, we have $\lim_{s\rightarrow\infty}\mu^{s}_{y}=\mu_{y}$ and $\lim_{s\rightarrow\infty}\big{(}\mu^{s}\big{)}^{t}Qe_{y}=\mu_{y}$ for each $y\in{\mathscr{X}}$ . Therefore, as $n\rightarrow\infty$ , we have

[TABLE]

Note that $-G_{i,i}>0$ for each $i\in{\mathscr{X}}$ since by assumption $G$ has no zero rows. Thus, by Proposition 3.4, we have

[TABLE]

Hence, if $\mu_{y_{k}}=0$ for some $1\leq k\leq r$ , by bounding say $P(0<X_{n,j}\leq\beta_{j},Y_{n,j}=y_{j}:1\leq j\leq r)\leq P(0<X_{n,j}\leq\beta_{j},Y_{n,j}=y_{j}:1\leq j\leq k)$ , the limit (3.24) vanishes. Now, suppose that $\{y_{j}\}_{j=1}^{r}$ is such that $\mu_{y_{j}}>0$ for each $1\leq j\leq r$ . We may write the limit (3.24) as

[TABLE]

decomposed as a product of (a) the transition probability of the chain ${\bf Y}^{\prime}$ , with kernel $K_{G^{\prime}}$ (cf. (2.7)) and initial distribution $\mu$ , running through states $\{y_{j}\}_{j=1}^{r}$ , and of (b) the distribution functions of independent Beta $(1,-G^{\prime}_{y_{j},y_{j}})$ random variables for $1\leq j\leq r$ . Hence, the finite dimensional distributional convergence of $({\bf P}_{n},{\bf Y}_{n})$ as $n\rightarrow\infty$ is established. ∎

3.4. Proof of Theorem 2.12: Occupation laws to MCcGEM and stick-breaking measures

Consider the pairs $\{({\bf P}_{n},{\bf Y}_{n})\}_{n\geq 1}$ , $({\bf P}^{\prime},{\bf Y}^{\prime})$ and $({\bf P}^{+},{\bf T}^{\prime})$ in the setting of Theorems 2.10 and 2.12. These objects belong to $[0,1]^{\mathbb{N}}\times\mathscr{X}^{\mathbb{N}}$ . We now discuss the topology on this space and its relatives, before going to the proof of (2.12) in Subsection 3.4.2.

3.4.1. Topology

We endow the space $[0,1]^{\mathbb{N}}$ with a standard product metric $\rho^{1}$ and $\sigma$ -field, generated in terms of this metric, which yields the usual product $\sigma$ -field built from the Borel $\sigma$ -fields on copies of $[0,1]$ : For $p,p^{\prime}\in[0,1]^{\mathbb{N}}$ ,

[TABLE]

Consider now the metric $\rho$ on $[0,1]^{\mathbb{N}}\times\mathscr{X}^{\mathbb{N}}$ defined as follows: For $(p,y),(p^{\prime},y^{\prime})\in[0,1]^{\mathbb{N}}\times\mathscr{X}^{\mathbb{N}}$ ,

[TABLE]

The corresponding $\sigma$ -field on $[0,1]^{\mathbb{N}}\times\mathscr{X}^{\mathbb{N}}$ , generated by $\rho$ , is the usual product $\sigma$ -field formed from the Borel $\sigma$ -fields on copies of $[0,1]$ and $\mathscr{X}$ . Importantly, weak convergence of probability measures on $[0,1]^{\mathbb{N}}\times{\mathscr{X}}^{\mathbb{N}}$ translates to finite dimensional convergence of these laws. Moreover, $([0,1]^{\mathbb{N}}\times{\mathscr{X}}^{\mathbb{N}},\rho)$ is a complete, separable metric space.

Recall that $\Delta_{\infty}$ is the collection of all probabilities on ${\mathbb{N}}$ :

[TABLE]

Since

[TABLE]

$\Delta_{\infty}\times\mathscr{X}^{\mathbb{N}}$ is a measurable set in $[0,1]^{\mathbb{N}}\times\mathscr{X}^{\mathbb{N}}$ . We may endow $\Delta_{\infty}\times\mathscr{X}^{\mathbb{N}}$ with the restriction of the metric $\rho$ and the $\sigma$ -field generated from the associated metric topology.

For a fixed point $(p^{\prime},y^{\prime})\in\Delta_{\infty}\times\mathscr{X}^{\mathbb{N}}$ , the projection map $f:[0,1]^{\mathbb{N}}\times\mathscr{X}^{\mathbb{N}}\rightarrow\Delta_{\infty}\times\mathscr{X}^{\mathbb{N}}$ , given by

[TABLE]

is measurable, and also continuous on the subset $\Delta_{\infty}\times\mathscr{X}^{\mathbb{N}}$ .

Now, denote the collection of probabilities on $\mathscr{X}$ ,

[TABLE]

and endow it with the metric $\rho^{2}(p,p^{\prime})=\sum_{n\geq 1}2^{-n}|p_{n}-p^{\prime}_{n}|$ , and the associated Borel $\sigma$ -field. Define $g:\Delta_{\infty}\times{\mathscr{X}}^{\mathbb{N}}\rightarrow\Delta_{\mathscr{X}}$ by

[TABLE]

Then, $g$ is a continuous and therefore measurable function on $\Delta_{\infty}\times{\mathscr{X}}^{\mathbb{N}}$ : Indeed, if $\{(p^{n},y^{n})\}_{n\geq 1}$ and $(p,y)$ belong to $\Delta_{\mathscr{X}}\times{\mathscr{X}}^{\mathbb{N}}$ , and the finite dimensional convergence $(p^{n},y^{n})\rightarrow(p,y)$ holds, for each $l\in{\mathscr{X}}$ , we have $\sum_{j\geq A}p^{n}_{j}\mathbbm{1}_{l}(y^{n}_{j})\leq\sum_{j\geq A}p^{n}_{j}=1-\sum_{j<A}p^{n}_{j}\stackrel{{\scriptstyle n\rightarrow\infty}}{{\longrightarrow}}1-\sum_{j<A}p_{j}$ . The claim now follows since (1) $\sum_{j<A}p^{n}_{j}\mathbbm{1}_{l}(y^{n}_{j})\stackrel{{\scriptstyle n\rightarrow\infty}}{{\longrightarrow}}\sum_{j<A}p_{j}\mathbbm{1}_{l}(y_{j})\stackrel{{\scriptstyle A\rightarrow\infty}}{{\longrightarrow}}g((p,y))$ , and (2) $\sum_{j\geq A}p_{j}\stackrel{{\scriptstyle A\rightarrow\infty}}{{\longrightarrow}}1$ .

3.4.2. Proof of (2.12)

First, we verify that the pairs $\{({\bf P}_{n},{\bf Y}_{n})\}_{n\geq 1}$ , $({\bf P}^{\prime},{\bf Y}^{\prime})$ and $({\bf P}^{+},{\bf T}^{\prime})$ belong almost surely to $\Delta_{\infty}\times{\mathscr{X}}^{N}$ . Clearly, $\{({\bf P}_{n},{\bf Y}_{n})\}_{n\geq 1}$ surely lives in $\Delta_{\infty}\times{\mathscr{X}}^{\mathbb{N}}$ by construction. Also, $({\bf P}^{\prime},{\bf Y}^{\prime})$ and $({\bf P}^{+},{\bf T}^{\prime})$ lie almost surely in $\Delta_{\infty}\times{\mathscr{X}}^{\mathbb{N}}$ since, by Theorem 2.10 and the assumptions of Theorem 2.12, we have that ${\bf P}^{\prime}$ and ${\bf P}^{+}$ are RAMs, and so $\sum_{j=1}^{\infty}P^{\prime}_{j}\stackrel{{\scriptstyle d}}{{=}}\sum_{j=1}^{\infty}\hat{P}^{+}_{j}\stackrel{{\scriptstyle a.s.}}{{=}}1$ .

Now, from the finite dimensional or in other words weak convergence of $({\bf P}_{n},{\bf Y}_{n})$ to $({\bf P}^{\prime},{\bf Y}^{\prime})$ in Theorem 2.10, we have $\nu_{n}=g\big{(}({\bf P}_{n},{\bf Y}_{n})\big{)}=g\circ f\big{(}({\bf P}_{n},{\bf Y}_{n})\big{)}$ converges weakly to $\nu=g\circ f\big{(}({\bf P}^{\prime},{\bf Y}^{\prime})\big{)}$ by the continuous mapping theorem, and so the left equality in (2.12) holds.

On the other hand, with respect to $({\bf P}^{+},{\bf T}^{\prime})$ , define ${\bf P^{+,V}}$ and ${\bf Y^{+}}$ as in the setting of Theorem 2.7. Recall that ${\bf T}^{\prime}$ is a Markov chain with kernel $Q^{\prime}=I+G^{\prime}/\theta$ and initial stationary distribution $\mu$ . Then, by Theorem 2.7, noting that $G^{\prime}=\theta(Q^{\prime}-I)$ , we have that $({\bf P^{+,V}},{\bf Y^{+}})$ has a MCcGEM $(G^{\prime})$ distribution. Hence, $({\bf P^{+,V}},{\bf Y^{+}})\stackrel{{\scriptstyle d}}{{=}}({\bf P}^{\prime},{\bf Y}^{\prime})$ . Since almost surely, by ‘unclumping’,

[TABLE]

we have $g\circ f\big{(}({\bf P}^{\prime},{\bf Y}^{\prime})\big{)}\stackrel{{\scriptstyle d}}{{=}}g\circ f\big{(}({\bf P}^{+},{\bf T}^{\prime})\big{)}$ , and the right equality of (2.12) holds. ∎

3.5. Proof of Theorem 2.13: Stick-breaking measures to Occupation laws

The claim follows from Theorem 2.12 once we verify that a homogeneous Markov chain with kernel $\tilde{Q}$ and a homogeneous Markov chain with kernel $(\tilde{Q}^{\prime})^{\prime}=I+(\tilde{G}^{\prime})^{\prime}/\theta$ , each with initial distribution $\mu$ , are equivalent in distribution.

To this end, for any generator matrix $\tilde{G}=\theta(\tilde{Q}-I)$ and associated stationary distribution $\mu$ , we observe that $(\tilde{G}^{\prime})^{\prime}_{ij}=\tilde{G}_{ij}$ when $\mu_{i}$ and $\mu_{j}$ are both positive:

[TABLE]

Since $\tilde{Q}=I+\tilde{G}/\theta$ and $(\tilde{Q}^{\prime})^{\prime}=I+(\tilde{G}^{\prime})^{\prime}/\theta$ , we conclude that $\tilde{Q}_{ij}=(\tilde{Q}^{\prime})^{\prime}_{ij}$ when $\mu_{i}$ and $\mu_{j}$ are both positive.

Finally, as $\mu$ is a stationary distribution, $\mu$ is only positive on positive recurrent states and for each recurrence class of $\tilde{Q}$ , $\mu$ either assigns [math] weight to each state in that class or strictly positive weights to each state in that class. Hence, homogeneous Markov chains with kernels $\tilde{Q}$ and $(\tilde{Q}^{\prime})^{\prime}$ , starting from $\mu$ , are equal in distribution. ∎

3.6. Proof of Theorem 2.17: Type of self-similarity

We first give a proof of Lemma 2.16, before going to the main argument in Subsection 3.6.2

3.6.1. Proof of Lemma 2.16

Let $\{(\eta_{j},X_{j})\}_{j\geq 1}$ be i.i.d. copies of $(\eta,X)$ , independent of $(\eta,X)$ , all on a common probability space.

Existence: Let $\chi(\cdot)=\sum_{j=1}^{\infty}\eta_{j}(\cdot)X_{j}\prod_{i=1}^{j-1}(1-X_{i})$ . Since ${\mathscr{P}}(X=0)<1$ , we have $\prod_{j\geq 1}(1-X_{j})=0$ a.s., and so $\big{\langle}X_{j}\prod_{i=1}^{j-1}(1-X_{i}):{j\geq 1}\big{\rangle}$ is a RAM. Hence, $\chi$ is a random probability measure on ${\mathscr{A}}$ as $\chi({\mathscr{A}})=\sum_{j=1}^{\infty}X_{j}\prod_{i=1}^{j-1}(1-X_{i})\stackrel{{\scriptstyle a.s.}}{{=}}1$ . Moreover, (2.14) holds straightfowardly:

[TABLE]

where $\tilde{\chi}$ has the same law as $\chi$ and is independent of $(X_{1},\eta_{1})$ .

Uniqueness: Suppose $\chi^{a}$ and $\chi^{b}$ both satisfy the self-similarity equation (2.14). On a probability space, where $\{(\eta_{j},X_{j})\}_{j\geq 1}$ , $\chi^{a}$ and $\chi^{b}$ are independent, define a sequence of measures: $\chi^{a}_{1}=\chi^{a}$ , $\chi^{b}_{1}=\chi^{b}$ and, for $j\geq 1$ ,

[TABLE]

By construction, $\{\chi^{a}_{j}\}_{j\geq 1}$ and $\{\chi^{b}_{j}\}_{j\geq 1}$ are two sequences of identically distributed random measures distributed as $\chi^{a}$ and $\chi^{b}$ respectively.

We note again that $\prod_{j\geq 1}(1-X_{j})=0$ a.s. as ${\mathscr{P}}(X=0)<1$ . Then, in terms of the variational norm $\|\cdot\|$ ,

[TABLE]

which vanishes a.s. as $j\rightarrow\infty$ . Hence, $\chi^{a}\stackrel{{\scriptstyle d}}{{=}}\chi^{b}$ . ∎

3.6.2. Completion of the proof of Theorem 2.17

Recall our conventions at the beginning of Section 2 and that ${\bf X}=\{X_{j}\}_{j\geq 1}$ is a collection of iid variables, and ${\bf T}$ is the homogeneous Markov chain with kernel $Q$ and initial distribution $\mu$ supported on recurrent states. Let ${\bf P}=\langle P_{j}:j\geq 1\rangle$ be the RAM constructed from ${\bf X}$ . For each recurrent state $i$ of $Q$ , let ${\bf T}^{i}={\bf T}\bigr{|}T_{1}=i$ be the Markov chain with transition kernel $Q$ and initial value $T^{i}_{1}=i$ . Recall the a.s. finite time $W^{i}=\inf\{l>1:T_{l}^{i}=i\}$ , and variable

[TABLE]

Recall also $\eta^{i}=\left(X^{i}\right)^{-1}\sum_{l=1}^{W^{i}-1}\left[X_{l}\prod_{n=1}^{l-1}(1-X_{n})\right]\delta_{T_{l}^{i}}$ .

We now rewrite the measure $\nu^{i}=\nu\bigr{|}T_{1}=i$ as follows:

[TABLE]

Then, by (3.25) and Proposition 3.2 for $j\geq 1$ we have

[TABLE]

Hence, as ${\bf X}$ is composed of iid variables, independent of ${\bf T}^{i}$ and therefore $W^{i}$ , we see that

[TABLE]

Clearly, as the chain starts over again at location $i$ , $\{T^{i}_{l}\}_{l\geq W^{i}}\stackrel{{\scriptstyle d}}{{=}}{\bf T}^{i}$ .

Moreover, by conditioning on the value of $W^{i}$ and noting that ${\bf X}$ and ${\bf T}^{i}$ are independent, the sequences $\big{\langle}\frac{P_{j-1+W^{i}}}{1-X^{i}}:j\geq 1\big{\rangle}$ and $\{T^{i}_{l}\}_{l\geq W^{i}}$ are independent. Similarly, we see that the sum $\sum_{l\geq W^{i}}\frac{P_{l}}{1-X^{i}}\delta_{T^{i}_{l}}$ , which depends only on variables $\{X_{k}\}_{k\geq W^{i}}$ and $\{T^{i}_{k}\}_{k\geq W^{i}}$ indexed beyond the first cycle, is independent of the pair $(X^{i},\eta^{i})$ . In particular, the sum $\tilde{\nu}^{i}:=\sum_{l\geq W^{i}}\frac{P_{l}}{1-X^{i}}\delta_{T^{i}_{l}}\stackrel{{\scriptstyle d}}{{=}}\nu^{i}$ .

Hence, from these observations, (3.6.2) represents the sought after self-similarity equation (2.15).

Finally, a distribution $\nu^{i}$ satisfying (2.15) is unique by Lemma 2.16 since $X^{i}_{1}\in(0,1]$ a.s. Also, by assumption, $T_{1}\sim\mu$ where $\mu$ is supported only on recurrent states. Therefore, as $T_{1}$ necessarily is a recurrent state, the distribution of the pair $(\nu,T_{1})$ is also unique. ∎

3.7. Proof of Theorems 2.18, 2.19 and Corollary 2.20: Recasting moments I, II, and marginals

We prove these results in succession.

3.7.1. Proof of Theorem 2.18

First, since $G$ is a $k\times k$ generator matrix with bounded entries and for large enough $j\in\mathbb{N}$

[TABLE]

we verify that $\widetilde{K}_{j}=K_{j}+O(j^{-2})$

Next, to show (2.17), we relate the occupation law of the Markov chain $\mathbf{Z}$ , with transition kernels $\{\widetilde{K}_{n}\}$ , to the occupation law $\nu$ of the Markov chain $\mathbf{T}$ , with kernels $\{K_{n}\}$ , through a Borel-Cantelli argument. In passing, we note this could be also accomplished via an analytic argument.

Define $A_{j}:=\tilde{K}_{j}-K_{j}$ , for $j\geq 1$ , and note $A_{j}=O(j^{-2})$ has constant row sums of [math]. Since $G$ does not have [math] entries and $K_{j}=I+\frac{G}{j}{\bf 1}(j>M)$ , there exists an $a$ such that $R_{j}:=K_{j}+\frac{j^{2}}{a}A_{j}$ is a non-negative matrix, and hence stochastic. Note

[TABLE]

Consider now an auxilliary sequence of independent Bernoulli $(aj^{-2})$ variables ${\bf B}=\{B_{j}\}_{j\geq 1}$ by possibly enlargening the probability space. Define a process ${\bf Z^{\prime}}\bigr{|}{\bf B}$ with $Z^{\prime}_{1}\bigr{|}{\bf B}\sim\mu$ and

[TABLE]

Then, noting (3.27), marginally, ${\bf Z}^{\prime}$ is a Markov chain with initial distribution $\mu$ and transition kernel

[TABLE]

Now, by Borel Cantelli lemma, ${\mathscr{P}}(B_{j}=1\text{ i.o.})=0$ and so $L:=\max\{j:B_{j}=1\}<\infty$ a.s. Conditional on the event that $\{L=r\}$ , the chain $\{Z^{\prime}_{j}\}_{j>r}$ is a Markov chain with transition kernels $\{K_{j}\}_{j>r}$ . Also, since $G$ is irreducible in the setting of [11], the initial distribution does not matter in the calculation of the occupation law $\nu$ (cf. Remark 3 in Subsection 2.2.1). Hence, the occupation law with respect to ${\bf Z}$ is also $\nu$ and (2.17) holds: Indeed, for $l\in{\mathscr{X}}$ and interval $A=(a,b)$ for $0<a<b<1$ , we have

[TABLE]

where $o(1)_{R}$ is an expression which vanishes uniformly in $n$ as $R\rightarrow\infty$ .

Finally, (2.18) follows straightforwardly by gathering together terms. ∎

3.7.2. Proof of Theorem 2.19

We break the argument into steps.

Step 1. First, we show that $p_{j}(\lambda)$ , $q(j)$ , and their quotients are all well-defined. A generator matrix $G$ can always be written as $G=\theta(Q-I)$ for some $\theta>0$ and a stochastic matrix $Q$ . The eigenvalues $\lambda$ of $Q$ correspond with the eigenvalues $\theta(\lambda-1)$ of $G$ . Additionally, since $G$ has no zero entries, $Q$ is irreducible. Therefore, the algebraic multiplicity of the eigenvalue [math] of $G$ is $1$ . Thus, with respect to the minimal polynomial of $G$ , $p_{min}(\lambda)$ , there exists a polynomial $q$ such that $p_{min}(\lambda)=\lambda q(\lambda)$ and $q(0)\neq 0$ .

Define

[TABLE]

Since the eigenvalues of the stochastic matrix $I+G/\theta^{G}$ are bounded by $1$ , the (complex) eigenvalues $\tilde{\lambda}$ of $G$ satisfy $\left|1+\tilde{\lambda}/\theta^{G}\right|\leq 1$ . Hence, the eigenvalues of $G$ have non-positive real part. Since $p_{min}(\lambda)=\lambda q(\lambda)$ and $q(0)\neq 0$ , we obtain that $j\in{\mathbb{N}}$ is not an eigenvalue of $G$ and so $q(j)\neq 0$ for $j\geq 0$ . Thus, $p_{j}(\lambda)/q(j)$ is well-defined for $j\geq 0$ .

Step 2. We now verify for $j>0$ that

[TABLE]

Write

[TABLE]

In particular, as $Gq(G)=p_{min}(G)=0$ , we have

[TABLE]

from which the desired identity follows.

Step 3: We now show that $p_{0}(G)/q(0)$ is the constant matrix with rows $\mu$ . Note that $p_{0}(\lambda)/q(0)=q(\lambda)/q(0)$ is well-defined in (2.19). Since row sums of $G^{k}$ vanish for $k\geq 1$ , we see that $p_{j}(G)/q(j)$ has constant row sums of $p_{j}(0)/q(j)=1$ . Now, necessarily, $p_{0}(G)G=0$ as $q(\lambda)\lambda=p_{min}(\lambda)$ is the minimal polynomial of $G$ . Since $G$ is irreducible, we can conclude that $p_{0}(G)$ is a matrix with rows given by multiples of the unique stochastic eigenvector $\mu$ associated to $G$ and eigenvalue [math]. However, since $p_{0}(G)/q(0)$ has row sums equal to $1$ , the claim follows.

Moreover, noting that $[p_{0}(G)/q(0)]_{i,j}=\mu_{j}$ for any $i,j\in{\mathscr{X}}$ , the moment identity (2.20) is now a direct consequence of these calculations. ∎

3.7.3. Proof of Corollary 2.20

Recall $q(\lambda)=\sum_{i=0}^{n}a_{i}\lambda^{i}$ is a degree $n\leq k-1$ polynomial where $a_{n}\neq 0$ . Then, noting (2.19), we see that $p_{j}(\lambda)$ is also degree $n$ polynomial in $j$ with $\lambda$ -free leading coefficient $a_{n}$ . In particular, $[p_{j}(G)]_{i,i}$ is a degree $n$ polynomial in $j$ with leading coefficient $a_{n}I_{i,i}=a_{n}$ for each $i\in\mathscr{X}$ .

Now, fix $i$ , and denote by $\{\gamma_{i,l}\}_{l=1}^{n}$ and $\{\lambda_{l}\}_{l=1}^{n}$ the roots of $[p_{j}(G)]_{ii}$ and $q(j)$ respectively when considered as functions of $j$ . In the formula (2.20), to calculate ${\mathscr{E}}[\nu_{i}^{N}]$ , there is only one list in ${\mathbb{S}}(N)$ , namely one composed of $N$ $1$ ’s. Then,

[TABLE]

as desired. ∎

Acknowledgement. We thank J. Sethuraman for enjoyable conversations on Dirichlet processes. Part of this research was supported by ARO W911NF-14-1-0179, and a Simons Foundation Sabbatical grant.

Bibliography46

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Arratia, R., Barbour, A. D., Tavaré, S. (1999) The Poisson-Dirichlet distribution and the scale-invariant Poisson process. structures. Combin. Probab. Comput. 8 407–416.
2[2] Arratia, R., Barbour, A. D., Tavaré, S. (2003) Logarithmic Combinatorial Structures: A Probabilistic Approach. European Mathematical Society, Zürich.
3[3] Berman, A., Plemmons, R.J. (1979) Nonnegative Matrices in the Mathematical Sciences. Academic Press, New York.
4[4] Blackwell, D., Mac Queen, J.B. (1973) Ferguson distributions via Polya urn schemes. Ann. Stat. 1 353–355.
5[5] Bouguet, F., Cloez, B. (2018) Fluctuations of the empirical measure of freezing Markov chains. Elec. J. Probab. 23 1–31.
6[6] Bovier, A., den Hollander, F (2015) Metastability: a potential-theoretic approach. Grundlehren der mathematischen Wissenschaften 351 , Springer, Berlin.
7[7] Broderick, T., Jordan, M., Pitman, J. (2012) Beta processes, stick-breaking and power laws. Bayesian Anal. 7 439–475.
8[8] Crane, H. (2016) The ubiquitous Ewens sampling formula. Statist. Sci. 31 1–19.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Stick-breaking processes, clumping, and Markov chain occupation laws

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction and summary

1.1. GEM and Dirichlet measures

1.2. Time-inhomogeneous Markov chains

1.3. Clumped structure and generalized ‘stick-breaking’ processes

1.4. Occupation laws of time-inhomogeneous Markov chains

2. Statement of results

2.1. RAMs, GEMs and MCcGEM laws

Definition 2.1** (Residual Allocation Model - RAM).**

Definition 2.2** (GEM).**

Definition 2.3** (Clumped measure).**

Theorem 2.4** (Clumped RAMs).**

Definition 2.5** (Generator kernel).**

Definition 2.6** (MCcGEM distribution).**

Theorem 2.7** (GEM to MCcGEM).**

Example 2.8**.**

Example 2.9**.**

2.2. Clumping and time-inhomogeneous Markov chains

Theorem 2.10** (Time-inhomogenous MC to MCcGEM).**

Example 2.11**.**

Theorem 2.12** (Occupation laws to MCcGEM and stick-breaking measures).**

Theorem 2.13** (Stick-breaking measures to Occupation laws).**

2.2.1. Remarks

2.2.2. Dirichlet process limits

2.3. Self-similarity of the occupation laws

Example 2.14**.**

Definition 2.15** (Self-similar random measure).**

Lemma 2.16**.**

Theorem 2.17** (Type of self-similarity).**

2.4. Moments of the occupation laws

Theorem 2.18** (Recasting moments I).**

Theorem 2.19** (Recasting moments II).**

Corollary 2.20** (Marginals).**

3. Proofs

Lemma 3.1**.**

Proposition 3.2**.**

Proof.

3.1. Proof of Theorem 2.4: Clumped RAMs

3.1.1. Proof of Part (1)

3.1.2. Proof of Part (2)

3.1.3. Proof of Part (3)

3.1.4. Proof of Part (4)

3.2. Proof of Theorem 2.7: GEM to MCcGEM

3.3. Proof of Theorem 2.10: Time inhomogeneous MC to MCcGEM

Lemma 3.3**.**

Proof.

Proposition 3.4**.**

Proof.

Lemma 3.5**.**

Proof.

3.3.1. Completion of the proof of Theorem 2.10

3.4. Proof of Theorem 2.12: Occupation laws to MCcGEM and stick-breaking measures

3.4.1. Topology

3.4.2. Proof of (2.12)

3.5. Proof of Theorem 2.13: Stick-breaking measures to Occupation laws

3.6. Proof of Theorem 2.17: Type of self-similarity

3.6.1. Proof of Lemma 2.16

3.6.2. Completion of the proof of Theorem 2.17

3.7. Proof of Theorems 2.18, 2.19 and Corollary 2.20: Recasting moments I, II, and marginals

3.7.1. Proof of Theorem 2.18

3.7.2. Proof of Theorem 2.19

3.7.3. Proof of Corollary 2.20

Definition 2.1 (Residual Allocation Model - RAM).

Definition 2.2 (GEM).

Definition 2.3 (Clumped measure).

Theorem 2.4 (Clumped RAMs).

Definition 2.5 (Generator kernel).

Definition 2.6 (MCcGEM distribution).

Theorem 2.7 (GEM to MCcGEM).

Example 2.8.

Example 2.9.

Theorem 2.10 (Time-inhomogenous MC to MCcGEM).

Example 2.11.

Theorem 2.12 (Occupation laws to MCcGEM and stick-breaking measures).

Theorem 2.13 (Stick-breaking measures to Occupation laws).

Example 2.14.

Definition 2.15 (Self-similar random measure).

Lemma 2.16.

Theorem 2.17 (Type of self-similarity).

Theorem 2.18 (Recasting moments I).

Theorem 2.19 (Recasting moments II).

Corollary 2.20 (Marginals).

Lemma 3.1.

Proposition 3.2.

Lemma 3.3.

Proposition 3.4.

Lemma 3.5.