Antiduality and M\"obius monotonicity: Generalized Coupon Collector Problem
Pawe{\l} Lorek

TL;DR
This paper introduces a systematic method to find antidual Markov chains related to a generalized coupon collector problem, revealing cutoff phenomena and constructing chains with prescribed stationary distributions.
Contribution
It develops a new approach based on M"obius monotonicity to identify antidual chains and applies this to generalized coupon collector problems, highlighting cutoff behaviors.
Findings
Identified several sharp antidual chains for coupon collector models.
Demonstrated cutoff phenomena with specific window sizes.
Constructed chains with prescribed stationary distributions and mixing times.
Abstract
For a given absorbing Markov chain on a finite state space, a chain is a sharp antidual of if the fastest strong stationary time of is equal, in distribution, to the absorption time of . In this paper we show a systematic way of finding such an antidual based on some partial ordering of the state space. We use a theory of strong stationary duality developed recently for M\"obius monotone Markov chains. We give several sharp antidual chains for Markov chain corresponding to a generalized coupon collector problem. As a consequence - utilizing known results on a limiting distribution of the absorption time - we indicate a separation cutoff (with its window size) in several chains. We also present a chain which (under some conditions) has a prescribed stationary distribution and its fastest strong stationary time is distributed as a prescribed mixture of sums of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Antiduality and Möbius monotonicity: Generalized Coupon Collector Problem
Paweł Lorek
Mathematical Institute, University of Wrocław, pl. Grunwaldzki 2/4, 50-384 Wrocław, Poland. Email: [email protected]
Abstract.
For a given absorbing Markov chain on a finite state space, a chain is a sharp antidual of if the fastest strong stationary time of is equal, in distribution, to the absorption time of . In this paper we show a systematic way of finding such an antidual based on some partial ordering of the state space. We use a theory of strong stationary duality developed recently for Möbius monotone Markov chains. We give several sharp antidual chains for Markov chain corresponding to a generalized coupon collector problem. As a consequence - utilizing known results on a limiting distribution of the absorption time - we indicate a separation cutoff (with its window size) in several chains. We also present a chain which (under some conditions) has a prescribed stationary distribution and its fastest strong stationary time is distributed as a prescribed mixture of sums of geometric random variables.
Key words and phrases:
Markov chains; Strong stationary duality; Antiduality; Absorption times; fastest strong stationary times; Möbius monotonicity; Generalized coupon collector problem; Double Dixie cup problem; separation cutoff; partial ordering; perfect simulation
1991 Mathematics Subject Classification:
60J10, 60G40, 06A06
Work supported by NCN Research Grant DEC-2013/10/E/ST1/00359
1. Introduction
Strong stationary times (SST) are a probabilistic tool for bounding a rate of convergence to stationarity for Markov chains. Aldous and Diaconis [1], [2] gave several examples of chains where SST were found ad hoc. Later in [8] authors introduced a more systematic way of finding SSTs. For a given general ergodic chain they showed that one can construct a so-called strong stationary dual (SSD) chain, a chain whose absorption time is equal to some SST, not only in distribution, via the coupling of the chain with its SSD which is presented in [8]. Moreover, they proved that there always exists sharp SSD, in the sense that its corresponding SST is stochastically the smallest, in which case it is called the fastest strong stationary time (FSST).
Their construction for general chains is purely theoretical (it involves the knowledge of the distribution of the chain at each step). However, they give a detailed recipe on how to construct such SSD assuming that the time reversed chain is stochastically monotone w.r.t. linear ordering. In particular, they consider birth and death chain, for which SST has the same distribution as absorption time in a dual chain, which turns out to be an absorbing birth and death chain. They also show that assuming that time reversed chain is stochastically monotone one can always construct set-valued SSD (see their Section 3.4 “greedy construction of a set-valued dual”). In this paper we actually start with some absorbing chain and show that it is a sharp SSD of a class (which we indicate) of ergodic chains. We exploit the results from [30], where the authors provided the recipe for constructing SSD on the same state space for chains, whose time reversal is Möbius monotone w.r.t to some partial ordering of the state space. This significantly enlarges the class of chains for which SSD can be found. In many chains there is usually some natural underlying ordering of the state space which is only partial. Moreover, the method yields the sharp SSD which is crucial for our applications.
Studying the rate of convergence of a chain to its stationary distribution, one is often interested in a so-called mixing time (i.e., the time until the chain is “close” to its stationary distribution). However, sometimes we can say much more than just a mixing time by showing that a so-called cutoff phenomenon occurs. Roughly speaking, this phenomenon describes a sharp transition in the convergence of the chain to its stationary distribution over a negligible period of time (cutoff window). There are two most commonly studied phenomena: separation cutoff and total variation cutoff, which differ in a distance used to measure the convergence (separation vs. total variation distance).
The total variation cutoff was first shown for a random transposition card shuffling in [12]. The name comes from [1], where the authors showed that a top-to-random card shuffling exhibits a total variation cutoff. A separation cutoff has recently been studied in few contexts. For example: in [11] authors gave if and only if conditions for the existence of a separation cutoff for birth and death chains (they use duality theory to convert convergence rates to hitting times and Keilson’s representation of first hitting times) – they show that there is a cutoff if and only if the product of a spectral gap and a mixing time tends to infinity; this was somehow extended – in [4] authors show that there is a cutoff measured in -norm () if and only if the the spectral gap and max- mixing time tends to infinity; computation of cutoff time and window size in a variety of birth and death chains is given in [5]; a separation cutoff for skip-free chains was given in [32]; some other specific chains were considered in [7]; in [17] author gives a formula for the separation for Tsetlin library chain specifying weights for which there is and there is no separation cutoff. Several examples of both, separation and total variation cutoffs are given in [26], some characterization of total variation cutoff for lazy (i.e., with probability of staying ) chains was recently given in [3]. In [6] authors give sufficient condition for skip-free chains to have real eigenvalues, they use Siegmund duality – actually antiduality – a type of transitions of their (anti)dual resembles some chains we obtain for a coupon collector problem. It is worth mentioning that although a sequence of birth and death chains exhibits total variation cutoff if and only if it exhibits separation cutoff [11], [13], it is not the case (in general) for other chains, as shown in [21].
As mentioned before, FSST is equal in distribution to the absorption time of the sharp SSD chain. Thus, there is a close relation between a sharp SSD and a separation cutoff. Roughly speaking, this cutoff can be studied by studying the limiting distribution of the absorption time of the SSD. This can be extremely difficult task. However, since examples of chains with proven separation cutoff are always welcome, we can reverse the procedure: starting with some already absorbing chain we can try to find an ergodic sharp antidual chain (or even a class of such antidual chains). Such an approach was considered in [19] in a context of birth and death chains only. A connection between a separation cutoff and a coupon collector problem (including some generalizations, e.g., sampling different coupons at a time) was given in [36].
Using this approach we will indicate a separation cutoff time and a window size in several examples of chains utilizing (nontrivial) results for the limiting distribution of the absorption time in some generalizations of the classical coupon collector problem. That is why we need a recipe for sharp antidual chains, what will be given based on results from [30]. Most of the examples that follow deal with some product-type chains. It is however worth noting that taking a product of chains where each chain exhibits a cutoff does not have to yield a chain (on a product space) exhibiting a cutoff. Such an example was recently given in [25].
The absorption time of many absorbing chains is distributed as a mixture of sums of geometric random variables with parameters being the eigenvalues of the transition matrix. E.g, the absorption time of discrete time birth and death chain starting at the minimal state with the maximal one being absorbing is distributed as a sum of geometric random variables with such parameters, provided the chain is stochastically monotone. The result is usually attributed to Karlin and McGregor [23] or Keilson [24]. Fill [19] gave a stochastic proof of this result using also the theory of SSD (the result was simultaneously obtained in [9]), later it was extended to skip-free Markov chains in Fill [18]. Miclo [33] showed that for large class of absorbing chains on a finite state space, the absorption time is distributed as a mixture of sums of geometric random variables. A natural question arises: Given a mixture of sums of geometric random variables and some distribution can we find an ergodic chain whose stationary distribution is and whose FSST is equal in distribution to this mixture? Or, a special case of the question, Given some distribution can be construct an ergodic chain whose stationary distribution is having deterministic FSST? We provide positive answers to both questions (some assumptions on distributions are needed). In particular, we present two ergodic chains on completely different state spaces having the same FSST.
The main goals of the paper are: i) we give a systematic way (based on partial ordering of the state space and Möbius monotonicity) for finding a class of sharp antidual chains; ii) we present nontrivial antidual chains related to some generalizations of coupon collector problem and, as a consequence, we show cutoff phenomena in some cases; iii) we present a construction of a chain with prescribed FSST and prescribed stationary distribution.
There is yet another potential application which served as a motivation for the paper (however, not exploited here): Given a probability distribution on , how to simulate a sample from this distribution? Markov Chain Monte Carlo methods come with the answer: construct a chain with stationary distribution and run it long enough. The most common algorithms for such constructions are Metropolis-Hastings algorithm and Gibbs sampler (for studies on rate of convergence for Metropolis-Hasting algorithms see, e.g., [10], the cutoff for Gibbs sampler for Ising model on the lattice was studied on [31]). This paper suggests an alternative approach: given on find some absorbing chain on and then calculate sharp antidual chain having this as stationary distribution. Knowing, e.g., expectation and variance of absorption time, one can quite precisely determine the number of steps needed for simulation. Moreover, having a sharp SSD actually can allow for a perfect simulation from distribution . One can construct an appropriate coupling of the absorbing chain and its antidual, so that stopping antidual chain when its SSD is absorbed yields an unbiased sample from . The reader is referred for details to [8] (Section 2.4), [20] (Section 1.1) or [29] (Section 2.3, Algorithm 4). We want to emphasize that utilizing this was not the purpose of this paper, and the stationary distributions which appear in most of the examples are of product form, which means we can easily simulate them coordinate by coordinate.
The paper is organized as follows. In Section 2 we introduce preliminaries on strong stationary duality and separation cutoff. In Section 3 we recall a notion of Möbius monotonicity and give a matrix-form proof of the result from [30]. In Section 4 we present our main results. Firstly, in Section 4.1 in Theorem 4.1 we give a systematic way for finding a class of sharp antidual chains. Secondly, in Section 4.2 we introduce in details the chain corresponding to the generalized coupon collector problem and present sharp antidual chains in Theorems 4.2 and 4.2. Then, in Section 4.3, we proceed with presenting separation cutoff results for some cases. In Section 4.4 we present our results concerning construction of ergodic chain with prescribed stationary distribution and with prescribed FSST. Section 5 includes main proofs. Section 5.1 contains proofs of Theorems 4.2 and 4.2, whereas Section 5.2 contains the proof of Theorem 4.4.
2. Preliminaries
2.1. Strong stationary duality
Consider an ergodic (i.e., irreducible and aperiodic) Markov chain on a finite state space with an initial distribution , a stationary distribution and a transition matrix . Let be a state space of an absorbing Markov chain , whose unique absorbing state and unique irreducible class is denoted by . Define , a matrix of size , to be a link if it is a stochastic matrix with the property: for all . We say that is a strong stationary dual (SSD) of with link if
[TABLE]
Diaconis and Fill [8] prove that then the absorption time of is a so-called strong stationary time (SST) for . This is such a random variable that has distribution and is independent from . The main application is in studying the rate of convergence of an ergodic chain to its stationary distribution, since for such a random variable we always have: , where stands for total variation distance, and stands for separation. Note that is not symmetric and thus is not a distance between probability measures. The corresponding is sharp if . In such a case, is called the fastest strong stationary time for , which we denote by . For more details on this duality consult [8]. Moreover, duality relation (1) allows for stochastic constructions, see, e.g., [19], where stochastic proof for passage time distribution for birth and death chain was given.
Note that once we fix and a link , and if there exists a right-inverse of , i.e., we can simply calculate from (1):
[TABLE]
If the resulting is a stochastic, irreducible and aperiodic matrix and is a probability distribution, then (it will always correspond to an absorbing chain) we have found an SSD. However, we can start with some already absorbing chain , then find some and some probability distribution on , and a link , so that
[TABLE]
If the resulting is a stochastic matrix, then is an ergodic chain with stationary distribution , and (time to absorption of ) is an SST for . In such a case, is called antidual of . Moreover, if we somehow know, that for some class of links relation (1) implies that is sharp (see Corollary 3), then we can possibly find many different antiduals, which all have the same fastest strong stationary time , which has a phase-type distribution. In such a case is called a sharp antidual of .
2.2. Separation cutoff
The forthcoming Theorem 4.1 indeed gives a recipe on how to construct a sharp antidual chain with a specified stationary distribution given absorbing chain , both on the same state space. It means, that we have
[TABLE]
Thus, studying the distribution of is equivalent to study the distribution of . Furthermore, a separation cutoff can be studied by studying the properties of . In what follows, we introduce the notion of separation cutoff. Since the definition of the cutoff involves increasing state space, we add a subscript () to transition matrices, distributions, state space and absorption time. Suppose we have a sequence of ergodic Markov chains indexed by Denote by the stationary distribution of . We say that this sequence exhibits a **separation cutoff at time ** with a window size if
[TABLE]
If the convergence to stationarity is measured in a total variation distance, we say about a total variation cutoff.
3. Möbius monotonicity and duality
In general, there is no recipe on how to find an SSD, i.e., a triplet . In [8] authors give a recipe for a dual on the same state space provided that a time reversed chain is stochastically monotone with respect to total ordering. In [30] we give an extension of this result to state spaces which are only partially ordered by . Then, provided that the time reversed chain is Möbius monotone (plus some conditions on the initial distribution), we give a formula for a sharp SSD on the same state space .
The Möbius monotonicity seems to be a natural one for extension of main result from [8] to partially ordered state spaces. In [28] we show that it is equivalent to the the existence of a Siegmund dual of a chain with given partial ordering. For a linearly ordered state space, stochastic monotonicity of a chain is required for the existence of a Siegmund dual (see [38]), and stochastic monotonicity of a time reversal is required for the existence of an SSD with a link being a truncated stationary distribution (see [8]). Both results fail for non-linear orderings, since both require Möbius monotonicity, which, in general, is different than the stochastic one. The monotonicities are equivalent for linear ordering. For more relations between these (and not only) monotonicities consult [29], and for applications of a Siegmund duality to some generalizations of a gambler’s ruin problem consult [27]. We will introduce this monotonicity by trying to solve (1) with some given link .
We consider a finite state space partially ordered by such that is the unique maximal state. For a function , by lower-case bold symbol we denote the row vector .
The idea is to find an SSD with a transition matrix on the same state space with a link, whose row corresponding to is a stationary distribution of truncated to , i.e.,
[TABLE]
Note that for all we have , as required. For a given ordering let . For the partial ordering we require only that the state which is absorbing for , denoted throughout the paper by , is the unique maximal one (i.e., for all and there is no such that for all ). We always identify ordering with the matrix , keeping in mind, that enumeration of states in and must be the same. Then the link can be written in a matrix form:
[TABLE]
where is a diagonal matrix with entries . The states can always be rearranged in such a way that implies , what means that , and thus , are invertible. Often, is called the Möbius function or the Möbius matrix of the partial order . Solving (1) for yields (recall that the transitions of time reversed chains are given by )
[TABLE]
which is a stochastic matrix if and only if each entry of is non-negative, in other words we say that is Möbius monotone. This way we proved the main part of Theorem 2 of [30]. We include it here, since this is a little bit different (matrix-form) proof. We will restate the theorem for completeness, introducing formal definitions of monotonicities first. For given partial ordering and any matrix (not necessarily stochastic) we define and similarly . {dfntn} Markov chain is Möbius monotone if (each entry non-negative). In terms of transition probabilities, it means that
[TABLE]
Recall that for a Möbius function we always have whenever . {dfntn} A function is Möbius monotone if (each entry non-negative). It means that
[TABLE]
{rmrk}
In Lorek, Szekli [30] this Möbius monotonicity (of both, function and chain) was called ↓-Möbius monotonicity (see Definitions 2.1 and 2.2 therein).
{dfntn}
is ↑-Möbius monotone if (each entry non-negative).
{thrm}
[Theorem 2 of [30]] Let be an ergodic Markov chain on a finite state space , partially ordered by , with a unique maximal state , and with a stationary distribution . Assume that
- (i)
is Möbius monotone,
- (ii)
time reversed chain is Möbius monotone.
Then there exists a strong stationary dual chain on with the following link
[TABLE]
Let . The SSD chain is uniquely determined by
[TABLE]
The following Corollary will play a crucial role: {crllr} The SSD constructed in Theorem 3 is sharp.
Proof.
The link given in (5) is lower-triangular, thus, by Remark 2.39 in [8], the resulting SSD is sharp. ∎
4. Main results
4.1. General procedure for sharp anti-dual chains
The main contribution is a systematic way of finding a sharp antidual (on the same state space ) chain of some given already absorbing chain with the unique absorbing state . The idea is clear from the previous section: introduce some partial ordering and some distribution on . Then solve for with the link given in (5). If the resulting matrix is non-negative, it will be a stochastic matrix of an ergodic Markov chain with the stationary distribution . Moreover, changing and/or ordering usually will yield a different sharp antidual. It means we can have a class of chains, all having the same fastest strong stationary time .
Fix some partial ordering on (expressed by ) having the unique maximal state and some distribution on . For given define
[TABLE]
With slight abuse of notation we will assume that is ↑-Möbius monotone meaning that . Definition 3 was stated for a Markov chain with a transition matrix , note however that does not have to be a stochastic matrix.
{thrm}
Let be an absorbing Markov chain on with the unique absorbing state . Let be the class of all partial orderings on with being unique maximal state. Consider the class of pairs of distributions and partial orderings such that is ↑-Möbius monotone:
[TABLE]
Then for any the chain with the link defined in (5) and with
[TABLE]
is a sharp antidual for , i.e., is a sharp SSD for . Equivalently, , where, for given and , the link is defined in (5).
Proof.
Since is a distribution on and is a stochastic matrix, is a distribution on . By assumption that is ↑-Möbius monotone, the matrix is non-negative. We will show that is its stationary distribution. Let . Last row of is equal to what can be expressed as , thus . We have
[TABLE]
Now we will show that the rows of sum up to 1, i.e., that . We have
[TABLE]
To show we need to show that for any . For diagonal matrices , and a square matrix (all of the same sizes) we have , thus
[TABLE]
Thus, is a stochastic matrix and thus is a Markov chain with the stationary distribution . Since (1) holds, is an SSD for . Theorem 3 and Corollary 3 imply that is a sharp SSD of . ∎
{rmrk}
If, in addition, within ordering we have a unique minimal state, say , and starts from this state (i.e., ), then the antidual chain also starts from this state, i.e. . This is the case in all examples that follow.
{rmrk} The condition that is Möbius monotone (w.r.t. and ) is equivalent to non-negativity of the resulting matrix . In examples, it is often more convenient to calculate and directly.
4.2. Antidual chains for a generalized coupon collector problem
Consider different types of coupons. These are sampled independently with replacement. Sampled types are recorded. For let be the probability that the coupon of type is sampled, with . With the remaining probability, i.e., with probability , no coupon is sampled. We start with no coupons of any type. Let be the number of steps it takes to collect coupons of type , for some fixed integers . Let denote that coupon of type was sampled times. If and coupon of type is sampled, the chain does not move. The distribution of is the time to absorption in the state of the chain on the state space with initial distribution and the following transition matrix:
[TABLE]
We refer to as to a generalized coupon collector chain. The case and is the classic coupon collector problem, which has a long history, see for example [16]. The term generalized is not unique. It is used when sequence is general but (e.g., [34]) or when but we are to collect more coupons of each type (see, e.g., [35], [14]). Although the chain given in (7) includes both mentioned generalizations, we consider two antidual chains for two different cases separately:
- a)
for general and with the uniform stationary distribution of antidual chain;
- b)
for general but with more general stationary distribution of antidual chain (including uniform one as special case).
The proofs are postponed to Section 5.1.
For convenience denote and . Define (with 1 on the position ).
Case: general and and a uniform stationary distribution of antidual chain
{thrm}
Let be a generalized coupon collector chain with the transition matrix given in (7) with fixed integers . Moreover, assume that
[TABLE]
Then the chain with and with transition matrix
[TABLE]
is an ergodic Markov chain with uniform distribution on which is a sharp antidual for .
{rmrk} Note that for example for , the condition (8) if always fulfilled.
Roughly speaking, the antidual has the following transitions. Being in state it can increase each coordinate by one (if feasible), it can stay in this state or it can change one of the coordinates to anything smaller. Changing some coordinate depends only on the value of this coordinate, and decreasing coordinate, say from to is constant for all (the probability depends only on and the formula itself is different on the border, i.e., when ).
Case: general and and a non-uniform distribution of antidual chain.
{thrm}
Let be a generalized coupon collector chain with the transition matrix given in (7). Assume that . Let for . Then the chain on the same state space with the initial distribution and the transition matrix
[TABLE]
is an ergodic Markov chain which is a sharp antidual for . The stationary distribution is the following:
[TABLE]
{rmrk}
The proof of Theorem 4.2 implies that the antidual chain has transitions consistent with partial ordering, i.e., at each step it can stay or it can either change one coordinate from 0 to 1 or vice-versa. This is not the case for any distribution . It can happen, that for some two coordinates change at a time or antidual does not exist (since some entries of can be negative). This is further commented after proof in Remark 5.1.
Taking the following concrete sequences of : or we obtain the following special cases: {crllr} The chains with a common initial distribution and transition matrices
[TABLE]
and with the respective stationary distributions
[TABLE]
(where , called a level of ) are sharp antidual chains for given in (7).
{rmrk}
In [30] we considered the chain on with transition matrix given by
[TABLE]
The chain is reversible with product form stationary distribution:
[TABLE]
We showed that the chain is Möbius monotone if and only if . As partial ordering, coordinate-wise was used. Then we obtained the following dual chain:
[TABLE]
what is our absorbing dual (7) we started with, with and . Note that is a special case of given in (10) with .
{crllr} The matrices given in (9) and in (10) have eigenvalues of the form:
[TABLE]
(the multiplicity of which depends on the case).
Proof.
We can order the states of in such a way that given in (7) is upper triangular, thus the eigenvalues are the entries on the diagonal. If the link is invertible (which is the case), then the transition matrices and of SSD have the same set of eigenvalues, what is a direct consequence of relation (1). ∎
{rmrk}
Fix and . One can ask the following question: For what sequence is the associated stochastically the smallest? Conjecture 2 in [14] suggests that this is in the case of equal probabilities .
4.3. Results on the separation cutoff
Since obtained antidual chains are sharp (i.e., (2) holds), we can present a series of results on the separation cutoff utilizing existing results on the limiting distribution of .
We start with the simplest chain corresponding to the classical coupon collector problem. {crllr} Consider a sequence of Markov chains indexed by on with an initial distribution and the transition matrix given in (10) with and any for . The stationary distribution is given in (11). The sequence exhibits a separation cutoff at time with window size .
Proof.
Denote the FSST of the chain by . It is known that . Moreover, converges in distribution (as ) to a standard Gumbel random variable (with c.d.f ), see [22].
Taking and we have
[TABLE]
Taking the limits as we have
[TABLE]
Taking the limit as finishes the proof. ∎
Results on the limiting distribution of from [34] let us indicate separation cutoffs for cases with non-constant probabilities . For example we can have the following corollary. {crllr} Consider piecewise constant probability density function on :
[TABLE]
where and . Without loss of generality assume that . Consider a sequence of Markov chains indexed by on with an initial distribution and the transition matrix given in (10) with
[TABLE]
and any for . The stationary distribution is given in (11). The sequence exhibits a separation cutoff at time with window size .
Proof.
Denote the FSST of the chain by (which is equal, in distribution, to collecting coupons). We have
[TABLE]
Lemma 3.1 in [34] implies that converges in distribution to a random variable with c.d.f . Thus, we have
[TABLE]
Similarly
[TABLE]
and
[TABLE]
Taking limits as finishes the proof. ∎
Next corollaries utilize results on time until some set of coupons is collected.
{crllr}
Consider a sequence of Markov chains indexed by on with an initial distribution and the transition matrix given in (9) with and (so that (8) holds). The stationary distribution is uniform. The sequence of chains exhibits a separation cutoff at time with window size .
Proof.
In [15] authors derived limiting distribution of showing that
[TABLE]
(where is the Euler-Mascheroni constant) converges in distribution to a standard Gumbel random variable. Similar calculations as in Corollary 4.3 finish the proof.
∎
Recently authors in [14] extended the result of [15] obtaining the limiting distribution of for and for quite general choices of probabilities . Let us indicate here one example (which actually includes result of Corollary 4.3 as a special case).
{crllr}
Consider a sequence of Markov chains indexed by on with an initial distribution and the transition matrix given in (9) with
[TABLE]
and (so that (8) holds). The stationary distribution is uniform. The sequence of chains exhibits a separation cutoff at time with window size .
Proof.
In [14] authors prove that
[TABLE]
converges in distribution to a standard Gumbel random variable. Again, similar calculations as in Corollary 4.3 finish the proof. ∎
4.4. Constructing an ergodic chain with a prespecified FSST and an arbitrary stationary distribution
Let us ask the following question (which was one of the main motivations for the paper):
How to construct a Markov chain on a state space of size with arbitrary stationary distribution whose FSST is deterministic, ?
The recipe is clear from previous sections: Start with some absorbing chain for which , where is the absorption time. Probably the simplest one is the following: take with transitions for and and start it at state 1. Then of course we have desired absorption time and thus the antidual would have desired stationary distribution and FSST.
The above example will be a special case of a more general result. Many absorbing chains have the absorption time distributed as a mixture of sums of independent geometric random variables with parameters being the eigenvalues of the transition matrix. E.g., for stochastically monotone discrete time birth and death chain starting at 1 with being the absorbing state, the time to absorption is distributed as a sum of geometric random variables with parameters being the eigenvalues of the transition matrix (which are positive in this case). This result follows from Karlin and McGregor [23] or Keilson [24]. Fill [19] gave a first stochastic proof of this result using dualities (the result was simultaneously obtained in [9]). This was extended to skip-free Markov chains in Fill [18]. Miclo [33] showed that for any absorbing chain on with positive eigenvalues and some reversibility condition (involving substochastic kernel corresponding to the transition matrix with row and column corresponding to absorbing state removed) there exists a measure such that the time to absorption has distribution
[TABLE]
where are the eigenvalues of the transition matrix sorted in non-increasing order and denotes the distribution of where .
For convenience denote . Our result is following. {thrm} Let and . Let be two probability distributions on such that for all . Define the matrix
[TABLE]
Assume that and sequence are such that that the matrix is non-negative. Then Markov chain with the transition matrix and with the initial distribution given by
[TABLE]
has the FSST distributed as
[TABLE]
and is its stationary distribution. Moreover, are the eigenvalues of .
Note that is a skip-free chain: for given the only nonzero entries of are for . The proof of the theorem is postponed to Section 5.2.
We can relatively easy have some corollaries being interesting special cases of Theorem 4.4. Applying the Theorem 4.4 with and we obtain the following corollary. {crllr} Consider a distribution on such that for all . The Markov chain on with transition matrix
[TABLE]
is ergodic with the stationary distribution . Assume the initial distribution is (i.e., ). Then the chain has deterministic fastest strong stationary time such that .
Note that for this chain we have
[TABLE]
Thus, this is an extreme example for a separation cutoff: For any the chain is completely not mixed (the separation between stationary distribution and distribution at step is 1) and the chain mixes completely exactly at step (the distance is 0).
Simplifying the chain further by taking additionally uniform distribution in Corollary 4.4 we obtain
[TABLE]
The chain is sketched in Fig. 1
Two Markov chains on essentially different state spaces with the same FSST
So far in this section we considered chains on totally ordered state space . We can also consider another state spaces. We will consider chain on . We will not present full generality one can have, instead we will present two chains, one on and the other on both with uniform distributions and the same FSST distributed as where for some fixed . Note that in particular the sizes of the state spaces are completely different, versus
{crllr}
Fix some integer and . Let be a Markov chain on with an initial distribution and transitions
[TABLE]
Let be a Markov chain on with initial distribution and with transitions
[TABLE]
(Recall that was called a level of ).
Then the FSSTs and of both chains have the same distribution:
[TABLE]
Both chains have the uniform stationary distribution on respective state spaces.
Proof.
We will show that chains and are sharp antidual chains of different chains and , whose absorption times are equal to the statement.
- •
Chain
This is a special case of the chain given in Theorem 4.4 with and the uniform stationary distribution . Taking we have that the initial distribution and that FSST is distributed as with . The distribution of is equal to with
- •
Chain
This is a special case of the chain given in Corollary 4.2 with . Thus, its sharp dual chain is given in (7). Recall this is the case , let us explicitly write the transitions of this using notation from this section:
[TABLE]
Roughly speaking, this is the following random walk on hypercube . Being at some state either we change one coordinate from 0 to 1 with probability or with the remaining probability we do nothing. State is an absorbing state. Since the probability of changing 0 into 1 does not depend on the actual state, the time to increase the current level depends only on the level. Being at any state on level the time to reach next level has distribution (since there are of zeros, each of which can be changed into 1 with probability ). Thus, if the chain starts somewhere on level 1, say , then the absorption time is equal in distribution to where . What remains to show is that yields . All the proofs of Theorems 4.2 and 4.2 are based on coordinate-wise ordering, i.e.,
[TABLE]
Recall the link (it is given in (3))
[TABLE]
We have
[TABLE]
what finishes the proof.
∎
5. Proofs
5.1. Proofs of Theorems 4.2 and 4.2
In both proofs we use the coordinate-wise ordering (defined in (13)) for which is the unique minimal and is the unique maximal one.
Proof of Theorem 4.2.
For the ordering under consideration, directly from Proposition 5 in Rota [37], we find the corresponding Möbius function
[TABLE]
Let
[TABLE]
We will apply Theorem 4.1 with the above ordering and the uniform distribution on , i.e., . Since starts at the minimal state, so does - by Remark 4.1 - the antidual chain. The link is the uniform distribution truncated to , from (4) we have , thus
[TABLE]
The inverse is given by , thus
[TABLE]
Instead of calculating , we will calculate and then directly the antidual chain from (the conditions on -Möbius monotonicity will be read from the resulting antidual, see Remark 4.1). We have to calculate
[TABLE]
Because of the form of , we need only to consider states which differ from at most by 1 on each coordinate.
[TABLE]
We need to calculate
[TABLE]
Note that for a given the only nonzero entries of are for or (if ), where (with 1 at position ). We have
[TABLE]
thus
[TABLE]
[TABLE]
For convenience, define
[TABLE]
Consider cases:
- •
Case 1. Increasing some coordinates: where are distinct integers and . When , then indicators in both, and are equal to 0. When , then the indicator in is equal to 0, whereas the indicator in can be nonzero only in case , and . Then we have
[TABLE]
- •
Case 2. Increasing two or more coordinates and decreasing any number of coordinates: because of the same reasons as in previous case, indicators in both, and are equal to 0.
- •
Case 3. Decreasing some coordinates: , where are distinct integers and .
Let , where and for . In (15) we sum over all such that . Let us split this sum into two sums over disjoint sets and , where
[TABLE]
Consider . Since it is incomparable with it means that for some we have such that and . Then the indicator in is equal to 0. The second indicator can be nonzero only when and . Thus, for any we have that for all such that . We have
[TABLE]
[TABLE]
The indicator is nonzero only when , and for , thus
[TABLE]
since the second sum does not depend on .
Consider . Then indicators in both and are nonzero, we have
[TABLE]
Consider cases:
, i.e., we decrease only one coordinate. In this case with only one 1 at position . Thus there are only two such that namely or . We have
[TABLE]
Note that for all the corresponding terms (for and ) are the same, thus they sum up to 0. The remaining terms:
[TABLE]
Finally, we have
[TABLE]
. Things are different in this case. Consider and fixed , where . Then there are different in , from which exactly gives and exactly gives , resulting in vanishing the terms or (depending on the value of ). This implies that . For example, for and, for simplicity, for , there are four following terms in :
[TABLE]
which sum up to 0.
Remark: In case for fixed there was no corresponding which could make the terms vanish.
- •
Case 4. Increasing one, decreasing another coordinate: . We have shown that increasing/decreasing coordinates has probability 0, thus there is no need to consider the case where we increase and decrease any number of coordinates in one step.
In this case the indicator in is zero. Concerning . Let, with one 1 at position . Note that for , the indicator in is also 0. Thus, the only nonzero terms are for either or (and then ):
[TABLE]
what sums up to 0.
- •
Case 5. Staying at the same state: . Then the indicator is nonzero only when , whereas the indicator is nonzero when and any or when . We have
[TABLE]
The assumption (8) implies that . We have considered all the transitions. Let us check that each row of calculated sums up to 1. We have (with the convention )
[TABLE]
∎
Proof of Theorem 4.2.
Note that is the minimal state, and starts at this state , thus - by Remark 4.1 - this is also the initial distribution of the antidual chain, i.e., .
For convenience, define
[TABLE]
For the stationary distribution given in (11) we have
[TABLE]
[TABLE]
Denote
[TABLE]
The sum in denominator of can be split into two sums: for and . We have
[TABLE]
Let us proceed with .
[TABLE]
Note that is not a stochastic matrix, since we have
[TABLE]
Now, calculating the antidual chain from Theorem 4.1, we have
[TABLE]
[TABLE]
where we applied the Möbius function for this ordering: (a consequence of (14)). We proceed with (16) by considering cases:
- •
Case 1. Increasing some coordinates: for some distinct integers .
First note that if , than, for any we have , thus .
For the sum in (16) is following , the only nonzero term is for , thus
[TABLE]
- •
Case 2. Increasing two or more coordinates and decreasing any number of coordinates: because of the same reasons as in previous case (we would have to increase at least two coordinates in one step) such transition has probability 0.
- •
Case 3: . Let us split into five disjoint sets:
[TABLE]
where means that and , and means that and are incomparable. Define also
[TABLE]
We have
[TABLE]
Let us consider cases and separately.
, i.e., . Note that then . We have
[TABLE]
We have and finally
[TABLE]
. Consider first . Assume thus that We have
[TABLE]
Summing up, , what is also the case for (the proof, although longer, is quite similar, we skip the details). This means that for
[TABLE]
- •
Case 4. Increasing one, decreasing another coordinate: . We have shown that increasing/decreasing coordinate has probability 0, thus it suffices to consider only changing two coordinates (one increasing, the other decreasing). Then the the summands are nonzero only for or , we have
[TABLE]
thus .
- •
Case 5. Staying at the same state: . Then we have
[TABLE]
First term is equal to , in the latter, the only possibility is to change -th coordinate of to one:
[TABLE]
Finally, we obtain matrix given in (10). ∎
Remark 5.1**.**
Showing that relied heavily on the fact that for the stationary distribution given in (11), we had and it did not depend on . That is why the terms and cancelled out. Similarly, it is the reason why decreasing coordinates has probability 0. For other, not product-form stationary distributions, such transitions are possible.
5.2. Proof of Theorem 4.4
Let be an absorbing chain on with transition matrix:
[TABLE]
where, for convenience, we set . Let be its initial distribution. This is a pure birth chain, thus its absorption time is distributed as (12). We will show that is its sharp antidual chain.
We consider the total ordering . Then the link given in (3) reads
[TABLE]
The inverse can be easily derived:
[TABLE]
Let us calculate
[TABLE]
Calculating transitions of the antidual chain:
[TABLE]
Consider separately the cases:
- •
. Then . This is nonzero only if or .
[TABLE]
- •
. We have
[TABLE]
Thus,
[TABLE]
- •
. We have
[TABLE]
Thus,
[TABLE]
Consider three sub-cases:
. Then we have
[TABLE]
. Then we have
[TABLE]
. Then we have
[TABLE]
For we obviously have . For we have
[TABLE]
Thus (cf. (6)) we considered all the cases. The only thing left to calculate is the initial distribution of the antidual chain. Using relation (1) we have
[TABLE]
The matrix is upper-triangular, thus are its eigenvalues. Because of the relation (1) these are also the eigenvalues of .
Acknowledgements
The author thanks anonymous reviewers for thorough reviews and appreciates the comments and suggestions, which contributed to improving the quality of the publication.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. Aldous and P. Diaconis. Shuffling cards and stopping times. American Mathematical Monthly , 93(5):333–348, 1986.
- 2[2] D. Aldous and P. Diaconis. Strong Uniform Times and Finite Random Walks. Advances in Applied Mathematics , 97:69–97, 1987.
- 3[3] R. Basu, J. Hermon, and Y. Peres. Characterization of cutoff for reversible Markov chains. Annals of Probability , 45(3):1448–1487, 2017.
- 4[4] G. Y. Chen and L. Saloff-Coste. The cutoff phenomenon for ergodic Markov processes. Electronic Journal of Probability , 13:26–78, 2008.
- 5[5] G.-Y. Chen and L. Saloff-Coste. Computing cutoff times of birth and death chains. Electronic Journal of Probability , 20:1–47, 2015.
- 6[6] M. C. H. Choi and P. Patie. A Sufficient Condition for Continuous-Time Finite Skip-Free Markov Chains to Have Real Eigenvalues. In: Bélair J., Frigaard I., Kunze H., Makarov R., Melnik R., Spiteri R. (eds) Mathematical and Computational Approaches in Advancing Modern Science and Engineering. , pages 529–536, 2016.
- 7[7] S. B. Connor. Separation and coupling cutoffs for tuples of independent Markov processes. Latin American Journal of Probability and Mathematical Statistics , 7(3):65–77, 2010.
- 8[8] P. Diaconis and J. A. Fill. Strong stationary times via a new form of duality. The Annals of Probability , 18(4):1483–1522, 1990.
