Antiduality and M\"obius monotonicity: Generalized Coupon Collector   Problem

Pawe{\l} Lorek

arXiv:1903.00247·math.PR·March 4, 2019

Antiduality and M\"obius monotonicity: Generalized Coupon Collector Problem

Pawe{\l} Lorek

PDF

TL;DR

This paper introduces a systematic method to find antidual Markov chains related to a generalized coupon collector problem, revealing cutoff phenomena and constructing chains with prescribed stationary distributions.

Contribution

It develops a new approach based on M"obius monotonicity to identify antidual chains and applies this to generalized coupon collector problems, highlighting cutoff behaviors.

Findings

01

Identified several sharp antidual chains for coupon collector models.

02

Demonstrated cutoff phenomena with specific window sizes.

03

Constructed chains with prescribed stationary distributions and mixing times.

Abstract

For a given absorbing Markov chain $X^{*}$ on a finite state space, a chain $X$ is a sharp antidual of $X^{*}$ if the fastest strong stationary time of $X$ is equal, in distribution, to the absorption time of $X^{*}$ . In this paper we show a systematic way of finding such an antidual based on some partial ordering of the state space. We use a theory of strong stationary duality developed recently for M\"obius monotone Markov chains. We give several sharp antidual chains for Markov chain corresponding to a generalized coupon collector problem. As a consequence - utilizing known results on a limiting distribution of the absorption time - we indicate a separation cutoff (with its window size) in several chains. We also present a chain which (under some conditions) has a prescribed stationary distribution and its fastest strong stationary time is distributed as a prescribed mixture of sums of…

Equations226

ν = ν^{*} Λ and Λ P = P^{*} Λ.

ν = ν^{*} Λ and Λ P = P^{*} Λ.

P^{*} = Λ P Λ^{- 1} and ν^{*} = ν Λ^{- 1} .

P^{*} = Λ P Λ^{- 1} and ν^{*} = ν Λ^{- 1} .

P = Λ^{- 1} P^{*} Λ and ν = ν^{*} Λ.

P = Λ^{- 1} P^{*} Λ and ν = ν^{*} Λ.

se p (ν P^{k}, π) = P (T_{F S S T} > k) = P (T^{*} > k) .

se p (ν P^{k}, π) = P (T_{F S S T} > k) = P (T^{*} > k) .

\begin{array}[]{lll}\displaystyle\lim_{c\to\infty}&\displaystyle\limsup_{d\to\infty}&\displaystyle sep(\nu_{(d)}\mathbf{P}^{t_{d}+cw_{d}}_{(d)},\pi_{(d)})=0,\\[6.0pt] \displaystyle\lim_{c\to\infty}&\displaystyle\liminf_{d\to\infty}&\displaystyle sep(\nu_{(d)}\mathbf{P}^{t_{d}-cw_{d}}_{(d)},\pi_{(d)})=1.\end{array}

\begin{array}[]{lll}\displaystyle\lim_{c\to\infty}&\displaystyle\limsup_{d\to\infty}&\displaystyle sep(\nu_{(d)}\mathbf{P}^{t_{d}+cw_{d}}_{(d)},\pi_{(d)})=0,\\[6.0pt] \displaystyle\lim_{c\to\infty}&\displaystyle\liminf_{d\to\infty}&\displaystyle sep(\nu_{(d)}\mathbf{P}^{t_{d}-cw_{d}}_{(d)},\pi_{(d)})=1.\end{array}

Λ (e_{i}, e_{j}) = \frac{π ( e _{j} )}{\sum _{e^{'} : e^{'} ⪯ e_{i}} π ( e ^{'} )} 1 (e_{j} ⪯ e_{i}) .

Λ (e_{i}, e_{j}) = \frac{π ( e _{j} )}{\sum _{e^{'} : e^{'} ⪯ e_{i}} π ( e ^{'} )} 1 (e_{j} ⪯ e_{i}) .

Λ = (diag (π C))^{- 1} C^{T} diag (π),

Λ = (diag (π C))^{- 1} C^{T} diag (π),

P^{*}

P^{*}

\forall (e_{i}, e_{j} \in E) e ⪰ e_{i} \sum μ (e_{i}, e) P (e, {e_{j}}^{↓}) \geq 0.

\forall (e_{i}, e_{j} \in E) e ⪰ e_{i} \sum μ (e_{i}, e) P (e, {e_{j}}^{↓}) \geq 0.

\forall (e_{i} \in E) e : e ⪰ e_{i} \sum μ (e_{i}, e) f (e) \geq 0.

\forall (e_{i} \in E) e : e ⪰ e_{i} \sum μ (e_{i}, e) f (e) \geq 0.

Λ = (diag (π C))^{- 1} C^{T} diag (π) .

Λ = (diag (π C))^{- 1} C^{T} diag (π) .

\begin{array}[]{llllrlllllllll}\nu^{*}&=&\nu\Lambda^{-1}&\textrm{ i.e.,}&\nu^{*}(\mathbf{e}_{i})&=&\displaystyle H(\mathbf{e}_{i})\sum_{\mathbf{e}:\mathbf{e}\succeq\mathbf{e}_{i}}\mu(\mathbf{e}_{i},\mathbf{e})g(\mathbf{e}),\\[12.0pt] \mathbf{P}^{*}&=&\Lambda\mathbf{P}\Lambda^{-1},&\textrm{ i.e.,}&\mathbf{P}^{*}(\mathbf{e}_{i},\mathbf{e}_{j})&=&\displaystyle{H(\mathbf{e}_{j})\over H(\mathbf{e}_{i})}\sum_{\mathbf{e}:\mathbf{e}\succeq\mathbf{e}_{j}}\mu(\mathbf{e}_{j},\mathbf{e})\overleftarrow{\mathbf{P}}(\mathbf{e},\{\mathbf{e}_{i}\}^{\downarrow}).&\\ \end{array}

\begin{array}[]{llllrlllllllll}\nu^{*}&=&\nu\Lambda^{-1}&\textrm{ i.e.,}&\nu^{*}(\mathbf{e}_{i})&=&\displaystyle H(\mathbf{e}_{i})\sum_{\mathbf{e}:\mathbf{e}\succeq\mathbf{e}_{i}}\mu(\mathbf{e}_{i},\mathbf{e})g(\mathbf{e}),\\[12.0pt] \mathbf{P}^{*}&=&\Lambda\mathbf{P}\Lambda^{-1},&\textrm{ i.e.,}&\mathbf{P}^{*}(\mathbf{e}_{i},\mathbf{e}_{j})&=&\displaystyle{H(\mathbf{e}_{j})\over H(\mathbf{e}_{i})}\sum_{\mathbf{e}:\mathbf{e}\succeq\mathbf{e}_{j}}\mu(\mathbf{e}_{j},\mathbf{e})\overleftarrow{\mathbf{P}}(\mathbf{e},\{\mathbf{e}_{i}\}^{\downarrow}).&\\ \end{array}

P^{*} = diag (π C) P^{*} (diag (π C))^{- 1} .

P^{*} = diag (π C) P^{*} (diag (π C))^{- 1} .

P (P^{*}) = {(π, C) : C \in C, P^{*} is^{↑} -M \overset{o}{¨} bius monotone} .

P (P^{*}) = {(π, C) : C \in C, P^{*} is^{↑} -M \overset{o}{¨} bius monotone} .

ν = ν^{*} Λ, P = (diag (π))^{- 1} (C^{T})^{- 1} P^{*} C^{T} diag (π)

ν = ν^{*} Λ, P = (diag (π))^{- 1} (C^{T})^{- 1} P^{*} C^{T} diag (π)

π P = π Λ^{- 1} P^{*} Λ = η P^{*} Λ = η Λ = π .

π P = π Λ^{- 1} P^{*} Λ = η P^{*} Λ = η Λ = π .

P (1, \dots, 1)^{T}

P (1, \dots, 1)^{T}

e^{'} \in E^{*} \sum ((diag π)^{- 1} (C^{T})^{- 1} diag (π C)) (e, e^{'})

e^{'} \in E^{*} \sum ((diag π)^{- 1} (C^{T})^{- 1} diag (π C)) (e, e^{'})

\mathbf{P}^{*}((i_{1},\ldots,i_{d}),(i^{\prime}_{1},\ldots,i^{\prime}_{d}))=\left\{\begin{array}[]{llllllll}p_{j}&\textrm{if}&i_{j}^{\prime}=i_{j}+1,i_{k}^{\prime}=i_{k},k\neq j,\\[10.0pt] \displaystyle 1-\sum_{k=1}^{d}p_{k}+\sum_{k:i_{k}=N_{k}}p_{k}&\textrm{if}&i_{j}^{\prime}=i_{j},1\leq j\leq d.\end{array}\right.

\mathbf{P}^{*}((i_{1},\ldots,i_{d}),(i^{\prime}_{1},\ldots,i^{\prime}_{d}))=\left\{\begin{array}[]{llllllll}p_{j}&\textrm{if}&i_{j}^{\prime}=i_{j}+1,i_{k}^{\prime}=i_{k},k\neq j,\\[10.0pt] \displaystyle 1-\sum_{k=1}^{d}p_{k}+\sum_{k:i_{k}=N_{k}}p_{k}&\textrm{if}&i_{j}^{\prime}=i_{j},1\leq j\leq d.\end{array}\right.

j = 1 \sum d (1 - \frac{1}{N _{j} ( N _{j} + 1 )}) p_{j} \leq 1.

j = 1 \sum d (1 - \frac{1}{N _{j} ( N _{j} + 1 )}) p_{j} \leq 1.

\left\{\begin{array}[]{lllllllllll}\displaystyle{i_{k}^{(1)}+1\over i_{k}^{(1)}+2}p_{k}&\textrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}+\mathbf{s}_{k},\\[18.0pt] \displaystyle\left({\mathbf{1}(i_{k}^{(1)}<N_{k})\over(i_{k}^{(1)}+1)(i_{k}^{(1)}+2)}+{\mathbf{1}(i_{k}^{(1)}=N_{k})\over N_{k}+1}\right)p_{k}&\textrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}-m\cdot\mathbf{s}_{k}\\ &&\textrm{\ with \ }1\leq m\leq i_{k},\\[12.0pt] \displaystyle 1-\sum_{j:i^{(1)}_{j}<N_{j}}\left(1-{1\over(i^{(1)}_{j}+1)(i^{(1)}_{j}+2)}\right)p_{j}-\sum_{j:i^{(1)}_{j}=N_{j}}{N_{j}\over N_{j}+1}p_{j}&\textrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}\end{array}\right.

\left\{\begin{array}[]{lllllllllll}\displaystyle{i_{k}^{(1)}+1\over i_{k}^{(1)}+2}p_{k}&\textrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}+\mathbf{s}_{k},\\[18.0pt] \displaystyle\left({\mathbf{1}(i_{k}^{(1)}<N_{k})\over(i_{k}^{(1)}+1)(i_{k}^{(1)}+2)}+{\mathbf{1}(i_{k}^{(1)}=N_{k})\over N_{k}+1}\right)p_{k}&\textrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}-m\cdot\mathbf{s}_{k}\\ &&\textrm{\ with \ }1\leq m\leq i_{k},\\[12.0pt] \displaystyle 1-\sum_{j:i^{(1)}_{j}<N_{j}}\left(1-{1\over(i^{(1)}_{j}+1)(i^{(1)}_{j}+2)}\right)p_{j}-\sum_{j:i^{(1)}_{j}=N_{j}}{N_{j}\over N_{j}+1}p_{j}&\textrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}\end{array}\right.

\mathbf{P}(\mathbf{i}^{(1)},\mathbf{i}^{(2)})=\left\{\begin{array}[]{lllllllllll}\displaystyle a_{k}p_{k}&\rm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}+\mathbf{s}_{k},\\[12.0pt] \displaystyle\displaystyle 1-\sum_{j:i^{(1)}_{j}=0}a_{j}p_{j}-\sum_{j:i^{(1)}_{j}=1}(1-a_{j})p_{j}&\rm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)},\\[12.0pt] \displaystyle(1-a_{k})p_{k}&\rm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}-\mathbf{s}_{k},\\ \end{array}\right.

\mathbf{P}(\mathbf{i}^{(1)},\mathbf{i}^{(2)})=\left\{\begin{array}[]{lllllllllll}\displaystyle a_{k}p_{k}&\rm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}+\mathbf{s}_{k},\\[12.0pt] \displaystyle\displaystyle 1-\sum_{j:i^{(1)}_{j}=0}a_{j}p_{j}-\sum_{j:i^{(1)}_{j}=1}(1-a_{j})p_{j}&\rm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)},\\[12.0pt] \displaystyle(1-a_{k})p_{k}&\rm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}-\mathbf{s}_{k},\\ \end{array}\right.

\begin{array}[]{lllllll}\displaystyle\pi(\mathbf{i})&=&\displaystyle\prod_{j=1}^{d}[a_{j}\mathbf{1}(i_{j}=1)+(1-a_{j})\mathbf{1}(i_{j}=0)].\\ \end{array}

\begin{array}[]{lllllll}\displaystyle\pi(\mathbf{i})&=&\displaystyle\prod_{j=1}^{d}[a_{j}\mathbf{1}(i_{j}=1)+(1-a_{j})\mathbf{1}(i_{j}=0)].\\ \end{array}

\begin{array}[]{lllllllll}\mathbf{P}_{1}(\mathbf{i}^{(1)},\mathbf{i}^{(2)})&=&\left\{\begin{array}[]{llllll}\displaystyle{1\over 2}p_{k}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}+\mathbf{s}_{k},\\[7.0pt] \displaystyle 1-{1\over 2}\sum_{j=1}^{d}p_{j}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)},\\[7.0pt] \displaystyle{1\over 2}p_{k}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}-\mathbf{s}_{k}.\\ \end{array}\right.\\ &\\[10.0pt] \mathbf{P}_{2}(\mathbf{i}^{(1)},\mathbf{i}^{(2)})&=&\left\{\begin{array}[]{llllll}\displaystyle{b\over{a+b}}p_{k}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}+\mathbf{s}_{k},\\[7.0pt] \displaystyle 1-{b\over a+b}\sum_{j:i^{(1)}_{j}=0}p_{r}-{a\over a+b}\sum_{j:i^{(1)}_{j}=1}p_{r}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)},\\[7.0pt] \displaystyle{a\over{a+b}}p_{k}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}-\mathbf{s}_{k}.\\ \end{array}\right.\\ &\\ \end{array}

\begin{array}[]{lllllllll}\mathbf{P}_{1}(\mathbf{i}^{(1)},\mathbf{i}^{(2)})&=&\left\{\begin{array}[]{llllll}\displaystyle{1\over 2}p_{k}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}+\mathbf{s}_{k},\\[7.0pt] \displaystyle 1-{1\over 2}\sum_{j=1}^{d}p_{j}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)},\\[7.0pt] \displaystyle{1\over 2}p_{k}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}-\mathbf{s}_{k}.\\ \end{array}\right.\\ &\\[10.0pt] \mathbf{P}_{2}(\mathbf{i}^{(1)},\mathbf{i}^{(2)})&=&\left\{\begin{array}[]{llllll}\displaystyle{b\over{a+b}}p_{k}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}+\mathbf{s}_{k},\\[7.0pt] \displaystyle 1-{b\over a+b}\sum_{j:i^{(1)}_{j}=0}p_{r}-{a\over a+b}\sum_{j:i^{(1)}_{j}=1}p_{r}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)},\\[7.0pt] \displaystyle{a\over{a+b}}p_{k}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}-\mathbf{s}_{k}.\\ \end{array}\right.\\ &\\ \end{array}

π_{1} (i) = \frac{1}{2 ^{d}}, π_{2} (i) = \frac{a ^{d - ∣ i ∣} b ^{∣ i ∣}}{( a + b ) ^{d}}

π_{1} (i) = \frac{1}{2 ^{d}}, π_{2} (i) = \frac{a ^{d - ∣ i ∣} b ^{∣ i ∣}}{( a + b ) ^{d}}

\mathbf{P}_{3}(\mathbf{i}^{(1)},\mathbf{i}^{(2)})=\left\{\begin{array}[]{llllll}\displaystyle\alpha_{k}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}+\mathbf{s}_{k},\\[7.0pt] \displaystyle 1-\sum_{j:i^{(1)}_{j}=0}\alpha_{j}-\sum_{j:i^{(1)}_{j}=1}\beta_{j}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)},\\[7.0pt] \displaystyle\beta_{k}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}-\mathbf{s}_{k}.\\ \end{array}\right.

\mathbf{P}_{3}(\mathbf{i}^{(1)},\mathbf{i}^{(2)})=\left\{\begin{array}[]{llllll}\displaystyle\alpha_{k}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}+\mathbf{s}_{k},\\[7.0pt] \displaystyle 1-\sum_{j:i^{(1)}_{j}=0}\alpha_{j}-\sum_{j:i^{(1)}_{j}=1}\beta_{j}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)},\\[7.0pt] \displaystyle\beta_{k}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}-\mathbf{s}_{k}.\\ \end{array}\right.

π_{3} (i) = j : i_{j} = 0 \prod \frac{β _{j}}{α _{j} + β _{j}} j : i_{j} = 1 \prod \frac{α _{j}}{α _{j} + β _{j}} .

π_{3} (i) = j : i_{j} = 0 \prod \frac{β _{j}}{α _{j} + β _{j}} j : i_{j} = 1 \prod \frac{α _{j}}{α _{j} + β _{j}} .

\mathbf{P}^{*}(\mathbf{i}^{(1)},\mathbf{i}^{(2)})=\left\{\begin{array}[]{llllll}\displaystyle\alpha_{k}+\beta_{k}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}+\mathbf{s}_{k},\\[7.0pt] \displaystyle 1-\sum_{j:i^{(1)}_{j}=0}(\alpha_{j}+\beta_{j})&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)},\end{array}\right.

\mathbf{P}^{*}(\mathbf{i}^{(1)},\mathbf{i}^{(2)})=\left\{\begin{array}[]{llllll}\displaystyle\alpha_{k}+\beta_{k}&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)}+\mathbf{s}_{k},\\[7.0pt] \displaystyle 1-\sum_{j:i^{(1)}_{j}=0}(\alpha_{j}+\beta_{j})&\mathrm{if}&\mathbf{i}^{(2)}=\mathbf{i}^{(1)},\end{array}\right.

λ_{A} = 1 - k \in A \sum p_{k}, for A \subseteq {1, \dots, d}

λ_{A} = 1 - k \in A \sum p_{k}, for A \subseteq {1, \dots, d}

\begin{array}[]{llllllllll}sep(\nu_{(d)}\mathbf{P}_{d}^{d\log d+cd},\pi_{d})&=&P(T_{d}^{*}>d\log d+cd)=1-P\left({1\over d}(T_{d}^{*}-d\log d)\leq c\right),\\[10.0pt] sep(\nu_{(d)}\mathbf{P}_{d}^{d\log d-cd},\pi_{d})&=&P(T_{d}^{*}>d\log d-cd)=1-P\left({1\over d}(T_{d}^{*}-d\log d))\leq-c\right).\end{array}

\begin{array}[]{llllllllll}sep(\nu_{(d)}\mathbf{P}_{d}^{d\log d+cd},\pi_{d})&=&P(T_{d}^{*}>d\log d+cd)=1-P\left({1\over d}(T_{d}^{*}-d\log d)\leq c\right),\\[10.0pt] sep(\nu_{(d)}\mathbf{P}_{d}^{d\log d-cd},\pi_{d})&=&P(T_{d}^{*}>d\log d-cd)=1-P\left({1\over d}(T_{d}^{*}-d\log d))\leq-c\right).\end{array}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Antiduality and Möbius monotonicity: Generalized Coupon Collector Problem

Paweł Lorek

Mathematical Institute, University of Wrocław, pl. Grunwaldzki 2/4, 50-384 Wrocław, Poland. Email: [email protected]

Abstract.

For a given absorbing Markov chain $X^{*}$ on a finite state space, a chain $X$ is a sharp antidual of $X^{*}$ if the fastest strong stationary time of $X$ is equal, in distribution, to the absorption time of $X^{*}$ . In this paper we show a systematic way of finding such an antidual based on some partial ordering of the state space. We use a theory of strong stationary duality developed recently for Möbius monotone Markov chains. We give several sharp antidual chains for Markov chain corresponding to a generalized coupon collector problem. As a consequence - utilizing known results on a limiting distribution of the absorption time - we indicate a separation cutoff (with its window size) in several chains. We also present a chain which (under some conditions) has a prescribed stationary distribution and its fastest strong stationary time is distributed as a prescribed mixture of sums of geometric random variables.

Key words and phrases:

Markov chains; Strong stationary duality; Antiduality; Absorption times; fastest strong stationary times; Möbius monotonicity; Generalized coupon collector problem; Double Dixie cup problem; separation cutoff; partial ordering; perfect simulation

1991 Mathematics Subject Classification:

60J10, 60G40, 06A06

Work supported by NCN Research Grant DEC-2013/10/E/ST1/00359

1. Introduction

Strong stationary times (SST) are a probabilistic tool for bounding a rate of convergence to stationarity for Markov chains. Aldous and Diaconis [1], [2] gave several examples of chains where SST were found ad hoc. Later in [8] authors introduced a more systematic way of finding SSTs. For a given general ergodic chain they showed that one can construct a so-called strong stationary dual (SSD) chain, a chain whose absorption time is equal to some SST, not only in distribution, via the coupling of the chain with its SSD which is presented in [8]. Moreover, they proved that there always exists sharp SSD, in the sense that its corresponding SST is stochastically the smallest, in which case it is called the fastest strong stationary time (FSST).

Their construction for general chains is purely theoretical (it involves the knowledge of the distribution of the chain at each step). However, they give a detailed recipe on how to construct such SSD assuming that the time reversed chain is stochastically monotone w.r.t. linear ordering. In particular, they consider birth and death chain, for which SST has the same distribution as absorption time in a dual chain, which turns out to be an absorbing birth and death chain. They also show that assuming that time reversed chain is stochastically monotone one can always construct set-valued SSD (see their Section 3.4 “greedy construction of a set-valued dual”). In this paper we actually start with some absorbing chain and show that it is a sharp SSD of a class (which we indicate) of ergodic chains. We exploit the results from [30], where the authors provided the recipe for constructing SSD on the same state space for chains, whose time reversal is Möbius monotone w.r.t to some partial ordering of the state space. This significantly enlarges the class of chains for which SSD can be found. In many chains there is usually some natural underlying ordering of the state space which is only partial. Moreover, the method yields the sharp SSD which is crucial for our applications.

Studying the rate of convergence of a chain to its stationary distribution, one is often interested in a so-called mixing time (i.e., the time until the chain is “close” to its stationary distribution). However, sometimes we can say much more than just a mixing time by showing that a so-called cutoff phenomenon occurs. Roughly speaking, this phenomenon describes a sharp transition in the convergence of the chain to its stationary distribution over a negligible period of time (cutoff window). There are two most commonly studied phenomena: separation cutoff and total variation cutoff, which differ in a distance used to measure the convergence (separation vs. total variation distance).

The total variation cutoff was first shown for a random transposition card shuffling in [12]. The name comes from [1], where the authors showed that a top-to-random card shuffling exhibits a total variation cutoff. A separation cutoff has recently been studied in few contexts. For example: in [11] authors gave if and only if conditions for the existence of a separation cutoff for birth and death chains (they use duality theory to convert convergence rates to hitting times and Keilson’s representation of first hitting times) – they show that there is a cutoff if and only if the product of a spectral gap and a mixing time tends to infinity; this was somehow extended – in [4] authors show that there is a cutoff measured in $L^{p}$ -norm ( $1<p\leq\infty$ ) if and only if the the spectral gap and max- $L^{p}$ mixing time tends to infinity; computation of cutoff time and window size in a variety of birth and death chains is given in [5]; a separation cutoff for skip-free chains was given in [32]; some other specific chains were considered in [7]; in [17] author gives a formula for the separation for Tsetlin library chain specifying weights for which there is and there is no separation cutoff. Several examples of both, separation and total variation cutoffs are given in [26], some characterization of total variation cutoff for lazy (i.e., with probability of staying $\geq 1/2$ ) chains was recently given in [3]. In [6] authors give sufficient condition for skip-free chains to have real eigenvalues, they use Siegmund duality – actually antiduality – a type of transitions of their (anti)dual resembles some chains we obtain for a coupon collector problem. It is worth mentioning that although a sequence of birth and death chains exhibits total variation cutoff if and only if it exhibits separation cutoff [11], [13], it is not the case (in general) for other chains, as shown in [21].

As mentioned before, FSST is equal in distribution to the absorption time of the sharp SSD chain. Thus, there is a close relation between a sharp SSD and a separation cutoff. Roughly speaking, this cutoff can be studied by studying the limiting distribution of the absorption time of the SSD. This can be extremely difficult task. However, since examples of chains with proven separation cutoff are always welcome, we can reverse the procedure: starting with some already absorbing chain we can try to find an ergodic sharp antidual chain (or even a class of such antidual chains). Such an approach was considered in [19] in a context of birth and death chains only. A connection between a separation cutoff and a coupon collector problem (including some generalizations, e.g., sampling $k>1$ different coupons at a time) was given in [36].

Using this approach we will indicate a separation cutoff time and a window size in several examples of chains utilizing (nontrivial) results for the limiting distribution of the absorption time in some generalizations of the classical coupon collector problem. That is why we need a recipe for sharp antidual chains, what will be given based on results from [30]. Most of the examples that follow deal with some product-type chains. It is however worth noting that taking a product of chains where each chain exhibits a cutoff does not have to yield a chain (on a product space) exhibiting a cutoff. Such an example was recently given in [25].

The absorption time of many absorbing chains is distributed as a mixture of sums of geometric random variables with parameters being the eigenvalues of the transition matrix. E.g, the absorption time of discrete time birth and death chain starting at the minimal state with the maximal one being absorbing is distributed as a sum of geometric random variables with such parameters, provided the chain is stochastically monotone. The result is usually attributed to Karlin and McGregor [23] or Keilson [24]. Fill [19] gave a stochastic proof of this result using also the theory of SSD (the result was simultaneously obtained in [9]), later it was extended to skip-free Markov chains in Fill [18]. Miclo [33] showed that for large class of absorbing chains on a finite state space, the absorption time is distributed as a mixture of sums of geometric random variables. A natural question arises: Given a mixture of sums of geometric random variables and some distribution $\pi$ can we find an ergodic chain whose stationary distribution is $\pi$ and whose FSST is equal in distribution to this mixture? Or, a special case of the question, Given some distribution $\pi$ can be construct an ergodic chain whose stationary distribution is $\pi$ having deterministic FSST? We provide positive answers to both questions (some assumptions on distributions are needed). In particular, we present two ergodic chains on completely different state spaces having the same FSST.

The main goals of the paper are: i) we give a systematic way (based on partial ordering of the state space and Möbius monotonicity) for finding a class of sharp antidual chains; ii) we present nontrivial antidual chains related to some generalizations of coupon collector problem and, as a consequence, we show cutoff phenomena in some cases; iii) we present a construction of a chain with prescribed FSST and prescribed stationary distribution.

There is yet another potential application which served as a motivation for the paper (however, not exploited here): Given a probability distribution $\pi$ on $\mathbb{E}$ , how to simulate a sample from this distribution? Markov Chain Monte Carlo methods come with the answer: construct a chain with stationary distribution $\pi$ and run it long enough. The most common algorithms for such constructions are Metropolis-Hastings algorithm and Gibbs sampler (for studies on rate of convergence for Metropolis-Hasting algorithms see, e.g., [10], the cutoff for Gibbs sampler for Ising model on the lattice was studied on [31]). This paper suggests an alternative approach: given $\pi$ on $\mathbb{E}$ find some absorbing chain on $\mathbb{E}$ and then calculate sharp antidual chain having this $\pi$ as stationary distribution. Knowing, e.g., expectation and variance of absorption time, one can quite precisely determine the number of steps needed for simulation. Moreover, having a sharp SSD actually can allow for a perfect simulation from distribution $\pi$ . One can construct an appropriate coupling of the absorbing chain and its antidual, so that stopping antidual chain when its SSD is absorbed yields an unbiased sample from $\pi$ . The reader is referred for details to [8] (Section 2.4), [20] (Section 1.1) or [29] (Section 2.3, Algorithm 4). We want to emphasize that utilizing this was not the purpose of this paper, and the stationary distributions which appear in most of the examples are of product form, which means we can easily simulate them coordinate by coordinate.

The paper is organized as follows. In Section 2 we introduce preliminaries on strong stationary duality and separation cutoff. In Section 3 we recall a notion of Möbius monotonicity and give a matrix-form proof of the result from [30]. In Section 4 we present our main results. Firstly, in Section 4.1 in Theorem 4.1 we give a systematic way for finding a class of sharp antidual chains. Secondly, in Section 4.2 we introduce in details the chain corresponding to the generalized coupon collector problem and present sharp antidual chains in Theorems 4.2 and 4.2. Then, in Section 4.3, we proceed with presenting separation cutoff results for some cases. In Section 4.4 we present our results concerning construction of ergodic chain with prescribed stationary distribution and with prescribed FSST. Section 5 includes main proofs. Section 5.1 contains proofs of Theorems 4.2 and 4.2, whereas Section 5.2 contains the proof of Theorem 4.4.

2. Preliminaries

2.1. Strong stationary duality

Consider an ergodic (i.e., irreducible and aperiodic) Markov chain $X\sim(\nu,\mathbf{P})$ on a finite state space $\mathbb{E}=\{\mathbf{e}_{1},\ldots,\mathbf{e}_{M}\}$ with an initial distribution $\nu$ , a stationary distribution $\pi$ and a transition matrix $\mathbf{P}$ . Let $\mathbb{E}^{*}=\{\mathbf{e}_{1}^{*},\ldots,\mathbf{e}_{N}^{*}\}$ be a state space of an absorbing Markov chain $X^{*}\sim(\nu^{*},\mathbf{P}^{*})$ , whose unique absorbing state and unique irreducible class is denoted by $\mathbf{e}_{N}^{*}$ . Define $\Lambda$ , a matrix of size $N\times M$ , to be a link if it is a stochastic matrix with the property: $\Lambda(\mathbf{e}_{N}^{*},\mathbf{e})=\pi(\mathbf{e})$ for all $\mathbf{e}\in\mathbb{E}$ . We say that $X^{*}$ is a strong stationary dual (SSD) of $X$ with link $\Lambda$ if

[TABLE]

Diaconis and Fill [8] prove that then the absorption time $T^{*}$ of $X^{*}$ is a so-called strong stationary time (SST) for $X$ . This is such a random variable $T$ that $X_{T}$ has distribution $\pi$ and $T$ is independent from $X_{T}$ . The main application is in studying the rate of convergence of an ergodic chain to its stationary distribution, since for such a random variable we always have: $d_{TV}(\nu\mathbf{P}^{k},\pi)\leq sep(\nu\mathbf{P}^{k},\pi):=\max_{\mathbf{e}\in\mathbb{E}}\left(1-{\nu\mathbf{P}^{k}(\mathbf{e})/\pi(\mathbf{e})}\right)\leq P(T>k)$ , where $d_{TV}$ stands for total variation distance, and $sep$ stands for separation. Note that $sep$ is not symmetric and thus is not a distance between probability measures. The corresponding $T^{*}$ is sharp if $sep(\nu\mathbf{P}^{k},\pi)=P(T^{*}>k)$ . In such a case, $T^{*}$ is called the fastest strong stationary time for $X$ , which we denote by $T_{FSST}$ . For more details on this duality consult [8]. Moreover, duality relation (1) allows for stochastic constructions, see, e.g., [19], where stochastic proof for passage time distribution for birth and death chain was given.

Note that once we fix $\mathbb{E}^{*}$ and a link $\Lambda$ , and if there exists a right-inverse of $\Lambda$ , i.e., $\Lambda^{-1}$ we can simply calculate from (1):

[TABLE]

If the resulting $\mathbf{P}^{*}$ is a stochastic, irreducible and aperiodic matrix and $\nu^{*}$ is a probability distribution, then (it will always correspond to an absorbing chain) we have found an SSD. However, we can start with some already absorbing chain $\mathbf{P}^{*}$ , then find some $\mathbb{E}$ and some probability distribution $\pi$ on $\mathbb{E}$ , and a link $\Lambda$ , so that

[TABLE]

If the resulting $\mathbf{P}$ is a stochastic matrix, then $X\sim(\nu,\mathbf{P})$ is an ergodic chain with stationary distribution $\pi$ , and $T^{*}$ (time to absorption of $X^{*}$ ) is an SST for $X$ . In such a case, $X$ is called antidual of $X^{*}$ . Moreover, if we somehow know, that for some class of links relation (1) implies that $T^{*}$ is sharp (see Corollary 3), then we can possibly find many different antiduals, which all have the same fastest strong stationary time $T^{*}$ , which has a phase-type distribution. In such a case $X$ is called a sharp antidual of $X^{*}$ .

2.2. Separation cutoff

The forthcoming Theorem 4.1 indeed gives a recipe on how to construct a sharp antidual chain $X$ with a specified stationary distribution $\pi$ given absorbing chain $X^{*}$ , both on the same state space. It means, that we have

[TABLE]

Thus, studying the distribution of $T_{FSST}$ is equivalent to study the distribution of $T^{*}$ . Furthermore, a separation cutoff can be studied by studying the properties of $T^{*}$ . In what follows, we introduce the notion of separation cutoff. Since the definition of the cutoff involves increasing state space, we add a subscript ( $d$ ) to transition matrices, distributions, state space and absorption time. Suppose we have a sequence of ergodic Markov chains $X_{(d)}\sim(\nu_{(d)},\mathbf{P}_{(d)})$ indexed by $d=1,2,\ldots$ Denote by $\pi_{(d)}$ the stationary distribution of $X_{(d)}$ . We say that this sequence exhibits a **separation cutoff at time ** $t_{d}$ with a window size $w_{d}=o(t_{d})$ if

[TABLE]

If the convergence to stationarity is measured in a total variation distance, we say about a total variation cutoff.

3. Möbius monotonicity and duality

In general, there is no recipe on how to find an SSD, i.e., a triplet $\mathbb{E}^{*},\mathbf{P}^{*},\Lambda$ . In [8] authors give a recipe for a dual on the same state space $\mathbb{E}^{*}=\mathbb{E}$ provided that a time reversed chain $\overleftarrow{X}$ is stochastically monotone with respect to total ordering. In [30] we give an extension of this result to state spaces which are only partially ordered by $\preceq$ . Then, provided that the time reversed chain $\overleftarrow{X}$ is Möbius monotone (plus some conditions on the initial distribution), we give a formula for a sharp SSD on the same state space $\mathbb{E}^{*}=\mathbb{E}$ .

The Möbius monotonicity seems to be a natural one for extension of main result from [8] to partially ordered state spaces. In [28] we show that it is equivalent to the the existence of a Siegmund dual of a chain with given partial ordering. For a linearly ordered state space, stochastic monotonicity of a chain is required for the existence of a Siegmund dual (see [38]), and stochastic monotonicity of a time reversal is required for the existence of an SSD with a link being a truncated stationary distribution (see [8]). Both results fail for non-linear orderings, since both require Möbius monotonicity, which, in general, is different than the stochastic one. The monotonicities are equivalent for linear ordering. For more relations between these (and not only) monotonicities consult [29], and for applications of a Siegmund duality to some generalizations of a gambler’s ruin problem consult [27]. We will introduce this monotonicity by trying to solve (1) with some given link $\Lambda$ .

We consider a finite state space $\mathbb{E}=\{\mathbf{e}_{1},\ldots,\mathbf{e}_{M}\}$ partially ordered by $\preceq$ such that $\mathbf{e}_{M}$ is the unique maximal state. For a function $f:\mathbb{E}\to\mathbf{R}$ , by lower-case bold symbol $\boldsymbol{f}$ we denote the row vector $\boldsymbol{f}=(f(\mathbf{e}_{1}),\ldots,f(\mathbf{e}_{M}))$ .

The idea is to find an SSD $X^{*}$ with a transition matrix $\mathbf{P}^{*}$ on the same state space $\mathbb{E}^{*}=\mathbb{E}$ with a link, whose row corresponding to $\mathbf{e}$ is a stationary distribution of $X$ truncated to $\{\mathbf{e}\}^{\downarrow}:=\{\mathbf{e}^{\prime}:\mathbf{e}^{\prime}\preceq\mathbf{e}\}$ , i.e.,

[TABLE]

Note that for all $\mathbf{e}\in\mathbb{E}$ we have $\Lambda(\mathbf{e}_{M},\mathbf{e})=\pi(\mathbf{e})$ , as required. For a given ordering let $\mathbf{C}(\mathbf{e}_{i},\mathbf{e}_{j})=\mathbf{1}(\mathbf{e}_{i}\preceq\mathbf{e}_{j})$ . For the partial ordering we require only that the state which is absorbing for $X^{*}$ , denoted throughout the paper by $\mathbf{e}_{M}$ , is the unique maximal one (i.e., $\mathbf{C}(\mathbf{e}_{M},\mathbf{e}_{j})=\mathbf{1}(\mathbf{e}_{j}=\mathbf{e}_{M})$ for all $j$ and there is no $\mathbf{e}_{M_{2}}\neq\mathbf{e}_{M}$ such that $\mathbf{C}(\mathbf{e}_{M_{2}},\mathbf{e}_{j})=\mathbf{1}(\mathbf{e}_{j}=\mathbf{e}_{M_{2}})$ for all $j$ ). We always identify ordering $\preceq$ with the matrix $\mathbf{C}$ , keeping in mind, that enumeration of states in $\mathbf{C}$ and $\mathbf{P}$ must be the same. Then the link can be written in a matrix form:

[TABLE]

where $\mathbf{diag}(\boldsymbol{g})$ is a diagonal matrix with entries $g(\mathbf{e}_{1}),\ldots,g(\mathbf{e}_{M})$ . The states can always be rearranged in such a way that $\mathbf{C}(\mathbf{e}_{i},\mathbf{e}_{j})=1$ implies $i\leq j$ , what means that $\mathbf{C}$ , and thus $\Lambda$ , are invertible. Often, $\mu\equiv\mathbf{C}^{-1}$ is called the Möbius function or the Möbius matrix of the partial order $\preceq$ . Solving (1) for $\mathbf{P}^{*}$ yields (recall that the transitions of time reversed chains are given by $\overleftarrow{\mathbf{P}}=(\mathbf{diag}(\boldsymbol{\pi}))^{-1}\mathbf{P}^{T}(\mathbf{diag}(\boldsymbol{\pi}))$ )

[TABLE]

which is a stochastic matrix if and only if each entry of $\mathbf{C}^{-1}\overleftarrow{\mathbf{P}}\mathbf{C}$ is non-negative, in other words we say that $\overleftarrow{\mathbf{P}}$ is Möbius monotone. This way we proved the main part of Theorem 2 of [30]. We include it here, since this is a little bit different (matrix-form) proof. We will restate the theorem for completeness, introducing formal definitions of monotonicities first. For given partial ordering $\preceq$ and any matrix $\mathbf{P}$ (not necessarily stochastic) we define $\mathbf{P}(\mathbf{e},\{\mathbf{e}_{j}\}^{\downarrow})=\sum_{\mathbf{e}^{\prime}:\mathbf{e}^{\prime}\preceq\mathbf{e}_{j}}\mathbf{P}(\mathbf{e},\mathbf{e}^{\prime})$ and similarly $\mathbf{P}(\mathbf{e},\{\mathbf{e}_{j}\}^{\uparrow})=\sum_{\mathbf{e}^{\prime}:\mathbf{e}^{\prime}\succeq\mathbf{e}_{j}}\mathbf{P}(\mathbf{e},\mathbf{e}^{\prime})$ . {dfntn} Markov chain $X$ is Möbius monotone if $\displaystyle\mathbf{C}^{-1}\mathbf{P}\mathbf{C}\geq 0$ (each entry non-negative). In terms of transition probabilities, it means that

[TABLE]

Recall that for a Möbius function we always have $\mu(\mathbf{e}_{i},\mathbf{e})=0$ whenever $\mathbf{e}_{i}\npreceq\mathbf{e}$ . {dfntn} A function $f:\mathbb{E}\to\mathbf{R}$ is Möbius monotone if $\mathbf{f}(\mathbf{C}^{T})^{-1}\geq 0$ (each entry non-negative). It means that

[TABLE]

{rmrk}

In Lorek, Szekli [30] this Möbius monotonicity (of both, function and chain) was called ↓-Möbius monotonicity (see Definitions 2.1 and 2.2 therein).

{dfntn}

$X$ is ↑-Möbius monotone if $\displaystyle(\mathbf{C}^{T})^{-1}\mathbf{P}\mathbf{C}^{T}\geq 0$ (each entry non-negative).

{thrm}

[Theorem 2 of [30]] Let $X\sim(\nu,\mathbf{P})$ be an ergodic Markov chain on a finite state space $\mathbb{E}=\{\mathbf{e}_{1},\ldots,\mathbf{e}_{M}\}$ , partially ordered by $\preceq$ , with a unique maximal state $\mathbf{e}_{M}$ , and with a stationary distribution $\pi$ . Assume that

(i)

$g(\mathbf{e})={\nu(\mathbf{e})\over\pi(\mathbf{e})}$ is Möbius monotone,

(ii)

time reversed chain $\overleftarrow{X}$ is Möbius monotone.

Then there exists a strong stationary dual chain $X^{*}\sim(\nu^{*},\mathbf{P}^{*})$ on $\mathbb{E}^{*}=\mathbb{E}$ with the following link

[TABLE]

Let $H(\mathbf{e})=\sum_{\mathbf{e}^{\prime}\preceq\mathbf{e}}\pi(\mathbf{e}^{\prime})$ . The SSD chain is uniquely determined by

[TABLE]

The following Corollary will play a crucial role: {crllr} The SSD constructed in Theorem 3 is sharp.

Proof.

The link given in (5) is lower-triangular, thus, by Remark 2.39 in [8], the resulting SSD is sharp. ∎

4. Main results

4.1. General procedure for sharp anti-dual chains

The main contribution is a systematic way of finding a sharp antidual (on the same state space $\mathbb{E}=\mathbb{E}^{*}$ ) chain of some given already absorbing chain $X^{*}\sim(\nu^{*},\mathbf{P}^{*})$ with the unique absorbing state $\mathbf{e}_{M}$ . The idea is clear from the previous section: introduce some partial ordering and some distribution $\pi$ on $\mathbb{E}$ . Then solve $\Lambda\mathbf{P}=\mathbf{P}^{*}\Lambda$ for $\mathbf{P}$ with the link given in (5). If the resulting matrix is non-negative, it will be a stochastic matrix of an ergodic Markov chain $X$ with the stationary distribution $\pi$ . Moreover, changing $\pi$ and/or ordering usually will yield a different sharp antidual. It means we can have a class of chains, all having the same fastest strong stationary time $T_{FFST}$ .

Fix some partial ordering $\preceq$ on $\mathbb{E}^{*}$ (expressed by $\mathbf{C}$ ) having the unique maximal state $\mathbf{e}_{M}$ and some distribution $\pi$ on $\mathbb{E}$ . For given $\mathbf{P}^{*}$ define

[TABLE]

With slight abuse of notation we will assume that $\widehat{\mathbf{P}^{*}}$ is ↑-Möbius monotone meaning that $(\mathbf{C}^{T})^{-1}\widehat{\mathbf{P}^{*}}\mathbf{C}^{T}\geq 0$ . Definition 3 was stated for a Markov chain $X$ with a transition matrix $\mathbf{P}$ , note however that $\widehat{\mathbf{P}^{*}}$ does not have to be a stochastic matrix.

{thrm}

Let $\mathbf{X}^{*}\sim(\nu^{*},\mathbf{P}^{*})$ be an absorbing Markov chain on $\mathbb{E}^{*}=\{\mathbf{e}_{1},\ldots,\mathbf{e}_{M}\}$ with the unique absorbing state $\mathbf{e}_{M}$ . Let $\mathcal{C}$ be the class of all partial orderings on $\mathbb{E}^{*}$ with $\mathbf{e}_{M}$ being unique maximal state. Consider the class of pairs of distributions and partial orderings such that $\widehat{\mathbf{P}^{*}}$ is ↑-Möbius monotone:

[TABLE]

Then for any $(\pi,\mathbf{C})\in\mathcal{P}(\mathbf{P}^{*})$ the chain $X\sim(\nu,\mathbf{P})$ with the link $\Lambda$ defined in (5) and with

[TABLE]

is a sharp antidual for $\mathbf{P}^{*}$ , i.e., $\mathbf{P}^{*}$ is a sharp SSD for $\mathbf{P}$ . Equivalently, $\mathbf{P}=\Lambda^{-1}\mathbf{P}^{*}\Lambda$ , where, for given $\pi$ and $\mathbf{C}$ , the link is defined in (5).

Proof.

Since $\nu^{*}$ is a distribution on $\mathbb{E}$ and $\Lambda$ is a stochastic matrix, $\nu$ is a distribution on $\mathbb{E}$ . By assumption that $\widehat{\mathbf{P}^{*}}$ is ↑-Möbius monotone, the matrix $\mathbf{P}$ is non-negative. We will show that $\pi$ is its stationary distribution. Let $\boldsymbol{\eta}=(0,\ldots,0,1)$ . Last row of $\Lambda$ is equal to $\boldsymbol{\pi}$ what can be expressed as $\boldsymbol{\eta}\Lambda=\boldsymbol{\pi}$ , thus $\boldsymbol{\eta}=\boldsymbol{\pi}\Lambda^{-1}$ . We have

[TABLE]

Now we will show that the rows of $\mathbf{P}$ sum up to 1, i.e., that $\mathbf{P}(1,\ldots,1)^{T}=(1,\ldots,1)^{T}$ . We have

[TABLE]

To show $(*)$ we need to show that $\sum_{\mathbf{e}^{\prime}\in\mathbb{E}^{*}}((\mathbf{diag}\boldsymbol{\pi})^{-1}(\mathbf{C}^{T})^{-1}\mathbf{diag}(\boldsymbol{\pi}\mathbf{C}))(\mathbf{e},\mathbf{e}^{\prime})=1$ for any $\mathbf{e}\in\mathbb{E}^{*}$ . For diagonal matrices $\mathbf{D}_{1}$ , $\mathbf{D}_{2}$ and a square matrix $\mathbf{A}$ (all of the same sizes) we have $\mathbf{D}_{1}\mathbf{A}\mathbf{D}_{2}(\mathbf{e},\mathbf{e}^{\prime})=$ $\mathbf{D}_{1}(\mathbf{e},\mathbf{e})\mathbf{A}(\mathbf{e},\mathbf{e}^{\prime})\mathbf{D}_{2}(\mathbf{e}^{\prime},\mathbf{e}^{\prime})$ , thus

[TABLE]

Thus, $\mathbf{P}$ is a stochastic matrix and thus $X\sim(\nu,\mathbf{P})$ is a Markov chain with the stationary distribution $\pi$ . Since (1) holds, $X^{*}$ is an SSD for $X$ . Theorem 3 and Corollary 3 imply that $X^{*}$ is a sharp SSD of $X$ . ∎

{rmrk}

If, in addition, within ordering $\preceq$ we have a unique minimal state, say $\mathbf{e}_{1}$ , and $X^{*}$ starts from this state (i.e., $\nu^{*}=\delta_{\mathbf{e}_{1}}$ ), then the antidual chain also starts from this state, i.e. $\nu=\delta_{\mathbf{e}_{1}}$ . This is the case in all examples that follow.

{rmrk} The condition that $\widehat{\mathbf{P}^{*}}$ is Möbius monotone (w.r.t. $\pi$ and $\mathbf{C}$ ) is equivalent to non-negativity of the resulting matrix $\mathbf{P}$ . In examples, it is often more convenient to calculate $\Lambda$ and $\Lambda^{-1}$ directly.

4.2. Antidual chains for a generalized coupon collector problem

Consider $d$ different types of coupons. These are sampled independently with replacement. Sampled types are recorded. For $1\leq k\leq d$ let $p_{k}>0$ be the probability that the coupon of type $k$ is sampled, with $\sum_{k=1}^{d}p_{k}\leq 1$ . With the remaining probability, i.e., with probability $1-\sum_{k=1}^{d}p_{k}$ , no coupon is sampled. We start with no coupons of any type. Let $T^{*}$ be the number of steps it takes to collect $N_{j}$ coupons of type $j$ , $j=1,\ldots,d$ for some fixed integers $N_{1},\ldots,N_{d}$ . Let $(i_{1},\ldots,i_{d})$ denote that coupon of type $j$ was sampled $i_{j}$ times. If $i_{j}=N_{j}$ and coupon of type $j$ is sampled, the chain does not move. The distribution of $T^{*}$ is the time to absorption in the state $(N_{1},\ldots,N_{d})$ of the chain $X^{*}\sim(\nu^{*},\mathbf{P}^{*})$ on the state space $\mathbb{E}^{*}=\{(i_{1},\ldots,i_{d}):0\leq i_{j}\leq N_{j},1\leq j\leq d\}$ with initial distribution $\nu^{*}=\delta_{(0,\ldots,0)}$ and the following transition matrix:

[TABLE]

We refer to $\mathbf{P}^{*}$ as to a generalized coupon collector chain. The case $N_{j}=1,j=1,\ldots,d$ and $p_{k}=1/d$ is the classic coupon collector problem, which has a long history, see for example [16]. The term generalized is not unique. It is used when sequence $\{p_{k}\}$ is general but $N_{1}=\ldots=N_{d}=1$ (e.g., [34]) or when $p_{k}=1/d$ but we are to collect more coupons of each type (see, e.g., [35], [14]). Although the chain $\mathbf{P}^{*}$ given in (7) includes both mentioned generalizations, we consider two antidual chains for two different cases separately:

a)

for general $N_{j}\geq 1$ and $p_{j},j=1,\ldots,d$ with the uniform stationary distribution of antidual chain;

b)

for general $p_{j}$ but $N_{j}=1,j=1,\ldots,d$ with more general stationary distribution of antidual chain (including uniform one as special case).

The proofs are postponed to Section 5.1.

For convenience denote $\mathbf{i}=(i_{1},\ldots,i_{d})$ and $\mathbf{i}^{(k)}=(i_{1}^{(k)},\ldots,i_{d}^{(k)})$ . Define $\mathbf{s}_{k}:=(0,\ldots,1,\ldots,0)$ (with 1 on the position $k$ ).

Case: general $N_{j}\geq 1$ and $p_{j},j=1,\ldots,d$ and a uniform stationary distribution of antidual chain

{thrm}

Let $X^{*}\sim(\nu^{*},\mathbf{P}^{*})$ be a generalized coupon collector chain with the transition matrix given in (7) with fixed integers $N_{j}\geq 1,j=1,\ldots,d$ . Moreover, assume that

[TABLE]

Then the chain $X\sim(\nu,\mathbf{P})$ with $\nu=\delta_{(0,\ldots,0)}$ and with transition matrix

$\mathbf{P}(\mathbf{i}^{(1)},\mathbf{i}^{(2)})=$

[TABLE]

is an ergodic Markov chain with uniform distribution on $\mathbb{E}=\mathbb{E}^{*}$ which is a sharp antidual for $\mathbf{P}^{*}$ .

{rmrk} Note that for example for $N_{1}=\ldots=N_{j}=1$ , the condition (8) if always fulfilled.

Roughly speaking, the antidual has the following transitions. Being in state $(i_{1},\ldots,i_{d})$ it can increase each coordinate by one (if feasible), it can stay in this state or it can change one of the coordinates to anything smaller. Changing some coordinate depends only on the value of this coordinate, and decreasing coordinate, say from $i_{j}$ to $i_{j}-m$ is constant for all $1\leq m<i_{j}$ (the probability depends only on $i_{j}$ and the formula itself is different on the border, i.e., when $i_{j}=N_{j}$ ).

Case: general $p_{j}$ and $N_{j}=1,j=1,\ldots,d$ and a non-uniform distribution of antidual chain.

{thrm}

Let $X^{*}\sim(\nu^{*},\mathbf{P}^{*})$ be a generalized coupon collector chain with the transition matrix given in (7). Assume that $N_{1}=\ldots=N_{d}=1$ . Let $a_{k}\in(0,1)$ for $k=1,\ldots,d$ . Then the chain $X\sim(\nu,\mathbf{P})$ on the same state space $\mathbb{E}=\mathbb{E}^{*}=\{0,1\}^{d}$ with the initial distribution $\nu=\nu^{*}=\delta_{(0,\ldots,0)}$ and the transition matrix

[TABLE]

is an ergodic Markov chain which is a sharp antidual for $\mathbf{P}^{*}$ . The stationary distribution is the following:

[TABLE]

{rmrk}

The proof of Theorem 4.2 implies that the antidual chain $X\sim(\nu,\mathbf{P})$ has transitions consistent with partial ordering, i.e., at each step it can stay or it can either change one coordinate from 0 to 1 or vice-versa. This is not the case for any distribution $\pi$ . It can happen, that for some $\pi$ two coordinates change at a time or antidual does not exist (since some entries of $\mathbf{P}$ can be negative). This is further commented after proof in Remark 5.1.

Taking the following concrete sequences of $a_{k}$ : $a_{k}={b\over a+b}$ or $a_{k}={1\over 2},j=1,\ldots,d$ we obtain the following special cases: {crllr} The chains $X^{(i)}\sim(\nu,\mathbf{P}_{i}),i=1,2$ with a common initial distribution $\nu=\delta_{(0,\ldots,0)}$ and transition matrices

[TABLE]

and with the respective stationary distributions

[TABLE]

(where $|\mathbf{i}|=\sum_{j=1}^{d}i_{j}$ , called a level of $\mathbf{i}$ ) are sharp antidual chains for $\mathbf{P}^{*}$ given in (7).

{rmrk}

In [30] we considered the chain on $\mathbb{E}=\{0,1\}^{d}$ with transition matrix $\mathbf{P}_{3}$ given by

[TABLE]

The chain is reversible with product form stationary distribution:

[TABLE]

We showed that the chain is Möbius monotone if and only if $\sum_{j=1}^{d}(\alpha_{j}+\beta_{j})\leq 1$ . As partial ordering, coordinate-wise was used. Then we obtained the following dual chain:

[TABLE]

what is our absorbing dual (7) we started with, with $p_{j}=\alpha_{j}+\beta_{j}$ and $N_{j}=1,j=1,\ldots,d$ . Note that $\mathbf{P}_{3}$ is a special case of $\mathbf{P}$ given in (10) with $a_{j}={\alpha_{j}\over\alpha_{j}+\beta_{j}}$ .

{crllr} The matrices $\mathbf{P}$ given in (9) and in (10) have eigenvalues of the form:

[TABLE]

(the multiplicity of which depends on the case).

Proof.

We can order the states of $X^{*}$ in such a way that $\mathbf{P}^{*}$ given in (7) is upper triangular, thus the eigenvalues are the entries on the diagonal. If the link $\Lambda$ is invertible (which is the case), then the transition matrices $\mathbf{P}$ and $\mathbf{P}^{*}$ of SSD have the same set of eigenvalues, what is a direct consequence of relation (1). ∎

{rmrk}

Fix $d$ and $N_{j}=N,j=1,\ldots,d$ . One can ask the following question: For what sequence $\{p_{k}\}$ is the associated $T_{FSST}$ stochastically the smallest? Conjecture 2 in [14] suggests that this is in the case of equal probabilities $p_{k}=1/d$ .

4.3. Results on the separation cutoff

Since obtained antidual chains are sharp (i.e., (2) holds), we can present a series of results on the separation cutoff utilizing existing results on the limiting distribution of $T^{*}$ .

We start with the simplest chain corresponding to the classical coupon collector problem. {crllr} Consider a sequence of Markov chains $X_{(d)}$ indexed by $d=1,2,\ldots$ on $\mathbb{E}_{(d)}=\{0,1\}^{d}$ with an initial distribution $\nu_{(d)}=\delta_{(0,\ldots,0)}$ and the transition matrix $\mathbf{P}_{(d)}$ given in (10) with $p_{k}={1\over d}$ and any $a_{k}\in(0,1)$ for $k=1,\ldots,d$ . The stationary distribution $\pi_{(d)}$ is given in (11). The sequence exhibits a separation cutoff at time $d\log d$ with window size $d$ .

Proof.

Denote the FSST of the chain by $T^{*}_{d}$ . It is known that $ET^{*}_{d}=d\sum_{i=1}^{d}{1\over i}\approx d\log d$ . Moreover, ${1\over d}(T^{*}_{d}-d\log d)$ converges in distribution (as $d\to\infty$ ) to a standard Gumbel random variable $Z$ (with c.d.f $P(Z\leq c)=e^{-e^{-c}}$ ), see [22].

Taking $t_{d}=d\log d$ and $w_{d}=d$ we have

[TABLE]

Taking the limits as $d\to\infty$ we have

[TABLE]

Taking the limit as $c\to\infty$ finishes the proof. ∎

Results on the limiting distribution of $T^{*}_{d}$ from [34] let us indicate separation cutoffs for cases with non-constant probabilities $p_{k}$ . For example we can have the following corollary. {crllr} Consider piecewise constant probability density function on $[0,1]$ :

[TABLE]

where $\lambda_{1},\ldots,\lambda_{k}>0$ and $0=n_{0}<n_{1}<\cdots n_{k}=1$ . Without loss of generality assume that $\lambda_{1}<\lambda_{2}<\ldots<\lambda_{k}$ . Consider a sequence of Markov chains $X_{(d)}$ indexed by $d=1,2,\ldots$ on $\mathbb{E}_{(d)}=\{0,1\}^{d}$ with an initial distribution $\nu_{(d)}=\delta_{(0,\ldots,0)}$ and the transition matrix $\mathbf{P}_{(d)}$ given in (10) with

[TABLE]

and any $a_{k}\in(0,1)$ for $k=1,\ldots,d$ . The stationary distribution $\pi_{(d)}$ is given in (11). The sequence exhibits a separation cutoff at time $t_{d}={d\over\lambda_{1}}(\log d-\log(n_{1}))$ with window size $w_{d}={d\over\lambda_{1}}$ .

Proof.

Denote the FSST of the chain by $T^{*}_{d}$ (which is equal, in distribution, to collecting $d$ coupons). We have

[TABLE]

Lemma 3.1 in [34] implies that ${1\over d}(T^{*}_{d}-{1\over\lambda_{1}}d\log d)$ converges in distribution to a random variable $Z$ with c.d.f $P(Z\leq c)=e^{-{n_{1}e^{-\lambda_{1}c}}}$ . Thus, we have

[TABLE]

Similarly

[TABLE]

and

[TABLE]

Taking limits as $c\to\infty$ finishes the proof. ∎

Next corollaries utilize results on time until some set of coupons is collected.

{crllr}

Consider a sequence of Markov chains $X_{(d)}$ indexed by $d=1,2,\ldots$ on $\mathbb{E}_{(d)}=\{0,1,\ldots,N\}^{d}$ with an initial distribution $\nu_{(d)}=\delta_{(0,\ldots,0)}$ and the transition matrix $\mathbf{P}_{(d)}$ given in (9) with $p_{k}={1\over d}$ and $N_{1}=\ldots=N_{d}=N\geq 2$ (so that (8) holds). The stationary distribution $\pi_{(d)}$ is uniform. The sequence of chains exhibits a separation cutoff at time $d\log d+(N-1)d\log\log d$ with window size $d$ .

Proof.

In [15] authors derived limiting distribution of $T^{*}_{d}$ showing that

[TABLE]

(where $\gamma=0.57721\ldots$ is the Euler-Mascheroni constant) converges in distribution to a standard Gumbel random variable. Similar calculations as in Corollary 4.3 finish the proof.

∎

Recently authors in [14] extended the result of [15] obtaining the limiting distribution of $T^{*}_{d}$ for $N_{1}=\ldots=N_{d}=N$ and for quite general choices of probabilities $p_{k}$ . Let us indicate here one example (which actually includes result of Corollary 4.3 as a special case).

{crllr}

Consider a sequence of Markov chains $X_{(d)}$ indexed by $d=1,2,\ldots$ on $\mathbb{E}_{(d)}=\{0,1,\ldots,N\}^{d}$ with an initial distribution $\nu_{(d)}=\delta_{(0,\ldots,0)}$ and the transition matrix $\mathbf{P}_{(d)}$ given in (9) with

[TABLE]

and $N_{1}=\ldots=N_{d}=N\geq 2$ (so that (8) holds). The stationary distribution $\pi_{(d)}$ is uniform. The sequence of chains exhibits a separation cutoff at time $d\log d+(N-1)d\log\log d$ with window size $d$ .

Proof.

In [14] authors prove that

[TABLE]

converges in distribution to a standard Gumbel random variable. Again, similar calculations as in Corollary 4.3 finish the proof. ∎

4.4. Constructing an ergodic chain with a prespecified FSST and an arbitrary stationary distribution

Let us ask the following question (which was one of the main motivations for the paper):

How to construct a Markov chain on a state space of size $M$ with arbitrary stationary distribution $\pi$ whose FSST $T$ is deterministic, $P(T=M-1)=1$ ?

The recipe is clear from previous sections: Start with some absorbing chain $X^{*}$ for which $P(T^{*}=M-1)=1$ , where $T^{*}$ is the absorption time. Probably the simplest one is the following: take $\mathbb{E}=\{1,\ldots,M\}$ with transitions $\mathbf{P}_{0}^{*}(k,k+1)=1$ for $k<N$ and $\mathbf{P}_{0}^{*}(N,N)=1$ and start it at state 1. Then of course we have desired absorption time and thus the antidual would have desired stationary distribution and FSST.

The above example will be a special case of a more general result. Many absorbing chains have the absorption time $T^{*}$ distributed as a mixture of sums of independent geometric random variables with parameters being the eigenvalues of the transition matrix. E.g., for stochastically monotone discrete time birth and death chain starting at 1 with $d>1$ being the absorbing state, the time to absorption is distributed as a sum of geometric random variables with parameters being the eigenvalues of the transition matrix (which are positive in this case). This result follows from Karlin and McGregor [23] or Keilson [24]. Fill [19] gave a first stochastic proof of this result using dualities (the result was simultaneously obtained in [9]). This was extended to skip-free Markov chains in Fill [18]. Miclo [33] showed that for any absorbing chain on $\mathbb{E}=\{\mathbf{e}_{1},\ldots,\mathbf{e}_{M}\}$ with positive eigenvalues and some reversibility condition (involving substochastic kernel corresponding to the transition matrix with row and column corresponding to absorbing state removed) there exists a measure $a=(a_{1},\ldots,a_{M})$ such that the time to absorption $T^{*}$ has distribution

[TABLE]

where $\lambda_{i}$ are the eigenvalues of the transition matrix sorted in non-increasing order and $\mathcal{G}(p_{1},\ldots,p_{k})$ denotes the distribution of $\sum_{j=1}^{k}X_{j},$ where $X_{j}\sim Geo(p_{j})$ .

For convenience denote $H(k):=\sum_{j=1}^{k}\pi(j)$ . Our result is following. {thrm} Let $\mathbb{E}=\{1,\ldots,M\}$ and $p_{k}\in[0,1],k=1,\ldots,{M-1}$ . Let $a_{k},\pi(k),k=1,\ldots,M$ be two probability distributions on $\mathbb{E}$ such that $a_{k}\geq 0,\pi(k)>0$ for all $k\in\mathbb{E}$ . Define the matrix

[TABLE]

Assume that $\pi$ and sequence $\{p_{k}\}_{k=1,\ldots,M}$ are such that that the matrix $\mathbf{P}$ is non-negative. Then Markov chain $X$ with the transition matrix $\mathbf{P}$ and with the initial distribution $\nu=(\nu(1),\ldots,\nu(M))$ given by

[TABLE]

has the FSST $T$ distributed as

[TABLE]

and $\pi$ is its stationary distribution. Moreover, $\{1-p_{1},\ldots,1-p_{M-1},1\}$ are the eigenvalues of $\mathbf{P}$ .

Note that $X$ is a skip-free chain: for given $k$ the only nonzero entries of $\mathbf{P}$ are $\mathbf{P}(k,s)$ for $s\leq k+1$ . The proof of the theorem is postponed to Section 5.2.

We can relatively easy have some corollaries being interesting special cases of Theorem 4.4. Applying the Theorem 4.4 with $p_{k}=1,k=1,\ldots,M-1,p_{M}=0$ and $a_{1}=1,a_{k}=0,k=2,\ldots,M$ we obtain the following corollary. {crllr} Consider a distribution $\pi$ on $\mathbb{E}=\{1,\ldots,M\}$ such that $\pi(k)>0$ for all $k\in\mathbb{E}$ . The Markov chain $X$ on $\mathbb{E}$ with transition matrix

[TABLE]

is ergodic with the stationary distribution $\pi$ . Assume the initial distribution is $\nu=\delta_{1}$ (i.e., $P(X_{0}=1)=1$ ). Then the chain has deterministic fastest strong stationary time $T$ such that $P(T=M-1)=1$ .

Note that for this chain we have

[TABLE]

Thus, this is an extreme example for a separation cutoff: For any $k\leq M-2$ the chain is completely not mixed (the separation between stationary distribution and distribution at step $k$ is 1) and the chain mixes completely exactly at step $k=M-1$ (the distance is 0).

Simplifying the chain further by taking additionally uniform distribution $\pi(k)={1\over M}$ in Corollary 4.4 we obtain

[TABLE]

The chain is sketched in Fig. 1

Two Markov chains on essentially different state spaces with the same FSST

So far in this section we considered chains on totally ordered state space $\mathbb{E}=\{1,\ldots,M\}$ . We can also consider another state spaces. We will consider chain on $\mathbb{E}^{(2)}=\{0,1\}^{d}$ . We will not present full generality one can have, instead we will present two chains, one on $\mathbb{E}^{(1)}=\{1,\ldots,d\}$ and the other on $\mathbb{E}^{(2)}$ both with uniform distributions and the same FSST distributed as $\sum_{k=1}^{d-1}X_{k},$ where $X_{k}\sim Geo(k\cdot p)$ for some fixed $p\leq{1\over d}$ . Note that in particular the sizes of the state spaces are completely different, $2^{d}$ versus $d$

{crllr}

Fix some integer $d>1$ and $0<p\leq{1\over d}$ . Let $X^{(1)}$ be a Markov chain on $\mathbb{E}^{(1)}=\{1,\ldots,d\}$ with an initial distribution $\nu^{(1)}=(1,0,\ldots,0)$ and transitions

[TABLE]

Let $X^{(2)}$ be a Markov chain on $\mathbb{E}^{(2)}=\{0,1\}^{d}$ with initial distribution $\nu^{(2)}((0,\ldots,0))=\nu^{(2)}((1,0,\ldots,0))=1/2$ and with transitions

[TABLE]

(Recall that $|\mathbf{i}|=\sum_{j=1}^{d}i_{j}$ was called a level of $\mathbf{i}$ ).

Then the FSSTs $T^{(1)}$ and $T^{(2)}$ of both chains have the same distribution:

[TABLE]

Both chains have the uniform stationary distribution on respective state spaces.

Proof.

We will show that chains $X^{(1)}$ and $X^{(2)}$ are sharp antidual chains of different chains $X^{*(1)}$ and $X^{*(2)}$ , whose absorption times are equal to the statement.

•

Chain $X^{(1)}$

This is a special case of the chain given in Theorem 4.4 with $p_{k}=(d-k)p$ and the uniform stationary distribution $\pi$ . Taking $a_{1}=1,a_{k}=0,k=2,\ldots,M$ we have that the initial distribution $v=(1,0,\ldots,0)$ and that FSST $T^{(1)}$ is distributed as $\sum_{k=1}^{d-1}X_{k},X_{k}\sim Geo(p_{k})$ with $p_{k}=(d-k)p$ . The distribution of $T^{(1)}$ is equal to $\sum_{k=1}^{d-1}Y_{k}$ with $Y_{k}\sim Geo(k\cdot p)$

•

Chain $X^{(2)}$

This is a special case of the chain $\mathbf{P}_{1}$ given in Corollary 4.2 with $p_{k}=p$ . Thus, its sharp dual chain is given in (7). Recall this is the case $N_{j}=1,j=1,\ldots,d$ , let us explicitly write the transitions of this $\mathbf{P}^{*}$ using notation from this section:

[TABLE]

Roughly speaking, this is the following random walk on hypercube $\{0,1\}^{d}$ . Being at some state $\mathbf{i}=(i_{1},\ldots,i_{d}),i_{k}\in\{0,1\}$ either we change one coordinate from 0 to 1 with probability $p$ or with the remaining probability we do nothing. State $(1,\ldots,1)$ is an absorbing state. Since the probability of changing 0 into 1 does not depend on the actual state, the time to increase the current level depends only on the level. Being at any state on level $|\mathbf{i}|=l$ the time to reach next level has distribution $Geo((d-l)p)$ (since there are $(d-l)$ of zeros, each of which can be changed into 1 with probability $p$ ). Thus, if the chain starts somewhere on level 1, say $\nu^{*}((1,0,\ldots,0))=1$ , then the absorption time is equal in distribution to $\sum_{k=1}^{d-1}X_{k},$ where $X_{k}\sim Geo(k\cdot p)$ . What remains to show is that $\nu=\nu^{*}\Lambda$ yields $\nu^{(2)}((0,\ldots,0))=\nu^{(2)}((1,0,\ldots,0))=1/2$ . All the proofs of Theorems 4.2 and 4.2 are based on coordinate-wise ordering, i.e.,

[TABLE]

Recall the link $\Lambda$ (it is given in (3))

[TABLE]

We have

[TABLE]

what finishes the proof.

∎

5. Proofs

5.1. Proofs of Theorems 4.2 and 4.2

In both proofs we use the coordinate-wise ordering (defined in (13)) for which $\mathbf{i}_{min}=(0,\ldots,0)$ is the unique minimal and $\mathbf{i}_{max}=(N_{1},\ldots,N_{d})$ is the unique maximal one.

Proof of Theorem 4.2.

For the ordering under consideration, directly from Proposition 5 in Rota [37], we find the corresponding Möbius function

[TABLE]

Let

[TABLE]

We will apply Theorem 4.1 with the above ordering and the uniform distribution $\pi$ on $\mathbb{E}^{*}$ , i.e., $\pi(\mathbf{i})={1\over\rho(\mathbf{i}_{max})}$ . Since $X^{*}$ starts at the minimal state, so does - by Remark 4.1 - the antidual chain. The link $\Lambda(\mathbf{i},\mathbf{i}^{\prime})$ is the uniform distribution truncated to $\{\mathbf{i}^{\prime}\preceq\mathbf{i}\}$ , from (4) we have $\Lambda=(\mathbf{diag}(\pi\mathbf{C}))^{-1}\mathbf{C}^{T}\mathbf{diag}(\pi)$ , thus

[TABLE]

The inverse is given by $\Lambda^{-1}=(\mathbf{diag}(\pi))^{-1}(\mathbf{C}^{-1})^{T}\mathbf{diag}(\pi\mathbf{C})$ , thus

[TABLE]

Instead of calculating $\widehat{\mathbf{P}^{*}}$ , we will calculate $\Lambda^{-1}$ and then directly the antidual chain from $\mathbf{P}=\Lambda^{-1}\mathbf{P}^{*}\Lambda$ (the conditions on $(\pi,\mathbf{C})$ -Möbius monotonicity will be read from the resulting antidual, see Remark 4.1). We have to calculate

[TABLE]

Because of the form of $\Lambda^{-1}$ , we need only to consider states which differ from $\mathbf{i}^{(1)}$ at most by 1 on each coordinate.

[TABLE]

We need to calculate

[TABLE]

Note that for a given $\mathbf{i}^{(1)}-\mathbf{r}\in\mathbb{E}^{*}$ the only nonzero entries of $\mathbf{P}^{*}(\mathbf{i}^{(1)}-\mathbf{r},\mathbf{i})$ are for $\mathbf{i}=\mathbf{i}^{(1)}-\mathbf{r}$ or $\mathbf{i}=\mathbf{i}^{(1)}-\mathbf{r}+\mathbf{s}_{j}$ (if $\mathbf{i}\in\mathbb{E}^{*}$ ), where $\mathbf{s}_{j}=(0,\ldots,0,1,0,\ldots,0)$ (with 1 at position $j$ ). We have

$(\mathbf{P}^{*}\Lambda)(\mathbf{i}^{(1)}-\mathbf{r},\mathbf{i}^{(2)})=$

[TABLE]

thus

[TABLE]

For convenience, define

[TABLE]

Consider cases:

•

Case 1. Increasing some coordinates: $\mathbf{i}^{(2)}=\mathbf{i}^{(1)}+m_{1}\mathbf{s}_{k_{1}}+\ldots+m_{t}\mathbf{s}_{k_{t}},$ where $1\leq k_{i}\leq d,i=1,\ldots,t$ are $t\geq 1$ distinct integers and $m_{i}\geq 1,i=1,\ldots,t$ . When $t\geq 2$ , then indicators in both, $H_{1}$ and $H_{2}$ are equal to 0. When $t=1$ , then the indicator in $H_{1}$ is equal to 0, whereas the indicator in $H_{2}$ can be nonzero only in case $m=1$ , $j=k$ and $\mathbf{r}=(0,\ldots,0)$ . Then we have

[TABLE]

•

Case 2. Increasing two or more coordinates and decreasing any number of coordinates: because of the same reasons as in previous case, indicators in both, $H_{1}$ and $H_{2}$ are equal to 0.

•

Case 3. Decreasing some coordinates: $\mathbf{i}^{(2)}=\mathbf{i}^{(1)}-m_{1}\mathbf{s}_{k_{1}}-\ldots-m_{t}\mathbf{s}_{k_{t}}$ , where $1\leq k_{i}\leq d,i=1,\ldots,t$ are $t\geq 1$ distinct integers and $1\leq m_{i}\leq i_{k_{i}},i=1,\ldots,t$ .

Let $\boldsymbol{\kappa}=(\kappa_{1},\ldots,\kappa_{d})$ , where $\kappa_{k_{i}}=1,i=1,\ldots,t$ and $\kappa_{j}=0$ for $j\notin\{k_{1},\ldots,k_{t}\}$ . In (15) we sum over all $\mathbf{r}\in\{0,1\}^{d}$ such that $\mathbf{i}^{(1)}-\mathbf{r}\in\mathbb{E}^{*}$ . Let us split this sum into two sums over disjoint sets $I_{1}$ and $I_{2}$ , where

[TABLE]

Consider $\mathbf{r}^{\prime}=(r_{1}^{\prime},\ldots,r_{d}^{\prime})\in I_{2}$ . Since it is incomparable with $\boldsymbol{\kappa}$ it means that for some $q\geq 1$ we have $a_{1},\ldots,a_{q}$ such that $\{a_{1},\ldots,a_{q}\}\cap\{k_{1},\ldots,k_{t}\}=\emptyset$ and $r_{a_{i}}^{\prime}=1,i=1,\ldots,q$ . Then the indicator in $H_{1}$ is equal to 0. The second indicator can be nonzero only when $q=1$ and $j=a_{1}$ . Thus, for any $\mathbf{r}\in I_{1}$ we have that $\mathbf{r}+s_{n}\in I_{2},$ for all $1\leq n\leq d$ such that $n\neq k_{i},i=1,\ldots,t$ . We have

[TABLE]

The indicator is nonzero only when $j=n$ , and $r_{n}=0$ for $n\notin\{k_{1},\ldots,k_{t}\}$ , thus

[TABLE]

since the second sum does not depend on $\mathbf{r}$ .

Consider $\mathbf{r}\in I_{1}$ . Then indicators in both $H_{1}$ and $H_{2}$ are nonzero, we have

[TABLE]

Consider cases:

$a)$

$t=1$ , i.e., we decrease only one coordinate. In this case $\boldsymbol{\kappa}=(0,\ldots,0,1,0,\ldots,0)$ with only one 1 at position $k$ . Thus there are only two $\mathbf{r}$ such that $\mathbf{r}\preceq\boldsymbol{\kappa},$ namely $\mathbf{r}=(0,\ldots,0)$ or $\mathbf{r}=\boldsymbol{\kappa}$ . We have

[TABLE]

Note that for $j\neq k$ all the corresponding terms (for $\mathbf{r}=(0,\ldots,0)$ and $\mathbf{r}=\boldsymbol{\kappa}$ ) are the same, thus they sum up to 0. The remaining terms:

[TABLE]

Finally, we have

[TABLE]

$b)$

$t\geq 0$ . Things are different in this case. Consider $\mathbf{r}=(r_{1},\ldots,r_{d})\preceq\boldsymbol{\kappa}$ and fixed $r_{n}$ , where $n\in\{k_{1},\ldots,k_{t}\}$ . Then there are $2^{t-1}$ different $\mathbf{r}$ in $S_{1}$ , from which exactly $2^{t-2}$ gives $(-1)^{|\mathbf{r}|}=1$ and exactly $2^{t-2}$ gives $(-1)^{|\mathbf{r}|}=1$ , resulting in vanishing the terms ${i_{j}^{(1)}+1\over i_{j}^{(1)}+2}$ or ${i_{j}^{(1)}\over i_{j}^{(1)}+1}$ (depending on the value of $r_{n}$ ). This implies that $S_{1}=0$ . For example, for $t=2$ and, for simplicity, for $d=2$ , there are four following terms in $S_{1}$ :

[TABLE]

which sum up to 0.

Remark: In case $t=1$ for fixed $r_{k_{1}}$ there was no corresponding $n\neq k_{1}$ which could make the terms vanish.

•

Case 4. Increasing one, decreasing another coordinate: $\mathbf{i}^{(2)}=\mathbf{i}^{(1)}-m_{1}\mathbf{s}_{k_{1}}+m_{2}\mathbf{s}_{k_{2}}$ . We have shown that increasing/decreasing $t\geq 2$ coordinates has probability 0, thus there is no need to consider the case where we increase and decrease any number of coordinates in one step.

In this case the indicator in $H_{1}$ is zero. Concerning $H_{2}$ . Let, $\boldsymbol{\kappa}=(0,\ldots,0,1,0,\ldots,0)$ with one 1 at position $k_{1}$ . Note that for $\mathbf{r}\npreceq\boldsymbol{\kappa}$ , the indicator in $H_{2}$ is also 0. Thus, the only nonzero terms are for either $\mathbf{r}=(0,\ldots,0)$ or $\mathbf{r}=\boldsymbol{\kappa}$ (and then $j=k_{2}$ ):

[TABLE]

what sums up to 0.

•

Case 5. Staying at the same state: $\mathbf{i}^{(2)}=\mathbf{i}^{(1)}$ . Then the indicator $\mathbf{1}(\mathbf{i}^{(2)}\preceq\mathbf{i}^{(1)}-\mathbf{r})$ is nonzero only when $\mathbf{r}=(0,\ldots,0)$ , whereas the indicator $\mathbf{1}(\mathbf{i}^{(2)}\preceq\mathbf{i}^{(1)}-\mathbf{r}+\mathbf{s}_{j})$ is nonzero when $\mathbf{r}=(0,\ldots,0)$ and any $j=1,\ldots,d$ or when $\mathbf{r}=\mathbf{s}_{j}$ . We have

[TABLE]

The assumption (8) implies that $\mathbf{P}(\mathbf{i}^{(1)},\mathbf{i}^{(1)})\geq 0$ . We have considered all the transitions. Let us check that each row of calculated $\mathbf{P}$ sums up to 1. We have (with the convention $\sum_{m=1}^{0}f(m)\equiv 0$ )

$\displaystyle\sum_{\mathbf{i}^{(2)}\in\mathbb{E}^{*}}\mathbf{P}(\mathbf{i}^{(1)},\mathbf{i}^{(2)})=$

[TABLE]

∎

Proof of Theorem 4.2.

Note that $(0,\ldots,0)$ is the minimal state, and $X^{*}$ starts at this state $\nu^{*}=\delta_{(0,\ldots,0)}$ , thus - by Remark 4.1 - this is also the initial distribution of the antidual chain, i.e., $\nu=\nu^{*}$ .

For convenience, define

[TABLE]

For the stationary distribution $\pi$ given in (11) we have

[TABLE]

Denote

[TABLE]

The sum in denominator of $f(\mathbf{i},k)$ can be split into two sums: for $\mathbf{i}^{\prime\prime}:i^{\prime\prime}_{k}=0$ and $\mathbf{i}^{\prime\prime}:i^{\prime\prime}_{k}=1$ . We have

[TABLE]

Let us proceed with $\widehat{\mathbf{P}^{*}}$ .

[TABLE]

Note that $\widehat{\mathbf{P}^{*}}$ is not a stochastic matrix, since we have

[TABLE]

Now, calculating the antidual chain from Theorem 4.1, we have

[TABLE]

where we applied the Möbius function for this ordering: $\mathbf{C}^{-1}(\mathbf{i},\mathbf{i}^{(1)})=(-1)^{|\mathbf{i}^{(1)}-\mathbf{i}|}\mathbf{1}(\mathbf{i}\preceq\mathbf{i}^{(1)})$ (a consequence of (14)). We proceed with (16) by considering cases:

•

Case 1. Increasing some coordinates: $\mathbf{i}^{(2)}=\mathbf{i}^{(1)}+\mathbf{s}_{k_{1}}+\ldots\mathbf{s}_{k_{t}}$ for some distinct $t\geq 1$ integers $1\leq k_{i}\leq d,i=1,\ldots,d$ .

First note that if $t\geq 2$ , than, for any $\mathbf{i}\preceq\mathbf{i}^{(1)}$ we have $\widehat{\mathbf{P}^{*}}(\mathbf{i},\{\mathbf{i}^{(1)}+\mathbf{s}_{k_{1}}+\ldots\mathbf{s}_{k_{t}}\}^{\uparrow})=0$ , thus $\mathbf{P}(\mathbf{i}^{(1)},\mathbf{i}^{(1)}+\mathbf{s}_{k_{1}}+\ldots\mathbf{s}_{k_{M}})=0$ .

For $t=1$ the sum in (16) is following $\sum_{\mathbf{i}\preceq\mathbf{i}^{(1)}}\widehat{\mathbf{P}^{*}}(\mathbf{i},\{\mathbf{i}^{(1)}+\mathbf{s}_{k}\}^{\uparrow})(-1)^{|\mathbf{i}^{(1)}-\mathbf{i}|}$ , the only nonzero term is for $\mathbf{i}=\mathbf{i}^{(1)}$ , thus

[TABLE]

•

Case 2. Increasing two or more coordinates and decreasing any number of coordinates: because of the same reasons as in previous case (we would have to increase at least two coordinates in one step) such transition has probability 0.

•

Case 3: $\mathbf{i}^{(2)}=\mathbf{i}^{(1)}-\mathbf{s}_{k_{1}}-\ldots-\mathbf{s}_{k_{t}},t\geq 1$ . Let us split $\{\mathbf{e}\preceq\mathbf{i}^{(1)}\}$ into five disjoint sets:

[TABLE]

where $\mathbf{e}\prec\mathbf{e}^{\prime}$ means that $\mathbf{e}\preceq\mathbf{e}^{\prime}$ and $\mathbf{e}\neq\mathbf{e}^{\prime}$ , and $\mathbf{e}\npreceq\mathbf{e}^{\prime}$ means that $\mathbf{e}$ and $\mathbf{e}^{\prime}$ are incomparable. Define also

[TABLE]

We have

[TABLE]

Let us consider cases $t=1$ and $t\geq 2$ separately.

$a)$

$t=1$ , i.e., $\mathbf{i}^{(2)}=\mathbf{i}^{(1)}-\mathbf{s}_{k}\ (k_{1}\equiv k)$ . Note that then $I_{4}=\emptyset$ . We have

[TABLE]

We have $S_{1}+S_{2}+S_{3}+S_{4}+S_{5}=a_{k}p_{k}$ and finally

[TABLE]

$b)$

$t\geq 2$ . Consider first $t=2$ . Assume thus that $\mathbf{i}^{(2)}=\mathbf{i}^{(1)}-\mathbf{s}_{k_{1}}-\mathbf{s}_{k_{2}}.$ We have

[TABLE]

Summing up, $S_{1}+S_{2}+S_{3}+S_{4}+S_{5}=0$ , what is also the case for $t>2$ (the proof, although longer, is quite similar, we skip the details). This means that for $t\geq 2$

[TABLE]

•

Case 4. Increasing one, decreasing another coordinate: $\mathbf{i}^{(2)}=\mathbf{i}^{(1)}+\mathbf{s}_{k_{1}}-\mathbf{s}_{k_{2}}$ . We have shown that increasing/decreasing $t\geq 2$ coordinate has probability 0, thus it suffices to consider only changing two coordinates (one increasing, the other decreasing). Then the the summands $\sum_{\mathbf{i}\preceq\mathbf{i}^{(1)}}\widehat{\mathbf{P}^{*}}(\mathbf{i},\{\mathbf{i}^{(1)}+\mathbf{s}_{k_{1}}-\mathbf{s}_{k_{2}}\}^{\uparrow})(-1)^{|\mathbf{i}^{(1)}-\mathbf{i}|}$ are nonzero only for $\mathbf{i}=\mathbf{i}^{(1)}$ or $\mathbf{i}=\mathbf{i}^{(1)}-\mathbf{s}_{k_{2}}$ , we have

[TABLE]

thus $\mathbf{P}(\mathbf{i}^{(1)},\mathbf{i}^{(1)}+\mathbf{s}_{k_{1}}-\mathbf{s}_{k_{2}})=0$ .

•

Case 5. Staying at the same state: $\mathbf{i}^{(2)}=\mathbf{i}^{(1)}$ . Then we have

[TABLE]

First term is equal to $\sum_{\mathbf{i}}\widehat{\mathbf{P}^{*}}(\mathbf{i}^{(1)},\mathbf{i})$ , in the latter, the only possibility is to change $j$ -th coordinate of $\mathbf{i}^{(1)}-\mathbf{s}_{j}$ to one:

[TABLE]

Finally, we obtain matrix $\mathbf{P}$ given in (10). ∎

Remark 5.1.

Showing that $\mathbf{P}(\mathbf{i}^{(1)},\mathbf{i}^{(1)}+\mathbf{s}_{k_{1}}-\mathbf{s}_{k_{2}})=0$ relied heavily on the fact that for the stationary distribution given in (11), we had $f(\mathbf{i},j)=1-a_{j}$ and it did not depend on $\mathbf{i}$ . That is why the terms $f(\mathbf{i}^{(1)},{k_{1}})p_{k_{1}}$ and $f(\mathbf{i}^{(1)},{k_{1}})p_{k_{1}}$ cancelled out. Similarly, it is the reason why decreasing $t\geq 2$ coordinates has probability 0. For other, not product-form stationary distributions, such transitions are possible.

5.2. Proof of Theorem 4.4

Let $\mathbf{X}^{*}$ be an absorbing chain on $\mathbb{E}=\{1,\ldots,M\},M\geq 2$ with transition matrix:

[TABLE]

where, for convenience, we set $p_{M}=0$ . Let $\nu^{*}=(a_{1},\ldots,a_{M})$ be its initial distribution. This is a pure birth chain, thus its absorption time $T^{*}$ is distributed as (12). We will show that $\mathbf{P}$ is its sharp antidual chain.

We consider the total ordering $\preceq:=\leq$ . Then the link given in (3) reads

[TABLE]

The inverse $\Lambda^{-1}$ can be easily derived:

[TABLE]

Let us calculate

[TABLE]

Calculating transitions of the antidual chain:

[TABLE]

Consider separately the cases:

•

$k=1$ . Then $\mathbf{P}(1,s)={H(1)\over\pi(1)}\mathbf{P}^{*}\Lambda(1,s)=\mathbf{P}^{*}\Lambda(1,s)$ . This is nonzero only if $s=1$ or $s=2$ .

[TABLE]

•

$k=M$ . We have

[TABLE]

Thus,

[TABLE]

•

$1<k<M$ . We have

[TABLE]

Thus,

[TABLE]

Consider three sub-cases:

$\diamond$

$s=k+1$ . Then we have

[TABLE]

$\diamond$

$s=k$ . Then we have

[TABLE]

$\diamond$

$s<k$ . Then we have

[TABLE]

For $k\in\{1,M\}$ we obviously have $\sum_{s=1}^{M}\mathbf{P}(k,s)=1$ . For $1<k<M$ we have

[TABLE]

Thus (cf. (6)) we considered all the cases. The only thing left to calculate is the initial distribution of the antidual chain. Using relation (1) we have

[TABLE]

The matrix $\mathbf{P}^{*}$ is upper-triangular, thus $\{1-p_{1},\ldots,1-p_{M-1},1\}$ are its eigenvalues. Because of the relation (1) these are also the eigenvalues of $\mathbf{P}$ .

Acknowledgements

The author thanks anonymous reviewers for thorough reviews and appreciates the comments and suggestions, which contributed to improving the quality of the publication.

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. Aldous and P. Diaconis. Shuffling cards and stopping times. American Mathematical Monthly , 93(5):333–348, 1986.
2[2] D. Aldous and P. Diaconis. Strong Uniform Times and Finite Random Walks. Advances in Applied Mathematics , 97:69–97, 1987.
3[3] R. Basu, J. Hermon, and Y. Peres. Characterization of cutoff for reversible Markov chains. Annals of Probability , 45(3):1448–1487, 2017.
4[4] G. Y. Chen and L. Saloff-Coste. The cutoff phenomenon for ergodic Markov processes. Electronic Journal of Probability , 13:26–78, 2008.
5[5] G.-Y. Chen and L. Saloff-Coste. Computing cutoff times of birth and death chains. Electronic Journal of Probability , 20:1–47, 2015.
6[6] M. C. H. Choi and P. Patie. A Sufficient Condition for Continuous-Time Finite Skip-Free Markov Chains to Have Real Eigenvalues. In: Bélair J., Frigaard I., Kunze H., Makarov R., Melnik R., Spiteri R. (eds) Mathematical and Computational Approaches in Advancing Modern Science and Engineering. , pages 529–536, 2016.
7[7] S. B. Connor. Separation and coupling cutoffs for tuples of independent Markov processes. Latin American Journal of Probability and Mathematical Statistics , 7(3):65–77, 2010.
8[8] P. Diaconis and J. A. Fill. Strong stationary times via a new form of duality. The Annals of Probability , 18(4):1483–1522, 1990.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Antiduality and Möbius monotonicity: Generalized Coupon Collector Problem

Abstract.

Key words and phrases:

1991 Mathematics Subject Classification:

1. Introduction

2. Preliminaries

2.1. Strong stationary duality

2.2. Separation cutoff

3. Möbius monotonicity and duality

Proof.

4. Main results

4.1. General procedure for sharp anti-dual chains

Proof.

4.2. Antidual chains for a generalized coupon collector problem

Proof.

4.3. Results on the separation cutoff

Proof.

Proof.

Proof.

Proof.

4.4. Constructing an ergodic chain with a prespecified FSST and an arbitrary stationary distribution

Proof.

5. Proofs

5.1. Proofs of Theorems 4.2 and 4.2

Proof of Theorem 4.2.

Proof of Theorem 4.2.

Remark 5.1**.**

5.2. Proof of Theorem 4.4

Acknowledgements

Remark 5.1.