Mixing time of an unaligned Gibbs sampler on the square

Bal\'azs Gerencs\'er

arXiv:1706.08694·math.PR·October 9, 2018

Mixing time of an unaligned Gibbs sampler on the square

Bal\'azs Gerencs\'er

PDF

Open Access

TL;DR

This paper proves that the mixing time of a specific unaligned Gibbs sampler on the unit square is proportional to the square of a parameter, confirming a conjecture and improving previous exponential bounds.

Contribution

The paper confirms Diaconis's conjecture that the mixing time is O(A^2) for the unaligned Gibbs sampler, improving the previous exponential estimate.

Findings

01

Mixing time is proportional to A^2.

02

Confirmed conjecture by Diaconis.

03

Improved from exponential to quadratic bound.

Abstract

The paper concerns a particular example of the Gibbs sampler and its mixing efficiency. Coordinates of a point are rerandomized in the unit square $[0, 1]^{2}$ to approach a stationary distribution with density proportional to $exp (- A^{2} (u - v)^{2})$ for $(u, v) \in [0, 1]^{2}$ with some large parameter $A$ . Diaconis conjectured the mixing time of this process to be $O (A^{2})$ which we confirm in this paper. This improves on the currently known $O (exp (A^{2}))$ estimate.

Figures10

Click any figure to enlarge with its caption.

Equations109

t_{mix} (X, ε) := X (0) \in Ω sup min {t : ∥ L (X (t)) - η ∥_{TV} \leq ε} .

t_{mix} (X, ε) := X (0) \in Ω sup min {t : ∥ L (X (t)) - η ∥_{TV} \leq ε} .

t_{mix} (X, α) \leq β A^{2} .

t_{mix} (X, α) \leq β A^{2} .

X(t):=\begin{cases}\big{(}u^{+},X_{v}(t-1)\big{)},&\text{ if }r(t)=U,\text{where }u^{+}\sim\pi(\cdot,X_{v}(t-1)),\\ \big{(}X_{u}(t-1),v^{+}\big{)},&\text{ if }r(t)=V,\text{where }v^{+}\sim\pi(X_{u}(t-1),\cdot),\end{cases},

X(t):=\begin{cases}\big{(}u^{+},X_{v}(t-1)\big{)},&\text{ if }r(t)=U,\text{where }u^{+}\sim\pi(\cdot,X_{v}(t-1)),\\ \big{(}X_{u}(t-1),v^{+}\big{)},&\text{ if }r(t)=V,\text{where }v^{+}\sim\pi(X_{u}(t-1),\cdot),\end{cases},

X^{*} (2 s)

X^{*} (2 s)

X^{*} (2 s + 1)

Y(t):=\big{(}u^{+},Y_{u}(t-1)\big{)},\qquad\text{where }u^{+}\sim\pi(\cdot,Y_{u}(t-1)).

Y(t):=\big{(}u^{+},Y_{u}(t-1)\big{)},\qquad\text{where }u^{+}\sim\pi(\cdot,Y_{u}(t-1)).

Y^{\prime}(t):=\big{(}u^{+},Y^{\prime}_{u}(t-1)\big{)},\qquad\text{where }u^{+}\sim\sigma_{Y^{\prime}_{u}(t-1)}.

Y^{\prime}(t):=\big{(}u^{+},Y^{\prime}_{u}(t-1)\big{)},\qquad\text{where }u^{+}\sim\sigma_{Y^{\prime}_{u}(t-1)}.

σ_{v} ({u^{+} > 1}) \leq 2 P (u > 1, u \sim N (v, 1/ (2 A^{2}))) \leq 2 \frac{exp ( - A ^{2} ( 1/2 + δ ) ^{2} )}{2 π A ( 1/2 + δ )} .

σ_{v} ({u^{+} > 1}) \leq 2 P (u > 1, u \sim N (v, 1/ (2 A^{2}))) \leq 2 \frac{exp ( - A ^{2} ( 1/2 + δ ) ^{2} )}{2 π A ( 1/2 + δ )} .

2 \frac{exp ( - A ^{2} ( 2 δ ) ^{2} )}{2 π A ( 2 δ )} .

2 \frac{exp ( - A ^{2} ( 2 δ ) ^{2} )}{2 π A ( 2 δ )} .

ϕ_{\overset{v}{ˉ}} ([0, u]) \geq σ_{v} ([0, u]) .

ϕ_{\overset{v}{ˉ}} ([0, u]) \geq σ_{v} ([0, u]) .

ϕ_{\overset{v}{ˉ}} ([0, u]) \geq ϕ_{v} ([0, u]), ϕ_{v} ([0, u]) \geq σ_{v} ([0, u]) .

ϕ_{\overset{v}{ˉ}} ([0, u]) \geq ϕ_{v} ([0, u]), ϕ_{v} ([0, u]) \geq σ_{v} ([0, u]) .

\partial_{v} ϕ_{v} ([0, u]) = \partial_{v} (ϕ ([- v - u, - v + u]))

\partial_{v} ϕ_{v} ([0, u]) = \partial_{v} (ϕ ([- v - u, - v + u]))

= \partial_{v} \frac{1}{\int _{- \infty}^{\infty} φ} - v - u \int - v + u φ

= \partial_{v} \frac{1}{\int _{- \infty}^{\infty} φ} - v - u \int - v + u φ

= \frac{1}{\int _{- \infty}^{\infty} φ} (- φ (- v + u) + φ (- v - u)) \leq 0.

= \frac{1}{\int _{- \infty}^{\infty} φ} (- φ (- v + u) + φ (- v - u)) \leq 0.

ϕ_{v} ([0, u]) = ϕ ([- v - u, - v + u]) \geq σ_{v} ([0, u]) .

ϕ_{v} ([0, u]) = ϕ ([- v - u, - v + u]) \geq σ_{v} ([0, u]) .

\int_{- v - u}^{- v + u} φ / \int_{- \infty}^{\infty} φ

\int_{- v - u}^{- v + u} φ / \int_{- \infty}^{\infty} φ

\int_{- v - u}^{- v + u} φ \cdot \int_{- v}^{\infty} φ

(\int_{- v - u}^{- v} φ + \int_{- v}^{- v + u} φ) \cdot \int_{- v}^{\infty} φ

\int_{- v - u}^{- v} φ \cdot \int_{- v}^{\infty} φ

\int_{- v - u}^{- v} φ \cdot (\int_{- v}^{- v + u} φ + \int_{- v + u}^{\infty} φ)

\int_{- v - u}^{- v} φ \cdot \int_{- v + u}^{\infty} φ

- v - u \int - v e^{- A^{2} x^{2}} d x \cdot - v + u \int \infty e^{- A^{2} y^{2}} d y

- v - u \int - v e^{- A^{2} x^{2}} d x \cdot - v + u \int \infty e^{- A^{2} y^{2}} d y

0 \int u e^{- A^{2} (x + v)^{2}} d x \cdot u \int \infty e^{- A^{2} (y - v)^{2}} d y

0 \int u u \int \infty e^{- A^{2} (x^{2} + y^{2} + 2 v^{2} + 2 v (x - y))} d y d x

n \to \infty lim P (max (S_{1}, S_{2}, \dots, S_{n}) < α n) = \frac{2}{π} \int_{0}^{α} exp (- x^{2} /2) d x .

n \to \infty lim P (max (S_{1}, S_{2}, \dots, S_{n}) < α n) = \frac{2}{π} \int_{0}^{α} exp (- x^{2} /2) d x .

P (max (0, W (1) - W (0), \dots, W (α_{4} A^{2}) - W (0)) < 1/2 - δ)

P (max (0, W (1) - W (0), \dots, W (α_{4} A^{2}) - W (0)) < 1/2 - δ)

= P (max (0, W (1) - W (0), \dots, W (α_{4} A^{2}) - W (0)) 2 A < \frac{1 - 2 δ}{2 α _{4}} α_{4} A^{2})

\to \frac{2}{π} \int_{0}^{\frac{1 - 2 δ}{2 α _{4}}} exp (- x^{2} /2) d x .

P (ν_{c}^{2} < α_{4} A^{2}) \leq 2 (1 - \frac{2}{π} \int_{0}^{\frac{1 - 2 δ}{2 α _{4}}} exp (- x^{2} /2) d x) + ε =: β_{4} .

P (ν_{c}^{2} < α_{4} A^{2}) \leq 2 (1 - \frac{2}{π} \int_{0}^{\frac{1 - 2 δ}{2 α _{4}}} exp (- x^{2} /2) d x) + ε =: β_{4} .

∥ L (Y_{u} (α_{5} A^{2})) - π_{u} ∥_{TV} < 1/3.

∥ L (Y_{u} (α_{5} A^{2})) - π_{u} ∥_{TV} < 1/3.

∥ L (W (α_{5} A^{2})) - μ ∥_{TV} = \frac{1}{2} \int_{- \infty}^{\infty} \frac{exp ( - ( x - Y _{u} ( 0 ) ) ^{2} / α _{5} )}{α _{5} π} - \mathbbm 1_{[0, 1]} (x) d x

∥ L (W (α_{5} A^{2})) - μ ∥_{TV} = \frac{1}{2} \int_{- \infty}^{\infty} \frac{exp ( - ( x - Y _{u} ( 0 ) ) ^{2} / α _{5} )}{α _{5} π} - \mathbbm 1_{[0, 1]} (x) d x

∥ L (W (α_{5} A^{2})) - μ ∥_{TV} = 1 - \int_{- \infty}^{\infty} min (\frac{exp ( - ( x - Y _{u} ( 0 ) ) ^{2} / α _{5} )}{α _{5} π}, \mathbbm 1_{[0, 1]} (x)) d x

∥ L (W (α_{5} A^{2})) - μ ∥_{TV} = 1 - \int_{- \infty}^{\infty} min (\frac{exp ( - ( x - Y _{u} ( 0 ) ) ^{2} / α _{5} )}{α _{5} π}, \mathbbm 1_{[0, 1]} (x)) d x

= 1 - \int_{0}^{1} min (\frac{exp ( - ( x - Y _{u} ( 0 ) ) ^{2} / α _{5} )}{α _{5} π}, 1) d x

\leq 1 + 2 δ - \int_{- δ}^{1 + δ} min (\frac{exp ( - ( x - 1/2 ) ^{2} / α _{5} )}{α _{5} π}, 1) d x =: γ .

∥ L (Y_{u} (α_{5} A^{2})) - L (W (α_{5} A^{2})) ∥_{TV} \leq β_{4},

∥ L (Y_{u} (α_{5} A^{2})) - L (W (α_{5} A^{2})) ∥_{TV} \leq β_{4},

p_{u} (x) = \frac{A}{π} \int_{- x}^{1 - x} φ (y) d y,

p_{u} (x) = \frac{A}{π} \int_{- x}^{1 - x} φ (y) d y,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Stochastic processes and statistical mechanics · Bayesian Methods and Mixture Models

Full text

Mixing time of an unaligned Gibbs sampler on the square

Balázs Gerencsér B. Gerencsér is with the Alfréd Rényi Institute of Mathematics, Hungarian Academy of Sciences and the ELTE Eötvös Loránd University, Department of Probability Theory and Statistics, [email protected]. His work is supported by NKFIH (National Research, Development and Innovation Office) grant PD 121107.

Abstract

The paper concerns a particular example of the Gibbs sampler and its mixing efficiency. Coordinates of a point are rerandomized in the unit square $[0,1]^{2}$ to approach a stationary distribution with density proportional to $\exp(-A^{2}(u-v)^{2})$ for $(u,v)\in[0,1]^{2}$ with some large parameter $A$ .

Diaconis conjectured the mixing time of this process to be $O(A^{2})$ which we confirm in this paper. This improves on the currently known $O(\exp(A^{2}))$ estimate.

1 Introduction

A standard use of Markov chains is to sample from a probability distribution that would be otherwise hard to access. This can happen when the distribution is supported on a set implicitly defined by some constraints, e.g., a convex body in a high dimensional space [5], [7], proper colorings of a graph [3], [8], etc. Several frameworks have been designed to achieve this goal including the Metropolis algorithm and the Gibbs sampler and their variants. There is a vast range of applications and studies, we refer the reader to [2], [1] for orientation.

A central and recurring question is the efficiency of these algorithms in the different settings. We highlight two phenomena that can decrease the performance of such algorithms. First, the incremental change the Markov chain allows is usually quite rigid and given by the structure of the state space. However, the desired stationary distribution does not need to be aligned with the directions where the Markov chain mixes fast. Second, some boundary effects might occur if the Markov chain can get trapped in some remote part of the state space.

In this paper we analyze an example of the Gibbs sampling procedure proposed by Diaconis which is surprisingly simple considering it captures both of the two phenomena above. We call the coordinate Gibbs sampler for the diagonal distribution the following process. Fix a large positive constant $A$ and on $[0,1]^{2}$ define the distribution $\pi$ with density proportional to $\exp(-A^{2}(u-v)^{2})$ for $(u,v)\in[0,1]^{2}$ . At each step randomly choose coordinate $u$ or $v$ and rerandomize it according to the conditional distribution of $\pi$ . Notice that the distribution of this Markov chain is mostly concentrated near the diagonal of the unit square, while only horizontal and vertical transitions are allowed. Furthermore, near $(0,0)$ and $(1,1)$ we see that both the high density of $\pi$ and also the boundaries of the square hinder the movement of the chain.

The efficiency of the algorithm is quantified by the mixing time of the Markov chain. For any Markov chain $X(0),X(1),\ldots$ on some state space $\Omega$ (which is $[0,1]^{2}$ in our case) let $\mathcal{L}(X(t))$ denote the distribution of the state at time $t$ and $\eta$ be the stationary distribution assuming it is unique (denoted by $\pi$ for our case). Using the total variation distance between measures, $\|\rho-\sigma\|_{\rm TV}:=\sup_{S\subseteq\Omega}|\rho(S)-\sigma(S)|$ we define the mixing time as

[TABLE]

Diaconis conjectured that the mixing time of the example proposed is $O(A^{2})$ , the goal of this paper is to confirm this bound.

Theorem 1.

Let $X(t)$ follow the coordinate Gibbs sampler for the diagonal distribution. For any $0<\alpha<1$ there exists $\beta>0$ such that for large enough $A$ ,

[TABLE]

Up until now only $O(\exp(A^{2}))$ was known which easily follows from a minorization condition of the transition kernel.

Observe that the diagonal nature of the distribution plays an important role in the mixing behavior, making the distribution and the randomization steps unaligned. If we took the distribution with density proportional to $\exp(A^{2}(u-1/2)^{2})$ for $(u,v)\in[0,1]^{2}$ , then the mixing time would decrease to be $O(1)$ . Indeed, this is a product distribution, product of one for $u$ and one (uniform) for $v$ , consequently after a rerandomization is performed along both coordinates, the distribution of the process will exactly match the prescribed one. This will happen with probability arbitrarily close to 1 within a corresponding finite number of steps, not depending on the value of $A$ .

The rest of the paper is organized as follows. In Section 2 a formal definition of the process of interest is provided and further variants are introduced that help the analysis. Section 3 provides the building blocks for the proof, to understand the short-term behavior of the process based on the initialization. Afterwards, the proof of Theorem 1 is aggregated in Section 4. Finally, a complementing lower bound demonstrating that Theorem 1 is essentially sharp is given in Section 5 together with some numerical simulations.

2 Preliminaries, alternative processes

We now formally define the coordinate Gibbs sampler for the diagonal distribution which we denote by $X(t)$ , then we introduce variants that will be more convenient to handle.

Let $\varphi(x):=\exp(-A^{2}x^{2})$ for some large $A>0$ and let $\pi$ be the probability distribution on $[0,1]^{2}$ with density $\mathcal{Z}^{-1}\varphi(u-v)$ at $(u,v)\in[0,1]^{2}$ (where $\mathcal{Z}=\int_{[0,1]^{2}}\varphi(u-v)$ ). We write $\pi(\cdot,v)$ for the conditional distribution of the $u$ coordinate when $v$ is fixed (similarly for $\pi(u,\cdot)$ ). Denote by $\pi_{u}$ the projection of $\pi$ , that is, the overall distribution of the $u$ coordinate.

When defining the coordinate Gibbs sampler for the diagonal distribution, we separate the decision of the direction of randomization and the randomization itself. For $t=1,2,\ldots$ let $r(t)$ be an i.i.d. sequence of variables of characters $U,V$ taking both with probability $1/2$ . Given some initial point $X(0)\in[0,1]^{2}$ the random variable $X(t)=(X_{u}(t),X_{v}(t))$ is generated as a Markov chain from $X(t-1)$ by randomizing along the axis given by $r(t)$ . Formally,

[TABLE]

where $u^{+},v^{+}$ are conditionally independent of the past at all steps.

Note that when multiple $U$ ’s follow each other in the series $r(t)$ (similarly for $V$ ), the values $u^{+}$ are repeatedly overwritten and forgotten, with no further mixing happening for the overall distribution. Therefore we define an alternative process where this effect does not occur, but rather the directions of randomization are deterministic.

Let $X^{*}(0):=X(0)$ , then the following process is generated:

[TABLE]

It would be convenient for the analysis if it wasn’t necessary to distinguish the steps based on the parity of the time index. For that reason, consider the following modification. At every even step take $X^{*}(2s)$ as before, at every odd step take $X^{*}(2s+1)$ flipped along the diagonal of the square (exchange the two coordinates). Equivalently, flip the process at every step while generating. As a result, the randomization happens in the same direction at every step. Note that the target distribution $\pi$ is symmetric along the diagonal therefore no adjustment is needed for the flipping. Formally, the process described is the following:

Let $Y(0):=X(0)$ , then the random variables $Y(t)$ are generated from $Y(t-1)$ as follows

[TABLE]

Observe that the scalar process $Y_{u}(t)$ is a Markov chain by itself simply because $Y(t)$ depends on $Y(t-1)$ only through $Y_{u}(t-1)$ .

3 Dynamics of $Y_{u}(t)$

In this section we prove two properties of the evolution of $Y_{u}(t)$ , which will be the key elements to compute the mixing time bounds. First, we show that the process cannot stay arbitrarily long at the sides of the unit interval, in $[0,1/2-\delta)$ or $(1/2+\delta,1]$ , where some small enough parameter $\delta>0$ will be chosen. Second, we prove that starting from a point in the middle part $[1/2-\delta,1/2+\delta]$ , the distribution of the process quickly approaches the stationary distribution.

3.1 Reaching the middle

We work on the case when the $Y_{u}(0)$ is away from the middle of $[0,1]$ . We want to ensure that the process does not stay near the boundaries for a long period. To quantify this, the time to reach the middle is defined as follows:

Definition 2.

Let $\nu_{m}:=\min\{s:Y_{u}(s)\in[1/2-\delta,1/2+\delta]\}$ .

Without the loss of generality we may assume that $Y_{u}(0)$ is on the left part of $[0,1]$ , thanks to the symmetry of $\pi$ w.r.t. $(1/2,1/2)$ . Therefore we start from $Y_{u}(0)<1/2-\delta$ . For this period before reaching the middle we introduce a slightly simplified process $Y^{\prime}$ , where both coordinates are allowed to take values in $[0,\infty)$ in principle. This is not supposed to have a substantially different behavior, but will allow more convenient analytic investigation as fewer boundaries are present.

For any $v\in\mathbb{R}$ let $\sigma_{v}$ be the measure on $[0,\infty)$ with density proportional to $\varphi(u-v)$ conditioned on $u\in[0,\infty)$ . Let $Y^{\prime}(0):=X(0)$ , then define the Markov chain $Y^{\prime}(t)$ as follows:

[TABLE]

We can generate $Y^{\prime}(t)$ to be coupled to $Y(t)$ as long as possible. For a fixed $v$ , $\pi(u,v)$ is proportional to $\varphi(u-v)$ conditioned on $u\in[0,1]$ . Therefore, when we need to generate $u^{+}$ we draw a random sample from $\sigma_{Y^{\prime}_{u}(t-1)}$ and use it for both $Y(t)$ and $Y^{\prime}(t)$ if $u^{+}<1$ . Otherwise, we use it for $Y^{\prime}(t)$ but for $Y(t)$ we draw a new independent sample from $\pi(\cdot,Y^{\prime}_{u}(t-1))$ . It is easy to verify this is overall a valid method for generating a random variable of distribution $\pi(\cdot,Y^{\prime}_{u}(t-1))$ .

In the latter case, we also signal decoupling by setting a stopping time $\nu_{c}^{1}=t$ . We show this rarely happens, when governed by a variant of $\nu_{m}$ . Let $\tilde{\nu}_{m}:=\min\{s:Y_{u}(s)\geq 1/2-\delta\}$ .

Lemma 3.

For any $\alpha_{1}>0$ there is $\beta_{1}>0$ such that $P(\nu_{c}^{1}<\min(\tilde{\nu}_{m},\alpha_{1}A^{2}))=O(\exp(-\beta_{1}A^{2}))$ .

Proof.

We want to bound the probability of decoupling at every point in time.

When $u^{+}$ is drawn, $Y^{\prime}_{u}(t-1)<1/2-\delta$ is ensured as $\tilde{\nu}_{m}$ has not yet occurred. For any $v<1/2-\delta$ we have

[TABLE]

Here we use that the conditional probability is at most twice the unconditional one (because of $v\geq 0$ ), use the monotonicity in $v$ , then apply a standard tail probability estimate for the Gaussian distribution.

These exceptional events may occur at most at $\alpha_{1}A^{2}$ different times, therefore by using the union bound the overall probability is $O(\exp(-\beta_{1}A^{2}))$ for any $\beta_{1}<(1/2+\delta)^{2}$ . ∎

Lemma 4.

There exists $\beta_{2}>0$ constant such that $P(\nu_{m}\neq\tilde{\nu}_{m})=O(\exp(-\beta_{2}A^{2})$ .

Proof.

By a similar argument as above this bad event $\{\nu_{m}\neq\tilde{\nu}_{m}\}$ happens when $Y_{u}(t-1)<1/2-\delta$ but $Y_{u}(t)>1/2+\delta$ when $\tilde{\nu}_{m}$ occurs, then a Gaussian tail probability estimate gives an upper bound of

[TABLE]

The lemma holds with $\beta_{2}=(2\delta)^{2}$ . ∎

Handling $Y^{\prime}(t)$ is still challenging due to the conditional distributions included in the definition. Therefore we introduce the following process that will be both convenient to handle and to relate to $Y^{\prime}(t)$ .

Let $\tilde{Z}(t)$ be a random walk with i.i.d. $\mathcal{N}(0,1/(2A^{2}))$ increments, starting from $\tilde{Z}(0):=X_{u}(0)$ .

Let $Z(t):=|\tilde{Z}(t)|$ .

Let us denote by $\phi$ the distribution of the centered Gaussian with variance $1/(2A^{2})$ . During the analysis of $Z(t)$ we will also need to use the distribution of the absolute value of a Gaussian distribution with variance $1/(2A^{2})$ . We denote it by $\phi_{x}$ when the original one is centered at $x$ and it is easy to verify that we can express it for any $A\subset[0,\infty)$ by $\phi_{x}(A)=\phi(A-x)+\phi(-A-x)$ .

Proposition 5.

$Z(t)$ * and $Y^{\prime}_{u}(t)$ can be coupled such that $Z(t)\leq Y^{\prime}_{u}(t)$ for all $t\geq 0$ .*

Proof.

At 0 we have $Z(0)=Y^{\prime}_{u}(0)$ . We construct the coupling iteratively, assuming $Z(t-1)\leq Y^{\prime}_{u}(t-1)$ we perform the next step of the coupling which will satisfy $Z(t)\leq Y^{\prime}_{u}(t)$ .

We will use the monotone coupling between the two. For two probability distributions $\rho,\rho^{\prime}$ the monotone coupling is the one assigning $x$ to $x^{\prime}$ when $\rho((-\infty,x])=\rho^{\prime}((-\infty,x^{\prime}])$ . (We now skip currently irrelevant technical details about continuity, etc.). It is easy to verify that $x\leq x^{\prime}$ is maintained through this coupling exactly if $\rho((-\infty,y])\geq\rho^{\prime}((-\infty,y])$ for all $y$ . In our case we will need the following:

Lemma 6.

For any $v\geq\bar{v}\geq 0$ and $u\geq 0$ :

[TABLE]

Here $\bar{v}$ corresponds to $Z(t-1)$ and $v$ to $Y^{\prime}_{u}(t-1)$ and we compare the distributions for step $t$ .

Proof.

We are going to prove the following two inequalities:

[TABLE]

For the first of the two we compute $\partial_{v}\phi_{v}([0,u])$ :

[TABLE]

This last inequality holds because $|-v+u|\leq|-v-u|$ and $\varphi(x)$ is decreasing in $|x|$ . Consequently, when $\bar{v}$ is increased to $v$ , the measure of $[0,u]$ decreases confirming the first inequality. This intuitively means that when a Gaussian distribution is shifted to the right then even the reflected Gaussian is shifted (if it was centered at a non-negative point).

The second inequality to confirm is the following:

[TABLE]

We rearrange and cancel out as much as possible from the domain of integrations.

[TABLE]

We substitute the functions to integrate and transform them to compare them on the same domain.

[TABLE]

On all the domain of integration we have $x\leq y$ . Therefore the exponent is larger at every point for the left hand side, which confirms the second inequality, completing the proof of the lemma. ∎

Lemma 6 thus ensures that the monotone coupling preserves the ordering, and we can indeed use the recursive coupling scheme while keeping $Z(t)\leq Y^{\prime}_{u}(t)$ at every step. ∎

Proposition 7.

For any $\alpha_{3}>0$ there exists $\beta_{3}>0$ with the following. For large enough $A$ with probability at least $1-\alpha_{3}$ we have $\nu_{m}<\beta_{3}A^{2}$ .

Proof.

First we look at the hitting time analogous to $\tilde{\nu}_{m}$ for $Y^{\prime}_{u}$ defined as $\hat{\mathbf{\nu}}_{m}=\min\{s:Y^{\prime}_{u}(s)\geq 1/2-\delta\}$ . Without aiming for tight estimates $\hat{\mathbf{\nu}}_{m}\leq t$ can be ensured by $Y^{\prime}_{u}(t)\geq 1/2-\delta$ and by Proposition 5 this holds whenever $Z(t)\geq 1/2-\delta$ . The latter is equivalent to $\tilde{Z}(t)\notin[-1/2+\delta,1/2-\delta]$ .

For some $\beta_{3}>0$ , the distribution of $\tilde{Z}(\beta_{3}A^{2})$ is $\mathcal{N}(X_{u}(0),\beta_{3}/2)$ . Choosing $\beta_{3}$ large enough, the probability of this falling into $[-1/2+\delta,1/2-\delta]$ can be made below $\alpha_{3}/2$ and this event is a superset of $\hat{\mathbf{\nu}}_{m}>\beta_{3}A^{2}$ .

Now apply Lemma 3 with $\alpha_{1}=\beta_{3}$ . Note that $\tilde{\nu}_{m}\neq\hat{\mathbf{\nu}}_{m}$ can only happen if $\nu_{c}^{1}<\tilde{\nu}_{m}$ . Also Lemma 4 ensures that $\nu_{m}$ and $\tilde{\nu}_{m}$ almost always coincide. Altogether, we have $\nu_{m}=\tilde{\nu}_{m}=\hat{\mathbf{\nu}}_{m}<\beta_{3}A^{2}$ with an exceptional probability at most $O(\exp(-\beta_{2}A^{2}))+O(\exp(-\beta_{1}A^{2}))+\alpha_{3}/2$ , this stays below $\alpha_{3}$ when $A$ is large enough, which completes the proof. ∎

3.2 Diffusion from the middle

In the previous subsection we have seen that the process $Y_{u}(t)$ eventually has to reach the middle of the interval $[0,1]$ as formulated in Proposition 7. Now we complement the analysis and consider the case when the process is initialized from the middle, meaning $Y_{u}(0)\in[1/2-\delta,1/2+\delta]$ . Intuitively, we expect the process to evolve as a random walk with independent Gaussian increments. However, we have to be careful as boundary effects might alter the behavior of $Y_{u}(t)$ when it moves near the ends of the interval $[0,1]$ . In this subsection we provide the techniques to estimate these boundary effects which will allow to conclude that the mixing of a random walk still translates to comparable mixing of $Y_{u}(t)$ .

Let $W(t)$ be a random walk with i.i.d. $\mathcal{N}(0,1/(2A^{2}))$ increments, starting from $W(0):=Y_{u}(0)$ . Our goal is to couple $W(t)$ with $Y_{u}(t)$ which only has a chance as long as $W(t)$ stays within $[0,1]$ .

Definition 8.

Let $\nu_{c}^{2}:=\min\{s:W(s)\notin[0,1]\}$ .

Lemma 9.

There exist a coupling of the processes $Y_{u}$ and $W$ such that $Y_{u}(t)=W(t)$ whenever $t<\nu_{c}^{2}$ .

Proof.

Assume the coupling holds until $t-1$ , having $Y_{u}(t-1)=W(t-1)$ . Let $\zeta\sim\mathcal{N}(0,1/(2A^{2}))$ be independent from the past, then define $W(t)=W(t-1)+\zeta$ . For $Y_{u}(t)$ , accept $Y_{u}(t-1)+\zeta$ if it is in $[0,1]$ otherwise redraw it according to $\pi(\cdot,Y_{u}(t-1))$ .

The same values are obtained for the two processes at $t$ except if $W(t)$ is outside $[0,1]$ . This is exactly the event we wanted to indicate with $\nu_{c}^{2}$ when we allow the two processes to decouple. ∎

Lemma 10.

For any $\alpha_{4}>0$ there exists $\beta_{4}>0$ with the following property. For $A$ large enough, if $Y_{u}(0)\in[1/2-\delta,1/2+\delta]$ there holds $P(\nu_{c}^{2}<\alpha_{4}A^{2})<\beta_{4}$ . We also have $\beta_{4}\rightarrow 0$ as we choose $\alpha_{4}\rightarrow 0$ .

Proof.

We need to control the minimum and the maximum of a random walk where we use the following result of Erdős and Kac [4]:

Theorem 11 (Erdős-Kac).

Let $\xi_{1},\xi_{,}\ldots$ i.i.d. random variables, $\mathbb{E}\xi_{k}=0,~{}D^{2}\xi_{k}=1$ . Let $S_{k}=\xi_{1}+\xi_{2}+\ldots+\xi_{k}.$ Then for any $\alpha\geq 0$

[TABLE]

Translating to the current situation, now that we use an initial value $Y_{u}(0)\in[1/2-\delta,1/2+\delta]$ as a reference, we want an upper bound on the probability that the partial sums generating $W(t)$ never exceed $1/2-\delta$ (nor they go below $-1/2+\delta$ ). The increments have variance $1/(2A^{2})$ and the number of steps is $\alpha_{4}A^{2}$ . Formally,

[TABLE]

Now $\nu_{c}^{2}<\alpha_{4}A^{2}$ can only occur if this event fails and the maximum exceeds $1/2-\delta$ , meaning $W(t)$ might exceed 1, or alternatively, the minimum of the process goes below $-1/2+\delta$ corresponding to $W(t)$ possibly leaving $[0,1]$ at 0. Consequently, we may fix any small $\varepsilon>0$ , then for any large enough $A$ we get

[TABLE]

Observe that the right hand side of the expression indeed converges to 0 as $\alpha_{4}\rightarrow 0$ . ∎

Proposition 12.

There exists a constant $\alpha_{5}>0$ such that for $A$ large enough, if $Y_{u}(0)\in[1/2-\delta,1/2+\delta]$ we have

[TABLE]

Proof.

We introduce $\alpha_{5}$ as a parameter. We will find sufficient conditions that ensure the claim of the proposition to hold, then pick a $\alpha_{5}$ that satisfies the conditions found.

We first compare two simpler distributions, that of $W(\alpha_{5}A^{2})$ and the uniform $\mu$ . By the definition of $W(t)$ , the distribution of $W(\alpha_{5}A^{2})$ is $\mathcal{N}(Y_{u}(0),\alpha_{5}/2)$ .

[TABLE]

The integrand has the form $|a-b|$ which we replace by $a+b-2\min(a,b)$ (knowing these variables are non-negative). Also, as the probability density functions integrate to 1, we get

[TABLE]

The last inequality follows because the constant term is increased by $2\delta$ , so is the length of the domain of the integration but the integrand is bounded above by 1. This step also involves an implicit change of variable depending on $Y_{u}(0)$ , and it results in a final expression independent of this starting condition. The $\gamma$ we get is also independent of $A$ , it does depend on $\delta$ but has a limit as $\delta\rightarrow 0$ .

The claim of the lemma is about two other distributions, now we relate them to the ones just compared. Using Lemma 10 for $\alpha_{4}=\alpha_{5}$ we know that $Y_{u}(t)$ and $W(t)$ can be coupled well up to $t=\alpha_{5}A^{2}$ , which directly implies

[TABLE]

where $\beta_{4}$ is the constant given by Lemma 10.

To compare $\pi_{u}$ with $\mu$ we show $\pi_{u}$ converges to $\mu$ in total variation as $A\rightarrow\infty$ . For every $x\in[0,1]$ define

[TABLE]

this is a function proportional to the density of $\pi_{u}$ . By standard Gaussian tail estimates for all $x\in(0,1)$ we get

[TABLE]

Hence for all $x\in(0,1),~{}p_{u}(x)\rightarrow 1$ as $A\rightarrow\infty$ . These are uniformly bounded functions, so $\int_{0}^{1}p_{u}\rightarrow 1$ . The expression to consider for the convergence of the distributions is

[TABLE]

Here $1/\int_{0}^{1}p_{u}$ is converging to 1 and is therefore bounded after some threshold, so the functions are eventually uniformly bounded and pointwise converging to 0. Thus the integrals also converge, and we get

[TABLE]

We can now combine our bounds of (2), (3) and (4):

[TABLE]

where $\varepsilon>0$ can be as small as wanted by setting $A$ large enough. The proposition holds if we can ensure this sum to be small enough.

Note that a strong compromise is present for the choice of the constant $\alpha_{5}$ . In (3) we want to limit how likely the boundaries of the unit interval are to be reached, at the same time in (2) we want to show that $Y_{u}(s)$ is already spread out to some extent.

Still, a specific choice is possible. For $\alpha_{5}=0.10$ Lemma 10 provides $\beta_{4}\approx 0.051$ when using $\delta=\varepsilon=0$ and computer calculations for (1). By choosing $\delta,\varepsilon>0$ but small enough, trusting computers but not too much, we can safely say $\beta_{4}<0.06$ . In (2) using the same choice of $\alpha_{5}$ we numerically get $\gamma\approx 0.263$ for $\delta=\varepsilon=0$ . Once again we allow a safety margin to only claim $\beta_{4}+\gamma+\varepsilon<1/3$ . ∎

4 Overall mixing

We are now ready to establish mixing time bounds for the process we understand the best, $Y_{u}(t)$ , then we will translate those results to the original process of interest $X(t)$ .

Let us define

[TABLE]

which measures the distance from the stationary distribution from the worst starting point. We can give good bounds based on the previous sections:

Lemma 13.

There exists a constant $\beta_{6}>0$ such that $d(\beta_{6}A^{2})<4/9$ .

Proof.

Intuitively, from any starting point we can first wait for the process to reach the middle and then let the diffusion happen from there, as these are components we can already control.

Let us apply Proposition 7 with $\alpha_{3}=1/9$ providing a certain $\beta_{3}$ . Once the process is in the middle part $[1/2-\delta,1/2+\delta]$ we know by Proposition 12 that in the subsequent $\alpha_{5}A^{2}$ steps sufficient diffusion occurs. Let $\beta_{6}=\beta_{3}+\alpha_{5}$ .

Formally, fix $Y_{u}(0)\in[0,1]$ . We perform our calculations by conditioning on the value of $\nu_{m}$ .

[TABLE]

Conditioned on $\nu_{m}=s$ , $Y_{u}(s)\in[1/2-\delta,1/2+\delta]$ , therefore Proposition 12 provides $\|\mathcal{L}(Y_{u}(s+\alpha_{5}A^{2})~{}|~{}\nu_{m}=s)-\pi_{u}\|_{\rm TV}<1/3$ . We use this for $s\leq\beta_{3}A^{2}$ , then performing $\beta_{3}A^{2}-s$ more steps can only decrease this distance, see [6, Chapter 4] for a detailed discussion about this. For $s>\beta_{3}A^{2}$ we use the trivial bound on the total variation distance. We get

[TABLE]

∎

A slight variation of $d(t)$ compares the distribution of the process when launched from two different starting points.

[TABLE]

Standard results provide the inequalities $d(t)\leq\bar{d}(t)\leq 2d(t)$ and the submultiplicativity $\bar{d}(s+t)\leq\bar{d}(s)\bar{d}(t)$ , see [6, Chapter 4]. The results therein are given for finite state Markov chains but are straightforward to translate to the current case of absolutely continuous distributions and transition kernels.

Proposition 14.

For any $0<\alpha_{7}<1$ there exists $\beta_{7}>0$ such that

[TABLE]

Proof.

Using Lemma 13 for any $k\geq 1$ we get

[TABLE]

For $k=\lceil\log\alpha_{7}/\log(8/9)\rceil$ this is less than $\alpha_{7}$ thus by setting $\beta_{7}=\beta_{6}\lceil\log\alpha_{7}/\log(8/9)\rceil$ the process will be close enough to the stationary distribution as required at $t=\beta_{7}A^{2}$ . ∎

Lemma 15.

The mixing time of $Y_{u}$ and $Y$ are nearly the same, for any $0<\alpha_{7}<1$

[TABLE]

Proof.

First, we use that the total variation distance between the marginals is at most the distance between the overall distributions. Consequently, for any $t$ we have $\|\mathcal{L}(Y_{u}(t-1))-\pi_{u}\|_{\rm TV}\leq\|\mathcal{L}(Y(t))-\pi\|_{\rm TV}$ . This gives $t_{\rm mix}(Y,\alpha_{7})\geq t_{\rm mix}(Y_{u},\alpha_{7})+1$ .

For the other direction, assume $\|\mathcal{L}(Y_{u}(t))-\pi_{u}\|_{\rm TV}\leq\alpha_{7}$ for some $t$ . This means there is an optimal coupling with a random variable $\tilde{Y}_{u}^{1}$ having distribution $\pi_{u}$ such that $P(Y_{u}(t)\neq\tilde{Y}_{u}^{1})\leq\alpha_{7}$ . As $\tilde{Y}_{u}^{1}$ has distribution $\pi_{u}$ , it is possible to draw an additional random variable $\tilde{Y}_{u}^{2}$ to get $(\tilde{Y}_{u}^{2},\tilde{Y}_{u}^{1})$ with distribution $\pi$ .

This is the same step when generating $Y_{u}(t+1)$ from $Y_{u}(t)$ thus we may keep the above coupling whenever already present. Therefore we have $P\big{(}(Y_{u}(t+1),Y_{u}(t))\neq(\tilde{Y}_{u}^{2},\tilde{Y}_{u}^{1})\big{)}\leq\alpha_{7}$ which can also be written as $\|\mathcal{L}(Y(t+1))-\pi\|_{\rm TV}\leq\alpha_{7}$ . This implies $t_{\rm mix}(Y,\alpha_{7})\leq t+1$ , completing the proof. ∎

We are now ready to prove the main theorem of the paper, as stated in the introduction.

Theorem 1.

Let $X(t)$ be the coordinate Gibbs sampler for the diagonal distribution. For any $0<\alpha<1$ there exists $\beta>0$ such that for large enough $A$

[TABLE]

Proof.

We use Proposition 14 with $\alpha_{7}=\alpha/2$ and get a constant $\beta_{7}$ such that $t_{\rm mix}(Y_{u},\alpha/2)\leq\beta_{7}A^{2}$ and by Lemma 15 also $t_{\rm mix}(Y,\alpha/2)\leq\beta_{7}A^{2}+1$ . At each step the distribution of $X^{*}$ and $Y$ might differ only by flipping along the diagonal, which does not change the distance from the (symmetric) $\pi$ thus also leaves the mixing time the same so we get $t_{\rm mix}(X^{*},\alpha_{7}/2)\leq\beta_{7}A^{2}+1$ .

The definition of $X^{*}$ was based on the observation that when the same coordinate is rerandomized repeatedly, no additional mixing happens and the values at that coordinate simply get overwritten. Let us now quantify this effect, counting how many times did the direction of randomization change:

[TABLE]

With this notation we see that $\mathcal{L}(X(t)~{}|~{}N(t)=k,r(1)=V)=\mathcal{L}(X^{*}(k+1))$ for all $t\geq 1$ .

Without the loss of generality we now focus on the case of $r(1)=V$ . Let us express the distribution of $X(t)$ conditioning on the value of $N(t)$ .

[TABLE]

We substitute $t=3\beta_{7}A^{2}$ and evaluate the total variation distance from $\pi$ .

[TABLE]

The last line holds with some positive $\varepsilon$ by Hoeffding’s inequality for the Binomial distribution and by substituting the upper bound on the total variation distance when we know $k$ is above the mixing time. For large enough $A$ this is below $\alpha$ .

By symmetry, the same bound holds for $\mathcal{L}(X(3\beta_{7}A^{2})~{}|~{}r(1)=U)$ and by convexity it is also true for the mixture of the two, the unconditional distribution of $X(3\beta_{7}A^{2})$ . This concludes the proof with $\beta=3\beta_{7}$ . ∎

Finally, let us comment on the multitude of constants $\alpha_{i},\beta_{i}$ appearing throughout the proofs, verifying that they can be consistently chosen when needed. First, a small enough $\delta>0$ has to be picked for the proof of Proposition 12 which also relies on Lemma 10. Once it is fixed, observe that in the remaining Sections 3.1 and 4 all the constants only depend on other ones with lower indices, with the last $\alpha,\beta$ of Theorem 1 also depending on some previous ones. This excludes the issue of circular dependence.

5 Further estimates

In this section we complement the main result Theorem 1 by a lower bound showing that the order of $A^{2}$ is exact and by demonstrating the evolution of the distribution via numerical simulations.

Such a lower bound is plausible once having Lemma 9 and Lemma 10, these roughly say that when starting from the middle $Y_{u}$ behaves like a random walk for order of $A^{2}$ steps and reaches only constant distance in order of $A^{2}$ steps. Let us proceed by forming a formal argument.

Theorem 16.

Let $X(t)$ be the coordinate Gibbs sampler for the diagonal distribution. There exists constants $\alpha^{\prime},\beta^{\prime}>0$ such that for large enough $A$

[TABLE]

First of all, to bound the mixing time from below it is sufficient to give a lower bound on the number of steps needed for a single starting point. In this spirit, we set $X(0)=(1/2,1/2)$ . With this choice, the arguments in Section 3.2 can be applied.

Set $S=[0,1/4]^{2}\cup[3/4,1]^{2}$ . Once we prove $\pi(S)-P(X(\beta^{\prime}A^{2})\in S)>\alpha^{\prime}$ for a proper choice of $\alpha^{\prime},\beta^{\prime}$ that warrants a large total variation distance at the time $\beta^{\prime}A^{2}$ and confirms our bound for the mixing time.

Lemma 17.

$\pi(S)\geq 1/8.$ **

Proof.

If we divide the unit square to 4-by-4 equal size smaller squares, then $S$ is composed of two of these smaller squares, see Figure 1.

It is enough to show that the selected squares forming $S$ have greater or equal probability than the other squares w.r.t. $\pi$ , this directly confirms $\pi(S)\geq 2/16=1/8$ .

To verify this, we compare the unnormalized density $\varphi$ on them. We use the simple inequality that for $u,v\in[0,1/4]$ and any $x\geq 0$ we have

[TABLE]

Indeed, note that $\varphi(y)$ is monotone decreasing with $|y|$ . Then for $u\geq v,x=0$ an easy comparison of the arguments provides the bound, while the other cases follow similarly. Observe that for $x=0$ this inequality compares $\varphi$ at some point and its reflection to the line $v=1/4$ . Setting $x>0$ corresponds to a further shift increasing the $v$ coordinate after the reflection.

Using this we see that the points of the small square labeled by $(a,a)$ in Figure 1 correspond to the points of $(a,b)$ after a reflection so $\varphi$ is pointwise larger on $(a,a)$ by the above inequality. The same comparison holds against $(a,c),~{}(a,d)$ where an additional shift is necessary besides the reflection. Consequently, $\pi$ is maximal for the $(a,a)$ square compared to the others in its column.

Additionally, note that $\varphi(u-v)$ is invariant under the shift of $(u,v)$ in the direction $(1,1)$ . Therefore $\pi$ is exactly the same for the four squares on the diagonal. Furthermore, all the other squares are diagonally shifted and/or reflected (w.r.t. the diagonal) copies of the ones considered above, where we have seen that their probability is upper bounded by the probability of the square $(a,a)$ . The distribution $\pi$ is symmetric w.r.t. the diagonal, so we conclude that $(a,a)$ (and therefore also $(d,d)$ ) have indeed maximal probability among all squares. ∎

Lemma 18.

For any $\alpha^{\prime}_{1}>0$ there exists $\beta^{\prime}_{1}>0$ such that for large enough $A$ any $t\leq\beta^{\prime}_{1}A^{2}$ satisfies

[TABLE]

Proof.

We want to rely on the previous observations that $Y_{u}(t)$ behaves like a random walk for a while with certain Gaussian increments. Using Lemma 10 we can choose $\alpha_{4}>0$ so that the corresponding $\beta_{4}$ goes below $\alpha^{\prime}_{1}/2$ . Let us denote this $\alpha_{4}$ by $\beta^{\prime}_{2}$ for convenience.

Also, there exists $\beta^{\prime}_{3}>0$ so that

[TABLE]

and clearly the same probability bound holds if the variance is decreased. Fixing $Y_{u}(0)=W(0)=1/2$ , the distribution of $W(\beta^{\prime}_{3}A^{2})$ is exactly $\mathcal{N}(1/2,\beta^{\prime}_{3}/2)$ .

To join our estimates we form

[TABLE]

For $t\leq\beta^{\prime}_{2}A^{2}$ the first term is below $\alpha^{\prime}_{1}/2$ as it is an upper bound for the decoupling of $W,Y_{u}$ to happen. For $t\leq\beta^{\prime}_{3}A^{2}$ the second term is below $\alpha^{\prime}_{1}/2$ . Altogether, if $t\leq\min(\beta^{\prime}_{2},\beta^{\prime}_{3})A^{2}$ ,

[TABLE]

Therefore by choosing $\beta^{\prime}_{1}=\min(\beta^{\prime}_{2},\beta^{\prime}_{3})$ we complete the proof. ∎

Proof of Theorem 16.

Apply Lemma 18 with $\alpha^{\prime}_{1}=1/16$ to get some $\beta^{\prime}_{1}$ . The distribution of $X(\beta^{\prime}_{1}A^{2})$ is a mixture of the distributions of $Y(t)$ and their diagonally flipped version for $t\leq\beta^{\prime}_{1}A^{2}$ , where $t$ corresponds to how many times the rerandomization happened in a new direction. The set $S$ is symmetric w.r.t. the diagonal so for $P(X(\beta^{\prime}_{1}A^{2})\in S)$ we can simply say it is a convex combination of $P(Y(t)\in S),~{}t\leq\beta^{\prime}_{1}A^{2}$ without needing any correction for the diagonal flip. Now by Lemma 18 each of these probabilities are below $\alpha^{\prime}_{1}$ , therefore it follows that

[TABLE]

Comparing this with the statement of Lemma 17 we get

[TABLE]

Consequently $\|\mathcal{L}(X(\beta^{\prime}_{1}A^{2}))-\pi\|_{\rm TV}>1/16$ , so $t_{\rm mix}(X,1/16)>\beta^{\prime}_{1}A^{2}$ . Thus the theorem holds with the choice $\alpha^{\prime}=1/16,~{}\beta^{\prime}=\beta^{\prime}_{1}$ . ∎

Finally, we present numerical approximations of the evolution of the distribution over time for different values of $A$ . The unit square is discretized with a resolution of $500\times 500$ and the distribution is calculated along these points. The starting point is always $(0,0)$ at the lower left corner. The results are presented in Figure 2 for different $A$ and different $t$ . Both the convergence to the stationary distribution is visible and also how this distribution becomes more concentrated along the diagonal for higher values of $A$ . We also computed the time necessary to get within a total variation distance of $1/4$ of the stationary distribution, for $A=10,t=71$ , for $A=50,t=1858$ , for $A=250,t=47233$ is needed. This is a good proxy for the mixing time, note that only a single (but intuitively bad) starting point was tested and the discretization might have introduced some error. Still, the quadratic growth of $t$ with respect to the increase of $A$ is already apparent.

Acknowledgments

The author would like to express his thanks to Persi Diaconis and György Michaletzky for their inspiring comments and to the American Institute of Mathematics for the stimulating workshop they hosted and organized.

Bibliography8

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. Diaconis , The Markov chain Monte Carlo revolution , Bulletin of the American Mathematical Society, 46 (2009), pp. 179–205.
2[2] P. Diaconis and L. Saloff-Coste , What do we know about the Metropolis algorithm? , Journal of Computer and System Sciences, 57 (1998), pp. 20–36.
3[3] M. Dyer, A. D. Flaxman, A. M. Frieze, and E. Vigoda , Randomly coloring sparse random graphs with fewer colors than the maximum degree , Random Structures & Algorithms, 29 (2006), pp. 450–465.
4[4] P. Erdős and M. Kac , On certain limit theorems of the theory of probability , Bulletin of the American Mathematical Society, 52 (1946), pp. 292–302.
5[5] R. Kannan, L. Lovász, and M. Simonovits , Random walks and an O ∗ ( n 5 ) superscript 𝑂 superscript 𝑛 5 {O}^{*}(n^{5}) volume algorithm for convex bodies , Random Structures & Algorithms, 11 (1997), pp. 1–50.
6[6] D. Levin, Y. Peres, and E. Wilmer , Markov chains and mixing times , American Mathematical Society, 2009.
7[7] L. Lovász and S. Vempala , Simulated annealing in convex bodies and an O ∗ ( n 4 ) superscript 𝑂 superscript 𝑛 4 {O^{*}}(n^{4}) volume algorithm , Journal of Computer and System Sciences, 72 (2006), pp. 392–417.
8[8] E. Mossel and A. Sly , Gibbs rapidly samples colorings of G ( n , d / n ) 𝐺 𝑛 𝑑 𝑛 {G}(n,d/n) , Probability Theory and Related Fields, 148 (2010), pp. 37–69.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Mixing time of an unaligned Gibbs sampler on the square

Abstract

1 Introduction

Theorem 1**.**

2 Preliminaries, alternative processes

3 Dynamics of Yu(t)Y_{u}(t)Yu​(t)

3.1 Reaching the middle

Definition 2**.**

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

Proposition 5**.**

Proof.

Lemma 6**.**

Proof.

Proposition 7**.**

Proof.

3.2 Diffusion from the middle

Definition 8**.**

Lemma 9**.**

Proof.

Lemma 10**.**

Proof.

Theorem 11** (Erdős-Kac).**

Proposition 12**.**

Proof.

4 Overall mixing

Lemma 13**.**

Proof.

Proposition 14**.**

Proof.

Lemma 15**.**

Proof.

Theorem 1.

Proof.

5 Further estimates

Theorem 16**.**

Lemma 17**.**

Proof.

Lemma 18**.**

Proof.

Proof of Theorem 16.

Acknowledgments

Theorem 1.

3 Dynamics of $Y_{u}(t)$

Definition 2.

Lemma 3.

Lemma 4.

Proposition 5.

Lemma 6.

Proposition 7.

Definition 8.

Lemma 9.

Lemma 10.

Theorem 11 (Erdős-Kac).

Proposition 12.

Lemma 13.

Proposition 14.

Lemma 15.

Theorem 16.

Lemma 17.

Lemma 18.