Fast initial conditions for Glauber dynamics

Eyal Lubetzky; Allan Sly

arXiv:1701.06042·math.PR·January 24, 2017

Fast initial conditions for Glauber dynamics

Eyal Lubetzky, Allan Sly

PDF

TL;DR

This paper introduces new methods using information percolation to analyze the mixing times of Glauber dynamics for the 1D Ising model from various initial states, revealing temperature-dependent optimal starting conditions.

Contribution

It provides the first analysis of mixing times from non-worst-case initial states in Glauber dynamics, showing the alternating initial condition is fastest at high temperatures.

Findings

01

Alternating initial condition is fastest at high temperatures.

02

Mixing time at the optimal initial condition is faster than at infinite temperature.

03

The dominant test function varies with temperature, switching from autocorrelation to Hamiltonian.

Abstract

In the study of Markov chain mixing times, analysis has centered on the performance from a worst-case starting state. Here, in the context of Glauber dynamics for the one-dimensional Ising model, we show how new ideas from information percolation can be used to establish mixing times from other starting states. At high temperatures we show that the alternating initial condition is asymptotically the fastest one, and, surprisingly, its mixing time is faster than at infinite temperature, accelerating as the inverse-temperature $β$ ranges from 0 to $β_{0} = \frac{1}{2} arctanh (\frac{1}{3})$ . Moreover, the dominant test function depends on the temperature: at $β < β_{0}$ it is autocorrelation, whereas at $β > β_{0}$ it is the Hamiltonian.

Figures1

Click any figure to enlarge with its caption.

Equations226

x^{alt} (i) = {1 - 1 i \equiv 0 (mod 2) i \equiv 1 (mod 2),

x^{alt} (i) = {1 - 1 i \equiv 0 (mod 2) i \equiv 1 (mod 2),

x^{blt} (i) = {1 - 1 i \equiv 0, 3 (mod 4) i \equiv 1, 2 (mod 4) .

x^{blt} (i) = {1 - 1 i \equiv 0, 3 (mod 4) i \equiv 1, 2 (mod 4) .

\left|t_{\textsc{mix}}^{x^{\mathrm{alt}}}(\varepsilon)-\max\big{\{}\tfrac{1}{4-2\theta},\tfrac{1}{4\theta}\big{\}}\log n\right|\leq C\log\log n\,.

\left|t_{\textsc{mix}}^{x^{\mathrm{alt}}}(\varepsilon)-\max\big{\{}\tfrac{1}{4-2\theta},\tfrac{1}{4\theta}\big{\}}\log n\right|\leq C\log\log n\,.

t_{\textsc mi x}^{x^{blt}} (ε) - max {\frac{1}{2}, \frac{1}{4 θ}} lo g n \leq C lo g lo g n .

t_{\textsc mi x}^{x^{blt}} (ε) - max {\frac{1}{2}, \frac{1}{4 θ}} lo g n \leq C lo g lo g n .

t_{\textsc mi x}^{x^{+}} (ε) - \frac{1}{2 θ} lo g n \leq C lo g lo g n .

t_{\textsc mi x}^{x^{+}} (ε) - \frac{1}{2 θ} lo g n \leq C lo g lo g n .

\nu\left(\left\{x_{0}:\left|t_{\textsc{mix}}^{x_{0}}(\varepsilon)-\tfrac{1}{2\theta}\log n\right|\leq C\log\log n\right\}\right)\to 1\quad\mbox{as $n\to\infty$}\,.

\nu\left(\left\{x_{0}:\left|t_{\textsc{mix}}^{x_{0}}(\varepsilon)-\tfrac{1}{2\theta}\log n\right|\leq C\log\log n\right\}\right)\to 1\quad\mbox{as $n\to\infty$}\,.

t_{\textsc mi x}^{ν} (ε) - \frac{1}{4 θ} lo g n \leq C lo g lo g n .

t_{\textsc mi x}^{ν} (ε) - \frac{1}{4 θ} lo g n \leq C lo g lo g n .

t_{⋆} = \frac{1}{4 - 2 θ} lo g n - 8 lo g lo g n

t_{⋆} = \frac{1}{4 - 2 θ} lo g n - 8 lo g lo g n

n \to \infty lim x_{0} in f ∥ P_{x_{0}} (X_{t_{⋆}} \in \cdot) - π ∥_{\textsc t v} = 1 .

n \to \infty lim x_{0} in f ∥ P_{x_{0}} (X_{t_{⋆}} \in \cdot) - π ∥_{\textsc t v} = 1 .

π (σ) = Z^{- 1} e^{β \sum_{uv \in E} σ (u) σ (v)},

π (σ) = Z^{- 1} e^{β \sum_{uv \in E} σ (u) σ (v)},

t_{\textsc{mix}}^{x_{0}}(\varepsilon)=\inf\big{\{}t\;:\;\|\mathbb{P}_{x_{0}}(X_{t}\in\cdot)-\pi\|_{\textsc{tv}}\leq\varepsilon\big{\}}\,,

t_{\textsc{mix}}^{x_{0}}(\varepsilon)=\inf\big{\{}t\;:\;\|\mathbb{P}_{x_{0}}(X_{t}\in\cdot)-\pi\|_{\textsc{tv}}\leq\varepsilon\big{\}}\,,

t_{\textsc mi x} (ε) = x_{0} \in Ω max t_{\textsc mi x}^{x_{0}} (ε),

t_{\textsc mi x} (ε) = x_{0} \in Ω max t_{\textsc mi x}^{x_{0}} (ε),

P (F_{\textsc s} (v, t_{1}, t_{2}) \neq = \emptyset) = e^{- (t_{2} - t_{1}) θ} .

P (F_{\textsc s} (v, t_{1}, t_{2}) \neq = \emptyset) = e^{- (t_{2} - t_{1}) θ} .

\mathbb{P}\bigg{(}\max_{v\in\mathbb{Z}/n\mathbb{Z}}\max_{\begin{subarray}{c}0\leq s\leq t\\ \mathscr{F}_{\textsc{s}}(v,s,t)\neq\emptyset\end{subarray}}|v-\mathscr{F}_{\textsc{s}}(v,s,t)|\geq\tfrac{1}{10}\log^{2}n\bigg{)}\leq O(n^{-10})\,.

\mathbb{P}\bigg{(}\max_{v\in\mathbb{Z}/n\mathbb{Z}}\max_{\begin{subarray}{c}0\leq s\leq t\\ \mathscr{F}_{\textsc{s}}(v,s,t)\neq\emptyset\end{subarray}}|v-\mathscr{F}_{\textsc{s}}(v,s,t)|\geq\tfrac{1}{10}\log^{2}n\bigg{)}\leq O(n^{-10})\,.

\mathbb{P}\bigg{(}\max_{\begin{subarray}{c}t-\log^{3/2}n\leq s\leq t\\ \mathscr{F}_{\textsc{s}}(v,s,t)\neq\emptyset\end{subarray}}|v-\mathscr{F}_{\textsc{s}}(v,s,t)|\geq\tfrac{1}{10}\log^{2}n\bigg{)}\leq O(n^{-11})\,.

\mathbb{P}\bigg{(}\max_{\begin{subarray}{c}t-\log^{3/2}n\leq s\leq t\\ \mathscr{F}_{\textsc{s}}(v,s,t)\neq\emptyset\end{subarray}}|v-\mathscr{F}_{\textsc{s}}(v,s,t)|\geq\tfrac{1}{10}\log^{2}n\bigg{)}\leq O(n^{-11})\,.

∥ P_{x^{alt}} (X_{t_{⋆}} (v) \in \cdot) - π ∣_{v} ∥_{\textsc t v}

∥ P_{x^{alt}} (X_{t_{⋆}} (v) \in \cdot) - π ∣_{v} ∥_{\textsc t v}

∥ P_{x^{blt}} (X_{t_{⋆}} (v) \in \cdot) - π ∣_{v} ∥_{\textsc t v}

P (Y (s) = a) = {\frac{1}{2} + \frac{1}{2} e^{- 2 (1 - θ) s} \frac{1}{2} - \frac{1}{2} e^{- 2 (1 - θ) s} \mbox i f a = x^{alt} (v), \mbox o t h er w i se .

P (Y (s) = a) = {\frac{1}{2} + \frac{1}{2} e^{- 2 (1 - θ) s} \frac{1}{2} - \frac{1}{2} e^{- 2 (1 - θ) s} \mbox i f a = x^{alt} (v), \mbox o t h er w i se .

∥ P_{x^{alt}} (X_{t_{⋆}} (v) \in \cdot) - π ∣_{v} ∥_{\textsc t v} = \frac{1}{2} e^{- 2 (1 - θ) t_{⋆}} e^{- θ t_{⋆}} = \frac{1}{2} e^{- (2 - θ) t_{⋆}} .

∥ P_{x^{alt}} (X_{t_{⋆}} (v) \in \cdot) - π ∣_{v} ∥_{\textsc t v} = \frac{1}{2} e^{- 2 (1 - θ) t_{⋆}} e^{- θ t_{⋆}} = \frac{1}{2} e^{- (2 - θ) t_{⋆}} .

\mathcal{B}=\bigg{\{}\max_{v\in\mathbb{Z}/n\mathbb{Z}}\max_{\begin{subarray}{c}t_{-}\leq s\leq t_{\star}\\ \mathscr{F}_{\textsc{s}}(v,s,t_{\star})\neq\emptyset\end{subarray}}|v-\mathscr{F}_{\textsc{s}}(v,s,t_{\star})|\leq\tfrac{1}{10}\log^{2}n\bigg{\}}\,,

\mathcal{B}=\bigg{\{}\max_{v\in\mathbb{Z}/n\mathbb{Z}}\max_{\begin{subarray}{c}t_{-}\leq s\leq t_{\star}\\ \mathscr{F}_{\textsc{s}}(v,s,t_{\star})\neq\emptyset\end{subarray}}|v-\mathscr{F}_{\textsc{s}}(v,s,t_{\star})|\leq\tfrac{1}{10}\log^{2}n\bigg{\}}\,,

P (B) \geq 1 - n^{- 10} .

P (B) \geq 1 - n^{- 10} .

{x : F_{\textsc s} (x, t_{-}, t_{⋆}) \neq = \emptyset} \subset i ⋃ W_{i},

{x : F_{\textsc s} (x, t_{-}, t_{⋆}) \neq = \emptyset} \subset i ⋃ W_{i},

i max W_{i} \leq lo g^{3} n,

i max W_{i} \leq lo g^{3} n,

i, i^{'} min d (W_{i}, W_{i^{'}}) \geq lo g^{2} n .

i, i^{'} min d (W_{i}, W_{i^{'}}) \geq lo g^{2} n .

M_{i} = {2 i lo g^{2} n, \dots, (2 i + 1) lo g^{2} n} (1 \leq i \leq \frac{n}{2 l o g ^{2} n}) .

M_{i} = {2 i lo g^{2} n, \dots, (2 i + 1) lo g^{2} n} (1 \leq i \leq \frac{n}{2 l o g ^{2} n}) .

\mathcal{B}^{\prime}=\biggl{\{}\max_{v\in\cup_{i}M_{i}}\max_{\begin{subarray}{c}t_{-}\leq s\leq t_{\star}\\ \mathscr{F}_{\textsc{s}}(v,s,t_{\star})\neq\emptyset\end{subarray}}|v-\mathscr{F}_{\textsc{s}}(v,s,t_{\star})|\leq\tfrac{1}{10}\log^{2}n\biggr{\}}\,.

\mathcal{B}^{\prime}=\biggl{\{}\max_{v\in\cup_{i}M_{i}}\max_{\begin{subarray}{c}t_{-}\leq s\leq t_{\star}\\ \mathscr{F}_{\textsc{s}}(v,s,t_{\star})\neq\emptyset\end{subarray}}|v-\mathscr{F}_{\textsc{s}}(v,s,t_{\star})|\leq\tfrac{1}{10}\log^{2}n\biggr{\}}\,.

D_{i} = {F_{\textsc s} (M_{i}, t_{-}, t_{⋆}) = \emptyset} .

D_{i} = {F_{\textsc s} (M_{i}, t_{-}, t_{⋆}) = \emptyset} .

P (D_{i}) \geq 1 - ∣ M_{i} ∣ e^{(1 - θ) κ l o g l o g n} \geq 1 - \frac{1}{lo g n},

P (D_{i}) \geq 1 - ∣ M_{i} ∣ e^{(1 - θ) κ l o g l o g n} \geq 1 - \frac{1}{lo g n},

P (D_{i}^{c} ∣ B^{'}) \leq \frac{P ( D _{i}^{c} )}{P ( B ^{'} )} \leq \frac{2}{lo g n} .

P (D_{i}^{c} ∣ B^{'}) \leq \frac{P ( D _{i}^{c} )}{P ( B ^{'} )} \leq \frac{2}{lo g n} .

P (D_{i}^{c}, D_{i + 1}^{c}, \dots, D_{i + \frac{1}{10} l o g n}^{c} ∣ B^{'}) \leq (\frac{2}{lo g n})^{\frac{1}{10} l o g n} \leq n^{- 10};

P (D_{i}^{c}, D_{i + 1}^{c}, \dots, D_{i + \frac{1}{10} l o g n}^{c} ∣ B^{'}) \leq (\frac{2}{lo g n})^{\frac{1}{10} l o g n} \leq n^{- 10};

\mathbb{P}\left(\mathcal{D}_{i}^{c},\mathcal{D}_{i+1}^{c},\ldots,\mathcal{D}_{i+\frac{1}{10}\log n}^{c}\right)\leq\mathbb{P}\left(\mathcal{D}_{i}^{c},\mathcal{D}_{i+1}^{c},\ldots,\mathcal{D}_{i+\frac{1}{10}\log n}^{c}\;\Big{|}\;\mathcal{B}^{\prime}\right)+\mathbb{P}\left({\mathcal{B}^{\prime}}^{c}\right)\leq 2n^{-10}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Fast Initial Conditions for Glauber Dynamics

Eyal Lubetzky

Eyal Lubetzky Courant Institute of Mathematical Sciences

New York University

New York, NY 10012, USA.

[email protected]

and

Allan Sly

Allan Sly Department of Mathematics

Princeton University

Princeton, NJ 08544, USA, and Department of Statistics

UC Berkeley

Berkeley, CA 94720, USA.

[email protected]

Abstract.

In the study of Markov chain mixing times, analysis has centered on the performance from a worst-case starting state. Here, in the context of Glauber dynamics for the one-dimensional Ising model, we show how new ideas from information percolation can be used to establish mixing times from other starting states. At high temperatures we show that the alternating initial condition is asymptotically the fastest one, and, surprisingly, its mixing time is faster than at infinite temperature, accelerating as the inverse-temperature $\beta$ ranges from 0 to $\beta_{0}=\frac{1}{2}\mathrm{arctanh}(\frac{1}{3})$ . Moreover, the dominant test function depends on the temperature: at $\beta<\beta_{0}$ it is autocorrelation, whereas at $\beta>\beta_{0}$ it is the Hamiltonian.

1. Introduction

In the study of mixing time of Markov chains, most of the focus has been on determining the asymptotics of the worst-case mixing time, while relatively little is known about the relative effect of different initial conditions. The latter is quite natural from an algorithmic perspective on sampling, since one would ideally initiate the dynamics from the fastest initial condition. However, until recently, the tools available for analyzing Markov chains on complex systems, such as the Ising model, were insufficient for the purpose of comparing the effect of different starting states; indeed, already pinpointing the asymptotics of the worst-case state for Glauber dynamics for the Ising model can be highly nontrivial.

In this paper we compare different initial conditions for the Ising model on the cycle. In earlier work [LS4], we analyzed three different initial conditions. The all-plus state is provably the worst initial condition up to an additive constant. Another is a quenched random condition chosen from $\nu$ , the uniform distribution on configurations, which with high probability has a mixing time which is asymptotically as slow. A third initial condition is an annealed random condition chosen from $\nu$ , i.e., to start at time 0 from the uniform distribution, which is asymptotically twice as fast as all-plus.

Here we consider two natural deterministic initial configurations. The first is the alternating sequence

[TABLE]

which we will show is asymptotically the fastest deterministic initial condition—yet strictly slower than starting from the annealed random condition—for all $\beta<\beta_{0}:=\frac{1}{2}\operatorname{arctanh}(\frac{1}{3})$ (at $\beta=\beta_{0}$ they match). The second is the bi-alternating sequence

[TABLE]

For convenience we will assume that $n$ is a multiple of 4, which ensures that the configurations are semi-translation invariant and turns both sequences into eigenvectors of the transition matrix of simple random walk on the cycle. (This is not necessary for the main result but leads to cleaner analysis.)

In what follows, set $\theta=\theta_{\beta}=1-\tanh(2\beta)$ , and let $t_{\textsc{mix}}^{x_{0}}(\varepsilon)$ denote the time it takes the dynamics to reach total variation distance at most $\varepsilon$ from stationarity, starting from the initial condition $x_{0}$ .

Theorem 1.

For every $\beta>0$ and $0<\varepsilon<1$ there exist $C(\beta)$ and $N(\beta,\varepsilon)$ such that the following hold for Glauber dynamics for the Ising model on the cycle $\mathbb{Z}/n\mathbb{Z}$ at inverse-temperature $\beta$ for all $n>N$ .

(i)

Alternating initial condition:

[TABLE] 2. (ii)

Bi-alternating initial condition:

[TABLE]

Surprisingly, the mixing time for the alternating initial condition begins as actually faster than the infinite temperature model: it decreases as a function of $\beta$ before increasing when $\beta>\frac{1}{2}\operatorname{arctanh}(\frac{1}{3})$ .

The following theorem summarizes the bounds we proved in [LS1, LS4] for the all-plus and random initial conditions. See Figure 1 for the relative performance of all these different initial conditions.

Theorem 2 ([LS1, LS4]).

In the same setting of Theorem 1, the following hold.

(i)

All-plus initial condition $x^{+}\equiv 1$ :

[TABLE] 2. (ii)

Quenched random initial condition:

[TABLE] 3. (iii)

Annealed random initial condition:

[TABLE]

(Note that, in the case of the all-plus initial conditions, the mixing time $t_{\textsc{mix}}^{x^{+}}(\varepsilon)$ is known in higher precision: it was shown [LS1, LS4] to be within an additive constant (depending on $\varepsilon$ and $\beta$ ) of $\frac{1}{2\theta}\log n$ .)

The upper bounds on the mixing times in Theorem 1 rely on the information percolation framework introduced by the authors in [LS4]. The asymptotically matching lower bounds in that theorem are derived from two test functions: the autocorrelation function, which for instance matches our upper bound on the alternating initial condition for $\beta>\beta_{0}$ ; and the Hamiltonian test function, which gives rise to the following lower bound on every deterministic initial condition.

Proposition 3.

Let $X_{t}$ be Glauber dynamics for the Ising model on $\mathbb{Z}/n\mathbb{Z}$ at inverse-temperature $\beta$ . For every sequence of deterministic initial conditions $x_{0}$ , the dynamics at time

[TABLE]

is at total variation distance $1-o(1)$ from equilibrium; that is,

[TABLE]

As a consequence of this result and Theorem 1, Part (i), we see that the initial condition $x^{\mathrm{alt}}$ is indeed the optimal deterministic one in the range $\beta<\beta_{0}$ , and that $\beta_{0}$ marks the smallest $\beta$ where a deterministic initial condition can first match the performance of the annealed random condition.

The mixing time estimates in Theorem 1 (as well as those in Theorem 2) imply, in particular, that Glauber dynamics for the Ising model on the cycle, from the respective starting configurations, exhibits the cutoff phenomenon—a sharp transition in its distance from stationarity, which drops along a negligible time period known as the cutoff window (here, $O(\log\log n)$ , vs. $t_{\textsc{mix}}$ which is of order $\log n$ ) from near its maximum to near 0. Until recently, only relatively few occurrences of this phenomenon, that was discovered by Aldous and Diaconis in the early 1980’s (see [Aldous, AD, DiSh, Diaconis]), were rigorously verified, even though it is believed to be widespread (e.g., Peres conjectured [LLP]Conjecture 1,[LPW]§23.2 cutoff for the Ising model on any sequence of transitive graphs when the mixing time is of order $\log n$ ); see [LPW]*§18.

For the Ising model on the cycle, the longstanding lower and upper bounds on $t_{\textsc{mix}}$ from a worst-case initial condition differed by a factor of 2—in our notation, $\frac{1-o(1)}{2\theta}\log n$ and $\frac{1+o(1)}{\theta}\log n$ —while cutoff was conjectured to occur (see, e.g., [LPW]*Theorem 15.4, as well as [LPW]*pp. 214,248 and Question 8 in p. 300). This was confirmed in [LS1], where the above lower bound was shown to be tight, via a proof that relied on log-Sobolev inequalities and applied to $\mathbb{Z}^{d}$ , for any dimension $d\geq 1$ , so long as the system features a certain decay-of-correlation property known as strong spatial mixing. This result was reproduced in [LS4] (with a finer estimate for the cutoff window) via the new information percolation method. Soon after, a remarkably short proof of cutoff for the cycle—crucially hinging on the correspondence between the one-dimensional Ising model and the “noisy voter” model—was obtained by Cox, Peres and Steif [CPS]. It is worthwhile noting that the arguments both in [CPS] and in [LS1] are tailored to worst-case analysis, and do not seem to be able to treat specific initial conditions as examined here. In contrast, the information percolation approach does allow one to control the subtle effect of various initial conditions on mixing.

To conclude this section, we conjecture that Proposition 3 also holds for $t_{\star}=\max\{\frac{1-o(1)}{4-2\theta},\frac{1-o(1)}{4\theta}\}\log n$ , i.e., that $x^{\mathrm{alt}}$ is asymptotically fastest among all the deterministic initial conditions at all $\beta>0$ . We further conjecture that the obvious generalization of $x^{\mathrm{alt}}$ to $(\mathbb{Z}/n\mathbb{Z})^{d}$ for $d\geq 2$ (a checkerboard for $d=2$ ) is the analogous fastest deterministic initial condition throughout the high-temperature regime.

2. Update support and information percolation

In this section we define the update support and use the framework of information percolation (see the papers [LS3, LS5] as well as the survey paper [LS6] for an exposition of this method) to upper bound the total variation distance with alternating and bi-alternating initial conditions.

2.1. Basic Notation

The Ising model on a finite graph $G$ with vertex-set $V$ and edge-set $E$ is a distribution over the set of configurations $\Omega=\{\pm 1\}^{V}$ ; each $\sigma\in\Omega$ is an assignment of plus/minus spins to the sites in $V$ , and the probability of $\sigma\in\Omega$ is given by the Gibbs distribution

[TABLE]

where $\mathcal{Z}$ is a normalizer (the partition-function) and $\beta$ is the inverse-temperature, here taken to be non-negative (ferromagnetic). The (continuous-time) heat-bath Glauber dynamics for the Ising model is the Markov chain—reversible w.r.t. the Ising measure $\pi$ —where each site is associated with a rate-1 Poisson clock, and as the clock at some site $u$ rings, the spin of $u$ is replaced by a sample from the marginal of $\pi$ given all other spins. See [Martinelli97] for an extensive account of this dynamics. In this paper we focus on the graph $G=\mathbb{Z}/n\mathbb{Z}$ and will let $X_{t}$ denote the Glauber dynamics Markov chain on $G$ .

An important notion of measuring the convergence of a Markov chain $(X_{t})$ to its stationarity measure $\pi$ is its total-variation mixing time, denoted $t_{\textsc{mix}}(\varepsilon)$ for a precision parameter $0<\varepsilon<1$ . From initial condition $x_{0}$ we denote

[TABLE]

and the overall mixing time as measured from a worst-case initial condition is

[TABLE]

where here and in what follows $\mathbb{P}_{x_{0}}$ denotes the probability given $X_{0}=x_{0}$ , and the total-variation distance $\|\mu_{1}-\mu_{2}\|_{\textsc{tv}}$ is defined as $\max_{A\subset\Omega}|\mu_{1}(A)-\mu_{2}(A)|=\tfrac{1}{2}\sum_{\sigma\in\Omega}|\mu_{1}(\sigma)-\mu_{2}(\sigma)|$ .

2.2. Information percolation clusters

The dynamics can be viewed as a deterministic function of $X_{0}$ and a random “update sequence” of the form $(J_{1},U_{1},t_{1}),(J_{2},U_{2},t_{2}),\ldots$ , where $0<t_{1}<t_{2}<\ldots$ are the update times (the ringing of the Poisson clocks), the $J_{i}$ ’s are i.i.d. uniformly chosen sites (which clocks ring), and the $U_{i}$ ’s are i.i.d. uniform variables on $[0,1]$ (to generate coin tosses). There are a variety of ways to encode such updates but in the case of the one-dimensional model there is a particularly useful one. We add an extra variable $S_{i}$ which is a randomly selected neighbor of $U_{i}$ Then given the sequence of $(J_{i},S_{i},U_{i},t_{i})$ the updates are processed sequentially as follows: set $t_{0}=0$ ; the configuration $X_{t}$ for all $t\in[t_{i-1},t_{i})$ ( $i\geq 1$ ) is obtained by updating the site $J_{i}$ via the unit variable as follows: if $U_{i}\leq\theta=1-\tanh(2\beta)$ update the spin at $J_{i}$ to a uniformly random value and with probability $1-\theta$ set it to the spin of $S_{i}$ .

With this description of the dynamics, we can work backwards to describe how the configurations at time $t_{\star}$ (or at any intermediate time) depend on the initial condition. The update support function, denoted $\mathscr{F}_{\textsc{s}}(A,s_{1},s_{2})$ , as introduced in [LS1], is the random set whose value is the minimal subset $S\subset\Lambda$ which determines the spins of $A$ given the update sequence along the interval $(s_{1},s_{2}]$ .

We now describe the support of a vertex $v\in V$ as it evolves backwards in time from $s_{2}$ to $s_{1}$ . Initially, $\mathscr{F}_{\textsc{s}}(v,s_{2},s_{2})=\{v\}$ ; then, updates in reverse chronological order alter the support: given the next update $(J_{i},S_{i},U_{i},t_{i})$ , if $J_{i}=\mathscr{F}_{\textsc{s}}(v,t_{i+1},s_{2})$ and $U_{i}\leq\theta$ then $\mathscr{F}_{\textsc{s}}(v,t_{i},s_{2})$ is set to $\emptyset$ , and if $U_{i}>\theta$ then it is set to $S_{i}$ . Thus, backwards in time $\mathscr{F}_{\textsc{s}}(v,t,s_{2})$ performs a continuous-time simple random walk with jump rate $1-\theta$ which is killed at rate $\theta$ . We refer to the full trajectory of the update support of a vertex as the history of the vertex. The survival time for a walk is exponential and so for $t_{1}\leq t_{2}$ ,

[TABLE]

For general sets $A$ we have that $\mathscr{F}_{\textsc{s}}(A,s_{1},s_{2})=\bigcup_{v\in A}\mathscr{F}_{\textsc{s}}(v,s_{1},s_{2})$ and taken together the collection of the update supports of the vertices are a set of coalescing killed continuous-time random walks.

A key use of these histories is to effectively bound the spread of information, as achieved by the following lemma.

Lemma 2.1.

For any $t$ we have that

[TABLE]

Proof.

By equation (2.2) we have that $\mathbb{P}[\mathscr{F}_{\textsc{s}}(\mathbb{Z}/n\mathbb{Z},t-\log^{3/2}n,t)\neq\emptyset]=O(n^{-10})$ so it is sufficient to show that

[TABLE]

This probability is bounded above by the probability of a rate $1-\theta$ continuous-time random walk to make at least $\frac{1}{10}\log^{2}n$ jumps by time $\log^{3/2}n$ . This is exactly the probability that a Poisson with mean $(1-\theta)\log^{3/2}n$ is at least $\frac{1}{10}\log^{2}n$ , which satisfies the required bound by standard tail bounds. ∎

3. Upper bounds

We will consider the dynamics run up to time $t_{\star}$ and derive an upper bound on its mixing time. We will first estimate the total variation distance not of the full dynamics but simply at a single vertex from initial conditions $x^{\mathrm{alt}}$ and $x^{\mathrm{blt}}$ .

Lemma 3.1.

For $v\in\mathbb{Z}/n\mathbb{Z}$ we have that,

[TABLE]

Proof.

We will begin with the case of initial condition $x^{\mathrm{alt}}$ . Of course $\pi|_{v}$ is the uniform measure on $\{\pm 1\}$ . The history $\mathscr{F}_{\textsc{s}}(v,t,t_{\star})$ is killed before time [math] with probability $1-e^{-\theta t_{\star}}$ and on this event is uniform on $\{\pm 1\}$ . Condition that it survives to time [math] and let $Y(s)=x^{\mathrm{alt}}(\mathscr{F}_{\textsc{s}}(v,t_{\star}-s,t_{\star}))$ . This is simply a continuous-time random walk on $\{\pm 1\}$ which switches state at rate $1-\theta$ . Thus,

[TABLE]

It therefore follows that $\left\|\mathbb{P}\left(Y(t_{\star})\in\cdot\right)-\pi|_{v}\right\|_{\textsc{tv}}=\frac{1}{2}e^{-2(1-\theta)t_{\star}}$ , and altogether,

[TABLE]

The case of $x^{\mathrm{blt}}$ follows similarly, with the exception that $Y(s)$ has jump rate $\frac{1}{2}(1-\theta)$ since it only switches sign with probability $\frac{1}{2}$ each step. ∎

3.1. Update Support

In this subsection we analyse the geometry of the update support similarly to [LS1] in order to approximate the Markov chain as a product measure. Let $\kappa=\frac{4}{1-\theta}$ and define the support time as $t_{-}=t_{\star}-\kappa\log\log n$ . By Lemma 2.1 we expect the histories to not travel “too far” along the time-interval $t_{\star}$ to $t_{-}$ ; precisely, if we define $\mathcal{B}$ as the event

[TABLE]

then by Lemma 2.1,

[TABLE]

The following event says that the support at time $t_{-}$ clusters into small well separated components. Let $\mathcal{A}$ be the event that there exists a set of intervals $W_{1},\ldots,W_{m}\subset\mathbb{Z}/n\mathbb{Z}$ that (i) cover the support:

[TABLE]

(ii) have logarithmic size:

[TABLE]

and (iii) are well-separated:

[TABLE]

Lemma 3.2.

We have that $\mathbb{P}\left(\mathcal{A}\right)\geq 1-O(n^{-9})$ .

Proof.

Define the following intervals on $\mathbb{Z}/n\mathbb{Z}$ :

[TABLE]

Restricting $\mathcal{B}$ to $\bigcup M_{i}$ , we let

[TABLE]

Since $\mathcal{B}^{\prime}\supset\mathcal{B}$ we have that $\mathbb{P}\left(\mathcal{B}^{\prime}\right)\geq 1-n^{-10}$ by Lemma 3.1. Next, let $\mathcal{D}_{i}$ be the event

[TABLE]

By a union bound and equation (2.2), we have that

[TABLE]

and so

[TABLE]

Moreover, conditional on $\mathcal{B}^{\prime}$ the events $\mathcal{D}_{i}$ are conditionally independent since the history of $M_{i}$ is determined by the updates within the set $\{v:d(v,M_{i})\leq\frac{1}{10}\log^{2}n\}$ which are disjoint. Hence, for all $i$ ,

[TABLE]

hence,

[TABLE]

Taking a union bound over all $i$ we have that

[TABLE]

We have thus arrived at the following: with probability at least $1-n^{-9}$ , for every $v\in\mathbb{Z}/n\mathbb{Z}$ there exists a block of $\log^{2}n$ consecutive vertices whose histories are killed before $t_{-}$ within distance $\frac{1}{5}\log^{3}n$ on both the right and the left, implying the existence of the decomposition and completing the lemma. ∎

When the event $\mathcal{A}$ holds we will assume that there is some canonical choice of the $W_{i}$ ’s. We set

[TABLE]

On the event that both $\mathcal{A}$ and $\mathcal{B}$ hold, the sets $V_{i}$ are disjoint, and satisfy

[TABLE]

We will make use of Lemma 3.3 from [LS3], a special case of which is the following.

Lemma 3.3 ([LS3]).

For any $0\leq s\leq t$ and any set of vertices $W$ we have that

[TABLE]

Using this result, we have that

[TABLE]

3.2. Coupling with product measures

On the event $\mathcal{A}\cap\mathcal{B}$ we couple $X_{t_{-}}(\bigcup_{i}V_{i})$ and $\pi|_{\bigcup_{i}V_{i}}$ with product measures. Since the $V_{i}$ ’s depend only on the updates along the interval $[t_{-},t_{\star}]$ and are independent of the dynamics up to time $t_{-}$ we will treat the $V_{i}$ as fixed deterministic sets satisfying (3.6). Let $(\pi^{(1)},\ldots,\pi^{(m)})$ be a product measure of $m$ copies of $\pi$ . Then, by the exponential decay of correlation of the one-dimensional Ising model,

[TABLE]

Next, let $X^{(1)}_{t},\ldots,X^{(m)}_{t}$ be $m$ independent copies of the dynamics up to time $t_{-}$ . Define the event

[TABLE]

and for each $1\leq j\leq m$ define the analogous event

[TABLE]

where $\mathscr{F}_{\textsc{s}}^{(j)}$ is the support function for the dynamics $X^{(j)}_{t}$ . From Lemma 2.1, together with a union bound, we infer that

[TABLE]

Let $\tilde{X}_{t}$ denote $X_{t}$ conditioned on $\mathcal{E}$ and, similarly, let $\tilde{X}^{(j)}_{t}$ denote $X^{(j)}_{t}$ conditioned on $\mathcal{E}^{(j)}$ . Then

[TABLE]

and so

[TABLE]

Now, since the laws of the $\tilde{X}_{t_{-}}(V_{i})$ for distinct $i$ depend on disjoint sets of updates, they are independent and equal in distribution to $\tilde{X}^{(i)}_{t_{-}}(V_{i})$ , hence

[TABLE]

Since $\tilde{X}$ is $X$ conditioned on $\mathcal{E}$ ,

[TABLE]

Combining the previous three equations we find that

[TABLE]

Thus, to show that $\big{\|}\mathbb{P}_{x_{0}}\left(X_{t_{\star}}\in\cdot\right)-\pi\big{\|}_{\textsc{tv}}\to 0$ it is sufficient to prove that

[TABLE]

3.3. Local $L^{2}$ distance

Let $L=10$ , and for each $i$ set

[TABLE]

with $S_{i}=0$ if $|V_{i}|\leq L$ .

First we bound the right tail of the distribution of $S_{i}$ . If $|\mathscr{F}_{\textsc{s}}(V_{i},t_{-}-s,t_{-})|>L$ then at least $L+1$ histories from $V_{i}$ have survived to time $t_{-}-s$ and not intersected. Hence, by equation (2.2),

[TABLE]

Therefore, for $0<s<t_{-}$ we see that

[TABLE]

Let $\mathcal{I}$ denote the event that for all $i$ we have that $S_{i}<t_{-}$ . By (3.11),

[TABLE]

and so $t_{\star}\geq\frac{2}{L\theta}\log n$ implies that $\mathbb{P}\left(\mathcal{I}\right)\to 1$ . On the event $\mathcal{I}$ , we define

[TABLE]

Applying Lemma 3.3 we have that

[TABLE]

Lemma 3.4.

There exists $C=C(\beta)>0$ such that, for every $|U_{i}|\leq L$ and $0\leq S_{i}<t_{-}$ ,

[TABLE]

Proof.

We will consider the case of $x^{\mathrm{alt}}$ , the proof for $x^{\mathrm{blt}}$ follows similarly. Let $R_{i}$ denote the first time the history coalesces to a single point:

[TABLE]

with the convention $R_{i}=t_{-}-S_{i}$ if $|\mathscr{F}_{\textsc{s}}(U_{i},0,t_{-}-S_{i})|\geq 2$ . By equation (2.2),

[TABLE]

Denote the vertex $a_{i}=\mathscr{F}_{\textsc{s}}(U_{i},t_{-}-S_{i}-R_{i},t_{-}-S_{i})$ . By Lemmas 3.1 and 3.3 we have that

[TABLE]

We estimate the right hand side as follows:

[TABLE]

where the final inequality follows by taking the maximal term in the sum. This, together with (3.14), completes the proof of the lemma. ∎

We now appeal to the $L^{1}$ -to- $L^{2}$ reduction developed in [LS1, LS3]. Recall that the $L^{2}$ -distance on measures is defined as

[TABLE]

and set

[TABLE]

By [LS3]*Proposition 7,

[TABLE]

We are now ready to prove the upper bound for the main theorem.

Proof of Theorem 1, Upper bound.

Again we focus on the case of $x^{\mathrm{alt}}$ . Set

[TABLE]

With this choice of $t_{\star}$ we have that $\mathbb{P}(\mathcal{I}^{c})\to 0$ and so, by equations (3.10), (3.3) and (3.17), it is sufficient to show that

[TABLE]

Since each vertex is either plus or minus with probability that is uniformly bounded below by $\frac{e^{-2\beta}}{e^{-2\beta}+e^{2\beta}}$ , given any choice of conditioning on the other vertices, we have that

[TABLE]

Comparing the $L^{1}$ and $L^{2}$ bounds we have that for any measures $\mu$ and set $U_{i}$ ,

[TABLE]

Thus, by Lemma 3.4,

[TABLE]

for some $C^{\prime}(\beta)$ . Finally, by equation (3.11)

[TABLE]

Combining the previous two inequalities implies that $\mathbb{E}{\mathfrak{M}}_{t}\mathbbm{1}_{\mathcal{I}}\to 0$ and hence we have that

[TABLE]

as required. The proof for $x^{\mathrm{blt}}$ follows similarly for the choice of

[TABLE]

4. Lower bounds

In order to establish the lower bound we will analyze two separate test functions. First, in order to analyze our test functions, we establish the following decay of correlation bound.

Lemma 4.1.

Let $V_{1},V_{2}\subset\mathbb{Z}/n\mathbb{Z}$ such that $d(V_{1},V_{2})\geq\log^{2}n$ and let $f_{i}:\{\pm 1\}^{V_{i}}\to\mathbb{R}$ be functions with $\|f_{i}\|_{\infty}\leq 1$ . Then for any initial condition $x_{0}$ and time $t$ we have that

[TABLE]

Proof.

We will prove the result by showing that $Y_{i}=f_{i}(X_{t}(V_{i}))$ can be approximated locally. Let $V_{i}^{+}=\{v:d(v,V_{i})\leq\frac{1}{10}\log^{2}n\}$ and so the $V^{+}_{i}$ are disjoint. Let $\mathcal{J}_{i}$ denote the sigma-algebra of generated by updates in $V_{i}^{+}$ and set $\widehat{Y}_{i}=\mathbb{E}_{x_{0}}[Y_{i}\mid\mathcal{J}_{i}]$ . Since the $V^{+}_{i}$ are disjoint the $\widehat{Y}_{i}$ depend on independent updates and so are independent. Let

[TABLE]

be the event in Lemma 2.1. On the event $\mathcal{G}$ , the random variables $Y_{i}$ are completely determined by the initial condition and the updates in $V_{i}^{+}$ and so $Y_{i}I(\mathcal{G})=\widehat{Y}_{i}I(\mathcal{G})$ . Thus,

[TABLE]

and hence

[TABLE]

which completes the proof. ∎

Since the above bound is uniform in $t$ by taking $t$ to infinity we get the result for $X$ given by the stationary measure as well.

4.1. Autocorrelation test functions

The magnetization test function achieves, at least up to an additive constant, the mixing time from the all-plus initial condition, which is asymptotically the worst-case (see [LS4]). In this light it is natural to consider test functions for $x^{\mathrm{alt}}$ and $x^{\mathrm{blt}}$ based on the autocorrelation, $\sum_{i=1}^{n}X_{t}(i)x_{0}(i)$ . This can be seen as a special case of a test function based on conditional expectations,

[TABLE]

Because of the special structure of the histories as a killed random walk the expectation has the following useful representation. Let $P_{t}$ be the semigroup of a continuous-time rate-1 simple random walk on $\mathbb{Z}/n\mathbb{Z}$ . Then by the killed random walk representation we have that

[TABLE]

The eigenvectors of $P_{t}$ are $e^{2\pi ikx}$ with eigenvalues $1-\cos(2\pi k\theta)$ for $k\in\{0,\ldots,n-1\}$ . Since the simple random walk is reversible with uniform stationary distribution we can write an orthonormal basis of real eigenvectors $\eta_{k}$ with eigenvalues $\lambda_{k}$ . Not that both $x^{\mathrm{alt}}$ and $x^{\mathrm{blt}}$ are eigenvectors of $P_{t}$ with eigenvalues $2$ and $1$ respectively and in fact $2$ is the largest eigenvalue. We first give a condition for the chain to not be sufficiently mixed starting from $x_{0}$ .

Lemma 4.2.

If for a sequence of initial conditions $x_{0}$ and time points $t$ we have

[TABLE]

then

[TABLE]

Proof.

Let $Y$ be distributed according to the stationary distribution. Then by symmetry,

[TABLE]

while

[TABLE]

To estimate the variance, observe that

[TABLE]

By Lemma 4.1, this is at most

[TABLE]

where the final inequality follows by the rearrangement inequality. Since Lemma 4.1 also applies to the stationary distribution, we further have

[TABLE]

Our test function considers the set $A=\big{\{}x\in\{\pm 1\}^{\mathbb{Z}/n\mathbb{Z}}:R_{x_{0},t}(x)\geq\frac{1}{2}e^{-2\theta t}\|P_{(1-\theta)t}x_{0}\|_{2}^{2}\big{\}}$ . Therefore, by Chebyshev’s inequality,

[TABLE]

and so by the assumption of the lemma $\mathbb{P}_{x_{0}}\left(X_{t}\in A\right)\to 1$ . Similarly,

[TABLE]

so $\mathbb{P}\left(Y\in A\right)\to 0$ which completes the lemma. ∎

We can now establish Proposition 3, giving a lower bound for any deterministic initial condition.

Proof of Proposition 3.

Writing $x_{0}=\sum_{j}b_{j}\eta_{j}$ we have that

[TABLE]

where the inequality follows from the fact that all the eigenvalues are bounded by 2. Thus,

[TABLE]

and so, by Lemma 4.2, we have that $\left\|\mathbb{P}_{x_{0}}\left(X_{t_{\star}}\in\cdot\right)-\pi\right\|_{\textsc{tv}}\to 1$ , as claimed. ∎

This gives the right bound in the case of $x^{\mathrm{alt}}$ since it is an eigenvector of eigenvalue 2. For $x^{\mathrm{blt}}$ we get a stronger lower bound. Since it has eigenvalue 1,

[TABLE]

So, taking $t_{\star}=\frac{1}{2}\log n-8\log\log n$ ,

[TABLE]

and hence by Lemma 4.2 we have that

[TABLE]

4.2. Hamiltonian test functions

The alternating initial condition $x^{\mathrm{alt}}$ is an extreme value for the Hamiltonian and measuring its convergence to stationarity gives another test of convergence. Such test functions were studied in [LS4] to analyze the a random annealed initial condition. To treat $x^{\mathrm{alt}}$ and $x^{\mathrm{blt}}$ in a unified manner, consider the function $R:\{\pm 1\}^{\mathbb{Z}/n\mathbb{Z}}\to\mathbb{R}$ given by

[TABLE]

For every $x_{0}$ and $t$ we have that, by Lemma 4.1,

[TABLE]

If $Y$ is taken from the stationary distribution by taking a limit as $t\to\infty$ , then we also have that $\operatorname{Var}(R(Y))=O(n\log^{2}n)$ . Let $\mathscr{H}$ denote the set of all histories of the vertices from time $t_{\star}$ , and consider $\mathbb{E}_{x_{0}}[X_{t_{\star}}(i)X_{t_{\star}}(i^{\prime})\mid\mathscr{H}]$ . If the histories of $i$ and $i^{\prime}$ merge then $X_{t_{\star}}(i)$ and $X_{t_{\star}}(i^{\prime})$ must take the same value and $\mathbb{E}_{x_{0}}[X_{t_{\star}}(i)X_{t_{\star}}(i^{\prime})\mid\mathscr{H}]=1$ . If the histories do not merge and at least one is killed before reaching time 0 then it is equally likely to be $\pm 1$ so $\mathbb{E}_{x_{0}}[X_{t_{\star}}(i)X_{t_{\star}}(i^{\prime})\mid\mathscr{H}]=0$ . Thus, the boundary condition can only play a role when both histories survive to time 0 and do not merge, as captured by the event

[TABLE]

Let $Y$ be an independent configuration distributed as $\pi$ and let $\mathbb{E}_{\pi}$ denotes the expectation started from the stationary measure. Then

[TABLE]

as the ferromagnetic Ising model is positively correlated. In a graph with two vertices connected by an edge, the correlation of spins of the Ising model can be found to be $\tanh\theta$ . Correlations are monotone in the edges of the graph, so for neighboring vertices in $\mathbb{Z}/n\mathbb{Z}$ we have $\mathbb{E}[Y(i)Y(i+1)]\geq\tanh\theta>0$ . It was shown in the proof of Theorem 6.4 of [LS4] that

[TABLE]

and so

[TABLE]

We will compare this bound with the behavior under the initial conditions $x^{\mathrm{alt}}$ and $x^{\mathrm{blt}}$ .

Claim 4.3.

For $x_{0}\in\{x^{\mathrm{alt}},x^{\mathrm{blt}}\}$ and $i\equiv 0\pmod{4}$ we have that

[TABLE]

Proof.

We first treat the case of $x^{\mathrm{alt}}$ . Let $Z_{1}(t)$ and $Z_{2}(t)$ be independent rate-( $1-\theta$ ) continuous-time simple random walks with initial conditions $Z_{1}(0)=i$ and $Z_{2}(0)=i+1$ . Let $T$ denote the first time the walks hit each other and $W(t)=x^{\mathrm{alt}}(Z_{1}(t))x^{\mathrm{alt}}(Z_{2}(t))$ . By the killed random walk representation of the histories, we have that

[TABLE]

Note that $W(t)$ is itself a Markov chain with state space $\{\pm 1\}$ and transition rate $2(1-\theta)$ , and so

[TABLE]

Thus, since $W(0)=-1$ by the definition of $x^{\mathrm{alt}}$ , and $W(T)=1$ , applying (4.5) we get

[TABLE]

Hence, $\mathbb{E}_{x^{\mathrm{alt}}}[X_{t_{\star}}(i)X_{t_{\star}}(i+1)\mathbbm{1}_{\mathcal{K}_{i,i+1}}\mid\mathscr{H}]\leq 0$ .

For $x^{\mathrm{blt}}$ , the process $x^{\mathrm{blt}}(Z_{1}(t))x^{\mathrm{blt}}(Z_{2}(t))$ is again a Markov chain but with transition rate $1-\theta$ . The requirement that $i$ is a multiple of 4 was chosen to ensure that $x^{\mathrm{blt}}(Z_{1}(0))x^{\mathrm{blt}}(Z_{2}(0))=-1$ . The argument is otherwise unchanged. ∎

Combining Lemma 4.5 with equation (4.2), we obtain that

[TABLE]

and thus

[TABLE]

We are now ready to prove the second lower bound.

Lemma 4.4.

Set

[TABLE]

For $x_{0}\in\{x^{\mathrm{alt}},x^{\mathrm{blt}}\}$ we have

[TABLE]

Proof.

Denote by $A$ the event

[TABLE]

By Chebyshev’s inequality and equations (4.2) and (4.6)

[TABLE]

and similarly

[TABLE]

Hence, $\|\mathbb{P}_{x_{0}}(X_{t_{\star}}\in\cdot)-\pi\|_{\textsc{tv}}\to 1$ , as claimed. ∎

Proof of Theorem 1, Lower bound.

The case of $x^{\mathrm{alt}}$ follows from combining Proposition 3 and Lemma 4.4, while the lower bound for $x^{\mathrm{blt}}$ follows from equation (4.1) and Lemma 4.4. ∎

Acknowledgements

We thank Yuval Peres for helpful discussions.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Fast Initial Conditions for Glauber Dynamics

Abstract.

1. Introduction

Theorem 1**.**

Theorem 2** ([LS1, LS4]).**

Proposition 3**.**

2. Update support and information percolation

2.1. Basic Notation

2.2. Information percolation clusters

Lemma 2.1**.**

Proof.

3. Upper bounds

Lemma 3.1**.**

Proof.

3.1. Update Support

Lemma 3.2**.**

Proof.

Lemma 3.3** ([LS3]).**

3.2. Coupling with product measures

3.3. Local L2L^{2}L2 distance

Lemma 3.4**.**

Proof.

Proof of Theorem 1, Upper bound.

4. Lower bounds

Lemma 4.1**.**

Proof.

4.1. Autocorrelation test functions

Lemma 4.2**.**

Proof.

Proof of Proposition 3.

4.2. Hamiltonian test functions

Claim 4.3**.**

Proof.

Lemma 4.4**.**

Proof.

Proof of Theorem 1, Lower bound.

Acknowledgements

References

Theorem 1.

Theorem 2 ([LS1, LS4]).

Proposition 3.

Lemma 2.1.

Lemma 3.1.

Lemma 3.2.

Lemma 3.3 ([LS3]).

3.3. Local $L^{2}$ distance

Lemma 3.4.

Lemma 4.1.

Lemma 4.2.

Claim 4.3.

Lemma 4.4.