Quantitative spectral gap estimate and Wasserstein contraction of simple   slice sampling

Viacheslav Natarovskii; Daniel Rudolf; Bj\"orn Sprungk

arXiv:1903.03824·math.PR·September 17, 2020

Quantitative spectral gap estimate and Wasserstein contraction of simple slice sampling

Viacheslav Natarovskii, Daniel Rudolf, Bj\"orn Sprungk

PDF

TL;DR

This paper establishes a quantitative spectral gap estimate and Wasserstein contraction for simple slice sampling, enhancing understanding of its convergence properties for certain classes of probability distributions.

Contribution

It provides the first explicit lower bound on the spectral gap of simple slice sampling for log-concave, rotationally invariant distributions, extending to broader target distributions.

Findings

01

Wasserstein contraction is proven for simple slice sampling.

02

An explicit lower bound on the spectral gap is derived.

03

Results apply to distributions depending on level set volumes.

Abstract

We prove Wasserstein contraction of simple slice sampling for approximate sampling w.r.t. distributions with log-concave and rotational invariant Lebesgue densities. This yields, in particular, an explicit quantitative lower bound of the spectral gap of simple slice sampling. Moreover, this lower bound carries over to more general target distributions depending only on the volume of the (super-)level sets of their unnormalized density.

Equations236

π (A) = \frac{\int _{A} ϱ ( x ) d x}{\int _{G} ϱ ( x ) d x}, A \in B (G) .

π (A) = \frac{\int _{A} ϱ ( x ) d x}{\int _{G} ϱ ( x ) d x}, A \in B (G) .

G (t) := {x \in G ∣ ϱ (x) \geq t},

G (t) := {x \in G ∣ ϱ (x) \geq t},

gap_{π} (U_{ϱ}) := 1 - ∥ U_{ϱ} ∥_{L_{2}^{0} (π) \to L_{2}^{0} (π)},

gap_{π} (U_{ϱ}) := 1 - ∥ U_{ϱ} ∥_{L_{2}^{0} (π) \to L_{2}^{0} (π)},

∥ ν U_{ϱ}^{n} - π ∥_{tv} \leq (1 - gap_{π} (U_{ϱ}))^{n} \frac{d ν}{d π} - 1_{2, π},

∥ ν U_{ϱ}^{n} - π ∥_{tv} \leq (1 - gap_{π} (U_{ϱ}))^{n} \frac{d ν}{d π} - 1_{2, π},

E \frac{1}{n} j = 1 \sum n f (X_{j}) - E_{π} (f)^{2} \leq \frac{2}{n \cdot gap _{π} ( U _{ϱ} )} + \frac{c _{p} \frac{d ν}{d π} - 1 _{\infty}}{n ^{2} \cdot gap _{π} ( U _{ϱ} )},

E \frac{1}{n} j = 1 \sum n f (X_{j}) - E_{π} (f)^{2} \leq \frac{2}{n \cdot gap _{π} ( U _{ϱ} )} + \frac{c _{p} \frac{d ν}{d π} - 1 _{\infty}}{n ^{2} \cdot gap _{π} ( U _{ϱ} )},

W (μ, ν) := γ \in Γ (μ, ν) in f \int_{G \times G} ∣ x - y ∣ d γ (x, y),

W (μ, ν) := γ \in Γ (μ, ν) in f \int_{G \times G} ∣ x - y ∣ d γ (x, y),

W (U_{ϱ} (x, \cdot), U_{ϱ} (y, \cdot)) \leq (1 - \frac{1}{d + 1}) ∣ x - y ∣.

W (U_{ϱ} (x, \cdot), U_{ϱ} (y, \cdot)) \leq (1 - \frac{1}{d + 1}) ∣ x - y ∣.

W (ν U_{ϱ}^{n}, π) \leq (1 - \frac{1}{d + 1})^{n} W (ν, π)

W (ν U_{ϱ}^{n}, π) \leq (1 - \frac{1}{d + 1})^{n} W (ν, π)

n \geq (d + 1) lo g (ε^{- 1} W (ν, π)),

n \geq (d + 1) lo g (ε^{- 1} W (ν, π)),

supp ℓ := (0, sup {t \in (0, \infty) ∣ ℓ (t) > 0})

supp ℓ := (0, sup {t \in (0, \infty) ∣ ℓ (t) > 0})

gap_{π} (U_{ϱ}) \geq \frac{1}{k + 1} .

gap_{π} (U_{ϱ}) \geq \frac{1}{k + 1} .

P (X_{n + 1} \in A ∣ X_{1}, \dots, X_{n}) = U_{ϱ} (X_{n}, A),

P (X_{n + 1} \in A ∣ X_{1}, \dots, X_{n}) = U_{ϱ} (X_{n}, A),

U_{ϱ} (x, A) = \frac{1}{ϱ ( x )} \int_{0}^{ϱ (x)} U_{t} (A) d t .

U_{ϱ} (x, A) = \frac{1}{ϱ ( x )} \int_{0}^{ϱ (x)} U_{t} (A) d t .

G (t) := {x \in R^{d} ∣ ϱ (x) \geq t},

G (t) := {x \in R^{d} ∣ ϱ (x) \geq t},

\int_{B} U_{ϱ} (x, A) π (d x) = \int_{A} U_{ϱ} (x, B) π (d x), A, B \in B (G) .

\int_{B} U_{ϱ} (x, A) π (d x) = \int_{A} U_{ϱ} (x, B) π (d x), A, B \in B (G) .

W(U_{\varrho}(x,\cdot),U_{\varrho}(y,\cdot))\leq\left(1-\frac{1}{d+1}\right)\big{|}\left|x\right|-\left|y\right|\big{|}.

W(U_{\varrho}(x,\cdot),U_{\varrho}(y,\cdot))\leq\left(1-\frac{1}{d+1}\right)\big{|}\left|x\right|-\left|y\right|\big{|}.

W (U_{ϱ} (x, \cdot), U_{ϱ} (y, \cdot)) \leq (1 - \frac{1}{d + 1}) ∣ x - y ∣ .

W (U_{ϱ} (x, \cdot), U_{ϱ} (y, \cdot)) \leq (1 - \frac{1}{d + 1}) ∣ x - y ∣ .

W(U_{\varrho}(x,\cdot),U_{\varrho}(y,\cdot))\leq\frac{d}{d+1}\cdot\frac{1}{\lambda_{d}\big{(}B^{(d)}_{1}\big{)}^{1/d}}\int_{0}^{1}\left|\ell_{\varrho}(r\varrho(x))^{1/d}-\ell_{\varrho}(r\varrho(y))^{1/d}\right|{\rm d}r,

W(U_{\varrho}(x,\cdot),U_{\varrho}(y,\cdot))\leq\frac{d}{d+1}\cdot\frac{1}{\lambda_{d}\big{(}B^{(d)}_{1}\big{)}^{1/d}}\int_{0}^{1}\left|\ell_{\varrho}(r\varrho(x))^{1/d}-\ell_{\varrho}(r\varrho(y))^{1/d}\right|{\rm d}r,

φ^{- 1} : [- lo g ∥ ϱ ∥_{\infty}, - lo g in f ϱ) \to [0, R) .

φ^{- 1} : [- lo g ∥ ϱ ∥_{\infty}, - lo g in f ϱ) \to [0, R) .

φ^{- 1} (t) := sup {s \in [0, R) : φ (s) \leq t}, t \in [- lo g ∥ ϱ ∥_{\infty}, \infty) .

φ^{- 1} (t) := sup {s \in [0, R) : φ (s) \leq t}, t \in [- lo g ∥ ϱ ∥_{\infty}, \infty) .

φ^{- 1} (t) = R \forall t \geq - lo g in f ϱ .

φ^{- 1} (t) = R \forall t \geq - lo g in f ϱ .

G (t) = {x \in \overset{˚}{B}_{R}^{(d)} ∣ ∣ x ∣ \leq φ^{- 1} (lo g t^{- 1})} = B_{(ℓ (t) / λ_{d} (B_{1}^{(d)}))^{1/ d}}^{(d)}, t \in (0, ∥ ϱ ∥_{\infty}),

G (t) = {x \in \overset{˚}{B}_{R}^{(d)} ∣ ∣ x ∣ \leq φ^{- 1} (lo g t^{- 1})} = B_{(ℓ (t) / λ_{d} (B_{1}^{(d)}))^{1/ d}}^{(d)}, t \in (0, ∥ ϱ ∥_{\infty}),

u_{t,s}(A\times B):=\frac{1}{\lambda_{d}\big{(}B^{(d)}_{1}\big{)}}\int_{B^{(d)}_{1}}\mathbf{1}_{A}\bigg{(}\bigg{(}\frac{\ell(t)}{\lambda_{d}(B^{(d)}_{1})}\bigg{)}^{1/d}z\bigg{)}\mathbf{1}_{B}\bigg{(}\bigg{(}\frac{\ell(s)}{\lambda_{d}(B^{(d)}_{1})}\bigg{)}^{1/d}z\bigg{)}{\rm d}z,

u_{t,s}(A\times B):=\frac{1}{\lambda_{d}\big{(}B^{(d)}_{1}\big{)}}\int_{B^{(d)}_{1}}\mathbf{1}_{A}\bigg{(}\bigg{(}\frac{\ell(t)}{\lambda_{d}(B^{(d)}_{1})}\bigg{)}^{1/d}z\bigg{)}\mathbf{1}_{B}\bigg{(}\bigg{(}\frac{\ell(s)}{\lambda_{d}(B^{(d)}_{1})}\bigg{)}^{1/d}z\bigg{)}{\rm d}z,

u_{t, s} (A \times G)

u_{t, s} (A \times G)

= \frac{1}{ℓ ( t )} \int_{G (t)} 1_{A} (y) d y = U_{t} (A) .

c (x, y, A \times B)

c (x, y, A \times B)

u_{t, s} (A \times G) = U_{t} (A), u_{t, s} (G \times B) = U_{s} (B)

u_{t, s} (A \times G) = U_{t} (A), u_{t, s} (G \times B) = U_{s} (B)

c (x, y, A \times G) = \int_{0}^{1} U_{r ϱ (x)} (A) d r = \frac{1}{ϱ ( x )} \int_{0}^{ϱ (x)} U_{t} (A) d t = U_{ϱ} (x, A) .

c (x, y, A \times G) = \int_{0}^{1} U_{r ϱ (x)} (A) d r = \frac{1}{ϱ ( x )} \int_{0}^{ϱ (x)} U_{t} (A) d t = U_{ϱ} (x, A) .

c(x,\widetilde{x},A\times B)=\frac{1}{\lambda_{d}\big{(}B^{(d)}_{1}\big{)}}\int_{0}^{1}\int_{B^{(d)}_{1}}\mathbf{1}_{A}\bigg{(}\bigg{(}\frac{\ell(r\varrho(x))}{\lambda_{d}(B^{(d)}_{1})}\bigg{)}^{1/d}z\bigg{)}\mathbf{1}_{B}\bigg{(}\bigg{(}\frac{\ell(r\varrho(\widetilde{x}))}{\lambda_{d}(B^{(d)}_{1})}\bigg{)}^{1/d}z\bigg{)}{\rm d}z{\rm d}r.

c(x,\widetilde{x},A\times B)=\frac{1}{\lambda_{d}\big{(}B^{(d)}_{1}\big{)}}\int_{0}^{1}\int_{B^{(d)}_{1}}\mathbf{1}_{A}\bigg{(}\bigg{(}\frac{\ell(r\varrho(x))}{\lambda_{d}(B^{(d)}_{1})}\bigg{)}^{1/d}z\bigg{)}\mathbf{1}_{B}\bigg{(}\bigg{(}\frac{\ell(r\varrho(\widetilde{x}))}{\lambda_{d}(B^{(d)}_{1})}\bigg{)}^{1/d}z\bigg{)}{\rm d}z{\rm d}r.

W (U_{ϱ} (x, \cdot), U_{ϱ}

W (U_{ϱ} (x, \cdot), U_{ϱ}

= \frac{1}{λ _{d} ( B _{1}^{(d)} )} \int_{0}^{1} \int_{B_{1}^{(d)}} (\frac{ℓ ( r ϱ ( x ))}{λ _{d} ( B _{1}^{(d)} )})^{1/ d} - (\frac{ℓ ( r ϱ ( x ))}{λ _{d} ( B _{1}^{(d)} )})^{1/ d} ∣ z ∣ d z d r

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Quantitative spectral gap estimate and Wasserstein contraction of simple slice sampling

Viacheslav Natarovskii, Daniel Rudolf ${}^{\,*,}$ , Björn Sprungk Institute for Mathematical Stochastics, Georg-August-Universität Göttingen, Goldschmidtstraße 7, 37077 Göttingen, Email: [email protected], [email protected] for Mathematical Statistics in the Biosciences, Goldschmidtstraße 7, 37077 GöttingenFaculty of Mathematics and Computer Science, Technische Universität Bergakademie Freiberg, [email protected]

Abstract

We prove Wasserstein contraction of simple slice sampling for approximate sampling w.r.t. distributions with log-concave and rotational invariant Lebesgue densities. This yields, in particular, an explicit quantitative lower bound of the spectral gap of simple slice sampling. Moreover, this lower bound carries over to more general target distributions depending only on the volume of the (super-)level sets of their unnormalized density.

**Keywords: ** Slice sampling, spectral gap, Wasserstein contraction

Classification. Primary: 65C40; Secondary: 60J22, 62D99, 65C05.

1 Introduction

A challenging problem in Bayesian statistics and computational science is sampling w.r.t. distributions which are only known up to a normalizing constant. Assume that $G\subseteq\mathbb{R}^{d}$ and $\varrho:G\rightarrow(0,\infty)$ is integrable w.r.t. to the Lebesgue measure. The goal is to sample w.r.t. the distribution determined by $\varrho$ , say $\pi$ , that is,

[TABLE]

Here $\mathcal{B}(G)$ denotes the Borel $\sigma$ -algebra. In most cases this can only be done approximately and the idea is to construct a (time-homogeneous) Markov chain $(X_{n})_{n\in\mathbb{N}}$ which has $\pi$ as limit distribution, i.e., for increasing $n$ the distribution of $X_{n}$ converges to $\pi$ . Slice sampling methods provide auxiliary variable Markov chains for doing this and several different versions have been proposed and investigated [2, 7, 10, 11, 12, 14, 15, 20, 21]. In particular also Metropolis-Hastings algorithms can be considered as such methods, see [7, 25]. In the underlying work we investigate simple slice sampling which works as follows111It is straightforward to verify that $\pi$ is a stationary distribution of the simple slice sampler.:

Algorithm 1.1.

Given the current state $X_{n}=x\in G$ the simple slice sampling algorithm generates the next Markov chain instance $X_{n+1}$ by the following two steps:

Draw $T_{n}$ uniformly distributed in $[0,\varrho(x)]$ , call the result $t$ . 2. 2.

Draw $X_{n+1}$ uniformly distributed on

[TABLE]

the (super-) level set of $\varrho$ at $t$ .

The charm of this algorithmic approach lies certainly in the empirically attestable and intuitively reasonable well-behaving convergence properties of the corresponding Markov chain. Indeed, robust convergence properties are also established theoretically. Mira and Tierney in [12] prove uniform ergodicity under boundedness conditions on $G$ and $\varrho$ . Roberts and Rosenthal [20] provide qualitative statements about geometric ergodicity under weak assumptions as well as prove quantitative estimates of the total variation distance of the difference of the distribution of $X_{n}$ and $\pi$ under a condition on the initial state. However, less is known about the spectral gap. Namely, beyond the general implications [19, 22] from uniform and geometric ergodicity of the results of [12, 20] there is, to our knowledge, no explicit estimate of the spectral gap of simple slice sampling available. Let $U_{\varrho}$ be the transition operator/kernel of a Markov chain generated by simple slice sampling of a distribution $\pi$ with (unnormalized) density $\varrho$ . The spectral gap is defined by

[TABLE]

where $L^{0}_{2}(\pi)$ is the space of functions $f\colon G\to\mathbb{R}$ with zero mean and finite variance (i.e., $\mathbb{E}_{\pi}(f):=\int_{G}f{\rm d}\pi=0$ ; $\|f\|_{2,\pi}^{2}:=\int_{G}|f|^{2}{\rm d}\pi<\infty$ ). A spectral gap, that is, ${\rm gap}_{\pi}(U_{\varrho})>0$ , leads to desirable robustness and convergence properties. For example, it is well known that a spectral gap implies geometric ergodicity [9, 19], and since $U_{\varrho}$ is reversible, it also implies a central limit theorem (CLT) for all $f\in L_{2}(\pi)$ , see [8]. In addition to that it allows the estimation of the CLT asymptotic variance [6]. In particular, an explicit lower bound of ${\rm gap_{\pi}}(U_{\varrho})$ leads to quantitative estimates of the total variation distance and a mean squared error bound of Markov chain Monte Carlo. More precisely, it is well known, see for instance [17, Lemma 2], that

[TABLE]

where $\|\nu-\mu\|_{\rm tv}:=\sup_{A\in\mathcal{B}(G)}|\nu(A)-\mu(A)|$ denotes the total variation distance, $\nu=\mathbb{P}_{X_{1}}$ and $\nu U_{\varrho}^{n}=\mathbb{P}_{X_{n+1}}$ . Moreover, in [22] it is shown for the sample average that

[TABLE]

for any $p>2$ and any $f\colon G\to\mathbb{R}$ with $\|f\|_{p}^{p}=\int_{G}|f|^{p}{\rm d}\pi\leq 1$ , where $c_{p}$ is an explicit constant which depends only on $p$ .

The crucial drawback of simple slice sampling is that the second step in the algorithm is difficult to perform, in particular, in high-dimensional scenarios. However, in [15] and the more recent papers [13, 14, 16, 26, 27] efficient slice sampling algorithms are designed, which mimic (to some extent) simple slice sampling. Already [15] constructs a number of algorithms which perform a single Markov chain step on the chosen level set instead of sampling the uniform distribution. We call those methods hybrid slice sampler. For us the motivation to study simple slice sampling is twofold:

There is to our knowledge no quantitative statement about the spectral gap available and for simple slice sampling one would expect particularly good dependence on the dimension which we to some extent verify. 2. 2.

In the recent work of [10] it is proven that certain hybrid slice sampler, in terms of spectral gap, are, on the one hand, worse than simple slice sampling but on the other hand not much worse. Hence knowledge of the spectral gap of simple slice sampling might carry over to estimates of the spectral gap of hybrid slice samplers, in particular to those suggested in [15].

Now let us explain the main results of the underlying work. For this let the Wasserstein distance w.r.t. the Euclidean norm $|\cdot|$ of probability measures $\nu,\mu$ on $(G,\mathcal{B}(G))$ be given by

[TABLE]

where $\Gamma(\mu,\nu)$ is the set of all couplings of $\mu$ and $\nu$ . The set of couplings is defined by all measures on $G\times G$ with marginals $\mu$ and $\nu$ .

First main result (Theorem 2.1): For a rotational invariant and log-concave (unnormalized) density $\varrho$ defined either on Euclidean balls or the whole $\mathbb{R}^{d}$ we show in Theorem 2.1 Wasserstein contraction of simple slice sampling, that is, for all $x,y\in G\subseteq\mathbb{R}^{d}$ we have

[TABLE]

This has a number of useful consequences. It is well known, see for instance [23, Section 2], that this implies

[TABLE]

for any initial distribution $\nu$ on $G$ . In addition to that by [4, Theorem 1.5], see also [18, Proposition 30], it implies ${\rm gap}_{\pi}(U_{\varrho})\geq 1/(d+1)$ . Two simple examples which satisfy the assumptions of Theorem 2.1 are given by $\varrho(x)=\exp(-|x|)$ and $\varrho(x)=\exp(-|x|^{2}/2)$ where $G=\mathbb{R}^{d}$ . For the former one Roberts and Rosenthal in [21] argue with empirical experiments that simple slice sampling “does not mix rapidly in higher dimensions”. Indeed, we observe theoretically that for increasing dimension the performance of simple slice sampling gets worse, however, we disagree to some extent to their statement, since the dependence on the dimension is moderate. Namely, from (1) we obtain for any initial distribution that for $W(\nu U_{\varrho}^{n},\pi)\leq\varepsilon$ with $\varepsilon\in(0,1)$ we need

[TABLE]

which increases only linearly in $d$ .

Second main result (Theorem 3.10): Based on the fact that in the second step of Algorithm 1.1 we sample w.r.t. the uniform distribution on the (super-)level set $G(t)$ , one can conjecture that its geometric shape does not matter. However, its “size” or volume should matter222This is already observed in [20, 21].. To this end, we define the level-set function $\ell_{\varrho}\colon(0,\infty)\to[0,\infty)$ of $\varrho\colon G\to(0,\infty)$ , with $G\subseteq\mathbb{R}^{d}$ , by $\ell_{\varrho}(t):=\lambda_{d}(G(t))$ for $t\in(0,\infty)$ , where $\lambda_{d}$ denotes the $d$ -dimensional Lebesgue measure. The idea is now, to identify certain “nice” properties of $\ell_{\varrho}$ which lead to spectral gap estimates. Here, we propose classes $\Lambda_{k}$ , with $k\in\mathbb{N}$ , of level-set functions containing all continuous $\ell\colon(0,\infty)\to[0,\infty)$ satisfying, that

•

$\ell$ is strictly decreasing on the open interval

[TABLE]

(which implies the existence of the inverse $\ell^{-1}$ on $(0,\left\|\ell\right\|_{\infty})$ with $\left\|\ell\right\|_{\infty}:=\sup_{s\in(0,\infty)}\ell(s)$ ), and

•

the function $g\colon(0,\left\|\ell\right\|_{\infty}^{1/k})\to\operatorname{supp}\ell$ , given by $g(s)=\ell^{-1}(s^{k})$ is log-concave (i.e., $\log g$ is concave).

In Theorem 3.10 we then show that, if for an unnormalized density $\varrho\colon G\to(0,\infty)$ we have $\ell_{\varrho}\in\Lambda_{k}$ for a $k\in\mathbb{N}$ , then

[TABLE]

A crucial tool in the proof of Theorem 3.10 is the equality of the spectral gap of $U_{\varrho}$ and the spectral gap of the transition operator of the “level Markov chain” $(T_{n})_{n\in\mathbb{N}}$ defined within Algorithm 1.1. This statement is provided in Lemma 3.3. Observe, that in the formulation of the second main result we did not impose any uni-modality, log-concavity or rotational invariance assumption on $\varrho$ . It is allowed that the $d$ -variate function $\varrho$ has more than one mode, the only requirement is that the corresponding level-set function belongs to $\Lambda_{k}$ . In many cases, for ${k=d}$ this is satisfied, however, also $k<d$ is possible, see Example 3.15. It contains the special case where $\varrho$ is assumed to be the density of the $d$ -variate standard normal distribution, which leads to $\ell_{\varrho}\in\Lambda_{\lfloor d/2\rfloor}$ . In that case for large $d$ the lower bound from (2) improves the spectral gap estimate of Theorem 2.1 roughly by a factor of $2$ . We also consider a $d$ -variate “volcano density”, where we show that this leads to a level-set function in $\Lambda_{1}$ , such that the corresponding spectral gap of simple slice sampling is independent of the dimension satisfying the lower bound $1/2$ .

The outline of the paper is as follows. In the next section we provide basic notation and prove our main result w.r.t. the Wasserstein contractivity. Then, in Section 3 we state and discuss the necessary operator theoretic definitions and investigate the important relation between the Markov chains $(X_{n})_{n\in\mathbb{N}}$ and $(T_{n})_{n\in\mathbb{N}}$ generated by the simple slice sampling algorithm. There we also prove the main theorem about the lower bound of the spectral gap and illustrate the result after a discussion about the sets $\Lambda_{k}$ by examples.

2 Wasserstein contraction

Let $(\Omega,\mathcal{F},\mathbb{P})$ be the common probability space on which all random variables are defined. The sequence of random variables $(X_{n})_{n\in\mathbb{N}}$ determined by Algorithm 1.1 provides a Markov chain on $G$ , that is, for all $A\in\mathcal{B}(G)$ it satisfies (almost surely)

[TABLE]

where the transition kernel of simple slice sampling $U_{\varrho}\colon G\times\mathcal{B}(G)\to[0,1]$ is given by

[TABLE]

Here $U_{t}$ denotes the uniform distribution on the level set

[TABLE]

thus, $U_{t}(A)=\frac{\lambda_{d}(A\cap G(t)}{\lambda_{d}(G(t))}$ for $t>0$ . Note that by construction the transition kernel $U_{\varrho}$ is reversible w.r.t. $\pi$ , that is,

[TABLE]

In particular, this implies that $\pi$ is a stationary distribution of $U_{\varrho}$ . Further, by $B^{(d)}_{R}$ we denote the $d$ -dimensional closed Euclidean ball with radius $R>0$ around zero and by $\mathring{B}^{(d)}_{R}$ its interior. For log-concave rotational invariant unnormalized densities we formulate now our Wasserstein contraction result of the simple slice sampler.

Theorem 2.1.

For $R\in(0,\infty]$ let $\varphi\colon[0,R)\to\mathbb{R}$ be a strictly increasing and convex function on $[0,R)$ . Define $\varrho\colon\mathring{B}^{(d)}_{R}\to(0,\infty)$ by $\varrho(x):=\exp\left(-\varphi(|x|)\right)$ . Then, for any ${x,y\in\mathring{B}^{(d)}_{R}}$ we have

[TABLE]

Before we prove the result let us provide some comments on it.

Remark 2.2.

Let us emphasize here that we allow $R=\infty$ , which leads to $\mathring{B}_{R}=\mathbb{R}^{d}$ . Moreover, we remark that since on the right-hand side of (3) we have the absolute value of the difference of the Euclidean norm of $x$ and $y$ an immediate consequence by the triangle inequality is

[TABLE]

Example 2.3.

Let $\varphi\colon[0,\infty)\to\mathbb{R}$ be given as $\varphi(s)=s^{2}/2$ . This gives $\varrho(x)=\exp(-|x|^{2}/2)$ which leads to $\pi$ being a multivariate standard normal density. With $R=\infty$ and the convexity of $\varphi$ we obtain (3).

For the proof of Theorem 2.1 we need the following auxiliary result.

Lemma 2.4.

With $G=\mathring{B}^{(d)}_{R}$ let $\varrho\colon G\to(0,\infty)$ be given as in Theorem 2.1. Then, for any $x,y\in G$ we have

[TABLE]

where $\ell_{\varrho}\colon(0,\infty)\to[0,\infty)$ is the level-set function defined by $\ell_{\varrho}(t):=\lambda_{d}\left(G(t)\right)$ .

Proof.

Since $\varphi$ is strictly increasing and convex it is continuous and thus injective. Moreover, note that the image of $\varphi$ satisfies $\varphi([0,R))=[-\log\|\varrho\|_{\infty},-\log\inf\varrho)$ . Here $\|\varrho\|_{\infty}:=\sup_{x\in\mathring{B}^{(d)}_{R}}\varrho(x)$ and $\inf\varrho$ is an abbreviation of $\inf_{x\in\mathring{B}^{(d)}_{R}}\varrho(x)$ with the convention $\log 0:=-\infty$ . Hence, there exists the inverse

[TABLE]

In the case $\inf\varrho=0$ the inverse $\varphi^{-1}$ is defined on $[-\log\|\varrho\|_{\infty},\infty)$ . In the case $\inf\varrho>0$ we extend the inverse $\varphi^{-1}$ to $[-\log\|\varrho\|_{\infty},\infty)$ by setting

[TABLE]

Note that by this extension we do not change $\varphi^{-1}$ in $[-\log\|\varrho\|_{\infty},-\log\inf\varrho)$ and obtain

[TABLE]

For simplicity of the notation we write $\ell$ for $\ell_{\varrho}$ . Observe that

[TABLE]

since $\ell(t)=\lambda_{d}(G(t))=\varphi^{-1}(\log t^{-1})^{d}\ \lambda_{d}(B_{1}^{(d)})$ . Thus, $U_{t}$ denotes the uniform distribution on the Euclidean ball around the origin with radius $\left(\ell(t)/\lambda_{d}(B_{1}^{(d)})\right)^{1/d}$ . Now it is straightforward to verify that $u_{t,s}\colon\mathcal{B}(G^{2})\to[0,1]$ determined by

[TABLE]

where $A,B\in\mathcal{B}(G)$ , is a coupling of $U_{t}$ and $U_{s}$ . For example, we have

[TABLE]

Further, note that $c\colon G^{2}\times\mathcal{B}(G^{2})\to[0,1]$ determined by

[TABLE]

is a Markovian coupling of $U_{\varrho}(x,\cdot)$ and $U_{\varrho}(y,\cdot)$ , i.e., $c(x,y,A\times G)=U_{\varrho}(x,A)$ and $c(x,y,G\times B)=U_{\varrho}(y,B)$ for all $x,y\in G$ and $A,B\in\mathcal{B}(G)$ . Indeed, since

[TABLE]

we get for example

[TABLE]

Summarized, for arbitrary $x,\widetilde{x}\in G$ and $A,B\in\mathcal{B}(G)$ we obtain

[TABLE]

Using the Markovian coupling we obtain for arbitrary $x,\widetilde{x}\in G$ that

[TABLE]

which finishes the proof. ∎

Remark 2.5.

In the previous proof we used the coupling $u_{t,s}\in\Gamma(U_{t},U_{s})$ for $s,t\in(0,\|\varrho\|_{\infty})$ . In the setting of Lemma 2.4 observe that for $d=1$ it is related to the optimal Hoeffding-Fréchet coupling. This optimality property also holds for arbitrary $d>1$ , which is justified as follows. We derive an upper bound for $W(U_{t},U_{s})$ by $u_{t,s}$ ,

[TABLE]

where we used $\int_{B^{(d)}_{1}}|z|\textrm{d}z=\frac{d}{d+1}\lambda_{d}\big{(}B^{(d)}_{1}\big{)}.$ To derive a lower bound of $W(U_{t},U_{s})$ we apply the Kantorovich-Rubinstein duality formula of the Wasserstein distance (see e.g. [29, Chapter 1.2],) w.r.t. $U_{t}$ and $U_{s}$ . It is given by

[TABLE]

where $\|g\|_{\rm Lip}:=\sup_{x,y\in G}\frac{|g(x)-g(y)|}{|x-y|}$ for $g\colon G\to\mathbb{R}$ . (The supremum is taken over Lipschitz continuous functions with Lipschitz constant less or equal to $1$ .) Considering $h(z):=|z|$ and noting $\|h\|_{\rm Lip}\leq 1$ as well as

[TABLE]

then yields

[TABLE]

Hence

[TABLE]

which implies that $u_{t,s}$ is an optimal coupling.

Now we provide the proof of Theorem 2.1.

Proof of Theorem 2.1.

Again, for $\ell_{\varrho}$ we write $\ell$ . To verify the claim of the theorem by Lemma 2.4 it is sufficient to show that

[TABLE]

Then, by the extended inverse $\varphi^{-1}$ derived in the proof of Lemma 2.4 we have

[TABLE]

Here also note that by the definition of $\varrho$ we have $\varphi(0)=-\log\|\varrho\|_{\infty}$ . The representation (4) yields for any $r\in(0,1]$ and $x\in\mathring{B}^{(d)}_{R}$ that

[TABLE]

which leads to

[TABLE]

We now show that for any $r\in(0,1]$ and any $s,\widetilde{s}\in[0,R)$ we have

[TABLE]

which immediately yields the assertion of the theorem.

For this let $s,\widetilde{s}\in[0,R)$ and assume without loss of generality that $s\leq\widetilde{s}$ . Define for arbitrary fixed $s\in[0,R)$ the value $r_{\min}(s)$ by

[TABLE]

Hence

[TABLE]

Moreover, we set

[TABLE]

and since $\varphi$ is continuous and increasing we have

[TABLE]

The same arguments lead to

[TABLE]

and

[TABLE]

for

[TABLE]

Note, that due to $s\leq\widetilde{s}$ we have $\varphi(s)\leq\varphi(\widetilde{s})$ and, thus, $r_{\min}(\widetilde{s})\leq r_{\min}(s)$ . We distinguish three cases w.r.t. $r\in(0,1]$ :

Assume $r\leq r_{\min}(\widetilde{s})$ : Here $\varphi^{-1}(\varphi(s)-\log r)=\varphi^{-1}(\varphi(\widetilde{s})-\log r)=R$ and

[TABLE] 2. 2.

Assume $r>r_{\min}(s)$ : Here

[TABLE]

with $s^{\prime}(r),\widetilde{s}^{\prime}(r)\in[0,R)$ . We now exploit the convexity of $\varphi$ on $[0,R)$ which is equivalent to

[TABLE]

being increasing in $u$ for fixed $v$ and vice versa (since $R_{\varphi}$ is symmetric).

Hence, since $s\leq s^{\prime}(r)$ and $\widetilde{s}\leq\widetilde{s}^{\prime}(r)$ , we obtain

[TABLE]

which implies

[TABLE] 3. 3.

Assume $r_{\min}(\widetilde{s})\leq r<r_{\min}(s)$ : Here333This case only occurs if $\lim_{t\uparrow R}\varphi(t)=-\log\inf\varrho<\infty$ . In that situation define $\varphi(R):=-\log\inf\varrho$ and observe that with this extension $\varphi$ is increasing and convex on $[0,R]$ .

[TABLE]

By the fact that $\varphi$ is increasing and convex it is continuous, such that there exists an $\hat{s}\in[0,R)$ with $s\leq\hat{s}\leq\widetilde{s}$ satisfying

[TABLE]

and, hence, $\hat{s}^{\prime}(r)=R$ . By employing the same reasoning as in (5) using the convexity of $\varphi$ we have that

[TABLE]

This finishes the proof. ∎

It is fair to ask whether the estimate can be improved. The following example answers this question. Namely, in any dimension we find a parameterized family of unnormalized densities for which (3) holds with equality.

Example 2.6.

Let $\alpha>0$ be an arbitrary parameter. With the notation of Theorem 2.1 set $R=\infty$ and $\varphi(s)=\alpha s$ on $[0,\infty)$ . The function $\varphi$ is strictly increasing and concave on $[0,\infty)$ . Hence, for $\varrho\colon\mathbb{R}^{d}\to(0,\infty)$ with $\varrho(x)=\exp(-\alpha|x|)$ the estimate of (3) is true. Further observe that $G(t)=B^{(d)}_{(\log t^{-1})/\alpha}$ . For $x,y\in\mathbb{R}^{d}$ we use again the Kantorovich-Rubinstein duality formula of the Wasserstein distance w.r.t. $U_{\varrho}(x,\cdot)$ and $U_{\varrho}(y,\cdot)$ , that is,

[TABLE]

where $\|g\|_{\rm Lip}:=\sup_{x,y\in\mathbb{R}^{d}}\frac{|g(x)-g(y)|}{|x-y|}$ for $g\colon\mathbb{R}^{d}\to\mathbb{R}$ . We argue as in Remark 2.5 and set $h(z)=|z|$ . Note that this function satisfies $\|h\|_{\rm Lip}\leq 1$ as well as

[TABLE]

where we again used the fact that $\int_{B^{(d)}_{1}}|z|\textrm{d}z=\frac{d}{d+1}\lambda_{d}\big{(}B^{(d)}_{1}\big{)}.$ Hence, by (6), employing the function $h$ we get a lower bound of $W(U_{\varrho}(x,\cdot),U_{\varrho}(y,\cdot))$ , which coincides with the upper bound (3). Thus, the Markovian coupling $c(x,y,\cdot)\in\Gamma(U_{\varrho}(x,\cdot),U_{\varrho}(y,\cdot))$ constructed in Lemma 2.4 is in this scenario optimal and

[TABLE]

This establishes that the inequality stated in Theorem 2.1 can, in general, not be improved.

3 Spectral gap estimate

In this section we investigate spectral gap properties of the Markov operator induced by the transition kernel $U_{\varrho}$ of the Markov chain $(X_{n})_{n\in\mathbb{N}}$ . For this we need further definitions. By $L_{2}(\pi)$ we denote the Hilbert space of functions $f\colon G\to\mathbb{R}$ with finite norm $\|f\|_{2,\pi}:=\left(\int_{G}|f|^{2}{\rm d}\pi\right)^{1/2}$ . By the reversibility of $U_{\varrho}$ we have that $\pi$ is a stationary distribution. The transition kernel $U_{\varrho}$ can be extended to a linear operator ${U}_{\varrho}\colon L_{2}(\pi)\to L_{2}(\pi)$ defined by

[TABLE]

It is well known that a general Markov operator is self-adjoint on $L_{2}(\pi)$ iff the corresponding transition kernel is reversible w.r.t. $\pi$ , see for example [22, Lemma 3.9]. We denote the (mean) functional $\mathbb{E}_{\pi}\colon L_{2}(\pi)\to\mathbb{R}$ by $\mathbb{E}_{\pi}(f):=\int_{G}f{\rm d}\pi$ and note that this can be extended to a bounded linear operator $\mathbb{E}_{\pi}\colon L_{2}(\pi)\to L_{2}(\pi)$ with $\mathbb{E}_{\pi}(f)\equiv\int_{G}f{\rm d}\pi$ . With this notation the spectral gap of $U_{\varrho}$ is determined by the operator norm of $U_{\varrho}-\mathbb{E}_{\pi}$ , i.e., it is given by

[TABLE]

Further let $L^{0}_{2}(\pi)$ be the set of functions $f\in L_{2}(\pi)$ with $\mathbb{E}_{\pi}(f)=0$ . Using the normed linear space $L_{2}^{0}(\pi)$ it is well known that $\|U_{\varrho}\|_{L_{2}^{0}(\pi)\to L_{2}^{0}(\pi)}=\|U_{\varrho}-\mathbb{E}_{\pi}\|_{L_{2}(\pi)\to L_{2}(\pi)}$ , see e.g. [22, Lemma 3.16], such that

[TABLE]

An immediate consequence of Theorem 2.1, for example by applying [18, Proposition 30], is the following:

Corollary 3.1.

Assume that $\varphi$ satisfies the conditions formulated in Theorem 2.1 and $\varrho(x)=\exp(-\varphi(|x|))$ . Then

[TABLE]

The aim of this section is to extend and improve the previous estimate to a larger class of density functions which are not necessarily log-concave and rotational invariant.

For this, in addition to the Markov chain $(X_{n})_{n\in\mathbb{N}}$ , the auxiliary variable Markov chain $(T_{n})_{n\in\mathbb{N}}$ also determined by Algorithm 1.1 is useful. In the next section we introduce the corresponding transition kernel, provide a relation to $U_{\varrho}$ and investigate further properties of $(T_{n})_{n\in\mathbb{N}}$ .

3.1 Auxiliary variable Markov chain

The sequence of auxiliary random variables $(T_{n})_{n\in\mathbb{N}}$ from Algorithm 1.1 provides also a Markov chain. In contrast to $(X_{n})_{n\in\mathbb{N}}$ the Markov chain $(T_{n})_{n\in\mathbb{N}}$ is defined on $(\mathbb{R}^{+},\mathcal{B}(\mathbb{R}^{+}))$ , with $\mathbb{R}^{+}:=(0,\infty)$ and the transition kernel is given by

[TABLE]

Recall that the level-set function of $\varrho$ is given by $\ell_{\varrho}(t)=\lambda_{d}(G(t))$ and define a probability measure $\mu$ on $(\mathbb{R}^{+},\mathcal{B}(\mathbb{R}^{+}))$ by

[TABLE]

From [10, Lemma 1] it follows that the transition kernel $Q_{\varrho}$ is reversible w.r.t. $\mu$ . For the convenience of the reader we prove this fact in our setting.

Lemma 3.2.

The transition kernel $Q_{\varrho}$ on $(\mathbb{R}^{+},\mathcal{B}(\mathbb{R}^{+}))$ is reversible w.r.t. $\mu$ .

Proof.

For any $A,B\in\mathcal{B}(\mathbb{R}^{+})$ we have

[TABLE]

Using the fact that $\mathbf{1}_{G(s)}(x)=\mathbf{1}_{[0,\varrho(x)]}(s)$ we have

[TABLE]

Note that the right-hand side of the previous equation is symmetric in $A$ and $B$ , such that we can change their roles and argue backwards. This leads to

[TABLE]

which finishes the proof. ∎

Now we present a relation of the spectral gap of $U_{\varrho}$ to the spectral gap of $Q_{\varrho}$ . Here we need the Hilbert space $L_{2}(\mu)$ , which consists of functions $h\colon\mathbb{R}^{+}\to\mathbb{R}$ with finite $\|h\|_{2,\mu}:=\left(\int_{\mathbb{R}^{+}}|h|^{2}\mu({\rm d}t)\right)^{1/2}$ . To state the spectral gap of $Q_{\varrho}$ let $\mathbb{E}_{\mu}\colon L_{2}(\mu)\to\mathbb{R}$ be the (mean) functional given by $\mathbb{E}_{\mu}h:=\int_{\mathbb{R}^{+}}h{\rm d}\mu$ , which we consider as linear operator mapping $L_{2}(\mu)$ functions to constant ones. Then, the spectral gap of $Q_{\varrho}$ is given by the operator norm

[TABLE]

where the transition kernel $Q_{\varrho}$ is extended to the self-adjoint Markov operator $Q_{\varrho}\colon L_{2}(\mu)\to L_{2}(\mu)$ defined by

[TABLE]

Note that the self-adjointness here comes (again as for $U_{\varrho}$ ) by the fact that $Q_{\varrho}$ is reversible. With this notation we obtain:

Lemma 3.3.

The spectral gaps of $Q_{\varrho}$ and $U_{\varrho}$ coincide, that is, ${\rm gap}_{\pi}(U_{\varrho})={\rm gap}_{\mu}(Q_{\varrho}).$

Proof.

Define the linear operators $V\colon L_{2}(\mu)\to L_{2}(\pi)$ and $V^{*}\colon L_{2}(\pi)\to L_{2}(\mu)$ by

[TABLE]

Now we show that $V^{*}$ is the adjoint operator of $V$ , i.e., $\langle Vg,f\rangle_{\pi}=\langle g,V^{*}f\rangle_{\mu}$ , where $\langle\cdot,\cdot\rangle_{\pi}$ and $\langle\cdot,\cdot\rangle_{\mu}$ are the inner products of $L_{2}(\pi)$ and $L_{2}(\mu)$ , respectively. We have

[TABLE]

Further we use the fact that $\boldsymbol{1}_{[0,\varrho(x)]}(t)=\boldsymbol{1}_{G(t)}(x)$ , that $\int_{G}\varrho(y){\rm d}y=\int_{0}^{\infty}\ell_{\varrho}(r){\rm d}r$ and change the order of the integrals. Finally, we have

[TABLE]

Furthermore, we have $U_{\varrho}=VV^{*}$ and $Q_{\varrho}=V^{*}V$ . Now, define $S\colon L_{2}(\mu)\to L_{2}(\pi)$ and $S^{*}\colon L_{2}(\pi)\to L_{2}(\mu)$ by

[TABLE]

Also, note here that $S^{*}$ is the adjoint operator of $S$ , as well as, $\mathbb{E}_{\pi}=SS^{*}$ and $\mathbb{E}_{\mu}=S^{*}S$ . Define $R:=V-S$ and the adjoint $R^{*}=V^{*}-S^{*}$ . By the fact that also $\mathbb{E}_{\pi}=SV^{*}=VS^{*}$ we have

[TABLE]

Similarly, by $\mathbb{E}_{\mu}=V^{*}S=S^{*}V$ we obtain $R^{*}R=Q_{\varrho}-\mathbb{E}_{\mu}$ . Now using the well-known fact, see e.g. [5, Proposition 2.7], that

[TABLE]

the statement of the lemma follows by

[TABLE]

and the definition of the spectral gap. ∎

Remark 3.4.

Similar arguments as in the previous proof have been used in [28, Section 4.2] in a finite state space setting as well as in [10, 24, 25].

Now we argue that the transition kernel $Q_{\varrho}$ (and therefore also the Markov operator) only depends on $\varrho$ via its level-set function $\ell_{\varrho}$ .

Lemma 3.5.

For an unnormalized density $\varrho\colon G\to\mathbb{R}^{+}$ we have for any $t\in\mathbb{R}^{+}$ that

[TABLE]

where on the right-hand side we use the Lebesgue-Stieltjes integral w.r.t. $-\ell_{\varrho}$ .

Proof.

Let $g\colon(t,\ell_{\varrho}(0))\to\mathbb{R}^{+}$ with $g(r)=\lambda_{1}\left(B\cap\left[0,r\right]\right)/r$ and note that the pushforward measure $\varrho_{*}\lambda_{d}$ on $\mathbb{R}_{+}$ is defined by

[TABLE]

Hence for any $r,s\in\mathbb{R^{+}}$ with $r<s$ we have

[TABLE]

where $\ell_{\varrho}(t+)$ denotes the right limit at $t\in\mathbb{R}^{+}$ of the left-continuous level-set function. Thus, $\varrho_{*}\lambda_{d}$ is the Lebesgue-Stieltjes measure associated to the monotone non-decreasing function $-\ell_{\varrho}\colon\mathbb{R}_{+}\to(-\infty,0]$ , see, e.g., [1, Section 1.3.2], and we obtain with a change of variable, see [3, Theorem 3.6.1, p. 190], that

[TABLE]

∎

Remark 3.6.

For a given $\varrho\colon G\to\mathbb{R}^{+}$ with continuously differentiable level-set function $\ell_{\varrho}$ the previous result can be stated as

[TABLE]

An immediate consequence of Lemma 3.3 and Lemma 3.5 is the following important result.

Corollary 3.7.

Let $d,\widetilde{d}\in\mathbb{N}$ and $G\subseteq\mathbb{R}^{d}$ as well as $\widetilde{G}\subseteq\mathbb{R}^{\widetilde{d}}$ . Further let $\varrho\colon G\to\mathbb{R}^{+}$ and $\widetilde{\varrho}\colon\widetilde{G}\to\mathbb{R}^{+}$ satisfying $\ell_{\varrho}(t)=\ell_{\widetilde{\varrho}}(t)$ for all $t\in\mathbb{R}^{+}$ . Then

[TABLE]

and

[TABLE]

where $\widetilde{\pi}$ denotes the distribution induced by $\widetilde{\varrho}$ .

Thus, the above corollary tells us that the spectral gap of simple slice sampling is entirely determined by the level-set function $\ell_{\varrho}\colon\mathbb{R}^{+}\to[0,\infty)$ of the (unnormalized) target density $\varrho$ and does, for instance, not necessarily depend on the dimension of $G$ . In particular, Corollary 3.7 allows us to extend the spectral gap result of Corollary 3.1 to much larger classes of target distributions as we explain in detail in the next subsection.

3.2 Spectral gap result

Corollary 3.7 implies that the lower bound for the spectral gap of simple slice sampling of rotational invariant and log-concave (unnormalized) target densities also holds for other target densities which share the same level-set function. Thus, our idea is to identify convenient classes of target densities $\varrho\colon G\to[0,\infty)$ , with $G\subseteq\mathbb{R}^{d}$ , which possess the same level-set function as a rotational invariant and log-concave unnormalized density $\widetilde{\varrho}\colon\widetilde{G}\to[0,\infty)$ , with $\widetilde{G}\subseteq\mathbb{R}^{\widetilde{d}}$ . We illustrate this approach first by an example and formalize it rigorously afterwards.

Example 3.8.

We consider a bimodal distribution $\pi$ on the set

[TABLE]

with $m_{0}=(5,0,\ldots,0)\in\mathbb{R}^{d}$ given by the unnormalized density

[TABLE]

Notice that $\varrho$ is positive on $G$ . Here it is worth to mention that in particular in such scenarios an efficient implementation of simple slice sampling is challenging and we are at this point merely interested in theoretical properties. By construction, the level sets of $\varrho$ consist of two disjoint balls, i.e., we have

[TABLE]

This leads to

[TABLE]

In Figure 2 and Figure 2 we provide an illustration of $\varrho$ and $\ell_{\varrho}$ for $d=2$ .

Straightforwardly one obtains the inverse of $\ell_{\varrho}$ given by $\ell_{\varrho}^{-1}\colon(0,\ell_{\varrho}(0))\to(0,1/2)$ with

[TABLE]

Now, for $k\in\mathbb{N}$ we can define rotational invariant unnormalized densities

[TABLE]

by

[TABLE]

which have the same level-set function as $\varrho$ , i.e., $\ell_{\varrho}(t)=\ell_{\widetilde{\varrho}^{(k)}}(t)$ for all $t\in(0,1/2)$ . Note that the dimension of the domain of $\widetilde{\varrho}^{(k)}$ is $k$ , whereas for $\varrho$ it is $d$ and $d$ does not need to coincide with $k$ . In Figure 4 and Figure 4 we display $\widetilde{\varrho}^{(k)}$ for $k=1$ , $k=2$ and $d=2$ . By Corollary 3.7 we can conclude that the spectral gaps of $U_{\varrho}$ and $U_{\widetilde{\varrho}^{(k)}}$ are the same. Moreover, the auxiliary densities $\widetilde{\varrho}^{(k)}$ are of the form $\widetilde{\varrho}^{(k)}(x)=\exp(-\varphi_{k}(|x|))$ on their domain, where

[TABLE]

for all $s\in\big{[}0,(\ell_{\varrho}(0)/\lambda_{k}(B^{(k)}_{1})^{1/k}\big{)}$ . Thus, for $k\geq\lceil\frac{d}{2}\rceil$ the function $\varphi_{k}$ is strictly increasing and convex, i.e., the unnormalized density $\widetilde{\varrho}^{(k)}$ satisfies the assumptions of Theorem 2.1 and Corollary 3.1, respectively. Hence, we can conclude that simple slice sampling of the bimodal target $\pi$ on $\mathbb{R}^{d}$ given by $\varrho$ has a spectral gap of at least

[TABLE]

The previous example suggests the definition of the following classes of level-set functions.

Definition 3.9.

A continuous function $\ell\colon(0,\infty)\to[0,\infty]$ belongs to the class $\Lambda_{k}$ with $k\in\mathbb{N}$ if

$\ell$ is strictly decreasing on its open support

[TABLE]

which implies the existence of the inverse $\ell^{-1}$ on $\ell(\operatorname{supp}\ell)=(0,\|\ell\|_{\infty})$ with

[TABLE] 2. 2.

the function $g\colon\big{(}0,\|\ell\|_{\infty}^{1/k}\big{)}\to\operatorname{supp}\ell$ given by $g(s):=\ell^{-1}(s^{k})$ is log-concave, that is, $\log g$ is concave.

The main result of this section is then as follows:

Theorem 3.10.

For an unnormalized density $\varrho\colon G\to\mathbb{R}^{+}$ assume that its level-set function $\ell_{\varrho}\in\Lambda_{k}$ for $k\in\mathbb{N}$ . Then

[TABLE]

Proof.

The idea here is to construct an unnormalized density $\widetilde{\varrho}^{(k)}\colon\mathbb{R}^{k}\to\mathbb{R}^{+}$ such that $\ell_{\varrho}=\ell_{\widetilde{\varrho}^{(k)}}$ and $\widetilde{\varrho}^{(k)}$ satisfies the assumptions of Theorem 2.1. The statement then follows by Corollary 3.1 and Corollary 3.7. To this end, we define $\widetilde{\varrho}^{(k)}\colon\mathring{B}_{R_{k}}^{(k)}\to\mathbb{R}^{+}$ with ${R_{k}:=\left(\|\ell_{\varrho}\|_{\infty}/\lambda_{k}(B_{1}^{(k)})\right)^{1/k}}$ by

[TABLE]

By construction we have for any $t\in(0,\infty)$

[TABLE]

Next, we observe that $\widetilde{\varrho}^{(k)}(x)=\exp(-\varphi_{k}\left(\left|x\right|\right))$ for $|x|<R_{k}$ with

[TABLE]

Since $\ell_{\varrho}$ belongs to $\Lambda_{k}$ , we know that $s\mapsto\log\ell_{\varrho}^{-1}\left(s^{k}\right)$ is concave. This yields the convexity of $\varphi_{k}$ on $[0,R_{k})$ . Moreover, $\ell_{\varrho}\in\Lambda_{k}$ implies that also $\ell_{\varrho}^{-1}$ is strictly decreasing on $[0,\|\ell_{\varrho}\|_{\infty})$ . Thus, the mapping $s\mapsto\log\ell_{\varrho}^{-1}\left(s^{k}\right)$ is strictly decreasing and, therefore, $\varphi_{k}$ is strictly increasing. Hence, the unnormalized density $\widetilde{\varrho}^{(k)}$ satisfies the assumptions of Theorem 2.1 which finishes the proof. ∎

Notice that the lower the number $k$ of the class $\Lambda_{k}$ the larger the lower bound of the spectral gap. Subsequently, we provide some (sufficient) characterizations of the classes $\Lambda_{k}$ .

3.2.1 Properties of the class $\Lambda_{k}$

The requirements of a level-set function to belong to the class $\Lambda_{k}$ are not easy to check. We provide some auxiliary tools. The following is a trivial consequence of the definition of $\Lambda_{k}$ .

Proposition 3.11.

If $\ell\in\Lambda_{k}$ for $k\in\mathbb{N}$ and $c>0$ , then $c\cdot\ell\in\Lambda_{k}$ .

Now a sufficient condition for being in $\Lambda_{1}$ is stated.

Proposition 3.12.

If $\ell\colon(0,\infty)\to[0,\infty)$ is strictly decreasing and concave, then $\ell\in\Lambda_{1}.$

Proof.

Since $\ell$ is strictly decreasing and concave we have that $\ell^{-1}$ is concave. Then $\log\ell^{-1}$ is log-concave and $\ell\in\Lambda_{1}$ . ∎

Assuming smoothness of $\ell$ the previous result can be extended and provides a characterisation of $\Lambda_{k}$ .

Proposition 3.13.

Let $\ell\colon(0,\infty)\to[0,\infty)$ be continuously differentiable on its open support $\operatorname{supp}\ell$ with $\ell^{\prime}(t)<0$ . Define the function $\psi\colon\operatorname{supp}\ell\to[0,\infty)$ by $\psi(t):=\frac{t\ell^{\prime}(t)}{\ell(t)^{1-1/k}}$ for $k\in\mathbb{N}$ . Then

[TABLE]

Proof.

The function $\ell$ is strictly decreasing on $\operatorname{supp}\ell$ , since $\ell^{\prime}(t)<0$ on that interval. This implies that the inverse $\ell^{-1}\colon[0,\|\ell\|_{\infty})\to\operatorname{supp}\ell$ exists and is strictly decreasing. Define the function $\varphi_{k}\colon[0,\|\ell\|_{\infty}^{1/k})\to\mathbb{R}$ with $\varphi_{k}(s):=-\log\ell^{-1}(s^{k})$ . Observe that $\varphi_{k}$ is strictly increasing and by the inverse mapping theorem continuously differentiable on $\operatorname{supp}\ell$ . We have

[TABLE]

Given the assumptions we have that $\ell\in\Lambda_{k}$ if and only if $\varphi_{k}$ is convex. The latter is equivalent to $\varphi_{k}^{\prime}$ being increasing. Note that for $s\in[0,\|\ell\|_{\infty}^{1/k})$

[TABLE]

Hence, with $h(t):=-\frac{\ell^{1-1/k}(t)}{t\ell^{\prime}(t)}$ we obtain $\varphi_{k}^{\prime}(s)=k\cdot h(\ell^{-1}(s^{k})),$ which leads to the fact that

[TABLE]

However, the latter is equivalent to the fact that the mapping $t\mapsto\frac{t\ell^{\prime}(t)}{\ell(t)^{1-1/k}}$ is decreasing, since $\frac{\ell(t)^{1-1/k}}{t\ell^{\prime}(t)}<0$ on $\operatorname{supp}\ell$ . ∎

Remark 3.14.

Roberts and Rosenthal [20] derived convergence results of simple slice sampling given the assumption that $t\mapsto t\ell^{\prime}(t)$ is decreasing which corresponds to the sufficient condition for $\ell\in\Lambda_{1}$ . In particular, they write “However, it is surprising that this same bound 444They provide a quantitative bound of $\|U_{\varrho}^{n}(x,\cdot)-\pi(\cdot)\|_{\rm tv}$ for any continuously differentiable $\ell$ as in Proposition 3.13. applies to any density $\varrho$ such that $t\ell^{\prime}(t)$ is non-increasing”555For the formulas we adapted their statement to our notation, namely in their work our $\varrho$ is $\pi$ and our $\ell$ is denoted by $Q$ .. We also observe this surprising fact, but w.r.t. the spectral gap. In contrast to their result, in general, we do not require the existence of the first derivative from the level-set function. Moreover, our result for $\Lambda_{k}$ with $k>1$ has no analogues in the work of Roberts and Rosenthal. To emphasize this we consider in Section 3.2.2 an example of a level-set function which is in $\Lambda_{2}$ but not in $\Lambda_{1}$ .

3.2.2 Further examples

We illustrate in two more examples the advantages of Theorem 3.10 compared to Theorem 2.1.

Example 3.15.

For $\alpha>0$ and $\gamma>0$ let $\varrho^{(d)}\colon\mathbb{R}^{d}\to\mathbb{R}^{+}$ be given by $\varrho^{(d)}(x)=\exp(-\alpha|x|^{\gamma})$ . By Proposition 3.11 it is sufficient to consider

[TABLE]

with $c_{1}=\lambda_{d}(B_{1}^{(d)})$ . The function $\ell$ is strictly decreasing and $\log\ell^{-1}(s^{k})=-\alpha s^{\gamma\frac{k}{d}}$ . Thus, for any $\gamma\geq 1$ and $k=d$ it is concave on $(0,\infty)$ , such that for this parameters $\ell\in\Lambda_{d}$ and by Theorem 3.10

[TABLE]

However, we notice that $\log\ell^{-1}(s^{k})=-\alpha s^{\gamma k/d}$ is concave for $k\geq\lceil d/\gamma\rceil$ . Otherwise, for $k<\lceil d/\gamma\rceil$ it is convex. Thus, we have that $\ell_{\varrho}\in\Lambda_{\lceil d/\gamma\rceil}$ but if $d<\gamma$ , then $\ell_{\varrho}\notin\Lambda_{\lceil d/\gamma\rceil-1}$ . For instance, for $\gamma=d/2$ we have that $\ell_{\varrho}\in\Lambda_{2}$ and $\ell_{\varrho}\notin\Lambda_{1}$ . Hence, Theorem 3.10 tells us that for this class of target densities

[TABLE]

In the following we consider a “volcano” density.

Example 3.16.

Let $\varrho^{(d)}\colon\mathbb{R}^{d}\to\mathbb{R}^{+}$ be given by $\varrho^{(d)}(x)=e^{-|x|^{2d}+2|x|^{d}}$ . In contrast to Example 3.15 here we have more than a single peak. For $d=2$ the density is plotted in Figure 6.

It is easy to see that $\ell_{\varrho^{(d)}}$ is proportional to the strictly decreasing function $\ell\colon(0,\infty)\to[0,\infty)$ given by

[TABLE]

such that by Proposition 3.11 it is sufficient to consider $\ell$ . This leads to

[TABLE]

and we have that $\log\ell^{-1}(s)$ is concave, see also Figure 6. Hence $\ell\in\Lambda_{1}$ for arbitrary $d$ and Theorem 3.10 implies

[TABLE]

Acknowledgements

Viacheslav Natarovskii thanks the DFG Research Training Group 2088 for their support. Daniel Rudolf thanks Andreas Eberle for fruitful discussions on this topic and acknowledges support of the Felix-Bernstein-Institute for Mathematical Statistics in the Biosciences (Volkswagen Foundation) and the Campus laboratory AIMS. Björn Sprungk thanks the DFG for supporting this research within project 389483880 and the Research Training Group 2088.

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Athreya, K. B. and Lahiri, S. N. (2006). Measure theory and probability theory , Springer Texts in Statistics, Springer, New York.
2[2] Besag, J. and Green, P. (1993). Spatial statistics and Bayesian computation , J. Roy. Statist. Soc. Ser. B, 25–37.
3[3] Bogachev, V. I. (2007). Measure theory , vol. I, Springer-Verlag Berlin Heidelberg.
4[4] Chen, M. and Wang, F. (1994). Application of coupling method to the first eigenvalue on manifold , Sci. China Ser. A 37 , no. 1, 1–14.
5[5] Conway, J. B. (1985). A course in functional analysis , Springer Verlag, New York.
6[6] Flegal, J. and Jones, G. (2010). Batch means and spectral variance estimators in Markov chain Monte Carlo , Ann. Statist. 38 , no. 2, 1034–1070.
7[7] Higdon, D. (1998). Auxiliary variable methods for Markov chain Monte Carlo with applications , J. Amer. Statist. Assoc. 93 , no. 442, 585–595.
8[8] Kipnis, C. and Varadhan, S. (1986). Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions , Communications in Mathematical Physics 104 , no. 1, 1–19.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Quantitative spectral gap estimate and Wasserstein contraction of simple slice sampling

Abstract

1 Introduction

Algorithm 1.1**.**

2 Wasserstein contraction

Theorem 2.1**.**

Remark 2.2**.**

Example 2.3**.**

Lemma 2.4**.**

Proof.

Remark 2.5**.**

Proof of Theorem 2.1.

Example 2.6**.**

3 Spectral gap estimate

Corollary 3.1**.**

3.1 Auxiliary variable Markov chain

Lemma 3.2**.**

Proof.

Lemma 3.3**.**

Proof.

Remark 3.4**.**

Lemma 3.5**.**

Proof.

Remark 3.6**.**

Corollary 3.7**.**

3.2 Spectral gap result

Example 3.8**.**

Definition 3.9**.**

Theorem 3.10**.**

Proof.

3.2.1 Properties of the class Λk\Lambda_{k}Λk​

Proposition 3.11**.**

Proposition 3.12**.**

Proof.

Proposition 3.13**.**

Proof.

Remark 3.14**.**

3.2.2 Further examples

Example 3.15**.**

Example 3.16**.**

Acknowledgements

Algorithm 1.1.

Theorem 2.1.

Remark 2.2.

Example 2.3.

Lemma 2.4.

Remark 2.5.

Example 2.6.

Corollary 3.1.

Lemma 3.2.

Lemma 3.3.

Remark 3.4.

Lemma 3.5.

Remark 3.6.

Corollary 3.7.

Example 3.8.

Definition 3.9.

Theorem 3.10.

3.2.1 Properties of the class $\Lambda_{k}$

Proposition 3.11.

Proposition 3.12.

Proposition 3.13.

Remark 3.14.

Example 3.15.

Example 3.16.