The Riemannian barycentre as a proxy for global optimisation

Salem Said; Jonathan H. Manton

arXiv:1902.03885·math.ST·February 12, 2019·GSI

The Riemannian barycentre as a proxy for global optimisation

Salem Said, Jonathan H. Manton

PDF

Open Access

TL;DR

This paper proposes a novel approach to global optimization on Riemannian symmetric spaces by replacing the minimization of a function with the problem of finding the barycentre of a Gibbs distribution, providing theoretical guarantees and an algorithm.

Contribution

It introduces a method to use Riemannian barycentres of Gibbs distributions as proxies for global minima, with explicit temperature bounds ensuring convexity and uniqueness.

Findings

01

Strong convexity of the energy function within a certain temperature range.

02

Explicit computation of the temperature threshold $T_ ext{delta}$.

03

Algorithmic framework for black-box optimization on Riemannian manifolds.

Abstract

Let $M$ be a simply-connected compact Riemannian symmetric space, and $U$ a twice-differentiable function on $M$ , with unique global minimum at $x^{*} \in M$ . The idea of the present work is to replace the problem of searching for the global minimum of $U$ , by the problem of finding the Riemannian barycentre of the Gibbs distribution $P_{T} \propto exp (- U / T)$ . In other words, instead of minimising the function $U$ itself, to minimise $E_{T} (x) = \frac{1}{2} \int d^{2} (x, z) P_{T} (d z)$ , where $d (\cdot, \cdot)$ denotes Riemannian distance. The following original result is proved : if $U$ is invariant by geodesic symmetry about $x^{*}$ , then for each $δ < \frac{1}{2} r_{c x}$ ( $r_{c x}$ the convexity radius of $M$ ), there exists $T_{δ}$ such…

Figures2

Click any figure to enlarge with its caption.

Equations169

E (x) = \frac{1}{2} \int_{M} d^{2} (x, z) P (d z) \mbox f or x \in M

E (x) = \frac{1}{2} \int_{M} d^{2} (x, z) P (d z) \mbox f or x \in M

P_{T} (d z) = (Z (T))^{- 1} exp [- \frac{U ( z )}{T}] vol (d z); T > 0

P_{T} (d z) = (Z (T))^{- 1} exp [- \frac{U ( z )}{T}] vol (d z); T > 0

W (P_{T}, δ_{x^{*}}) < \frac{η ^{2}}{( 4 diam M )} ⟹ d (\overset{x}{ˉ}_{T}, x^{*}) < η

W (P_{T}, δ_{x^{*}}) < \frac{η ^{2}}{( 4 diam M )} ⟹ d (\overset{x}{ˉ}_{T}, x^{*}) < η

W (P_{T}, δ_{x^{*}}) \leq 2 π (π /2)^{n - 1} B_{n}^{- 1} (μ_{m a x} / μ_{m i n})^{n /2} (T / μ_{m i n})^{1/2}

W (P_{T}, δ_{x^{*}}) \leq 2 π (π /2)^{n - 1} B_{n}^{- 1} (μ_{m a x} / μ_{m i n})^{n /2} (T / μ_{m i n})^{1/2}

E_{T} (γ (t)) = \frac{1}{2} \int_{M} d^{2} (γ (t), z) P_{T} (d z) = \frac{1}{2} \int_{M - Cut (γ)} d^{2} (γ (t), z) P_{T} (d z)

E_{T} (γ (t)) = \frac{1}{2} \int_{M} d^{2} (γ (t), z) P_{T} (d z) = \frac{1}{2} \int_{M - Cut (γ)} d^{2} (γ (t), z) P_{T} (d z)

G_{x} = \int_{M - Cut (x)} G_{x} (z) P_{T} (d z); H_{x} = \int_{M - Cut (x)} H_{x} (z) P_{T} (d z)

G_{x} = \int_{M - Cut (x)} G_{x} (z) P_{T} (d z); H_{x} = \int_{M - Cut (x)} H_{x} (z) P_{T} (d z)

\nabla E_{T} (x) = G_{x} \mbox an d \nabla^{2} E_{T} (x) = H_{x}

\nabla E_{T} (x) = G_{x} \mbox an d \nabla^{2} E_{T} (x) = H_{x}

f (T) = (2/ π) (π /8)^{n /2} (μ_{m a x} / T)^{n /2} exp (- U_{δ} / T)

f (T) = (2/ π) (π /8)^{n /2} (μ_{m a x} / T)^{n /2} exp (- U_{δ} / T)

\nabla^{2} E_{T} (x) \geq Ct (2 δ) (1 - vol (M) f (T)) - π A_{M} f (T)

\nabla^{2} E_{T} (x) \geq Ct (2 δ) (1 - vol (M) f (T)) - π A_{M} f (T)

μ_{m i n} d^{2} (x, x^{*}) \leq 2 (U (x) - U (x^{*})) \leq μ_{m a x} d^{2} (x, x^{*})

μ_{m i n} d^{2} (x, x^{*}) \leq 2 (U (x) - U (x^{*})) \leq μ_{m a x} d^{2} (x, x^{*})

f (T, m, ρ) = (2/ π)^{1/2} (μ_{m a x} / T)^{m /2} exp (- U_{ρ} / T)

f (T, m, ρ) = (2/ π)^{1/2} (μ_{m a x} / T)^{m /2} exp (- U_{ρ} / T)

T_{o} = min {T_{o}^{1}, T_{o}^{2}} \mbox w h er e

T_{o} = min {T_{o}^{1}, T_{o}^{2}} \mbox w h er e

\begin{array}[]{ll}T^{1}_{o}=&\inf\left\{T>0\,:\,f(T,n-2,\rho)\,>\,\rho^{2-n}\,A_{n-1}\,\right\}\\[8.5359pt] T^{2}_{o}=&\inf\left\{T>0\,:\,f(T,n+1,\rho)\,>\,\left(\mu_{\scriptscriptstyle\max}/\mu_{\scriptscriptstyle\min}\right)^{n/2}\,C_{n}\right\}\\[5.69046pt] \end{array}

\begin{array}[]{ll}T^{1}_{o}=&\inf\left\{T>0\,:\,f(T,n-2,\rho)\,>\,\rho^{2-n}\,A_{n-1}\,\right\}\\[8.5359pt] T^{2}_{o}=&\inf\left\{T>0\,:\,f(T,n+1,\rho)\,>\,\left(\mu_{\scriptscriptstyle\max}/\mu_{\scriptscriptstyle\min}\right)^{n/2}\,C_{n}\right\}\\[5.69046pt] \end{array}

T_{δ} = min {T_{δ}^{1}, T_{δ}^{2}} - ε

T_{δ} = min {T_{δ}^{1}, T_{δ}^{2}} - ε

\begin{array}[]{ll}T^{1}_{\scriptscriptstyle\delta}=&\inf\left\{T\leq T_{o}\,:\,\sqrt{2\pi}\,(T/\mu_{\scriptscriptstyle\min})^{1/2}\,>\,\delta^{2}\,\left(\mu_{\scriptscriptstyle\min}/\mu_{\scriptscriptstyle\max}\right)^{n/2}\,D_{n}\right\}\\[8.5359pt] T^{2}_{\scriptscriptstyle\delta}=&\inf\left\{T\leq T_{o}\,:\,f(T)\,>\,\mathrm{Ct}(2\delta)\left(\mathrm{Ct}(2\delta)\,\mathrm{vol}(M)+\pi A_{M}\right)^{-1}\,\right\}\\[5.69046pt] \end{array}

\begin{array}[]{ll}T^{1}_{\scriptscriptstyle\delta}=&\inf\left\{T\leq T_{o}\,:\,\sqrt{2\pi}\,(T/\mu_{\scriptscriptstyle\min})^{1/2}\,>\,\delta^{2}\,\left(\mu_{\scriptscriptstyle\min}/\mu_{\scriptscriptstyle\max}\right)^{n/2}\,D_{n}\right\}\\[8.5359pt] T^{2}_{\scriptscriptstyle\delta}=&\inf\left\{T\leq T_{o}\,:\,f(T)\,>\,\mathrm{Ct}(2\delta)\left(\mathrm{Ct}(2\delta)\,\mathrm{vol}(M)+\pi A_{M}\right)^{-1}\,\right\}\\[5.69046pt] \end{array}

\overset{x}{^}_{n - 1} #_{\frac{1}{n}} z_{n} = γ (1/ n)

\overset{x}{^}_{n - 1} #_{\frac{1}{n}} z_{n} = γ (1/ n)

∥ P_{n} - P_{T} ∥_{T V} \leq (1 - p_{T})^{n}

∥ P_{n} - P_{T} ∥_{T V} \leq (1 - p_{T})^{n}

p_{T} \leq (vol M) x, z in f q (x, z) exp (- x sup U (x) / T)

p_{T} \leq (vol M) x, z in f q (x, z) exp (- x sup U (x) / T)

U (x) = - P_{9} (x^{3}) \mbox f or x = (x^{1}, x^{2}, x^{3}) \in S^{2}

U (x) = - P_{9} (x^{3}) \mbox f or x = (x^{1}, x^{2}, x^{3}) \in S^{2}

f_{x} (z) = \frac{1}{2} d^{2} (x, z)

E_{T} (x) = \int_{M} f_{x} (z) P_{T} (d z)

E_{0} (x) = \int_{M} f_{x} (z) δ_{x^{*}} (d z) = \frac{1}{2} d^{2} (x, x^{*})

∣ E_{T} (x) - E_{0} (x) ∣ \leq (diam M) W (P_{T}, δ_{x^{*}})

x \in B (x^{*}, η) in f E_{T} (x) - x \in B (x^{*}, η) in f E_{0} (x) \leq (diam M) W (P_{T}, δ_{x^{*}}) \mbox an d

x \in / B (x^{*}, η) in f E_{0} (x) - x \in / B (x^{*}, η) in f E_{T} (x) \leq (diam M) W (P_{T}, δ_{x^{*}}) \mbox an d

x \in B (x^{*}, η) in f E_{0} (x) = 0 \mbox an d x \in / B (x^{*}, η) in f E_{0} (x) = \frac{η ^{2}}{2}

x \in B (x^{*}, η) in f E_{T} (x) < \frac{η ^{2}}{4} < x \in / B (x^{*}, η) in f E_{T} (x)

μ_{m i n} d^{2} (x, x^{*}) \leq 2 (U (x) - U (x^{*})) \leq μ_{m a x} d^{2} (x, x^{*})

μ_{m i n} d^{2} (x, x^{*}) \leq 2 (U (x) - U (x^{*})) \leq μ_{m a x} d^{2} (x, x^{*})

P_{T}^{ρ} (d z) = \frac{1 _{B_{ρ}} ( z )}{P _{T} ( B _{ρ} )} \cdot P_{T} (d z)

W (P_{T}, δ_{x^{*}}) \leq W (P_{T}, P_{T}^{ρ}) + W (P_{T}^{ρ}, δ_{x^{*}})

W (P_{T}, P_{T}^{ρ}) \leq (diam M \times vol M) \frac{2}{π} (\frac{π}{8})^{n /2} (\frac{μ _{m a x}}{T})^{n /2} exp (- U_{ρ} / T)

W (P_{T}^{ρ}, δ_{x^{*}}) \leq 2 2 π (\frac{π}{2})^{n - 1} B_{n}^{- 1} (\frac{μ _{m a x}}{μ _{m i n}})^{n /2} (\frac{T}{μ _{m i n}})^{1/2}

f (T, n + 1, ρ) \leq (μ_{m a x} / μ_{m i n})^{n /2} C_{n}

(diam M \times vol M) f (T, n + 1, ρ) \leq 2 (2 π)^{n /2} B_{n}^{- 1} (μ_{m a x} / μ_{m i n})^{n /2}

(diam M \times vol M) \frac{1}{π} (\frac{π}{8})^{n /2} f (T, n + 1, ρ) \leq (\frac{π}{2})^{n - 1} B_{n}^{- 1} (μ_{m a x} / μ_{m i n})^{n /2}

P_{T}

K (d z_{1} \times d z_{2}) = P_{T}^{ρ} (d z_{1}) [P_{T} (B_{ρ}) δ_{z_{1}} (d z_{2}) + 1_{B_{ρ}^{c}} (z_{2}) P_{T} (d z_{2})]

W (P_{T}, P_{T}^{ρ}) \leq (diam M) P_{T} (B_{ρ}^{c})

P_{T} (B_{ρ}^{c}) \leq (Z (T))^{- 1} (vol M) exp (- U_{ρ} / T)

Z (T) \geq \frac{π}{2} (\frac{8}{π})^{n /2} (\frac{T}{μ _{m a x}})^{n /2} \mbox f or T \leq T_{o}^{1}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Point processes and geometric inequalities · Geometric Analysis and Curvature Flows

Full text

11institutetext: Laboratoire IMS (CNRS 5218), Université de Bordeaux 11email: [email protected]

22institutetext: Department of Electrical and Electronic Engineering,

The University of Melbourne

22email: [email protected]

The Riemannian barycentre as a

proxy for global optimisation

Salem Said 11

Jonathan H. Manton 22

Abstract

Let $M$ be a simply-connected compact Riemannian symmetric space, and $U$ a twice-differentiable function on $M$ , with unique global minimum at $x^{*}\in M$ . The idea of the present work is to replace the problem of searching for the global minimum of $U$ , by the problem of finding the Riemannian barycentre of the Gibbs distribution $P_{\scriptscriptstyle{T}}\propto\exp(-U/T)$ . In other words, instead of minimising the function $U$ itself, to minimise $\mathcal{E}_{\scriptscriptstyle{T}}(x)=\frac{1}{2}\int d^{\scriptscriptstyle 2}(x,z)P_{\scriptscriptstyle{T}}(dz)$ , where $d(\cdot,\cdot)$ denotes Riemannian distance. The following original result is proved : if $U$ is invariant by geodesic symmetry about $x^{*}$ , then for each $\delta<\frac{1}{2}r_{\scriptscriptstyle cx}$ ( $r_{\scriptscriptstyle cx}$ the convexity radius of $M$ ),

there exists $T_{\scriptscriptstyle\delta}$ such that $T\leq T_{\scriptscriptstyle\delta}$ implies $\mathcal{E}_{\scriptscriptstyle{T}}$ is strongly convex on the geodesic ball $B(x^{*},\delta)\,$ , and $x^{*}$ is the unique global minimum of $\mathcal{E}_{\scriptscriptstyle{T\,}}$ . Moreover, this $T_{\scriptscriptstyle\delta}$ can be computed explicitly. This result gives rise to a general algorithm for black-box optimisation, which is briefly described, and will be further explored in future work.

Keywords:

Riemannian barycentre $\cdot$ black-box optimisation $\cdot$ symmetric space.

It is common knowledge that the Riemannian barycentre $\bar{x}$ , of a probability distribution $P$ defined on a Riemannian manifold $M$ , may fail to be unique. However, if $P$ is supported inside a geodesic ball $B(x^{*},\delta)$ with radius $\delta<\frac{1}{2}r_{\scriptscriptstyle cx}$ ( $r_{\scriptscriptstyle cx}$ the convexity radius of $M$ ), then $\bar{x}$ is unique and also belongs to $B(x^{*},\delta)$ . In fact, Afsari has shown this to be true, even when $\delta<r_{\scriptscriptstyle cx}$ (see [1][2]).

Does this statement continue to hold, if $P$ is not supported inside $B(x^{*},\delta)$ , but merely concentrated on this ball? The answer to this question is positive, assuming that $M$ is a simply-connected compact Riemannian symmetric space, and $P=P_{\scriptscriptstyle{T}}\propto\exp(-U/T)$ , where the function $U$ has unique global minimum at $x^{*}\in M$ . This is given by Proposition 2, in Section 2 below.

Proposition 2 motivates the main idea of the present work : the Riemannian barycentre $\bar{x}_{\scriptscriptstyle T}$ of $P_{\scriptscriptstyle{T}}$ can be used as a proxy for the global minimum $x^{*}$ of $U$ . In general, $\bar{x}_{\scriptscriptstyle T}$ only provides an approximation of $x^{*}$ , but the two are equal if $U$ is invariant by geodesic symmetry about $x^{*}$ , as stated in Proposition 3, in Section 4 below.

The following Section 1 introduces Proposition 1, which estimates the Riemannian distance between $\bar{x}_{\scriptscriptstyle T}$ and $x^{*}\,$ , as a function of $T$ .

1 Concentration of the barycentre

Let $P$ be a probability distribution on a complete Riemannian manifold $M$ . A (Riemannian) barycentre of $P$ is any global minimiser $\bar{x}\in M$ of the function

[TABLE]

The following statement is due to Karcher, and was improved upon by Afsari [1][2] : if $P$ is supported inside a geodesic ball $B(x^{*},\delta)$ , where $x^{*}\in M$ and $\delta<\frac{1}{2}r_{\scriptscriptstyle cx}$ ( $r_{\scriptscriptstyle cx}$ the convexity radius of $M$ ), then $\mathcal{E}$ is strongly convex on $B(x^{*},\delta)$ , and $P$ has a unique barycentre $\bar{x}\in B(x^{*},\delta)$ .

On the other hand, the present work considers a setting where $P$ is not supported inside $B(x^{*},\delta)$ , but merely concentrated on this ball. Precisely, assume $P$ is equal to the Gibbs distribution

[TABLE]

where $Z(T)$ is a normalising constant, $U$ is a $C^{2}$ function with unique global minimum at $x^{*}$ , and $\mathrm{vol}$ is the Riemannian volume of $M$ . Then, let $\mathcal{E}_{\scriptscriptstyle{T}}$ denote the function $\mathcal{E}$ in (1), and let $\bar{x}_{\scriptscriptstyle{T}}$ denote any barycentre of $P_{\scriptscriptstyle{T\,}}$ .

In this new setting, it is not clear whether $\mathcal{E}_{\scriptscriptstyle{T}}$ is differentiable or not. Therefore, statements about convexity of $\mathcal{E}_{\scriptscriptstyle{T}}$ and uniqueness of $\bar{x}_{\scriptscriptstyle{T}}$ are postponed to the following Section 2. For now, it is possible to state the following Proposition 1. In this proposition, $d(\cdot,\cdot)$ denotes Riemannian distance, and $W(\cdot,\cdot)$ denotes the Kantorovich ( $L^{1}$ -Wasserstein) distance [3][4]. Moreover, $(\,\mu_{\scriptscriptstyle\min\,},\mu_{\scriptscriptstyle\max})$ is any open interval which contains the spectrum of the Hessian $\nabla^{2}U(x^{*})$ , considered as a linear mapping of the tangent space $T_{\scriptscriptstyle x^{*}}M$ .

Proposition 1

*assume $M$ is an $n$ -dimensional compact Riemannian manifold with non-negative sectional curvature. Denote $\delta_{\scriptscriptstyle x^{*}}$ the Dirac distribution at $x^{*}$ . The following hold,

(i) for any $\eta>0$ ,*

[TABLE]

(ii) for $T\leq T_{o}$ (which can be computed explicitly)

[TABLE]

where $B_{n}=B(1/2,n/2)$ in terms of the Beta function.

Proposition 1 is motivated by the idea of using $\bar{x}_{\scriptscriptstyle T}$ as an approximation of $x^{*}$ . Intuitively, this requires choosing $T$ so small that $P_{\scriptscriptstyle T}$ is sufficiently close to $\delta_{\scriptscriptstyle x^{*}\,}$ . Just how small a $T$ may be required is indicated by the inequality in (4). This inequality is optimal and explicit, in the following sense.

It is optimal because the dependence on $T^{1/2}$ in its right-hand side cannot be improved. Indeed, by the multi-dimensional Laplace approximation (see [5], for example), the left-hand side is equivalent to $\mathrm{L}\cdot T^{1/2\,}$ (in the limit $T\rightarrow 0$ ). While this constant $\mathrm{L}$ is not tractable, the constants appearing in Inequality (4) depend explicitly on the manifold $M$ and the function $U$ . In fact, this inequality does not follows from the multi-dimensional Laplace approximation, but rather from volume comparison theorems of Riemannian geometry [6].

In spite of these nice properties, Inequality (4) does not escape the curse of dimensionality. Indeed, for fixed $T$ , its right-hand side increases exponentially with the dimension $n$ (note that $B_{n}$ decreases like $n^{\scriptscriptstyle-1/2}$ ). On the other hand, although $T_{o}$ also depends on $n$ , it is typically much less affected by dimensionality, and decreases slower that $n^{-1}$ as $n$ increases.

2 Convexity and uniqueness

Assume now that $M$ is a simply-connected, compact Riemannian symmetric space. In this case, for any $T$ , the function $\mathcal{E}_{\scriptscriptstyle T}$ turns out to be $C^{2}$ throughout $M$ . This results from the following lemma.

Lemma 1

let $M$ be a simply-connected compact Riemannian symmetric space. Let $\gamma:I\rightarrow M$ be a geodesic defined on a compact interval $I$ . Denote $\mathrm{Cut}(\gamma)$ the union of all cut loci $\mathrm{Cut}(\gamma(t))$ for $t\in I$ . Then, the topological dimension of $\mathrm{Cut}(\gamma)$ is strictly less than $n=\dim M$ . In particular, $\mathrm{Cut}(\gamma)$ is a set with volume equal to zero.

Remark : the assumption that $M$ is simply-connected cannot be removed, as the conclusion does not hold if $M$ is a real projective space.

The proof of Lemma 1 uses the structure of Riemannian symmetric spaces, as well as some results from topological dimension theory [7] (Chapter VII). The notion of topological dimension arises because it is possible $\mathrm{Cut}(\gamma)$ is not a manifold. The lemma immediately implies, for all $t$ ,

[TABLE]

Then, since the domain of integration avoids the cut loci of all the $\gamma(t)$ , it becomes possible to differentiate under the integral. This is used in obtaining the following (the assumptions are the same as in Lemma 1).

Corollary 1

for $x\in M$ , let $G_{x}(z)=\nabla f_{z}(x)$ and $H_{x}(z)=\nabla^{2}f_{z}(x)$ , where $f_{z}$ is the function $x\mapsto\frac{1}{2}\,d^{\scriptscriptstyle 2}(x,z)$ . The following integrals converge for any $T$

[TABLE]

and both depend continuously on $x$ . Moreover,

[TABLE]

so that $\mathcal{E}_{\scriptscriptstyle T}$ is $C^{2}$ throughout $M$ .

With Corollary 1 at hand, it is possible to obtain Proposition 2, which is concerned with the convexity of $\mathcal{E}_{\scriptscriptstyle T}$ and uniqueness of $\bar{x}_{\scriptscriptstyle T\,}$ . In this proposition, the following notation is used

[TABLE]

where $U_{\scriptscriptstyle\delta}=\inf\{U(x)-U(x^{*})\,;\,x\notin B(x^{*},\delta)\}$ for positive $\delta$ . The reader may wish to note the fact that $f(T)$ decreases to [math] as $T$ decreases to [math].

Proposition 2

*let $M$ be a simply-connected compact Riemannian symmetric space. Let $\kappa^{2}$ be the maximum sectional curvature of $M$ , and $r_{\scriptscriptstyle cx}=\kappa^{-1}\frac{\pi}{2}$ its convexity radius. If $T\leq T_{o}$ (see (ii) of Proposition 1), then the following hold for any $\delta<\frac{1}{2}r_{\scriptscriptstyle cx}$ .

(i) for all $x$ in the geodesic ball $B(x^{*},\delta)$ ,*

[TABLE]

*where $\mathrm{Ct}(2\delta)=2\kappa\delta\cot(2\kappa\delta)>0$ and $A_{M}>0$ is a constant given by the structure of the symmetric space $M$ .

(ii) there exists $T_{\scriptscriptstyle\delta}$ (which can be computed explicitly), such that $T\leq T_{\scriptscriptstyle\delta}$ implies $\mathcal{E}_{\scriptscriptstyle{T}}$ is strongly convex on $B(x^{*},\delta)\,$ , and has a unique global minimum $\bar{x}_{\scriptscriptstyle T}\in B(x^{*},\delta)$ . In particular, this means $\bar{x}_{\scriptscriptstyle T}$ is the unique barycentre of $P_{\scriptscriptstyle T\,}$ .*

Note that (ii) of Proposition 2 generalises the statement due to Karcher [1], which was recalled in Section 1.

3 Finding $T_{o}$ and $T_{\scriptscriptstyle\delta}$

Propositions 1 and 2 claim that $T_{o}$ and $T_{\scriptscriptstyle\delta}$ can be computed explicitly. This means that, with some knowledge of the Riemannian manifold $M$ and the function $U$ , $T_{o}$ and $T_{\scriptscriptstyle\delta}$ can be found by solving scalar equations. The current section gives the definitions of $T_{o}$ and $T_{\scriptscriptstyle\delta\,}$ .

In the notation of Proposition 1, let $\rho>0$ be small enough, so that,

[TABLE]

whenever $d(x,x^{*})\leq\rho\,$ , and consider the quantity

[TABLE]

where $U_{\scriptscriptstyle\rho}$ is defined as in (6). Note that $f(T,m,\rho)$ decreases to [math] as $T$ decreases to [math], for fixed $m$ and $\rho$ . Now, it is possible to define $T_{o}$ as

[TABLE]

Here, $A_{n}=E|X|^{n}$ for $X\sim N(0,1)$ , and $C_{n}=\omega_{n}\,A_{n}/\!\left(\mathrm{diam}\,M\times\mathrm{vol}\,M\right)$ , where $\omega_{n}$ is the surface area of a unit sphere $S^{n-1\,}$ .

With regard to Proposition 2, define $T_{\scriptscriptstyle\delta}$ as follows,

[TABLE]

for some arbitrary $\varepsilon>0$ . Here, in the notation of (4), (6) and (7),

[TABLE]

where $D_{n}=\,(2/\pi)^{n-1}\,B_{n}/(4\,\mathrm{diam}\,M)$ .

4 Black-box optimisation

Consider the problem of searching for the unique global minimum $x^{*}$ of $U$ . In black-box optimisation, it is only possible to evaluate $U(x)$ for given $x\in M$ , and the cost of this evaluation precludes numerical approximation of derivatives. Then, the problem is to find $x^{*}$ using successive evaluations of $U(x)$ (hopefully, as few of these evaluations as possible).

Here, a new algorithm for solving this problem is described. The idea of this algorithm is to find $\bar{x}_{\scriptscriptstyle T}$ using successive evaluations of $U(x)$ , in the hope that $\bar{x}_{\scriptscriptstyle T}$ will provide a good approximation of $x^{*}$ . While the quality of this approximation is controlled by Inequalities (3) and (4) of Proposition 1, in some cases of interest, $\bar{x}_{\scriptscriptstyle T}$ is exactly equal to $x^{*}$ , for correctly chosen $T$ , as in the following proposition 3.

To state this proposition, let $s_{\scriptscriptstyle{x^{*}}}$ denote geodesic symmetry about $x^{*}$ (see [7]). This is the transformation of $M$ , which leaves $x^{*}$ fixed, and reverses the direction of geodesics passing through $x^{*}$ .

Proposition 3

assume that $U$ is invariant by geodesic symmetry about $x^{*}\,$ , in the sense that $U\circ s_{\scriptscriptstyle{x^{*}}}=U$ . If $T\leq T_{\scriptscriptstyle\delta}$ (see (ii) of Proposition 2), then $\bar{x}_{\scriptscriptstyle T}=x^{*}$ is the unique barycentre of $P_{\scriptscriptstyle T\,}$ .

Proposition 3 follows rather directly from Proposition 2. Precisely, by (ii) of Proposition 2, the condition $T\leq T_{\scriptscriptstyle\delta}$ implies $\mathcal{E}_{\scriptscriptstyle{T}}$ is strongly convex on $B(x^{*},\delta)$ , and $\bar{x}_{\scriptscriptstyle T}\in B(x^{*},\delta)$ . Thus, $\bar{x}_{\scriptscriptstyle T}$ is the unique stationary point of $\mathcal{E}_{\scriptscriptstyle{T}}$ in $B(x^{*},\delta)$ . But, using the fact that $U$ is invariant by geodesic symmetry about $x^{*}\,$ , it is possible to prove that $x^{*}$ is a stationary point of $\mathcal{E}_{\scriptscriptstyle{T\,}}$ , and this implies $\bar{x}_{\scriptscriptstyle T}=x^{*}$ .

The two following examples verify the conditions of Proposition 3.

Example 1 : assume $M=\mathrm{Gr}(k,\mathbb{C}^{n})$ is a complex Grassmann manifold. In particular, $M$ is a simply-connected, compact Riemannian symmetric space. Identify $M$ with the set of Hermitian projectors $x:\mathbb{C}^{n}\rightarrow\mathbb{C}^{n}$ such that $\mathrm{tr}(x)=k$ , where $\mathrm{tr}$ denotes the trace. Then, define $U(x)=-\,\mathrm{tr}(C\,x)$ for $x\in\mathrm{Gr}(k,\mathbb{C}^{n})$ , where $C$ is a Hermitian positive-definite matrix with distinct eigenvalues. Now, the unique global minimum of $U$ occurs at $x^{*}$ , the projector onto the principal

$k$ -subspace of $C$ . Also, the geodesic symmetry $s_{\scriptscriptstyle x^{*}}$ is given by $s_{\scriptscriptstyle x^{*}}\cdot x=r_{\scriptscriptstyle x^{*}}x\,r_{\scriptscriptstyle x^{*}\,}$ , where $r_{\scriptscriptstyle x^{*}}:\mathbb{C}^{n}\rightarrow\mathbb{C}^{n}$ denotes reflection through the image space of $x^{*}$ . It is elementary to verify that $U$ is invariant by this geodesic symmetry. Example 2 : let $M$ be a simply-connected, compact Riemannian symmetric space, and $U_{\scriptscriptstyle o}$ a function on $M$ with unique global minimum at $o\in M$ . Assume moreover that $U_{\scriptscriptstyle o}$ is invariant by geodesic symmetry about $o$ . For each $x^{*}\in M$ , there exists an isometry $g$ of $M$ , such that $x^{*}=g\cdot o$ . Then, $U(x)=U_{\scriptscriptstyle o}(g^{\scriptscriptstyle{-1}}\cdot x)$ has unique global minimum at $x^{*}$ , and is invariant by geodesic symmetry about $x^{*}$ .

Example 1 describes the standard problem of finding the principal subspace of the covariance matrix $C$ . In Example 2, the function $U_{\scriptscriptstyle o}$ is a known template, which undergoes an unknown transformation $g$ , leading to the observed pattern $U$ . This is a typical situation in pattern recognition problems.

Of course, from a mathematical point of view, Example 2 is not really an example, since it describes the completely general setting where the conditions of Proposition 3 are verified. In this setting, consider the following algorithm.

Description of the algorithm :

– input : $T\leq T_{\scriptscriptstyle\delta}$ % to find such $T$ , see Section 3

$Q(x,dz)=q(x,z)\mathrm{vol}(dz)$ % symmetric Markov kernel

$\hat{x}_{\scriptscriptstyle 0}=z_{\scriptscriptstyle 0}\in M$ % initial guess for $x^{*}$

– iterate : for $n=1,2,\ldots$

(1) sample $z_{\scriptscriptstyle n}\sim q(z_{\scriptscriptstyle n-1},z)$

(2) compute $r_{\scriptscriptstyle n}=1-\min\left\{1,\exp\left[\left(U(z_{\scriptscriptstyle n-1})-U(z_{\scriptscriptstyle n})\right)\middle/T\right]\right\}$

(3) reject $z_{\scriptscriptstyle n}$ with probability $r_{\scriptscriptstyle n}$ % then, $z_{\scriptscriptstyle n}=z_{\scriptscriptstyle n-1}$

(4) $\hat{x}_{\scriptscriptstyle n}=\hat{x}_{\scriptscriptstyle n-1}\,\#_{\scriptscriptstyle\frac{1}{n}}\,z_{\scriptscriptstyle n}$ % see definition (10) below

– until : $\hat{x}_{\scriptscriptstyle n}$ does not change sensibly

– output : $\hat{x}_{\scriptscriptstyle n}$ % approximation of $x^{*}$

The above algorithm recursively computes the Riemannian barycentre $\hat{x}_{\scriptscriptstyle n\,}$ of the samples $z_{\scriptscriptstyle n}$ generated by a symmetric Metropolis-Hastings algorithm (see [8]). Here, The Metropolis-Hastings algorithm is implemented in lines (1)--(3). On the other hand, line (4) takes care of the Riemannian barycentre. Precisely, if $\gamma:[0,1]\rightarrow M$ is a length-minimising geodesic connecting $\hat{x}_{\scriptscriptstyle n-1}$ to $z_{\scriptscriptstyle n\,}$ , let

[TABLE]

This geodesic $\gamma$ need not be unique.

The point of using the Metropolis-Hastings algorithm is that the generated $z_{\scriptscriptstyle n}$ eventually sample from the Gibbs distribution $P_{\scriptscriptstyle T\,}$ . The convergence of the distribution $P_{\scriptscriptstyle n}$ of $z_{\scriptscriptstyle n}$ to $P_{\scriptscriptstyle T}$ takes place exponentially fast. Indeed, it may be inferred from [8] (see Theorem 8, Page 36)

[TABLE]

where $\|\cdot\|_{\scriptscriptstyle TV}$ is the total variation norm, and $p_{\,\scriptscriptstyle T}\in(0,1)$ verifies

[TABLE]

so the rate of convergence is degraded when $T$ is small.

Accordingly, the intuitive justification of the above algorithm is the following. Since the $z_{\scriptscriptstyle n}$ eventually sample from the Gibbs distribution $P_{\scriptscriptstyle T\,}$ , and the desired global minimum $x^{*}$ of $U$ is equal to the barycentre $\bar{x}_{\scriptscriptstyle T}$ of $P_{\scriptscriptstyle T\,}$ (by Proposition 3), then the barycentre $\hat{x}_{\scriptscriptstyle n}$ of the $z_{\scriptscriptstyle n}$ is expected to converge to $x^{*}$ .

It should be emphasised that, in the present state of the literature, there is no rigorous result which confirms this convergence $z_{\scriptscriptstyle n}\rightarrow x^{*}\,$ . It is therefore an open problem, to be confronted in future work.

For a basic computer experiment, consider $M=S^{2}\subset\mathbb{R}^{3},$ and let

[TABLE]

where $P_{\scriptscriptstyle 9}$ is the Legendre polynomial of degree $9$ [9]. The unique global minimiser of $U$ is $x^{*}=(0,0,1)$ , and the conditions of Proposition 3 are verified, since $U$ is invariant by reflection in the $x^{\scriptscriptstyle 3}$ axis, which is geodesic symmetry about $x^{*}$ .

Figure 2 shows the dependence of $U(x)$ on $x^{\scriptscriptstyle 3}$ , displaying multiple local minima and maxima. Figure 2 shows the algorithm overcoming these local minima and maxima, and converging to the global minimum $x^{*}=(0,0,1)$ , within $n=5000$ iterations. The experiment was conducted with $T=0.2$ , and the Markov kernel $Q$ obtained from the von Mises-Fisher distribution (see [10]). The initial guess $\hat{x}_{\scriptscriptstyle 0}=(0,0,-1)$ is not shown in Figure 2.

In comparison, a standard simulated annealing method offered less robust performance, which varied considerably with the choice of annealing schedule.

5 Proofs

This section is devoted to the proofs of the results stated in previous sections.

As of now, assume that $U(x^{*})=0$ . There is nos loss of generality in making this assumption.

5.1 Proof of Proposition 1

[TABLE]

Proof of (ii) : let $\rho\leq\min\{\mathrm{inj}\,x^{*},\,\kappa^{-1}\,\frac{\pi}{2}\}$ where $\mathrm{inj}\,x^{*}$ is the injectivity radius of $M$ at $x^{*}$ , and $\kappa^{2}$ is an upper bound on the sectional curvature of $M$ . Assume, in addition, $\rho$ is small enough so

[TABLE]

Proof of second estimate : the Kantorovich distance between $P^{\rho}_{\scriptscriptstyle T}$ and the Dirac distribution $\delta_{\scriptscriptstyle x^{*}}$ is equal to the expectation of the distance to $x^{*}$ , with respect to $P^{\rho}_{\scriptscriptstyle T}$ [4]. Precisely,

[TABLE]

6 Proof of Lemma 1

Denote $G$ the connected component at identity of the group of isometries of $M$ . It will be assumed that $G$ is simply-connected and semisimple [7]. Any geodesic $\gamma:I\rightarrow M$ is of the form [7][11],

[TABLE]

In order to describe the set $\mathrm{Cut}(x)$ , denote $K$ the isotropy group of $x$ in $G$ , and $\mathfrak{k}$ the Lie algebra of $K$ . Let $\mathfrak{g}=\mathfrak{k}+\mathfrak{p}$ be an orthogonal decomposition, with respect to the Killing form of $\mathfrak{g}$ , and let $\mathfrak{a}$ be a maximal Abelian subspace of $\mathfrak{p}$ . Define $\mathcal{S}=K/C_{\mathfrak{a}}$ ( $C_{\mathfrak{a}}$ the centraliser of $\mathfrak{a}$ in $K$ ), and consider the mapping

[TABLE]

The set $\mathrm{Cut}(x)$ is the image under $\phi$ of a certain set $\mathcal{S}\times\partial Q\,$ , which is now described, following [7][12].

Let $\Delta_{+}$ be the set of positive restricted roots associated to the pair $(G,K)$ , (each $\lambda\in\Delta_{+}$ is a linear form $\lambda:\mathfrak{a}\rightarrow\mathbb{R}$ ). Then, let $Q$ be the set of $a\in\mathfrak{a}$ such that $\left|\lambda(a)\right|\leq\pi$ for all $\lambda\in\Delta_{+}\,$ , and $\partial Q$ the boundary of $Q$ . Then

[TABLE]

Recapitulating (17b) and (17d),

[TABLE]

Lemma 1 states that the topological dimension of $\mathrm{Cut}(\gamma)$ is strictly less than $\dim\,M$ . This is proved using results from topological dimension theory [7][13].

Note that both $I$ and $\mathcal{S}$ are compact. Indeed, $\mathcal{S}$ is compact since it is the continuous image of the compact group $K$ under the projection $K\rightarrow K/C_{\mathfrak{a}}$ . Also, $\partial Q$ is compact in $\mathfrak{a}$ , and $\partial Q=\cup_{\lambda}\,\partial Q_{\lambda}$ where $\partial Q_{\lambda}=\partial Q\cap\{\lambda(a)=\pm\,\pi\}$ for $\lambda\in\Delta_{+}\,$ . Since $\{\lambda(a)=\pm\,\pi\}$ is the union of two (closed) hyperplanes in $\mathfrak{a}$ , $\partial Q_{\lambda}$ is compact. Now, each $I\times\mathcal{S}\times\partial Q_{\lambda}$ is compact, and therefore closed. It follows from (17e) that (see [13], Page 30),

[TABLE]

But, for each $\lambda$ ,

[TABLE]

where $\mathcal{S}_{\lambda}=K/C_{\lambda}$ ( $C_{\lambda}$ the centraliser of $\{\lambda(a)=\pm\,\pi\}$ in $K$ ). The above inclusion implies (by [13], Page 26),

[TABLE]

To conclude, note that the set $\mathbb{R}\times\mathcal{S}_{\lambda}\times\{\lambda(a)=\pm\,\pi\}$ is a differentiable manifold. It follows that (see [7], Page 345),

[TABLE]

The right-hand side of this inequality is

[TABLE]

since the dimension of a hyperplane in $\mathfrak{a}$ is $\dim\,\mathfrak{a}-1$ . In addition, according to [7] (Page 296), $\dim\,\mathcal{S}_{\lambda}<\dim\,\mathcal{S}$ . Thus,

[TABLE]

since $\dim\,M=\dim\,\mathcal{S}+\dim\,\mathfrak{a}$ [7]. Replacing this into (17h), it follows from (17f) and (17g) that $\dim\,\mathrm{Cut}(\gamma)<\dim\,M$ , as required. $\blacksquare$

7 Proof of Corollary 1

The corollary can be split into the two following claims, which will be proved separately.

First claim : both integrals $G_{x}$ and $H_{x}$ converge for any value of $T$ .

Second claim : $\mathcal{E}_{\scriptscriptstyle T}$ is $C^{2}$ throughout $M$ , with derivatives given by (5).

The fact that $G_{x}$ and $H_{x}$ depend continuously on $x$ is contained in the second claim, since (5) states that $G_{x}$ and $H_{x}$ are the gradient and Hessian of $\mathcal{E}_{\scriptscriptstyle T}$ at $x$ .

In the following proofs, the notation $\mathrm{D}(x)=M-\mathrm{Cut}(x)$ will be used, in order to avoid cumbersome expressions.

Proof of first claim : The convergence of the integral $G_{x}$ is straightforward, since the integrand $G_{x}(z)$ is a smooth and bounded function, from $\mathrm{D}(x)$ to $T_{x}M$ . This is because, by definition, $G_{x}(z)$ is given by

[TABLE]

The convergence of the integral $H_{x}$ is more difficult. While the integrand $H_{x}(z)$ is smooth on $\mathrm{D}(x)$ , it is not bounded. It will be seen that $H_{x}$ is an absolutely convergent improper integral.

Recall the mapping $\phi$ defined in (17c). Let $D_{+}$ be the set of points $a\in\mathfrak{a}$ which belong to the interior of $Q$ , and which verify $\lambda(a)\geq 0$ for each $\lambda\in\Delta_{+}\,$ . Let ${D}^{\scriptscriptstyle o}_{+}$ be the interior of $D_{+}\,$ . Then, $\phi$ maps $\mathcal{S}\times D_{+}$ onto $\mathrm{D}(x)$ , and is a diffeomorphism of $\mathcal{S}\times{D}^{\scriptscriptstyle o}_{+}$ onto its image in $\mathrm{D}(x)$ [7][12] (see Chapter VII in [7]). Using Sard’s theorem [14], it follows from the definition of $H_{x}$ that

[TABLE]

where $p_{\scriptscriptstyle T}$ denotes the density of $P_{\scriptscriptstyle T}$ with respect to the Riemannian volume of $M$ , and $J(a)$ is the Jacobian determinant of $\phi\,$ , given by [7]

[TABLE]

with $m_{\lambda}$ the multiplicity of the restricted root $\lambda$ , and where $\omega(ds)$ is the invariant Riemannian volume induced on $\mathcal{S}$ from $K$ .

Now, $H_{x}(\phi(s,a))$ can be expressed as follows ( $\cot$ is the cotangent function)

[TABLE]

where $\Pi_{0}(s)$ and the $\Pi_{\lambda}(s)$ denote orthogonal projectors, onto the respective eigenspaces of $H_{x}(\phi(s,a))$ .

According to this expression, $H_{x}(\phi(s,a))$ diverges to $-\infty$ whenever $\lambda(a)=\pi$ . However, the product

[TABLE]

which appears under the integral in (19a), is clearly continuous and bounded on the domain of integration. Thus, the absolute convergence of the integral $H_{x}$ follows immediately from (19a).

It now remains to provide a proof of (19c). This is here only briefly indicated. Expression (19c) is a slight improvement of the one in [15] (see Theorem IV.1, Page 636), where it is enough to note that if $R$ is the curvature tensor of $M$ , then the operator $R_{v}(u)=R(v,u)v$ has the eigenvalues [math] and $(\lambda(a))^{2}$ for each $\lambda\in\Delta_{+\,}$ , whenever $v,u\in T_{x}M\simeq\mathfrak{p}$ with $v=\mathrm{Ad}(s)\,a$ [7][12]. It is well-known, by properties of the Jacobi equation [6], that $H_{x}(\phi(s,a))$ has the same eigenspace decomposition as $R_{v}$ , in this case. $\blacksquare$ Proof of second claim : the proof of this claim relies in a crucial way on Lemma 1. To compute the gradient and Hessian of the function $\mathcal{E}_{\scriptscriptstyle T}$ at $x\in M$ , consider any geodesic $\gamma:I\rightarrow M$ , defined on a compact interval $I=[-\tau,\tau]$ , such that $\gamma(0)=x$ . For each $t\in I$ , by definition of the function $\mathcal{E}_{\scriptscriptstyle T\,}$ ,

[TABLE]

Then, consider the integral $H_{x}=H_{\gamma(0)}$ , and recall Formulae (19a) and (19c). Each $z\in\mathrm{D}(\gamma)$ can be written under the form $z=\phi(s,a)$ where $(s,a)\in\mathcal{S}\times D_{+\,}$ . Accordingly, it follows from (19c) that

[TABLE]

where $\|\cdot\|_{F}$ is the Frobenius norm with respect to the Riemannian metric of $M$ , and $\kappa\in\Delta_{+}$ is the highest restricted root [7] ( $\kappa(a)\geq\lambda(a)$ for $\lambda\in\Delta_{+\,}$ , $a\in D_{+}$ ).

The required uniform integrability is equivalent to the statement that

[TABLE]

where the rate of convergence to this limit does not depend on $x$ . But, according to (20d), if $K>1$ , there exists $\epsilon>0$ such that

[TABLE]

and $\epsilon\rightarrow 0$ as $K\rightarrow\infty$ . In this case, the integral in (20e) is less than

[TABLE]

Now, using the same integral formula as in (19a), this last integral is equal to

[TABLE]

In view of (19b), since $\kappa\in\Delta_{+\,}$ , the function in square brackets is bounded on the closure of $D_{+}$ . In fact [7], its supremum is $\kappa^{2}=(\kappa,\kappa)$ where $(\cdot,\cdot)$ is the scalar product induced on $\mathfrak{a}^{*}$ (the dual space of $\mathfrak{a})$ by the Killing form of $\mathfrak{g}$ . Finally, by (20f), the integral in (20e) is less than

[TABLE]

Since, $\kappa(a)\in[0,\pi)$ for $a\in D_{+}\,$ , this last integral converges to [math] as $\epsilon\rightarrow 0$ , at a rate which does not depend on $x$ . This proves the required uniform integrability, so the proof is now complete. $\blacksquare$

8 Proof of Proposition 2

[TABLE]

Remark : in the statement of Proposition 2, the notation $\kappa^{2}$ is used for the maximum sectional curvature of $M$ . In the previous proof of Corollary 1, the same notation $\kappa^{2}$ was used for the squared norm of the highest restricted root. This is not an abuse of notation, since the two quantities are in fact equal [7] (see Page 334).

Proof of (i) : let $x\in B(x^{*},\delta)$ . By (5) of Corollary 1, $\nabla^{2}\mathcal{E}_{\scriptscriptstyle T}(x)$ is equal to $H_{x}\,$ . To obtain (7), decompose $H_{x}$ into two integrals

[TABLE]

This is possible since $B(x,r_{cx})\subset\mathrm{D}(x)$ , where $\mathrm{D}(x)=M-\mathrm{Cut}(x)$ . The first integral in (21a) will be denoted $I_{\scriptscriptstyle 1\,}$ , and the second integral $I_{\scriptscriptstyle 2\,}$ .

With regard to $I_{\scriptscriptstyle 1\,}$ , note the inclusions $B(x^{*},\delta)\subset B(x,2\delta)\subset B(x,r_{cx})$ , which follow from the triangle inequality. In addition, note that $H_{x}(z)\geq 0$ (in the Loewner order [16]), for $z\in B(x,r_{cx})$ . Therefore,

[TABLE]

However, from (19c) and the definition of $\kappa\in\Delta_{+}\,$ ,

[TABLE]

for $z=\phi(s,a)\in\mathrm{D}(x)$ . Using the Cauchy-Scwharz inequality, $\kappa(a)\leq\kappa\,\|a\|$ . Moreover, (17c) implies $\|a\|=d(x,z)$ , since $\mathrm{Ad}(s)$ is an isometry. Accordingly, if $z\in B(x,2\delta)$ , it follows from (21c)

[TABLE]

where the last inequality is because $2\delta<r_{cx}=\kappa^{-1}\frac{\pi}{2\,}$ . Replacing in (21b) gives

[TABLE]

Finally, (15c) and (15d) imply that $P_{\scriptscriptstyle T}(B^{c}(x^{*},\delta))\leq\mathrm{vol}(M)\,f(T)$ , where $f(T)$ was defined in (6) – Precisely, this follows after replacing $\rho$ by $\delta$ in (15c). Thus,

[TABLE]

The proof of (7) will be completed by showing

[TABLE]

Proof of (ii) : fix $\delta<\frac{1}{2}r_{cx\,}$ , and let $T_{\scriptscriptstyle\delta}$ be given by (9). If $T\leq T_{\scriptscriptstyle\delta\,}$ , then $T<T^{2}_{\scriptscriptstyle\delta\,}$ , so the definition of $T^{2}_{\scriptscriptstyle\delta}$ implies

[TABLE]

By (i) of Proposition 1, to prove that $\bar{x}_{\scriptscriptstyle T}\in B(x^{*},\delta)$ , it is enough to prove

[TABLE]

However, if $T\leq T_{\scriptscriptstyle\delta\,}$ , then $T<T_{o}\,$ . Therefore, by (ii) of Proposition 1, $W(P_{\scriptscriptstyle{T\,}},\delta_{\scriptscriptstyle x^{*}})$ satisfies inequality (4). Furthermore, because $T<T^{1}_{\scriptscriptstyle\delta\,}$ , it follows from the definition of $T^{1}_{\scriptscriptstyle\delta}$ that

[TABLE]

or, by replacing the expression of $D_{n}\,$ , and simplifying

[TABLE]

Thus, (23c) follows from (4) and (23d). This proves that $\bar{x}_{\scriptscriptstyle T}$ belongs to $B(x^{*},\delta)$ , and therefore $\bar{x}_{\scriptscriptstyle T}$ is the unique global minimum of $\mathcal{E}_{\scriptscriptstyle T\,}$ . But this is equivalent to saying that $\bar{x}_{\scriptscriptstyle T}$ is the unique barycentre of $P_{\scriptscriptstyle T\,}$ . $\blacksquare$

9 Proof of Proposition 3

fix $\delta<\frac{1}{2}r_{cx\,}$ , and let $T_{\scriptscriptstyle\delta}$ be given by (9). By (ii) of Proposition 2, if $T\leq T_{\scriptscriptstyle\delta\,}$ , then $\mathcal{E}_{\scriptscriptstyle T}$ is strictly convex on $B(x^{*},\delta)$ , with unique global minimum $\bar{x}_{\scriptscriptstyle T}\in B(x^{*},\delta)\,$ . By definition, this unique global minimum $\bar{x}_{\scriptscriptstyle T}$ is the unique barycentre of $P_{\scriptscriptstyle T\,}$ .

Accordingly, to prove that $\bar{x}_{\scriptscriptstyle T}=x^{*}$ , it is enough to prove that $x^{*}$ is a stationary point of $\mathcal{E}_{\scriptscriptstyle T\,}$ . Indeed, as $\mathcal{E}_{\scriptscriptstyle T}$ is strictly convex on $B(x^{*},\delta)$ , it can have only one stationary point in $B(x^{*},\delta)\,$ . This stationary point is then identical to $\bar{x}_{\scriptscriptstyle T\,}$ .

The fact that $x^{*}$ is a stationary point of $\mathcal{E}_{\scriptscriptstyle T}$ will follow because $U$ is invariant by geodesic symmetry about $x^{*}$ . This invariance will be seen to imply

[TABLE]

To obtain (24a), it is possible to write, from the definition of $G_{x^{*}}$ ,

[TABLE]

where $\mathrm{D}(x)=M-\mathrm{Cut}(x)$ . From (18), since $s_{\scriptscriptstyle x^{*}}$ is an isometry, and reverses geodesics passing through $x^{*}$ ,

[TABLE]

Replacing this into (24b), and using $w=s_{\scriptscriptstyle x^{*}}(z)$ as a new variable of integration, it follows that

[TABLE]

because $s^{\scriptscriptstyle-1}_{\scriptscriptstyle x^{*}}=s_{\scriptscriptstyle x^{*}}$ and $s_{\scriptscriptstyle x^{*}}$ maps $\mathrm{D}(x)$ onto itself. Now, note that $P_{\scriptscriptstyle T}\circ s_{\scriptscriptstyle x^{*}}=P_{\scriptscriptstyle T\,}$ . This is clear, since from (2),

[TABLE]

However, by assumption, $U\circ s_{\scriptscriptstyle x^{*}}(w)=U(w)$ . Moreover, since $s_{\scriptscriptstyle x^{*}}$ is an isometry, it preserves Riemannian volume, so $\left(\mathrm{vol}\circ s_{\scriptscriptstyle x^{*}}\right)(dw)=\mathrm{vol}(dw)$ . Thus, (24c) reads

[TABLE]

By definition, the right-hand side is $G_{x^{*}}$ , so (24a) is obtained. $\blacksquare$

Bibliography16

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Karcher, H.: Riemannian centre of mass and mollifier smoothing. Comm. Pure. Appl. Math. 30 (5), 509–541 (1977).
2[2] Afsari, B.: Riemannian L p superscript 𝐿 𝑝 L^{p} center of Mass : existence, uniqueness, and convexity. Proc. Am. Math. Soc. 139 (2), 655–673 (2010).
3[3] Kantorovich, L.V., Akilov, G.P. : Functional Analysis (Second Edition). Pergamon Press, Oxford (1982).
4[4] Villani, C.: Optimal transport, old and new. 2nd edn. Springer-Verlag, Berlin-Heidelberg (2009).
5[5] Wong, R.: Asymptotic approximations of Integrals. Society for Industrial and Applied Mathematics (2001).
6[6] Chavel, I.: Riemannian Geometry, a modern introduction. Cambridge University Press, Cambridge (2006).
7[7] Helgason, S.: Differential geometry, Lie groups, and symmetric spaces. American Mathematical Society (1978).
8[8] Roberts, G. O., Rosenthal, J. S.: General state space Markov chains and MCMC algorithms. Probab. Surveys. 1 , 20–71 (2004).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

The Riemannian barycentre as a

Abstract

Keywords:

1 Concentration of the barycentre

Proposition 1

2 Convexity and uniqueness

Lemma 1

Corollary 1

Proposition 2

3 Finding ToT_{o}To​ and TδT_{\scriptscriptstyle\delta}Tδ​

4 Black-box optimisation

Proposition 3

5 Proofs

5.1 Proof of Proposition 1

6 Proof of Lemma 1

7 Proof of Corollary 1

8 Proof of Proposition 2

9 Proof of Proposition 3

3 Finding $T_{o}$ and $T_{\scriptscriptstyle\delta}$