Comparison of Lasserre's measure--based bounds for polynomial   optimization to bounds obtained by simulated annealing

Etienne de Klerk; Monique Laurent

arXiv:1703.00744·math.OC·March 3, 2017·Math. Oper. Res.

Comparison of Lasserre's measure--based bounds for polynomial optimization to bounds obtained by simulated annealing

Etienne de Klerk, Monique Laurent

PDF

Open Access

TL;DR

This paper compares Lasserre's measure-based bounds for polynomial optimization with bounds from simulated annealing, showing that Lasserre's hierarchy converges faster for polynomial functions over convex sets.

Contribution

The paper demonstrates that for polynomial functions over convex sets, Lasserre's hierarchy provides a faster convergence rate than previously established, compared to simulated annealing bounds.

Findings

01

Lasserre's bounds outperform simulated annealing in convergence speed.

02

Faster convergence rate established for polynomial optimization over convex bodies.

03

Comparison highlights advantages of measure-based bounds over stochastic methods.

Abstract

Comparison of Lasserre's measure--based bounds for polynomial optimization to bounds obtained by simulated annealing. We consider the problem of minimizing a continuous function $f$ over a compact set $K$ . We compare the hierarchy of upper bounds proposed by Lasserre in [{\em SIAM J. Optim.} $21 (3)$ $(2011)$ , pp. $864 - 885$ ] to bounds that may be obtained from simulated annealing. We show that, when $f$ is a polynomial and $K$ a convex body, this comparison yields a faster rate of convergence of the Lasserre hierarchy than what was previously known in the literature.

Tables2

Table 1. Table 1: Test functions, all with n = 2 𝑛 2 n=2 , domain 𝐊 = [ − 1 , 1 ] 2 𝐊 superscript 1 1 2 {\mathbf{K}}=[-1,1]^{2} , and minimum f min , 𝐊 = 0 subscript 𝑓 𝐊 0 f_{\min,{\mathbf{K}}}=0 .

Name	$f (x)$	${\hat{f}}_{\max}$	$d$	Convex?
Booth function	${(10 x_{1} + 20 x_{2} - 7)}^{2} + {(20 x_{1} + 10 x_{2} - 5)}^{2}$	$2594$	$2$	yes
Matyas function	$26 (x_{1}^{2} + x_{2}^{2}) - 48 x_{1} x_{2}$	$100$	$2$	yes
Motzkin polynomial	$64 (x_{1}^{4} x_{2}^{2} + x_{1}^{2} x_{2}^{4}) - 48 x_{1}^{2} x_{2}^{2} + 1$	$81$	$6$	no
Three-Hump Camel function	$\frac{5^{6}}{6} x_{1}^{6} - 5^{4} \cdot 1.05 x_{1}^{4} + 50 x_{1}^{2} + 25 x_{1} x_{2} + 25 x_{2}^{2}$	$2048$	$6$	no

Table 2. Table 2: Comparison of the upper bounds S A ( r ) 𝑆 superscript 𝐴 𝑟 SA^{(r)} and f ¯ 𝐊 ( r ) superscript subscript ¯ 𝑓 𝐊 𝑟 \underline{f}_{\mathbf{K}}^{(r)} for the test functions.

$r$	Booth Function		Matyas Function		Three–Hump Camel Function		Motzkin Polynomial
$r$	${\underline{f}}_{𝐊}^{(r)}$	$S A^{(r)}$	${\underline{f}}_{𝐊}^{(r)}$	$S A^{(r)}$	${\underline{f}}_{𝐊}^{(r)}$	$S A^{(r)}$	${\underline{f}}_{𝐊}^{(r)}$	$S A^{(r)}$
$3$	118.383	367.834	4.2817	15.4212	29.0005	247.462	1.0614	4.0250
$4$	97.6473	356.113	3.8942	14.8521	9.5806	241.700	0.8294	3.9697
$5$	69.8174	345.043	3.6894	14.3143	9.5806	236.102	0.8010	3.9157
$6$	63.5454	334.585	2.9956	13.8062	4.4398	230.663	0.8010	3.8631
$7$	47.0467	324.701	2.5469	13.3262	4.4398	225.381	0.7088	3.8118
$8$	41.6727	315.354	2.0430	12.8726	2.5503	220.251	0.5655	3.7618
$9$	34.2140	306.510	1.8335	12.4441	2.5503	215.269	0.5655	3.7130
$10$	28.7248	298.138	1.4784	12.0390	1.7127	210.431	0.5078	3.6654
$11$	25.6050	290.206	1.3764	11.6560	1.7127	205.734	0.4060	3.6190
$12$	21.1869	282.687	1.1178	11.2938	1.2775	201.173	0.4060	3.5737
$13$	19.5588	275.554	1.0686	10.9511	1.2775	196.745	0.3759	3.5296
$14$	16.5854	268.782	0.8742	10.6267	1.0185	192.446	0.3004	3.4865
$15$	15.2815	262.348	0.8524	10.3195	1.0185	188.272	0.3004	3.4444
$16$	13.4626	256.230	0.7020	10.0284	0.8434	184.220	0.2819	3.4034
$17$	12.2075	250.408	0.6952	9.75250	0.8434	180.287	0.2300	3.3633
$18$	11.0959	244.863	0.5760	9.49071	0.7113	176.469	0.2300	3.3242
$19$	9.9938	239.577	0.5760	9.24220	0.7113	172.762	0.2185	3.2860
$20$	9.2373	234.534	0.4815	9.00615	0.6064	169.164	0.1817	3.2487

Equations99

f_{m i n, K} := x \in K min f (x) .

f_{m i n, K} := x \in K min f (x) .

f_{\min,\mathbf{K}}=\inf_{h\in\Sigma[x]}\int_{\mathbf{K}}h(x)f(x)dx\ \ \mbox{s.t. $\int_{\mathbf{K}}h(x)dx=1$.}

f_{\min,\mathbf{K}}=\inf_{h\in\Sigma[x]}\int_{\mathbf{K}}h(x)f(x)dx\ \ \mbox{s.t. $\int_{\mathbf{K}}h(x)dx=1$.}

\displaystyle\underline{f}^{(r)}_{\mathbf{K}}:=\inf_{h\in\Sigma[x]_{r}}\int_{\mathbf{K}}h(x)f(x)dx\ \ \mbox{s.t. $\int_{\mathbf{K}}h(x)dx=1$.}

\displaystyle\underline{f}^{(r)}_{\mathbf{K}}:=\inf_{h\in\Sigma[x]_{r}}\int_{\mathbf{K}}h(x)f(x)dx\ \ \mbox{s.t. $\int_{\mathbf{K}}h(x)dx=1$.}

\underline{f}_{K}^{(r)} - f_{m i n, K} \leq \frac{C _{f, K}}{r} for all r \geq r_{K} .

\underline{f}_{K}^{(r)} - f_{m i n, K} \leq \frac{C _{f, K}}{r} for all r \geq r_{K} .

m_{α} (K) := \int_{K} x^{α} d x \mbox f or α \in N^{n} .

m_{α} (K) := \int_{K} x^{α} d x \mbox f or α \in N^{n} .

\underline{f}_{K}^{(r)}

\underline{f}_{K}^{(r)}

A x = λ B x (x \neq = 0),

A x = λ B x (x \neq = 0),

A_{α, β} = δ \in N (n, d) \sum f_{δ} \int_{K} x^{α + β + δ} d x, B_{α, β} = \int_{K} x^{α + β} d x α, β \in N (n, r) .

A_{α, β} = δ \in N (n, d) \sum f_{δ} \int_{K} x^{α + β + δ} d x, B_{α, β} = \int_{K} x^{α + β} d x α, β \in N (n, r) .

P_{f} (x) := \frac{e x p ( - f ( x ))}{\int _{K} e x p ( - f ( x ^{'} )) d x ^{'}} .

P_{f} (x) := \frac{e x p ( - f ( x ))}{\int _{K} e x p ( - f ( x ^{'} )) d x ^{'}} .

f_{m i n, K} \leq E_{X \sim P_{f / t}} [f (X)] .

f_{m i n, K} \leq E_{X \sim P_{f / t}} [f (X)] .

t ↓ 0 lim E_{X \sim P_{f / t}} [f (X)] = f_{m i n, K} .

t ↓ 0 lim E_{X \sim P_{f / t}} [f (X)] = f_{m i n, K} .

f (x_{1}, x_{2}) = 64 (x_{1}^{4} x_{2}^{2} + x_{1}^{2} x_{2}^{4}) - 48 x_{1}^{2} x_{2}^{2} + 1

f (x_{1}, x_{2}) = 64 (x_{1}^{4} x_{2}^{2} + x_{1}^{2} x_{2}^{4}) - 48 x_{1}^{2} x_{2}^{2} + 1

E_{X \sim P_{f / t}} [f (X)] - x \in K min f (x) \leq n t .

E_{X \sim P_{f / t}} [f (X)] - x \in K min f (x) \leq n t .

E_{X \sim P_{f / t}} [f (X)] - x \in K min f (x) \leq n t .

E_{X \sim P_{f / t}} [f (X)] - x \in K min f (x) \leq n t .

E_{K} := E_{X \sim P_{f / t}} [f (X)] = \frac{\int _{K} f ( x ) e ^{- f (x) / t} d x}{\int _{K} e ^{\frac{- f ( x )}{t}} d x} .

E_{K} := E_{X \sim P_{f / t}} [f (X)] = \frac{\int _{K} f ( x ) e ^{- f (x) / t} d x}{\int _{K} e ^{\frac{- f ( x )}{t}} d x} .

f_{m i n, K} = x \in K min f (x) \leq E_{K} .

f_{m i n, K} = x \in K min f (x) \leq E_{K} .

K := {(x, x_{n + 1}) \in R^{n + 1} : x \in K, f (x) \leq x_{n + 1} \leq E_{K}} .

K := {(x, x_{n + 1}) \in R^{n + 1} : x \in K, f (x) \leq x_{n + 1} \leq E_{K}} .

x \in K min f (x) = (x, x_{n + 1}) \in K min x_{n + 1} .

x \in K min f (x) = (x, x_{n + 1}) \in K min x_{n + 1} .

E_{K} := \frac{\int _{K} x _{n + 1} e ^{- x_{n + 1} / t} d x _{n + 1} d x}{\int _{K} e ^{- x_{n + 1} / t} d x _{n + 1} d x} .

E_{K} := \frac{\int _{K} x _{n + 1} e ^{- x_{n + 1} / t} d x _{n + 1} d x}{\int _{K} e ^{- x_{n + 1} / t} d x _{n + 1} d x} .

E_{K} = E_{K} + t .

E_{K} = E_{K} + t .

N_{K} := \int_{K} f (x) e^{- f (x) / t} d x, D_{K} := \int_{K} e^{- f (x) / t} d x,

N_{K} := \int_{K} f (x) e^{- f (x) / t} d x, D_{K} := \int_{K} e^{- f (x) / t} d x,

N_{K} := \int_{K} x_{n + 1} e^{- x_{n + 1} / t} d x_{n + 1} d x, D_{K} := \int_{K} e^{- x_{n + 1} / t} d x_{n + 1} d x .

N_{K} := \int_{K} x_{n + 1} e^{- x_{n + 1} / t} d x_{n + 1} d x, D_{K} := \int_{K} e^{- x_{n + 1} / t} d x_{n + 1} d x .

D_{K} = \int_{K} (\int_{f (x)}^{E_{K}} e^{- x_{n + 1} / t} d x_{n + 1}) d x = \int_{K} (t e^{- f (x) / t} - t e^{- E_{K} / t}) d x = t D_{K} - t e^{- E_{K} / t} vol (K),

D_{K} = \int_{K} (\int_{f (x)}^{E_{K}} e^{- x_{n + 1} / t} d x_{n + 1}) d x = \int_{K} (t e^{- f (x) / t} - t e^{- E_{K} / t}) d x = t D_{K} - t e^{- E_{K} / t} vol (K),

N_{K}

N_{K}

\frac{N _{K}}{D _{K}} = t + \frac{N _{K} - E _{K} e ^{- E_{K} / t} vol ( K )}{D _{K} - e ^{- E_{K} / t} vol ( K )} = t + \frac{N _{K}}{D _{K}},

\frac{N _{K}}{D _{K}} = t + \frac{N _{K} - E _{K} e ^{- E_{K} / t} vol ( K )}{D _{K} - e ^{- E_{K} / t} vol ( K )} = t + \frac{N _{K}}{D _{K}},

E_{X \sim P_{f / t}} [f (X)] - x \in K min f (x) = E_{K} - x \in K min f (x) = (E_{K} - (x, x_{n + 1}) \in K min x_{n + 1}) + (E_{K} - E_{K}) \leq t (n + 1) - t = t n .

E_{X \sim P_{f / t}} [f (X)] - x \in K min f (x) = E_{K} - x \in K min f (x) = (E_{K} - (x, x_{n + 1}) \in K min x_{n + 1}) + (E_{K} - E_{K}) \leq t (n + 1) - t = t n .

E_{X \sim P_{f / t}} [f (X)] - x \in K min f (x) = \frac{\int _{0}^{1} x e ^{- x / t} d x}{\int _{0}^{ℓ} e ^{- x / t} d x} - 0 = t - \frac{e ^{- 1/ t}}{1 - e ^{- 1/ t}} \sim t \mbox f or s ma l l t .

E_{X \sim P_{f / t}} [f (X)] - x \in K min f (x) = \frac{\int _{0}^{1} x e ^{- x / t} d x}{\int _{0}^{ℓ} e ^{- x / t} d x} - 0 = t - \frac{e ^{- 1/ t}}{1 - e ^{- 1/ t}} \sim t \mbox f or s ma l l t .

\underline{f}_{K}^{(r d)} \leq E_{X \sim P_{f / t}} [f (X)] + \frac{f _{m a x}}{2 ^{r}} \mbox f or an y in t e g er r \geq \frac{e \cdot f _{m a x}}{t} \mbox an d an y t > 0.

\underline{f}_{K}^{(r d)} \leq E_{X \sim P_{f / t}} [f (X)] + \frac{f _{m a x}}{2 ^{r}} \mbox f or an y in t e g er r \geq \frac{e \cdot f _{m a x}}{t} \mbox an d an y t > 0.

\underline{f}_{K}^{(r d)} - x \in K min f (x) \leq \frac{c}{r},

\underline{f}_{K}^{(r d)} - x \in K min f (x) \leq \frac{c}{r},

\underline{f}_{K}^{(r d)} - x \in K min f (x)

\underline{f}_{K}^{(r d)} - x \in K min f (x)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Advanced Optimization Algorithms Research · Markov Chains and Monte Carlo Methods

Full text

Comparison of Lasserre’s measure–based bounds for polynomial optimization to bounds obtained by simulated annealing

Etienne de Klerk Tilburg University and Delft University of Technology, [email protected]

Monique Laurent Centrum Wiskunde & Informatica (CWI), Amsterdam and Tilburg University, [email protected]

Abstract

We consider the problem of minimizing a continuous function $f$ over a compact set ${\mathbf{K}}$ . We compare the hierarchy of upper bounds proposed by Lasserre in [SIAM J. Optim. $21(3)$ $(2011)$ , pp. $864-885$ ] to bounds that may be obtained from simulated annealing.

We show that, when $f$ is a polynomial and ${\mathbf{K}}$ a convex body, this comparison yields a faster rate of convergence of the Lasserre hierarchy than what was previously known in the literature.

Keywords: Polynomial optimization; Semidefinite optimization; Lasserre hierarchy; simulated annealing

AMS classification: 90C22; 90C26; 90C30

1 Introduction

We consider the problem of minimizing a continuous function $f:{\mathbb{R}}^{n}\to{\mathbb{R}}$ over a compact set $\mathbf{K}\subseteq{\mathbb{R}}^{n}$ . That is, we consider the problem of computing the parameter:

[TABLE]

Our goal is to compare two convergent hierarchies of upper bounds on $f_{\min,\mathbf{K}}$ , namely measure-based bounds introduced by Lasserre [10], and simulated annealing bounds, as studied by Kalai and Vempala [6]. The bounds of Lasserre are obtained by minimizing over measures on ${\mathbf{K}}$ with sum-of-squares polynomial density functions with growing degrees, while simulated annealing bounds use Boltzman distributions on ${\mathbf{K}}$ with decreasing temparature parameters.

In this note we establish a relationship between these two approaches, linking the degree and temperature parameters in the two bounds (see Theorem 4.1 for a precise statement). As an application, when $f$ is a polynomial and $K$ is a convex body, we can show a faster convergence rate for the measure-based bounds of Lasserre. The new convergence rate is in $O(1/r)$ (see Corollary 4.3), where $2r$ is the degree of the sum-of-squares polynomial density function, while the dependence was in $O(1/\sqrt{r})$ in the previously best known result from [4].

Polynomial optimization is a very active research area in the recent years since the seminal works of Lasserre [8] and Parrilo [13] (see also, e.g., the book [9] and the survey [11]). In particular, hierarchies of (lower and upper) bounds for the parameter $f_{\min,\mathbf{K}}$ have been proposed, based on sum-of-squares polynomials and semidefinite programming.

For a general compact set ${\mathbf{K}}$ , upper bounds for $f_{\min,\mathbf{K}}$ have been introduced by Lasserre [10], obtained by searching for a sum-of-squares polynomial density function of given maximum degree $2r$ , so as to minimize the integration of $f$ with respect to the corresponding probability measure on ${\mathbf{K}}$ . When $f$ is Lipschitz continuous and under some mild assumption on ${\mathbf{K}}$ (which holds, e.g., when ${\mathbf{K}}$ is a convex body), estimates for the convergence rate of these bounds have been proved in [4] that are in order $O(1/\sqrt{r})$ . Improved rates have been subsequently shown when restricting to special sets ${\mathbf{K}}$ . Related stronger results have been shown for the case when ${\mathbf{K}}$ is the hypercube $[0,1]^{n}$ or $[-1,1]^{n}$ . In [3] the authors show a hierarchy of upper bounds using the Beta distribution, with the same convergence rate in $O(1/\sqrt{r})$ , but whose computation needs only elementary operations; moreover an improved convergence in $O(1/r)$ can be shown, e.g., when $f$ is quadratic. In addition, a convergence rate in $O(1/r^{2})$ is shown in [2], using distributions based on Jackson kernels and a larger class of sum-of-squares density functions.

In this paper we investigate the hierarchy of measure-based upper bounds of [10] and show that when $K$ is a convex body, convexity can be exploited to show an improved convergence rate in $O(1/r)$ , even for nonconvex functions. The key ingredient for this is to establish a relationship with upper bounds based on simulated annealing and to use a known convergence rate result from [6] for simulated annealing bounds in the convex case.

Simulated annealing was introduced by Kirkpatrick et al. [7] as a randomized search procedure for general optimization problems. It has enjoyed renewed interest for convex optimization problems since it was shown by Kalai and Vempala [6] that a polynomial-time implementation is possible. This requires so-called hit-and-run sampling from $\mathbf{K}$ , as introduced by Smith [14], that was shown to be a polynomial-time procedure by Lovász [12]. Most recently, Abernethy and Hazan [1] showed formal equivalence with a certain interior point method for convex optimization.

This unexpected equivalence between seemingly different methods has motivated this current work to relate the bounds by Lasserre [10] to the simulating annealing bounds as well.

In what follows, we first introduce the measure-based upper bounds of Lasserre [10]. Then we recall the bounds based on simulated annealing and the known convergence results for a linear objective function $f$ , and we give an explicit proof of their extension to the case of a general convex function $f$ . After that we state our main result and the next section is devoted to its proof. In the last section we conclude with numerical examples showing the quality of the two types of bounds and some final remarks.

2 Lasserre’s hierarchy of upper bounds

Throughout, ${\mathbb{R}}[x]={\mathbb{R}}[x_{1},\dots,x_{n}]$ is the set of polynomials in $n$ variables with real coefficients and, for an integer $r\in{\mathbb{N}}$ , ${\mathbb{R}}[x]_{r}$ is the set of polynomials with degree at most $r$ . Any polynomial $f\in{\mathbb{R}}[x]_{r}$ can be written $f=\sum_{\alpha\in N(n,r)}f_{\alpha}x^{\alpha}$ , where we set $x^{\alpha}=\prod_{i=1}^{n}x_{i}^{\alpha_{i}}$ for $\alpha\in{\mathbb{N}}^{n}$ and $N(n,r)=\{\alpha\in{\mathbb{N}}^{n}:\sum_{i=1}^{n}\alpha_{i}\leq r\}$ . We let $\Sigma[x]$ denote the set of sums of squares of polynomials, and $\Sigma[x]_{r}=\Sigma[x]\cap{\mathbb{R}}[x]_{2r}$ consists of all sums of squares of polynomials with degree at most $2r$ .

We recall the following reformulation for $f_{\min,\mathbf{K}}$ , established by Lasserre [10]:

[TABLE]

By bounding the degree of the polynomial $h\in\Sigma[x]$ by $2r$ , we can define the parameter:

[TABLE]

Clearly, the inequality $f_{\min,\mathbf{K}}\leq\underline{f}^{(r)}_{\mathbf{K}}$ holds for all $r\in{\mathbb{N}}$ . Lasserre [10] gave conditions under which the infimum is attained in the program (1). De Klerk, Laurent and Sun [4, Theorem 3] established the following rate of convergence for the bounds $\underline{f}^{(r)}_{\mathbf{K}}$ .

Theorem 2.1 (De Klerk, Laurent, and Sun [4]).

Let $f\in{\mathbb{R}}[x]$ and $\mathbf{K}$ a convex body. There exist constants $C_{f,{\mathbf{K}}}$ (depending only on $f$ and ${\mathbf{K}}$ ) and $r_{\mathbf{K}}$ (depending only on ${\mathbf{K}}$ ) such that

[TABLE]

That is, the following asymptotic convergence rate holds: $\underline{f}^{(r)}_{\mathbf{K}}-f_{\min,\mathbf{K}}\simeq O\left({1\over\sqrt{r}}\right).$

This result of [4] holds in fact under more general assumptions, namely when $f$ is Lipschitz continuous and ${\mathbf{K}}$ satisfies a technical assumption (Assumption 1 in [4]), which says (roughly) that around any point in $\mathbf{K}$ there is a ball whose intersection with ${\mathbf{K}}$ is at least a constant fraction of the unit ball.

As explained in [10] the parameter $\underline{f}^{(r)}_{\mathbf{K}}$ can be computed using semidefinite programming, assuming one knows the moments $m_{\alpha}({\mathbf{K}})$ of the Lebesgue measure on ${\mathbf{K}}$ , where

[TABLE]

Indeed suppose $f(x)=\sum_{\beta\in N(n,d)}f_{\beta}x^{\beta}$ has degree $d$ . Writing $h\in\Sigma[x]_{r}$ as $h(x)=\sum_{\alpha\in N(n,2r)}h_{\alpha}x^{\alpha}$ , the parameter $\underline{f}^{(r)}_{\mathbf{K}}$ from (1) can be reformulated as follows:

[TABLE]

Since the sum-of-squares condition on $h$ may be written as a linear matrix inequality, this is a semidefinite program. In fact, since it only has one linear equality constraint, it may even be rewritten as a generalised eigenvalue problem. In particular, $\underline{f}_{\mathbf{K}}^{(r)}$ is equal to the the smallest generalized eigenvalue of the system:

[TABLE]

where the symmetric matrices $A$ and $B$ are of order ${n+r\choose r}$ with rows and columns indexed by $N(n,r)$ , and

[TABLE]

For more details, see [10, 4, 3].

3 Bounds from simulated annealing

Given a continuous function $f$ , consider the associated Boltzman distribution over the set ${\mathbf{K}}$ , defined by the density function:

[TABLE]

Write $X\sim P_{f}$ if the random variable $X$ takes values in ${\mathbf{K}}$ according to the Boltzman distribution.

The idea of simulated annealing is to sample $X\sim P_{f/t}$ where $t>0$ is a fixed ‘temperature’ parameter, that is subsequently decreased. Clearly, for any $t>0$ , we have

[TABLE]

The point is that, under mild assumptions, these bounds converge to the minimum of $f$ over ${\mathbf{K}}$ (see, e.g., [15]):

[TABLE]

The key step in the practical utilization of theses bounds is therefore to perform the sampling of $X\sim P_{f/t}$ .

Example 3.1.

Consider the minimization of the Motzkin polynomial

[TABLE]

over ${\mathbf{K}}=[-1,1]^{2}$ , where there are four global minimizers at the points $\left(\pm\frac{1}{2},\pm\frac{1}{2}\right)$ , and $f_{\min,{\mathbf{K}}}=0$ . Figure 1 shows the corresponding Boltzman density function for $t=\frac{1}{2}$ . Note that this density has four modes, roughly positioned at the four global minimizers of $f$ in $[-1,1]^{2}$ . The corresponding upper bound on $f_{\min,{\mathbf{K}}}=0$ is $\mathbb{E}_{X\sim P_{f/t}}[f(X)]\approx 0.7257$ ( $t=\frac{1}{2}$ ).

To obtain a better upper bound on $f_{\min,{\mathbf{K}}}$ from the Lasserre hierarchy, one needs to use a degree $14$ s.o.s. polynomial density; in particular, one has $\underline{f}^{(6)}_{\mathbf{K}}=0.8010$ (degree $12$ ) and $\underline{f}^{(7)}_{\mathbf{K}}=0.7088$ (degree $14$ ). More detailed numerical results are given in Section 5.

When $f$ is linear and ${\mathbf{K}}$ a convex body, Kalai and Vempala [6, Lemma 4.1] show that the rate of convergence of the bounds in (5) is linear in the temperature $t$ .

Theorem 3.2 (Kalai and Vempala [6]).

Let $f(x)=c^{T}x$ where $c$ is a unit vector, and let ${\mathbf{K}}$ be a convex body. Then, for any $t>0$ , we have

[TABLE]

We indicate how to extend the result of Kalai and Vempala in Theorem 3.2 to the case of an arbitrary convex function $f$ . This more general result is hinted at in §6 of [6], where the authors write

“… a statement analogous to [Theorem 2] holds also for general convex functions …”

but no precise statement is given there. In any event, as we will now show, the more general result may readily be derived from Theorem 3.2 (in fact, from the special case of a linear coordinate function $f(x)=x_{i}$ for some $i$ ).

Corollary 3.3.

Let $f$ be a convex function and let ${\mathbf{K}}\subseteq{\mathbb{R}}^{n}$ be a convex body. Then, for any $t>0$ , we have

[TABLE]

Proof.

Set

[TABLE]

Then we have

[TABLE]

Define the set

[TABLE]

Then $\widehat{\mathbf{K}}$ is a convex body and we have

[TABLE]

Accordingly, define the parameter

[TABLE]

Corollary 3.3 will follow if we show that

[TABLE]

To this end set $E_{\mathbf{K}}={N_{\mathbf{K}}\over D_{\mathbf{K}}}$ and $E_{\widehat{\mathbf{K}}}={N_{\widehat{\mathbf{K}}}\over D_{\widehat{\mathbf{K}}}}$ , where we define

[TABLE]

We work out the parameters $N_{\widehat{\mathbf{K}}}$ and $D_{\widehat{\mathbf{K}}}$ (taking integrations by part):

[TABLE]

Then, using the fact that $E_{\mathbf{K}}={N_{\mathbf{K}}\over D_{\mathbf{K}}}$ , we obtain:

[TABLE]

which proves relation (6).

We can now derive the result of Corollary 3.3. Indeed, using Theorem 2 applied to $\widehat{\mathbf{K}}$ and the linear function $x_{n+1}$ , we get

[TABLE]

∎∎

The bound in the corollary is tight asymptotically, as the following example shows.

Example 3.4.

Consider the univariate problem $\min_{x}\{x\;|\;x\in[0,1]\}$ . Thus, in this case, $f(x)=x$ , ${\mathbf{K}}=[0,1]$ and $\min_{x\in{\mathbf{K}}}f(x)=0$ . For given temperature $t>0$ , we have

[TABLE]

4 Main results

We will prove the following relationship between the sum-of-squares based upper bound (1) of Lasserre and the bound (5) based on simulated annealing.

Theorem 4.1.

Let $f$ be a polynomial of degree $d$ , let ${\mathbf{K}}$ be a compact set and set $\widehat{f}_{\max}=\max_{x\in{\mathbf{K}}}|f(x)|.$ Then we have

[TABLE]

For the problem of minimizing a convex polynomial function over a convex body, we obtain the following improved convergence rate for the sum-of-squares based bounds of Lasserre.

Corollary 4.2.

Let $f\in{\mathbb{R}}[x]$ be a convex polynomial of degree $d$ and let ${\mathbf{K}}$ be a convex body. Then for any integer $r\geq 1$ one has

[TABLE]

for some constant $c>0$ that does not depend on $r$ . (For instance, $c=(ne+1)\widehat{f}_{\max}$ .)

Proof.

Let $r\geq 1$ and set $t={e\cdot\widehat{f}_{\max}\over r}$ . Combining Theorems 3.2 and 4.1, we get

[TABLE]

∎∎

For convex polynomials $f$ , this improves on the known $O(1/\sqrt{r})$ result from Theorem 2.1. One may in fact use the last corollary to obtain the same rate of convergence in terms of $r$ for all polynomials, without the convexity assumption, as we will now show.

Corollary 4.3.

If $f$ be a polynomial and ${\mathbf{K}}$ a convex body, then there is a $c>0$ depending on $f$ and ${\mathbf{K}}$ only, so that

[TABLE]

A suitable value for $c$ is

[TABLE]

where $C^{1}_{f}=\max_{x\in{\mathbf{K}}}\|\nabla f(x)\|_{2}$ and $C^{2}_{f}=\max_{x\in{\mathbf{K}}}\|\nabla^{2}f(x)\|_{2}$ .

We first define a convex quadratic function $q$ that upper bounds $f$ on ${\mathbf{K}}$ as follows:

[TABLE]

where $C^{2}_{f}=\max_{x\in{\mathbf{K}}}\|\nabla^{2}f(x)\|_{2}$ , and $a$ is the minimizer of $f$ on ${\mathbf{K}}$ . Note that $q(x)\geq f(x)$ for all $x\in{\mathbf{K}}$ by Taylor’s theorem, and $\min_{x\in{\mathbf{K}}}q(x)=f(a)$ .

By definition of the Lasserre hierarchy,

[TABLE]

Invoking Corollary 4.2 and using that the degree of $q$ is $2$ , we obtain:

[TABLE]

where $\hat{q}_{\max}=\max_{x\in{\mathbf{K}}}q(x)\leq f_{\min,{\mathbf{K}}}+C^{1}_{f}\cdot\mbox{diam}({\mathbf{K}})+C^{2}_{f}\cdot\mbox{diam}({\mathbf{K}})^{2}$ .∎

The last result improves on the known $O\left(\frac{1}{\sqrt{r}}\right)$ rate in Theorem 2.1.

Proof of Theorem 4.1

The key idea in the proof of Theorem 4.1 is to replace the Boltzman density function by a polynomial approximation.

To this end, we first recall a basic result on approximating the exponential function by its truncated Taylor series.

Lemma 4.4 (De Klerk, Laurent and Sun [4]).

Let $\phi_{2r}(\lambda)$ denote the (univariate) polynomial of degree $2r$ obtained by truncating the Taylor series expansion of $e^{-\lambda}$ at the order $2r$ . That is,

[TABLE]

Then $\phi_{2r}$ is a sum of squares of polynomials. Moreover, we have

[TABLE]

We now define the following approximation of the Boltzman density $P_{f/t}$ :

[TABLE]

By construction, $\varphi_{2r,t}$ is a sum-of-squares polynomial probability density function on ${\mathbf{K}}$ , with degree $2rd$ if $f$ is a polynomial of degree $d$ . Moreover, by relation (7) in Lemma 4.4, we obtain

[TABLE]

From this we can derive the following result.

Lemma 4.5.

For any continuous $f$ and scalar $t>0$ one has

[TABLE]

Proof.

As $\varphi_{2r,t}(x)$ is a polynomial of degree $2rd$ and a probability density function on ${\mathbf{K}}$ (by (8)), we have:

[TABLE]

Using the above inequality (10) for $\varphi_{2r,t}(x)$ we can upper bound the integral on the right hand side:

[TABLE]

Combining with the inequality (12) gives the desired result.∎∎

We now proceed to the proof of Theorem 4.1. In view of Lemma 4.5, we only need to bound the last right-hand-side term in (11):

[TABLE]

and to show that $T\leq{\widehat{f}_{\max}\over 2^{r}}$ .

By the defininition of $\widehat{f}_{\max}$ we have

[TABLE]

which implies

[TABLE]

Combining with the Stirling approximation inequality,

[TABLE]

applied to $(2r+1)!$ , we obtain:

[TABLE]

Consider $r\geq{e\cdot\widehat{f}_{\max}\over t}$ , so that $\widehat{f}_{\max}/t\leq r/e$ . Then, using the fact that $r/(2r+1)\leq 1/2$ , we obtain

[TABLE]

This concludes the proof of Theorem 4.1.

5 Concluding remarks

We conclude with a numerical comparison of the two hierarchies of bounds. By Theorem 4.1, it is reasonable to compare the bounds $\underline{f}^{(r)}_{\mathbf{K}}$ and $\mathop{\mathbb{E}}_{X\sim P_{f/t}}[f(X)]$ , with $t=\frac{e\cdot d\cdot\widehat{f}_{\max}}{r}$ and $d$ the degree of $f$ . Thus we define, for the purpose of comparison:

[TABLE]

We calculated the bounds for the polynomial test functions listed in Table 1.

The bounds are shown in Table 2. The bounds $\underline{f}_{\mathbf{K}}^{(r)}$ were taken from [2], while the bounds $SA^{(r)}$ were computed via numerical integration, in particular using the Matlab routine sum2 of the package Chebfun [5].

The results in the table show that the bound in Theorem 4.1 is far from tight for these examples. In fact, it may well be that the convergence rates of $\underline{f}_{\mathbf{K}}^{(r)}$ and $SA^{(r)}$ are different for convex $f$ . We know that $SA^{(r)}-f_{\min,{\mathbf{K}}}=\Theta(1/r)$ is the exact convergence rate for the simulated annealing bounds for convex $f$ (cf. Example 3.4), but it was speculated in [2] that one may in fact have $\underline{f}_{\mathbf{K}}^{(r)}-f_{\min,{\mathbf{K}}}=O(1/r^{2})$ , even for non-convex $f$ . Determining the exact convergence rate $\underline{f}_{\mathbf{K}}^{(r)}$ remains an open problem.

Finally, one should point out that it is not really meaningful to compare the computational complexities of computing the two bounds $\underline{f}_{\mathbf{K}}^{(r)}$ and $SA^{(r)}$ , as explained below.

For any polynomial $f$ and convex body ${\mathbf{K}}$ , $\underline{f}^{(r)}_{\mathbf{K}}$ may be computed by solving a generalised eigenvalue problem with matrices of order ${n+r\choose r}$ , as long as the moments of the Lebesgue measure on ${\mathbf{K}}$ are known. The generalised eigenvalue computation may be done in $O\left({n+r\choose r}^{3}\right)$ operations; see [3] for details. Thus this is a polynomial-time procedure for fixed values of $r$ .

For non-convex $f$ , the complexity of computing $\mathop{\mathbb{E}}_{X\sim P_{f/t}}[f(X)]$ is not known. When $f$ is linear, it is shown in [1] that $\mathop{\mathbb{E}}_{X\sim P_{rf}}[f(X)]$ with $t=O(1/r)$ may be obtained in $O^{*}\left(n^{4.5}\log(r)\right)$ oracle membership calls for ${\mathbf{K}}$ , where the $O^{*}(\cdot)$ notation suppresses logarithmic factors.

Since the assumptions on the available information is different for the two types of bounds, there is no simple way to compare these respective complexities.

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. Abernethy and E. Hazan. Faster Convex Optimization: Simulated Annealing with an Efficient Universal Barrier. ar Xiv 1507.02528, July 2015.
2[2] E. de Klerk, R. Hess and M. Laurent. Improved convergence rates for Lasserre-type hierarchies of upper bounds for box-constrained polynomial optimization. SIAM J. Optim. (to appear), ar Xiv:1603.03329 v 1 (2016)
3[3] E. de Klerk, J.-B. Lasserre, M. Laurent, and Z. Sun. Bound-constrained polynomial optimization using only elementary calculations. Mathematics of Operations Research (to appear), ar Xiv:1507.04404 v 2 (2016)
4[4] E. de Klerk, M. Laurent, Z. Sun. Convergence analysis for Lasserre’s measure-based hierarchy of upper bounds for polynomial optimization, Math. Program. Ser. A , (2016). doi:10.1007/s 10107-016-1043-1.
5[5] T. A. Driscoll, N. Hale, and L. N. Trefethen, editors, Chebfun Guide , Pafnuty Publications, Oxford, 2014.
6[6] A. T. Kalai and S. Vempala. Simulated annealing for convex optimization. Mathematics of Operations Research , 31(2), 253–266 (2006)
7[7] S. Kirkpatrick, C.D. Gelatt, Jr., M.P. Vecchi. Optimization by simulated annealing. Science 220, 671-–680, 1983.
8[8] Lasserre, J.B.: Global optimization with polynomials and the problem of moments. SIAM J. Optim. 11, 796–817 (2001)

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Comparison of Lasserre’s measure–based bounds for polynomial optimization to bounds obtained by simulated annealing

Abstract

1 Introduction

2 Lasserre’s hierarchy of upper bounds

Theorem 2.1** (De Klerk, Laurent, and Sun [4]).**

3 Bounds from simulated annealing

Example 3.1**.**

Theorem 3.2** (Kalai and Vempala [6]).**

Corollary 3.3**.**

Proof.

Example 3.4**.**

4 Main results

Theorem 4.1**.**

Corollary 4.2**.**

Proof.

Corollary 4.3**.**

Proof of Theorem 4.1

Lemma 4.4** (De Klerk, Laurent and Sun [4]).**

Lemma 4.5**.**

Proof.

5 Concluding remarks

Theorem 2.1 (De Klerk, Laurent, and Sun [4]).

Example 3.1.

Theorem 3.2 (Kalai and Vempala [6]).

Corollary 3.3.

Example 3.4.

Theorem 4.1.

Corollary 4.2.

Corollary 4.3.

Lemma 4.4 (De Klerk, Laurent and Sun [4]).

Lemma 4.5.