Stochastic Polynomial Optimization

Jiawang Nie; Liu Yang; Suhan Zhong

arXiv:1908.05689·math.OC·August 19, 2019·Optim. Methods Softw.

Stochastic Polynomial Optimization

Jiawang Nie, Liu Yang, Suhan Zhong

PDF

Open Access

TL;DR

This paper introduces a stochastic polynomial optimization framework using sample averages and Moment-SOS relaxations, analyzing its properties and demonstrating effectiveness through numerical experiments.

Contribution

It presents a novel approach combining sample average approximation with Moment-SOS relaxations for stochastic polynomial optimization.

Findings

01

Effective solution method demonstrated via numerical experiments

02

Properties of the proposed optimization and relaxations analyzed

03

Sample average approach integrated with Moment-SOS relaxations

Abstract

This paper studies stochastic optimization problems with polynomials. We propose an optimization model with sample averages and perturbations. The Lasserre type Moment-SOS relaxations are used to solve the sample average optimization. Properties of the optimization and its relaxations are studied. Numerical experiments are presented.

Tables10

Table 1. Table 1. Performance of PSAA for Example 4.1

$ϵ$		0	$10^{- 4}$	$10^{- 3}$	$10^{- 2}$	$10^{- 1}$
\@slowromancapi@	solvable?	yes	yes	yes	yes	yes
	time(sec.)	1.13	1.34	1.23	1.23	1.23
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{N} (u) \|$	1.29e-06	6.41e-07	2.38e-07	1.30e-07	4.75e-08
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{m i n} \|$	2.05e-02	2.05e-02	2.07e-02	3.41e-02	3.33e-01
\@slowromancapii@	solvable?	yes	yes	yes	yes	yes
	time(sec.)	1.12	1.29	1.23	1.21	1.22
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{N} (u) \|$	9.15e-07	6.41e-07	2.06e-07	1.27e-07	1.53e-08
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{m i n} \|$	4.84e-04	4.86e-04	6.91e-04	1.38e-02	3.13e-01
\@slowromancapiii@	solvable?	no	yes	yes	yes	yes
	time(sec.)	2.14	2.78	1.40	1.16	1.15
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{N} (u) \|$	n.a.	1.13e+03	7.50e-08	1.25e-07	8.22e-09
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{m i n} \|$	n.a.	7.14e+05	3.85e-02	7.00e-03	2.94e-01
\@slowromancapiv@	solvable?	no	yes	yes	yes	yes
	time(sec.)	2.19	2.79	1.40	1.14	1.16
	$\| ⟨ f_{N}, y ⟩ - f_{N} (u) \|$	n.a.	1.35e+03	1.89e-07	1.26e-07	1.54e-08
	$\| ⟨ f_{N}, y ⟩ - f_{m i n} \|$	n.a.	6.52e+05	4.73e-04	1.32e-02	3.14e-01

Table 2. Table 2. Performance of PSAA for Example 4.2

$ϵ$		0	$10^{- 4}$	$10^{- 3}$	$10^{- 2}$
\@slowromancapi@	solvable?	yes	yes	yes	yes
	time(sec.)	0.37	0.29	0.15	0.26
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{N} (u) \|$	1.81e+05	1.29e-08	5.28e-09	9.53e-09
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{m i n} \|$	1.81e+05	1.48e-02	1.48e-02	1.47e-02
\@slowromancapii@	solvable?	yes	yes	yes	yes
	time(sec.)	0.23	0.12	0.14	0.12
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{N} (u) \|$	1.87e+05	1.29e-08	5.61e-09	9.64e-09
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{m i n} \|$	1.87e+05	6.13e-03	6.12e-03	6.12e-03
\@slowromancapiii@	solvable?	yes	yes	yes	yes
	time(sec.)	0.12	0.13	0.10	0.09
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{N} (u) \|$	5.58e-02	1.29e-08	5.61e-09	9.70e-09
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{m i n} \|$	1.40e-02	8.56e-03	8.56e-03	8.55e-03

Table 3. Table 3. Performance of PSAA for Example 4.3

$ϵ$		$0$	$10^{- 4}$	$10^{- 3}$
\@slowromancapi@	solvable?	yes	yes	yes
	time(sec.)	0.17	0.13	0.12
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{N} (u) \|$	1.86e+03	4.18e-06	3.55e-06
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{m i n} \|$	3.44e+01	2.09e-04	2.30e-02
\@slowromancapii@	solvable?	yes	yes	yes
	time(sec.)	0.13	0.14	0.16
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{N} (u) \|$	1.85e+03	4.17e-06	3.56e-06
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{m i n} \|$	3.44e+01	1.29e-02	9.87e-03

Table 4. Table 4. Performance of PSAA for Example 4.4 .

\@slowromancapi@	$ϵ$	$0$	$0.0012$	$0.004$	$0.008$
	solvable?	no	yes	yes	yes
	time(sec.)	0.12	0.10	0.07	0.09
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{N} (u) \|$	n.a.	5.31e-03	3.36e-04	1.09e-04
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{m i n} \|$	n.a.	5.44e-03	4.61e-04	2.34e-04
\@slowromancapii@	$ϵ$	$0$	$10^{- 4}$	$10^{- 3}$	$10^{- 2}$
	solvable?	yes	yes	yes	yes
	time(sec.)	0.08	0.08	0.07	0.06
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{N} (u) \|$	5.03e-09	1.84e-09	1.61e-09	1.86e-09
	$\| ⟨ f_{N}, y^{*} ⟩ - f_{m i n} \|$	2.50e-04	2.50e-04	2.50e-04	2.50e-04

Table 5. Table 5. The values of ϵ ∗ superscript italic-ϵ \epsilon^{*} for Example 4.5 .

	\@slowromancapi@	\@slowromancapii@	\@slowromancapiii@	\@slowromancapiv@
$N$	$500$	$1000$	$5000$	$10000$
$\bar{ξ}$	2.11	1.96	2.01	2.02
$s_{2}$	6.43	5.71	6.13	6.07
$ϵ^{*}$	0.807543	3.3618e-10	0.073413	0.146826

Table 6. Table 6. Performance of PSAA for Example 4.5 .

	\@slowromancapiii@			\@slowromancapiv@
$ϵ$	0	$ϵ^{*}$	0.1	0	$ϵ^{*}$	0.2
solvable?	no	yes	yes	no	yes	yes
time(sec.)	0.23	0.20	0.11	0.10	0.16	0.07
$\| ⟨ f_{N}, y^{*} ⟩ - f_{N} (u) \|$	n.a.	5.30e+01	2.02e-01	n.a.	6.47e+01	2.28e-01
$\| ⟨ f_{N}, y ⟩ - f_{m i n} \|$	n.a.	5.39e+01	3.90e-01	n.a.	6.49e+01	2.96e-01
$‖ u - v^{*} ‖$	n.a.	3.09e-02	1.28e-01	n.a.	5.86e-02	2.73e-01
$u$	n.a.	$[\begin{matrix} 1.0216 \\ 0.0035 \\ 0.0035 \\ - 1.0216 \end{matrix}]$	$[\begin{matrix} 0.9102 \\ 0.0071 \\ 0.0071 \\ - 0.9102 \end{matrix}]$	n.a.	$[\begin{matrix} 0.9591 \\ 0.0059 \\ 0.0059 \\ - 0.9591 \end{matrix}]$	$[\begin{matrix} 0.8070 \\ 0.0085 \\ 0.0085 \\ - 0.8070 \end{matrix}]$

Table 7. Table 7. The values of ϵ ∗ superscript italic-ϵ \epsilon^{*} for Example 4.6 .

	\@slowromancapi@	\@slowromancapii@	\@slowromancapiii@	\@slowromancapiv@
$N$	$500$	$1000$	$5000$	$10000$
$s_{1}$	0.98	1.08	1.01	0.97
$s_{2}$	1.06	0.96	1.02	0.96
$ϵ^{*}$	0.023094	0.023094	0.017321	1.1076e-09

Table 8. Table 8. Performance of PSAA for Example 4.6 .

	\@slowromancapii@			\@slowromancapiii@
$ϵ$	0	$ϵ^{*}$	0.05	0	$ϵ^{*}$	0.05
solvable?	no	yes	yes	no	yes	yes
time(sec.)	0.06	0.21	0.06	0.06	0.13	0.05
$\| ⟨ f_{N}, y^{*} ⟩ - f_{N} (u) \|$	n.a.	1.13e+02	1.36e-02	n.a.	5.65e+00	6.25e-03
$\| ⟨ f_{N}, y ⟩ - f_{m i n} \|$	n.a.	1.13e+02	4.21e-03	n.a.	5.65e+00	2.80e-03
$‖ u - v^{*} ‖$	n.a.	1.20e-03	1.85e-02	n.a.	6.08e-03	2.58e-02
$u$	n.a.	$[\begin{matrix} 1.0000 \\ 0.7059 \end{matrix}]$	$[\begin{matrix} 1.0000 \\ 0.6886 \end{matrix}]$	n.a.	$[\begin{matrix} 1.0000 \\ 0.7010 \end{matrix}]$	$[\begin{matrix} 1.0000 \\ 0.6813 \end{matrix}]$

Table 9. Table 9. The values of ϵ ∗ superscript italic-ϵ \epsilon^{*} for Example 4.7 .

	\@slowromancapi@	\@slowromancapii@	\@slowromancapiii@
$N$	$500$	$1000$	$5000$
$\bar{ξ}$	0.99	1.03	1.00
$s_{2}$	1.32	1.38	1.33
$s_{3}$	1.97	2.09	1.99
$ϵ^{*}$	0.508637	0.518810	0.508637

Table 10. Table 10. Performance of PSAA for Example 4.7 .

	\@slowromancapi@		\@slowromancapii@		\@slowromancapiii@
$ϵ$	0	$ϵ^{*}$	0	$ϵ^{*}$	0	$ϵ^{*}$
solvable?	yes	yes	yes	yes	yes	yes
time(sec.)	0.07	0.06	0.07	0.06	0.08	0.06
$\| ⟨ f_{N}, y^{*} ⟩ - f_{N} (u) \|$	7.08e+01	1.07e-06	7.08e+01	8.44e-07	7.08e+01	1.07e-06
$\| ⟨ f_{N}, y ⟩ - f_{m i n} \|$	2.07e+01	2.00e-02	2.07e+01	1.00e-02	2.07e+01	1.00e-02
$‖ u - v^{*} ‖$	1.67e+00	2.97e-08	1.67e+00	2.97e-08	1.67e+00	2.97e-08
$u$	$[\begin{matrix} 1.9637 \\ 1.9637 \\ 1.9630 \end{matrix}]$	$[\begin{matrix} 1.0000 \\ 1.0000 \\ 1.0000 \end{matrix}]$	$[\begin{matrix} 1.9631 \\ 1.9631 \\ 1.9627 \end{matrix}]$	$[\begin{matrix} 1.0000 \\ 1.0000 \\ 1.0000 \end{matrix}]$	$[\begin{matrix} 1.9631 \\ 1.9631 \\ 1.9627 \end{matrix}]$	$[\begin{matrix} 1.0000 \\ 1.0000 \\ 1.0000 \end{matrix}]$

Equations161

x \in K min f (x) := E [F (x, ξ)]

x \in K min f (x) := E [F (x, ξ)]

F (x, ξ) := α = (α_{1}, \dots, α_{n}) \sum c_{α} (ξ) x_{1}^{α_{1}} \dots x_{n}^{α_{n}}

F (x, ξ) := α = (α_{1}, \dots, α_{n}) \sum c_{α} (ξ) x_{1}^{α_{1}} \dots x_{n}^{α_{n}}

K := {x \in R^{n} : g_{1} (x) \geq 0, \dots, g_{m} (x) \geq 0},

K := {x \in R^{n} : g_{1} (x) \geq 0, \dots, g_{m} (x) \geq 0},

\left\{\begin{array}[]{rl}\min&f(x):=\mathbb{E}\big{[}F(x,\xi)\big{]}\\ \mathrm{s.t.}&g_{1}(x)\geq 0,\ldots,g_{m}(x)\geq 0.\end{array}\right.

\left\{\begin{array}[]{rl}\min&f(x):=\mathbb{E}\big{[}F(x,\xi)\big{]}\\ \mathrm{s.t.}&g_{1}(x)\geq 0,\ldots,g_{m}(x)\geq 0.\end{array}\right.

f_{N} (x) := \frac{1}{N} k = 1 \sum N F (x, ξ^{(k)}) .

f_{N} (x) := \frac{1}{N} k = 1 \sum N F (x, ξ^{(k)}) .

f_{N} (x) \to f (x) as N \to \infty,

f_{N} (x) \to f (x) as N \to \infty,

\left\{\begin{array}[]{rl}\min&f_{N}(x)\\ \mathrm{s.t.}&g_{1}(x)\geq 0,\ldots,g_{m}(x)\geq 0.\end{array}\right.

\left\{\begin{array}[]{rl}\min&f_{N}(x)\\ \mathrm{s.t.}&g_{1}(x)\geq 0,\ldots,g_{m}(x)\geq 0.\end{array}\right.

\mbox d i s t (A, B) := max {x \in A sup y \in B in f ∥ x - y ∥, x \in B sup y \in A in f ∥ x - y ∥} .

\mbox d i s t (A, B) := max {x \in A sup y \in B in f ∥ x - y ∥, x \in B sup y \in A in f ∥ x - y ∥} .

\left\{\begin{array}[]{rl}\min&f_{N}(x)+\epsilon\|[x]_{2d}\|\\ \mathrm{s.t.}&g_{1}(x)\geq 0,\ldots,g_{m}(x)\geq 0,\end{array}\right.

\left\{\begin{array}[]{rl}\min&f_{N}(x)+\epsilon\|[x]_{2d}\|\\ \mathrm{s.t.}&g_{1}(x)\geq 0,\ldots,g_{m}(x)\geq 0,\end{array}\right.

2 d \geq max {de g (f_{N}), de g (g_{1}), \dots, de g (g_{m})}

2 d \geq max {de g (f_{N}), de g (g_{1}), \dots, de g (g_{m})}

∣ α ∣ := α_{1} + \dots + α_{n}, x^{α} := x_{1}^{α_{1}} \dots x_{n}^{α_{n}} .

∣ α ∣ := α_{1} + \dots + α_{n}, x^{α} := x_{1}^{α_{1}} \dots x_{n}^{α_{n}} .

N_{d}^{n} := {α \in N^{n} : ∣ α ∣ \leq d} .

N_{d}^{n} := {α \in N^{n} : ∣ α ∣ \leq d} .

[x]_{k} := [1 x_{1} \dots x_{n} x_{1}^{2} x_{1} x_{2} \dots x_{n}^{k}]^{T} .

[x]_{k} := [1 x_{1} \dots x_{n} x_{1}^{2} x_{1} x_{2} \dots x_{n}^{k}]^{T} .

Q (g) := Σ [x] + g_{1} \cdot Σ [x] + \dots + g_{m} \cdot Σ [x] .

Q (g) := Σ [x] + g_{1} \cdot Σ [x] + \dots + g_{m} \cdot Σ [x] .

Q (g)_{2 d} := Σ [x]_{2 d} + g_{1} \cdot Σ [x]_{2 d - deg (g_{1})} + \dots + g_{m} \cdot Σ [x]_{2 d - deg (g_{m})} .

Q (g)_{2 d} := Σ [x]_{2 d} + g_{1} \cdot Σ [x]_{2 d - deg (g_{1})} + \dots + g_{m} \cdot Σ [x]_{2 d - deg (g_{m})} .

\dots \subseteq Q (g)_{2 d} \subseteq Q (g)_{2 d + 2} \subseteq \dots Q (g) .

\dots \subseteq Q (g)_{2 d} \subseteq Q (g)_{2 d + 2} \subseteq \dots Q (g) .

R^{N_{d}^{n}} := {y = (y_{α})_{α \in N_{d}^{n}} : y_{α} \in R} .

R^{N_{d}^{n}} := {y = (y_{α})_{α \in N_{d}^{n}} : y_{α} \in R} .

\mathscr{R}_{y}\Big{(}{\sum}_{\alpha\in\mathbb{N}_{d}^{n}}f_{\alpha}x^{\alpha}\Big{)}:={\sum}_{\alpha\in\mathbb{N}_{d}^{n}}f_{\alpha}y_{\alpha}.

\mathscr{R}_{y}\Big{(}{\sum}_{\alpha\in\mathbb{N}_{d}^{n}}f_{\alpha}x^{\alpha}\Big{)}:={\sum}_{\alpha\in\mathbb{N}_{d}^{n}}f_{\alpha}y_{\alpha}.

⟨ f, y ⟩ := R_{y} (f) .

⟨ f, y ⟩ := R_{y} (f) .

vec(a)^{T}\Big{(}L_{p}^{(d)}[y]\Big{)}vec(b)=\mathscr{R}_{y}(pab)

vec(a)^{T}\Big{(}L_{p}^{(d)}[y]\Big{)}vec(b)=\mathscr{R}_{y}(pab)

L_{p}^{(3)} [y] = [r] y_{110} - y_{003} y_{210} - y_{103} y_{120} - y_{013} y_{111} - y_{004} y_{210} - y_{103} y_{310} - y_{203} y_{220} - y_{113} y_{211} - y_{104} y_{120} - y_{013} y_{220} - y_{113} y_{130} - y_{023} y_{121} - y_{014} y_{111} - y_{004} y_{211} - y_{104} y_{121} - y_{014} y_{112} - y_{005} .

L_{p}^{(3)} [y] = [r] y_{110} - y_{003} y_{210} - y_{103} y_{120} - y_{013} y_{111} - y_{004} y_{210} - y_{103} y_{310} - y_{203} y_{220} - y_{113} y_{211} - y_{104} y_{120} - y_{013} y_{220} - y_{113} y_{130} - y_{023} y_{121} - y_{014} y_{111} - y_{004} y_{211} - y_{104} y_{121} - y_{014} y_{112} - y_{005} .

M_{d} [y] := L_{1}^{(d)} [y] .

M_{d} [y] := L_{1}^{(d)} [y] .

M_{2} [y] = [r] y_{000} y_{100} y_{010} y_{001} y_{200} y_{110} y_{101} y_{020} y_{011} y_{002} y_{100} y_{200} y_{110} y_{101} y_{300} y_{210} y_{201} y_{120} y_{111} y_{102} y_{010} y_{110} y_{020} y_{011} y_{210} y_{120} y_{111} y_{030} y_{021} y_{012} y_{001} y_{101} y_{011} y_{002} y_{201} y_{111} y_{102} y_{021} y_{012} y_{003} y_{200} y_{300} y_{210} y_{201} y_{400} y_{310} y_{301} y_{220} y_{211} y_{202} y_{110} y_{210} y_{120} y_{111} y_{310} y_{220} y_{211} y_{130} y_{121} y_{112} y_{101} y_{201} y_{111} y_{102} y_{301} y_{211} y_{202} y_{121} y_{112} y_{103} y_{020} y_{120} y_{030} y_{021} y_{220} y_{130} y_{121} y_{040} y_{031} y_{022} y_{011} y_{111} y_{021} y_{012} y_{211} y_{121} y_{112} y_{031} y_{022} y_{013} y_{002} y_{102} y_{012} y_{003} y_{202} y_{112} y_{103} y_{022} y_{013} y_{004} .

M_{2} [y] = [r] y_{000} y_{100} y_{010} y_{001} y_{200} y_{110} y_{101} y_{020} y_{011} y_{002} y_{100} y_{200} y_{110} y_{101} y_{300} y_{210} y_{201} y_{120} y_{111} y_{102} y_{010} y_{110} y_{020} y_{011} y_{210} y_{120} y_{111} y_{030} y_{021} y_{012} y_{001} y_{101} y_{011} y_{002} y_{201} y_{111} y_{102} y_{021} y_{012} y_{003} y_{200} y_{300} y_{210} y_{201} y_{400} y_{310} y_{301} y_{220} y_{211} y_{202} y_{110} y_{210} y_{120} y_{111} y_{310} y_{220} y_{211} y_{130} y_{121} y_{112} y_{101} y_{201} y_{111} y_{102} y_{301} y_{211} y_{202} y_{121} y_{112} y_{103} y_{020} y_{120} y_{030} y_{021} y_{220} y_{130} y_{121} y_{040} y_{031} y_{022} y_{011} y_{111} y_{021} y_{012} y_{211} y_{121} y_{112} y_{031} y_{022} y_{013} y_{002} y_{102} y_{012} y_{003} y_{202} y_{112} y_{103} y_{022} y_{013} y_{004} .

\mathscr{S}(g)_{2d}:=\Big{\{}\left.y\in\mathbb{R}^{\mathbb{N}^{n}_{2d}}\right|M_{d}[y]\succeq 0,\,L_{g_{1}}^{(d)}[y]\succeq 0,\ldots,L_{g_{m}}^{(d)}[y]\succeq 0\Big{\}}.

\mathscr{S}(g)_{2d}:=\Big{\{}\left.y\in\mathbb{R}^{\mathbb{N}^{n}_{2d}}\right|M_{d}[y]\succeq 0,\,L_{g_{1}}^{(d)}[y]\succeq 0,\ldots,L_{g_{m}}^{(d)}[y]\succeq 0\Big{\}}.

π : R^{N_{2 d}^{n}} \to R^{n}, y \mapsto u = (y_{e_{1}}, \dots, y_{e_{n}}) .

π : R^{N_{2 d}^{n}} \to R^{n}, y \mapsto u = (y_{e_{1}}, \dots, y_{e_{n}}) .

K\subseteq\pi\Big{(}\mathscr{S}(g)_{2d}\cap\{y_{0}=1\}\Big{)}.

K\subseteq\pi\Big{(}\mathscr{S}(g)_{2d}\cap\{y_{0}=1\}\Big{)}.

f_{N} (x) := \frac{1}{N} k = 1 \sum N F (x, ξ^{(k)}) .

f_{N} (x) := \frac{1}{N} k = 1 \sum N F (x, ξ^{(k)}) .

d=\Big{\lceil}\frac{1}{2}\max\{\deg(f_{N}),\deg(g_{1}),\ldots,\deg(g_{m})\}\Big{\rceil}.

d=\Big{\lceil}\frac{1}{2}\max\{\deg(f_{N}),\deg(g_{1}),\ldots,\deg(g_{m})\}\Big{\rceil}.

\left\{\begin{array}[]{rl}\min&f_{N}(x)+\epsilon\|[x]_{2d}\|\\ \mathrm{s.t.}&g_{1}(x)\geq 0,\ldots,g_{m}(x)\geq 0,\end{array}\right.

\left\{\begin{array}[]{rl}\min&f_{N}(x)+\epsilon\|[x]_{2d}\|\\ \mathrm{s.t.}&g_{1}(x)\geq 0,\ldots,g_{m}(x)\geq 0,\end{array}\right.

f_{N}(x)=\langle f_{N},[x]_{2d}\rangle,\quad M_{d}\big{[}[x]_{2d}\big{]}\succeq 0,\quad L_{g_{i}}^{(d)}\big{[}[x]_{2d}\big{]}\succeq 0

f_{N}(x)=\langle f_{N},[x]_{2d}\rangle,\quad M_{d}\big{[}[x]_{2d}\big{]}\succeq 0,\quad L_{g_{i}}^{(d)}\big{[}[x]_{2d}\big{]}\succeq 0

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Polynomial and algebraic computation · Risk and Portfolio Optimization

Full text

Stochastic Polynomial Optimization

Jiawang Nie

Jiawang Nie, Suhan Zhong, Department of Mathematics, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, USA, 92093.

[email protected],[email protected]

,

Liu Yang

Liu Yang, school of Mathematics and Computational Sciences, Xiangtan University, Xiangtan, Hunan, China, 411105.

[email protected]

and

Suhan Zhong

Abstract.

This paper studies stochastic optimization problems with polynomials. We propose an optimization model with sample averages and perturbations. The Lasserre type Moment-SOS relaxations are used to solve the sample average optimization. Properties of the optimization and its relaxations are studied. Numerical experiments are presented.

Key words and phrases:

stochastic optimization, polynomial, Moment-SOS relaxation, semidefinite program

2010 Mathematics Subject Classification:

90C15,90C22,90C31,65K05

1. Introduction

Stochastic optimization is about functions that depend on random variables. A typical stochastic optimization problem is

[TABLE]

where $F:\mathbb{R}^{n}\times\mathbb{R}^{r}\to\mathbb{R}$ is a function in $(x,\xi)$ . The variable $\xi$ is a random vector, and the decision variable $x\in\mathbb{R}^{n}$ is required to be contained in a set $K\subseteq\mathbb{R}^{n}$ . In (1.1), the symbol $\mathbb{E}$ denotes the expectation of a function in the random vector $\xi$ . Frequently used methods for solving stochastic optimization are often based on sample average approximation (SAA). We refer to [5, 8, 17, 24, 25, 33, 35, 36, 38, 39, 41] for related work on stochastic optimization. The SAA methods use sample averages to approximate the expectation function $f(x)$ , transforming the stochastic optimization into deterministic optimization. Many classical SAA methods assume the objective functions are convex and are based on evaluations of gradients or subgradients. They can also be applied to nonconvex problems, however, the global optimality may not be guaranteed. There exists relatively less work on nonconvex stochastic optimization [1, 9, 10]. Generally, nonconvex stochastic optimization problems are computationally challenging, because the deterministic case is already difficult.

This paper discusses the special case of stochastic polynomial optimization, i.e., $F(x,\xi)$ is a polynomial function in $x$ and $K$ is a semialgebraic set defined by polynomials. When $F$ does not depend on $\xi$ , this is the case of polynomial optimization. The Lasserre type Moment-SOS relaxations are efficient and reliable for solving polynomial optimization [20, 23]. When $F$ depends on the random vector $\xi$ , the objective $f(x)$ is the expectation of $F(x,\xi)$ with respect to $\xi$ . Hence, $f(x)$ is still a polynomial. When the distribution of $\xi$ is known explicitly (e.g, Gaussian, Poisson, etc), the objective $f$ can be expressed by integral formula. However, when the distribution of $\xi$ is not known exactly, or its density function is too complicated for evaluating the expectation, it is not practical to get an explicit formula for $f$ . In most methods for stochastic optimization, the objective $f$ is approximated by sample averages. In this article, we discuss how to use the efficient Moment-SOS relaxation for solving stochastic optimization with sample average approximation.

In stochastic polynomial optimization, we assume that

[TABLE]

is a polynomial in $x\in\mathbb{R}^{n}$ . Here, each coefficient $c_{\alpha}(\xi)$ is a measurable function in $\xi$ . The feasible set $K$ is assumed to be in the form

[TABLE]

for given polynomials $g_{1},\ldots,g_{m}$ in $x\in\mathbb{R}^{n}$ . For each $x$ , $F(x,\xi)$ is a measurable function in $\xi$ . So the objective $f(x):=\mathbb{E}[F(x,\xi)]$ is also a polynomial. The stochastic optimization can be expressed as

[TABLE]

The coefficients of the polynomial $f$ are typically not known explicitly, because the true distribution of $\xi$ is usually not known exactly. However, they can be estimated by sample average approximation. In applications, we can generate samples of $\xi$ , say, $N$ random samples $\xi^{(1)},\xi^{(2)},...,\xi^{(N)}$ . The expectation function $f(x)=\mathbb{E}[F(x,\xi)]$ can be approximated by the sample average

[TABLE]

If each sample $\xi^{(k)}$ obeys the same distribution of $\xi$ , then $\mathbb{E}[f_{N}(x)]=f(x)$ . Furthermore, when all $\xi^{(k)}$ are independently identically distributed (i.i.d.), the Law of Large Numbers (LLN) (see [15]) implies that

[TABLE]

with probability one and under some regularity conditions. The resulting sample average approximation for (1.3) is

[TABLE]

This is also a polynomial optimization problem. It can be solved globally by the Lasserre type Moment-SOS hierarchy of relaxations [18].

Sample average approximation methods have good statistical properties. For convenience, denote by $\vartheta^{*},\vartheta_{N}$ the optimal values of (1.3) and (1.4) respectively, and denote by $S,S_{N}$ their optimizer sets respectively. Assume that: i) $f$ is continuous and $S$ is nonempty; ii) there is a compact set $C\subseteq\mathbb{R}^{n}$ such that $S\subseteq C$ and $f_{N}$ converges uniformly to $f$ on $C$ , with probability one; iii) for all $N$ large enough, $\emptyset\not=S_{N}\subseteq C$ . Then, it can be shown (see [39]) that $\vartheta_{N}\longrightarrow\vartheta^{*}$ and $\mbox{dist}(S_{N},S)\longrightarrow 0$ 111For sets $A,B\subseteq\mathbb{R}^{n}$ , their distance is defined as

$\mbox{dist}(A,B):=\max\left\{\sup_{x\in A}\inf_{y\in B}\|x-y\|,\,\sup_{x\in B}\inf_{y\in A}\|x-y\|\right\}.$

as $N\rightarrow\infty$ , with probability one. Moreover, if $\xi^{(1)},\xi^{(2)},...,\xi^{(N)}$ are independently identically distributed as $\xi$ is, then $\mathbb{E}[\vartheta_{N}]\,\leq\,\mathbb{E}[\vartheta_{N+1}]\,\leq\,v^{*}$ . That is, as the sample size $Z$ increases, the sample average optimization gives better approximation for (1.3). We refer to [39] for more details about the convergence of the sample average optimization.

When $F(x,\xi)$ is a polynomial in $x$ , the sample average approximation (1.4) is also a polynomial optimization problem. The Lasserre type Moment-SOS relaxations can be applied to solve it. However, the following concerns need to be addressed:

•

For given samples $\xi^{(1)},\ldots,\xi^{(N)}$ , the optimizer set $S_{N}$ of (1.4) may (or may not) be far away from the optimizer set $S$ of (1.3). For instance, it is possible that (1.3) is bounded from below and has a global minimizer, while (1.4) is unbounded from below and has no global minimizers.

•

The sample average $f_{N}(x)$ is only an approximation for the objective $f(x)$ . Usually, the optimizer sets of (1.3) and (1.4) are not exactly same. Therefore, the sample average approximation (1.4) does not need to be solved exactly. However, we still expect that the optimizer set $S_{N}$ of (1.4) (if it is nonempty) is a good approximation for the optimizer set $S$ of (1.3), though they might be very different. Generally, the optimizer set $S$ can not be determined exactly, unless the objective $f(x)$ can be determined exactly.

To address the above concerns, we propose the following perturbation sample average approximation (PSAA) model

[TABLE]

where $d$ is the smallest integer such that

[TABLE]

and $[x]_{2d}$ is the vector of monomials in $x$ and with degrees at most $2d$ (see (2.1)). The norm $\|\cdot\|$ is the standard Euclidean norm. We use the Lasserre type Moment-SOS relaxation of degree $2d$ for solving (1.5). A small parameter $\epsilon>0$ is often selected, for (1.5) to approximate (1.3) well. In this article, we discuss properties of (1.5), as well as its Moment-SOS relaxations. The perturbation term $\epsilon\|[x]_{2d}\|$ plays an important role in sample average approximation. The paper is organized as follows. We review some basics for polynomial optimization in Section 2. The properties of the perturbed sample average optimization (1.5) are discussed in Section 3. The numerical experiments are given in Section 4.

2. Preliminaries

Notation

The symbols $\mathbb{N},\mathbb{R}$ denote the set of nonnegative integers and real numbers, respectively. For given $x\in\mathbb{R}^{n}$ and a real scalar $r>0$ , $B(x,r)$ denotes the closed ball in $\mathbb{R}^{n}$ centered at $x$ with radius $r$ , under the standard Euclidean norm. For a real symmetric matrix $X$ , we write $X\succeq 0$ (resp., $X\succ 0$ ) by meaning that $X$ is positive semidefinite (resp., positive definite). The symbol $\mathbb{R}[x]:=\mathbb{R}[x_{1},\ldots,x_{n}]$ denotes the ring of polynomials with real coefficients and in $x:=(x_{1},\ldots,x_{n})$ . For a polynomial $f$ , $\deg(f)$ refers to its total degree. For a tuple of polynomials $p=(p_{1},p_{2},\ldots,p_{m})$ , $\deg(p)$ refers to the maximum of the degrees of $p_{i}$ . For a degree $d$ , $\mathbb{R}[x]_{d}$ stands for the space of all real polynomials in $x$ and of degrees no more than $d$ . For a nonnegative integer vector $\alpha=(\alpha_{1},...,\alpha_{n})\in\mathbb{N}^{n}$ , denote

[TABLE]

For convenience, denote the monomial power set

[TABLE]

For a degree $k$ , $[x]_{k}$ denotes the vector of all monomials of degrees at most $k$ , ordered in the graded lexicographic ordering, i.e.,

[TABLE]

(The superscript T denotes the transpose of a vector or matrix.) For $t\in\mathbb{R}$ , $\lceil t\rceil$ denotes the smallest integer greater than or equal to $t$ . For a function $p(\xi)$ in a random vector $\xi$ , $\mathbb{E}[p(\xi)]$ stands for the expectation of $p(\xi)$ , with respect to the distribution of $\xi$ .

A polynomial $\sigma$ is said to be a sum of squares (SOS) if $\sigma=s_{1}^{2}+s_{2}^{2}+\cdots+s_{k}^{2}$ , for some $k\in\mathbb{N}$ and polynomials $s_{1},s_{2},...,s_{k}\in\mathbb{R}[x]$ . Clearly, if $\sigma$ is SOS and has degree $2d$ , then each $s_{i}$ must have degree at most $d$ . We use $\Sigma[x]$ to denote the cone of all SOS polynomials, and $\Sigma[x]_{2d}$ to denote the truncation of SOS polynomials in $\mathbb{R}[x]_{2d}$ . Checking whether a polynomial is SOS or not can be done by solving a semidefintie program [18, 31].

For a tuple $g:=(g_{1},\ldots,g_{m})$ of polynomials in $\mathbb{R}[x]$ , its quadratic module is the set

[TABLE]

The $2d$ -th truncation of $Q(g)$ is

[TABLE]

It clearly holds the nesting relation of containment

[TABLE]

Indeed, each $Q(g)_{2d}$ is a convex cone of the space $\mathbb{R}[x]_{2d}$ . The tuple $g$ determines the semialgebraic set in (1.2). Obviously, if $f\in Q(g)$ , then $f\geq 0$ on $K$ . To ensure $f\in Q(g)$ , we often require $f>0$ on $K$ . The quadratic module $Q(g)$ is said to be archimedean if there exists a single polynomial $p\in Q(g)$ such that the inequality $p(x)\geq 0$ defines a compact set in $\mathbb{R}^{n}$ . If $Q(g)$ is archimedean, then the set $K$ must be compact. The converse is not necessarily true. However, if $K$ is compact (say, $K\subseteq B(0,R)$ for some radius $R$ ), one can always enforce $Q(g)$ to be archimedean by adding the redundant polynomial $R^{2}-\|x\|^{2}$ to the tuple $g$ . When $Q(g)$ is archimedean, if $f>0$ on $K$ , then we must have $f\in Q(g)$ . This conclusion is referred to Putinar’s Positivstellensatz, which was shown in [34]. Interestingly, when $f\geq 0$ on $K$ , we still have $f\in Q(g)$ , under some optimality conditions [28].

For a given dimension $n$ and degree $d$ , denote by $\mathbb{R}^{\mathbb{N}_{d}^{n}}$ the space of real vectors that are indexed by $\alpha\in\mathbb{N}^{n}_{d}$ , i.e.,

[TABLE]

Each vector in $\mathbb{R}^{\mathbb{N}_{d}^{n}}$ is called a truncated multi-sequence (tms) of degree $d$ . A tms $y\in\mathbb{R}^{\mathbb{N}_{d}^{n}}$ gives the linear functional $\mathscr{R}_{y}$ acting on $\mathbb{R}[x]_{d}$ as

[TABLE]

The $\mathscr{R}_{y}$ is called a Riesz functional. For $f\in\mathbb{R}[x]_{d}$ and $y\in\mathbb{R}^{\mathbb{N}_{d}^{n}}$ , we denote

[TABLE]

The tms $y\in\mathbb{R}^{\mathbb{N}_{d}^{n}}$ is said to admit a representing measure supported in a set $T\subseteq\mathbb{R}^{n}$ if there exists a Borel measure $\mu$ , supported in $T$ , such that $y_{\alpha}=\int x^{\alpha}\mathtt{d}\mu$ for all $\alpha\in\mathbb{N}_{d}^{n}$ . This is equivalent to that $\langle f,y\rangle=\int f(x)\mathtt{d}\mu$ for all $f\in\mathbb{R}[x]_{d}$ . Such $\mu$ is called a $T$ -representing measure for $y$ . We refer to [7, 12, 29] for recent work on truncated moment problems.

For a polynomial $p\in\mathbb{R}[x]_{2d}$ , the $d$ th localizing matrix of $p$ associated to a tms $y\in\mathbb{R}^{\mathbb{N}^{n}_{2d}}$ , is the symmetric matrix $L_{p}^{(d)}[y]$ such that

[TABLE]

for all polynomials $a,b\in\mathbb{R}[x]_{t}$ , with $t=d-\lceil\deg(p)/2\rceil$ . In the above, the $vec(a)$ denotes the coefficient vector of the polynomial $a$ . For instance, when $n=3$ and $p=x_{1}x_{2}-x_{3}^{3}$ , for $y\in\mathbb{R}^{\mathbb{N}^{3}_{6}}$ , we have

[TABLE]

For the special case of constant one polynomial $p=1$ , $L_{1}^{(d)}[y]$ is reduced to the so-called moment matrix

[TABLE]

The columns and rows of $L_{p}^{(d)}[y]$ , as well as $M_{d}[y]$ , are labelled by $\alpha\in\mathbb{N}^{n}$ with $2|\alpha|+\deg(p)\leq 2d$ . For instance, for $n=3$ and $y\in\mathbb{R}^{\mathbb{N}_{4}^{3}}$ , we have

[TABLE]

Suppose $g:=(g_{1},\ldots,g_{m})$ is a tuple of polynomials in $\mathbb{R}[x]_{2d}$ . Consider the cone of tms of degree $2d$

[TABLE]

It is a closed convex cone in $\mathbb{R}^{\mathbb{N}^{n}_{2d}}$ . Consider the projection map:

[TABLE]

Let $K$ be the semialgebraic set as in (1.2), then

[TABLE]

This is because for each $u\in K$ , the tms $y:=[u]_{2d}$ belongs to $\mathscr{S}(g)_{2d}$ and $\pi(y)=u$ . Therefore, the linear section $\{y_{0}=1\}$ of the cone $\mathscr{S}(g)_{2d}$ is a lifted convex relaxation of the set $K$ . The cone $\mathscr{S}(g)_{2d}$ and the quadratic module $Q(g)_{2d}$ are dual to each other. This is because $\langle f,y\rangle\geq 0$ for all $f\in Q_{2d}(g)$ and $y\in\mathscr{S}(g)_{2d}$ . We refer to [20, 23, 30] for these basic properties. Interestingly, the containment in (2.8) is an equality when the polynomials $g_{i}$ are SOS-concave [11].

There exists much work for polynomial optimization. The Lasserre type Moment-SOS hierarchy of relaxations were introduced in [18]. The Moment-SOS hierarchy was proved to have finite convergence under some optimality conditions [16, 19, 28]. The flat extension or flat truncation condition can be used to certify its convergence [7, 13, 26]. For unconstrained optimization, the performance of the standard SOS relaxations was studied in [31, 32]. For the special case of finite feasible sets, the convergence was studied in [21, 22, 27]. We refer to [3, 20, 23, 37] for more detailed introductions to polynomial optimization.

3. The sample average optimization

Let $\xi^{(1)},\ldots,\xi^{(N)}$ be given samples for the random vector $\xi$ . Consider the sample average function

[TABLE]

When $F(x,\xi)$ is a polynomial in $x\in\mathbb{R}^{n}$ , $f_{N}(x)$ is also a polynomial in $x$ . We assume that the feasible set $K$ is given as in (1.2), for a tuple $g:=(g_{1},\ldots,g_{m})$ of polynomials. Let $d$ be the degree:

[TABLE]

Instead of solving (1.4) directly, we propose to solve the sample average optimization with perturbation

[TABLE]

for a small parameter $\epsilon>0$ . The Lasserre type moment relaxation can be applied to solve (3.2). Recall the notation $[x]_{2d}$ , $M_{d}[y]$ , $L_{g_{i}}^{(d)}[y]$ as in Section 2. Observe that

[TABLE]

for all $x\in K$ and all $i=1,\ldots,m$ . If we replace the monomial vector $[x]_{2d}$ by a tms $y\in\mathbb{R}^{\mathbb{N}^{n}_{2d}}$ , then (3.2) is relaxed to the following convex optimization

[TABLE]

It is a semidefinite program, with a norm function in the objective. The relaxation (3.3) is said to be tight if its optimal value is the same as that of (3.2). In this paper, we choose $\|\cdot\|$ to be the standard Euclidean norm, but any other kind of vector norms can also be used. The equality constraint $y_{0}=1$ means that the first entry of $y$ is equal to one. The set of all $y$ satisfying linear matrix inequalities in (3.3) is just the cone $\mathscr{S}(g)_{2d}$ , defined as in (2.6). The cone $\mathscr{S}(g)_{2d}$ and the truncated quadratic module $Q(g)_{2d}$ are dual to each other. Therefore, the Lagrange function for (3.3) is

[TABLE]

for dual variables $q\in Q(g)_{2d}$ and $\gamma\in\mathbb{R}$ . The function $\mathcal{L}(y,q,\gamma)$ has a finite minimum value for $y\in\mathbb{R}^{\mathbb{N}^{n}_{2d}}$ if and only if

[TABLE]

for which case the minimum value is $\gamma$ . (The $vec(p)$ denotes the coefficient vector of $p$ .) Therefore, the dual optimization problem of (3.3) is

[TABLE]

Because the sample average $f_{N}(x)$ is only an approximation for $f(x)$ , it is possible that there is no scalar $\gamma$ such that $f_{N}-\gamma\in Q(g)_{2d}$ . The perturbation term $\epsilon\|y\|$ in (3.3) motivates us to find the maximum $\gamma$ such that $f_{N}-p-\gamma\in Q(g)_{2d}$ , for some polynomial $p$ whose coefficient vector has a small norm. This leads to the following algorithm.

Algorithm 3.1.

Generate samples $\xi^{(1)},\ldots,\xi^{(N)}$ , according to the distribution of $\xi$ . Choose a small perturbation parameter $\epsilon>0$ .

Step 1

Compute the sample average $f_{N}=N^{-1}\sum_{k=1}^{N}F(x,\xi^{(k)})$ .

Step 2

Solve the semidefinite relaxation problem (3.3). If (3.3) is infeasible, increase the value of $\epsilon$ (e.g., let $\epsilon:=2\epsilon$ ), until (3.3) has a minimizer, which we denote as $y^{*}$ .

Step 3

Let $u=\pi(y^{*})$ , where $\pi$ is the projection map in (2.7), or equivalently, let

[TABLE]

Output $u$ as a candidate minimizer for the sample average optimization with perturbation (3.2), and stop.

For $\epsilon>0$ , the minimizer of the relaxation (3.3) is always unique (if it exists), because its objective is strictly convex. Our numerical experiments demonstrate that Algorithm 3.1 is efficient for solving (3.2).

Theorem 3.2.

Assume that $u^{*}$ is a minimizer of (3.2) and $y^{*}$ is a minimizer of (3.3). Then, for $\epsilon>0$ , the relaxation (3.3) is tight if and only if $\mbox{rank}\,M_{d}[y^{*}]=1$ . In particular, for the case $\mbox{rank}\,M_{d}[y^{*}]=1$ , the point $u=\pi(y^{*})$ is a minimizer of (3.2).

Proof.

Let $\vartheta_{1},\vartheta_{2}$ be optimal values of (3.2) and (3.3) respectively.

“ $\Leftarrow$ ” It is clear that $\vartheta_{1}\geq\vartheta_{2}$ . If $rankM_{d}[y^{*}]=1$ , then for $u=\pi(y^{*})$ one can show that $M_{d}[y^{*}]=[u]_{d}([u]_{d})^{T}$ . Hence, $y^{*}=[u]_{2d}$ , $\langle f_{N},y^{*}\rangle=f_{N}(u)$ , and each $g_{i}(u)\geq 0$ (see [13, 26]). So, $u$ is a feasible point of (3.2) and

[TABLE]

Therefore, $\vartheta_{1}=\vartheta_{2}$ , $u$ is a minimizer of (3.2), and the relaxation (3.3) is tight.

“ $\Rightarrow$ ” Let $\tilde{y}:=[u^{*}]_{2d}$ , then $f_{N}(u^{*})=\langle f_{N},\tilde{y}\rangle$ and $\|[u^{*}]_{2d}\|=\|\tilde{y}\|$ . If the relaxation (3.3) is tight, then $\vartheta_{1}=\vartheta_{2}$ and $\tilde{y}$ is a minimizer of (3.3). For $\epsilon>0$ , the objective of (3.3) is strictly convex, so its minimizer must be unique. Hence, $\tilde{y}=y^{*}$ and

[TABLE]

Therefore, $\mbox{rank}\,M_{d}[y^{*}]=\mbox{rank}\,M_{d}[\tilde{y}]=1$ . ∎

When the sample average $f_{N}(x)$ is unbounded from below on the feasible set $K$ , the moment relaxation (3.3) might still be unbounded from below if $\epsilon>0$ is small. However, if $\epsilon>0$ is big, then (3.3) must be feasible and has a minimizer. Indeed, we have the following theorem.

Theorem 3.3.

Suppose the feasible set $K$ has nonempty interior. If $\epsilon>0$ is big, both (3.3) and (3.4) have optimizers and their optimal values are the same.

Proof.

When $K$ has nonempty interior, the quadratic module $Q(g)_{2d}$ is a closed cone (see [23, Theorem 3.49]) and the cone $\mathscr{S}(g)_{2d}$ has nonempty interior. For instance, let $\nu$ be the Gaussian measure, then the tms

[TABLE]

is an interior point of the cone $\mathscr{S}(g)_{2d}$ . In other words, $M_{d}[\hat{y}]\succ 0$ and all $L_{g_{i}}^{(d)}[\hat{y}]\succ 0$ . This is because $\int_{K}p^{2}d\nu>0$ and $\int_{K}g_{i}p^{2}d\nu>0$ for all nonzero polynomials $p$ . Moreover, $\hat{y}_{0}=1$ . The convex relaxation (3.3) is strictly feasible (i.e., there is a feasible $y$ such that each matrix in (3.3) is positive definite). When $\epsilon>0$ is big, the SOS relaxation (3.4) is also strictly feasible. For instance, for the choice

[TABLE]

we have that

[TABLE]

In the above, int denotes the interior of a set. Therefore, for big $\epsilon>0$ , both (3.3) and (3.4) have strictly feasible points. By the strong duality theorem (see [2, 4]), they have the same optimal value and they both achieve the optimal value, i.e., they have optimizers. ∎

In applications, however, we often choose a small $\epsilon>0$ , because we expect that (3.2) is a good approximation for (1.4). In (3.3), the value of $\epsilon$ affects the performance of (3.3). When $\epsilon>0$ is too small, (3.3) might be unbounded from below and has no minimizers. If $\epsilon>0$ is big, (3.3) might give a loose approximation for (1.4). For efficiency, we often anticipate the smallest value of $\epsilon$ such that (3.3) is bounded from below and has a minimizer. When $K$ has nonempty interior, the relaxation (3.3) is strictly feasible, i.e., there exists $\hat{y}$ such that all the matrices $M_{d}[\hat{y}]$ and $L_{g_{i}}^{(d)}[\hat{y}]$ are positive definite. Therefore, the strong duality holds between (3.3) and (3.4). To ensure that (3.3) is solvable (i.e., it has a minimizer), the dual optimization problem (3.4) needs to be feasible. Consider the optimization problem

[TABLE]

The above is a convex optimization problem with semidefinite constraints. In computational practice, we often choose $\epsilon>0$ in a heuristic way, e.g., $\epsilon=10^{-2}$ . If such $\epsilon$ is not enough, we can increase its value until (3.3) performs well.

4. Numerical Experiments

This section gives numerical experiments of applying Algorithm 3.1 to solve stochastic polynomial optimization. The computation is implemented in MATLAB R2018a, in a Laptop with CPU 8th Generation Intel® Core™ i5-8250U and RAM 16 GB. The moment relaxation (3.3) is solved by the software GloptiPoly 3 [14], which calls the semidefinite program solver SeDuMi [40]. The computational results are displayed with four decimal digits. We use “PSAA” to denote the perturbation sample average approximation model (3.2). Its relaxation (3.3) is said to be solvable if it has a minimizer $y^{*}$ . For such a case, let $u$ be the point given in Step 3 of Algorithm 3.1, i.e., $u=\pi(y^{*})$ . Otherwise, (3.3) is said to be not solvable and we use “n.a.” to indicate the relevant values are not available. We use $f_{min}$ and $v^{*}$ to denote the optimal value and the minimizer of (1.3) respectively. The symbol $\bar{\xi}$ stands for the sample average of the random vector $\xi\in\mathbb{R}^{r}$ , while $\bar{\xi}_{i}$ refers to its $i$ th entry:

[TABLE]

In our numerical examples, we use the following classical distributions for random variables (see [6]; let $\delta_{a}$ denote the Dirac function supported at $a$ ):

•

$Ber(p)$ denotes the Bernoullian distribution with success probability $p$ , whose density function is $(1-p)\cdot\delta_{0}+p\cdot\delta_{1}$ .

•

$Geo(p)$ denotes the geometric distribution with success probability $p$ , whose density function is $\sum_{n=0}^{\infty}(1-p)^{n}p\cdot\delta_{n}$ .

•

$\mathcal{P}(\lambda)$ denotes the Poisson distribution with parameter $\lambda>0$ , whose density function is $\sum_{n=0}^{\infty}e^{-\lambda}\frac{\lambda^{n}}{n!}\cdot\delta_{n}$ .

•

$\mathcal{U}(a,b)$ , with $a<b$ , denotes the uniform distribution on $[a,b]$ .

•

$\mathcal{N}(\mu,P)$ denotes the normal distribution with the expectation $\mu$ and covariance matrix $P$ .

Example 4.1.

Consider the stochastic optimization problem

[TABLE]

where

[TABLE]

We have $\mathbb{E}(\xi_{1})=\mathbb{E}(\xi_{2})=1$ . The constraining polynomial tuple is

[TABLE]

The feasible set $K$ is a polyhedron and the exact objective is

[TABLE]

For the optimization (1.3), the optimal value and the minimizer are

[TABLE]

Since the approximation polynomial $f_{N}$ is uniquely determined by $\bar{\xi_{1}}$ and $\bar{\xi_{2}}$ , we can pick some typical sample averages to explore the performance of our optimization model (3.2) and its relaxation (3.3). For instance, we consider the samples of $\xi$ such that

[TABLE]

There are four cases of the signs:

[TABLE]

We solve the optimization model (3.2) for each case. The computational results are reported in Table 1. The PSAA model (3.2) has clear advantages for cases @slowromancapiii@ and @slowromancapiv@, when they are compared to the unperturbed case (i.e, $\epsilon=0$ ). It gives reliable optimizers, while the classical SAA model (1.4) does not return good ones.

Example 4.2.

Consider the stochastic polynomial optimization

[TABLE]

where

[TABLE]

If $f(x)$ is evaluated exactly, the optimal value and the minimizer of (1.3) are

[TABLE]

As in Example 4.1, we explore the performance of (3.2)-(3.3) for the following cases of samples:

[TABLE]

The computational results are reported in Table 2. The PSAA model (3.2) performs better than the classical SAA model (1.4) (i.e. $\epsilon=0$ ). For all these cases, (3.2) gives more reliable optimizers. Moreover, solving the relaxation (3.3) costs less computational time for cases @slowromancapi@ and @slowromancapii@.

Example 4.3.

Consider the stochastic optimization

[TABLE]

where

[TABLE]

The exact objective $f(x)=\mathbb{E}[F(x,\xi)]$ can be evaluated as

[TABLE]

For the optimization (1.3), its optimal value and minimizer are

[TABLE]

For convenience, we use “ $\bar{\xi}=\mathbb{E}(\xi)+o(10^{-3})$ ” to denote a random sample average of size $N=1000$ with error in the order of $10^{-3}$ . Consider two cases:

[TABLE]

The numerical results are reported in Table 3. The PSAA model (3.2) gives a better optimizer than the unperturbed one, for both cases.

Example 4.4.

Consider the stochastic polynomial optimization

[TABLE]

where

[TABLE]

The feasible set $K$ is a simplex, which is compact and satisfies the archimedean condition. For all samples $\xi^{(i)}$ , the sample average $f_{N}(x)$ is bounded from below on $K$ and it has a minimizer. For this example, $\mathbb{E}(\xi_{1})=0.5$ and $\mathbb{E}(\xi_{2})=2$ , so

[TABLE]

The optimal value and minimizer of (1.3) are

[TABLE]

We consider two cases of samples

[TABLE]

In the above, $\epsilon^{*}$ is the minimum value of (3.5).

The numerical results are reported in Table 4. The PSAA model (3.2) performs very well for both cases. Compared with the classical SAA model (1.4) (i.e., $\epsilon=0$ ), it has quite clear advantages for case @slowromancapi@. It successfully returned a good minimizer, while (1.4) is unbounded from below and does not return a minimizer.

Example 4.5.

Consider the unconstrained stochastic optimization

[TABLE]

where

[TABLE]

Evaluating the expectation, we get $\mathbb{E}(\xi)=2,\mathbb{E}(\xi^{2})=6$ and

[TABLE]

The optimal value and minimizer of (1.3) are

[TABLE]

Approximate $\xi,\xi^{2}$ by their sample averages $\bar{\xi},s_{2}$ , i.e., $s_{2}=\frac{1}{N}\sum_{k=1}^{N}(\xi^{(k)})^{2}$ . We make samples of different sizes and compute $\epsilon^{*}$ in (3.5) for each case.

We focus on cases @slowromancapiii@ and @slowromancapiv@.

The computational results are reported in Table 6. The perturbation term in the PSAA model (3.2) makes a big difference for computing reliable minimizers. The PSAA model returned minimizers that are close to the optimizer of (1.3), while the classical SAA model (i.e., $\epsilon=0$ ) is unbounded from below and fails to return a minimizer.

Example 4.6.

Consider the stochastic polynomial optimization

[TABLE]

where

[TABLE]

Note that $\mathbb{E}(\xi_{1}\xi_{3})=\mathbb{E}(\xi_{2}\xi_{3})=1$ and the exact objective is

[TABLE]

Its optimal value and minimizer are

[TABLE]

Denote $s_{1}:=\sum_{k=1}^{N}\xi_{1}^{(k)}\xi_{3}^{(k)},\ s_{2}:=\sum_{k=1}^{N}\xi_{2}^{(k)}\xi_{3}^{(k)}$ . We make samples of different sizes and compute $\epsilon^{*}$ in (3.5) for each case. The values of $\epsilon^{*}$ are given in Table 7.

We focus on cases @slowromancapii@ and @slowromancapiii@, for which the numerical results are reported in Table 8.

While the optimization (1.4) is unbounded from below via the classical SAA model, our PSAA model (3.2) provides good lower bounds and returns reliable minimizers for all cases.

Example 4.7.

Consider the stochastic polynomial optimization

[TABLE]

where $\xi\sim\mathcal{U}(0,2)$ and

[TABLE]

One can see that $\mathbb{E}(\xi)=1,\mathbb{E}(\xi^{2})=4/3$ , $\mathbb{E}(\xi^{3})=2$ , and the exact objective

[TABLE]

Its optimal value and minimizer are

[TABLE]

As in Examples 4.5-4.6, we approximate $\xi,\xi^{2},\xi^{3}$ by sample averages where $s_{2}=\sum_{k=1}^{N}\big{(}\xi^{(k)}\big{)}^{2},\ s_{3}=\sum_{k=1}^{N}\big{(}\xi^{(k)}\big{)}^{3}$ . Several samples of different sizes are made. The values of $\epsilon^{*}$ in (3.5) are given in Table 9.

We apply the classical SAA model (i.e., $\epsilon=0$ ) and the PSAA model (i.e., $\epsilon=\epsilon^{*}$ ) to cases @slowromancapi@, @slowromancapii@ and @slowromancapiii@.

The computational results are reported in the Table 10. The PSAA model (3.2) performs much better than the classical SAA model (1.4), as (3.2) gives more reliable minimizers for all cases.

5. Conclusion

This paper proposes a sample average optimization model with a perturbation term for solving stochastic polynomial optimization. The perturbation optimization model performs better than the classical one without perturbations. The Lasserre type moment relaxations are used to solve the perturbation optimization. In particular, we show that the moment relaxation is tight if and only if the moment matrix of the minimizer is rank one. Numerical experiments demonstrated advantages of our perturbation optimization model.

Bibliography41

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] F. Bastin, C. Cirillo and P. Toint Convergence theory for nonconvex stochastic programming with an application to mixed logit. Math. Program. , 108(2–3), pp. 207–234, 2006.
2[2] A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications , MPS-SIAM Series on Optimization, SIAM, Philadelphia, 2001.
3[3] G. Blekherman, P. Parrilo and R. Thomas (eds.), Semidefinite optimization and convex algebraic geometry , MOS-SIAM Series on Optimization, SIAM, 2013.
4[4] S. Boyd and L. Vandenberghe, Convex Optimization , Cambridge University Press, 2004.
5[5] M. Branda. Sample approximation technique for mixed-integer stochastic programming problems with several chance constraints. Operations Research Letters , 40(3), pp. 207–211, 2012.
6[6] K. Chung, A course in probability theory , Academic press, 2001.
7[7] R. Curto and L. Fialkow. Truncated K-moment problems in several variables. Journal of Operator Theory , 54(2005), pp. 189-226.
8[8] M. Fu (eds.), Handbook of simulation optimization , volume 216, Springer, New York, 2015.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Stochastic Polynomial Optimization

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

2. Preliminaries

Notation

3. The sample average optimization

Algorithm 3.1**.**

Theorem 3.2**.**

Proof.

Theorem 3.3**.**

Proof.

4. Numerical Experiments

Example 4.1**.**

Example 4.2**.**

Example 4.3**.**

Example 4.4**.**

Example 4.5**.**

Example 4.6**.**

Example 4.7**.**

5. Conclusion

Algorithm 3.1.

Theorem 3.2.

Theorem 3.3.

Example 4.1.

Example 4.2.

Example 4.3.

Example 4.4.

Example 4.5.

Example 4.6.

Example 4.7.