A duality principle for non-convex optimization in $\mathbb{R}^n$

Fabio Botelho

arXiv:1903.06014·math.OC·April 2, 2019

A duality principle for non-convex optimization in $\mathbb{R}^n$

Fabio Botelho

PDF

Open Access

TL;DR

This paper introduces a duality principle for certain non-convex optimization problems in al R^n, establishing conditions for optimality and the absence of duality gap using convex analysis and D.C. optimization techniques.

Contribution

It develops a duality framework for non-convex problems, extending classical convex analysis tools and proving no duality gap at local extremal points.

Findings

01

Established a duality principle for non-convex optimization in al R^n.

02

Derived global sufficient optimality conditions.

03

Proved the absence of duality gap at local extremal points.

Abstract

This article develops a duality principle for a class of optimization problems in $R^{n}$ . The results are obtained based on standard tools of convex analysis and on a well known result of Toland for D.C. optimization. Global sufficient optimality conditions are also presented as well as relations between the critical points of the primal and dual formulations. Finally we formally prove there is no duality gap between the primal and dual formulations in a local extremal context.

Equations178

J (x) = - G_{1} (x) + G_{2} (x, 0),

J (x) = - G_{1} (x) + G_{2} (x, 0),

G_{1} (x) = - \frac{x ^{T} A x}{2} + \frac{K}{2} x^{T} x - f^{T} x

G_{1} (x) = - \frac{x ^{T} A x}{2} + \frac{K}{2} x^{T} x - f^{T} x

G_{2} (x, v) = j = 1 \sum N \frac{γ _{j}}{2} (\frac{x ^{T} B _{j} x}{2} + c_{j} + v_{j})^{2} + \frac{K}{2} x^{T} x,

G_{2} (x, v) = j = 1 \sum N \frac{γ _{j}}{2} (\frac{x ^{T} B _{j} x}{2} + c_{j} + v_{j})^{2} + \frac{K}{2} x^{T} x,

J (x) = \frac{x ^{T} A x}{2} + j = 1 \sum N \frac{γ _{j}}{2} (\frac{x ^{T} B _{j} x}{2} + c_{j})^{2} + f^{T} x .

J (x) = \frac{x ^{T} A x}{2} + j = 1 \sum N \frac{γ _{j}}{2} (\frac{x ^{T} B _{j} x}{2} + c_{j})^{2} + f^{T} x .

{e_{1}, \dots, e_{n}}

{e_{1}, \dots, e_{n}}

\delta_{ij}=\left\{\begin{array}[]{lr}1,&\text{ if }i=j\\ 0,&\text{ otherwise},\end{array}\right.

\delta_{ij}=\left\{\begin{array}[]{lr}1,&\text{ if }i=j\\ 0,&\text{ otherwise},\end{array}\right.

J (x)

J (x)

G_{1} (x) = - \frac{x ^{T} A x}{2} + \frac{K}{2} x^{T} x - f^{T} x

G_{1} (x) = - \frac{x ^{T} A x}{2} + \frac{K}{2} x^{T} x - f^{T} x

G_{2} (x, v) = j = 1 \sum N \frac{γ _{j}}{2} (\frac{x ^{T} B _{j} x}{2} + c_{j} + v_{j})^{2} + \frac{K}{2} x^{T} x .

G_{2} (x, v) = j = 1 \sum N \frac{γ _{j}}{2} (\frac{x ^{T} B _{j} x}{2} + c_{j} + v_{j})^{2} + \frac{K}{2} x^{T} x .

J (x) \to + \infty

J (x) \to + \infty

G_{1}^{*} (v^{*})

G_{1}^{*} (v^{*})

G_{2}^{*} (v^{*}, v_{0}^{*})

G_{2}^{*} (v^{*}, v_{0}^{*})

C^{*} = {v_{0}^{*} \in R^{N} : j = 1 \sum N (v_{0}^{*})_{j} B_{j} + K I_{d} > 0} .

C^{*} = {v_{0}^{*} \in R^{N} : j = 1 \sum N (v_{0}^{*})_{j} B_{j} + K I_{d} > 0} .

B^{*} = {v_{0}^{*} \in R^{N} : A + j = 1 \sum N (v_{0}^{*})_{j} B_{j} > 0}

B^{*} = {v_{0}^{*} \in R^{N} : A + j = 1 \sum N (v_{0}^{*})_{j} B_{j} > 0}

A^{*} = B^{*} \cap C^{*} .

A^{*} = B^{*} \cap C^{*} .

J^{*} (v^{*}, v_{0}^{*}) = G_{1} (v^{*}) - G_{2}^{*} (v^{*}, v_{0}^{*}),

J^{*} (v^{*}, v_{0}^{*}) = G_{1} (v^{*}) - G_{2}^{*} (v^{*}, v_{0}^{*}),

\tilde{J}^{*} (v^{*}) = v_{0}^{*} \in C^{*} sup J^{*} (v^{*}, v_{0}^{*}) .

\tilde{J}^{*} (v^{*}) = v_{0}^{*} \in C^{*} sup J^{*} (v^{*}, v_{0}^{*}) .

(\overset{v}{^}_{0}^{*})_{j} = γ_{j} (\frac{x _{0}^{T} B _{j} x _{0}}{2} + c_{j}),

(\overset{v}{^}_{0}^{*})_{j} = γ_{j} (\frac{x _{0}^{T} B _{j} x _{0}}{2} + c_{j}),

\overset{v}{^}^{*} = j = 1 \sum N (\overset{v}{^}_{0}^{*})_{j} B_{j} x_{0} + K x_{0},

\overset{v}{^}^{*} = j = 1 \sum N (\overset{v}{^}_{0}^{*})_{j} B_{j} x_{0} + K x_{0},

H_{3} = P_{1} \overline{E} P_{2},

H_{3} = P_{1} \overline{E} P_{2},

α \equiv (α)_{n \times n} = (I_{d} - H_{3}) D - I_{d},

α \equiv (α)_{n \times n} = (I_{d} - H_{3}) D - I_{d},

α_{1} = - (p = 1 \sum N (\overset{v}{^}_{0}^{*})_{p} B_{p} + K I_{d})^{- 1} (α) (p = 1 \sum N (\overset{v}{^}_{0}^{*})_{p} B_{p} + K I_{d}) .

α_{1} = - (p = 1 \sum N (\overset{v}{^}_{0}^{*})_{p} B_{p} + K I_{d})^{- 1} (α) (p = 1 \sum N (\overset{v}{^}_{0}^{*})_{p} B_{p} + K I_{d}) .

P_{1}=\left[\begin{array}[]{cccc}B_{1}x_{0}&B_{2}x_{0}&\cdots&B_{N}x_{0}\end{array}\right]_{n\times N}

P_{1}=\left[\begin{array}[]{cccc}B_{1}x_{0}&B_{2}x_{0}&\cdots&B_{N}x_{0}\end{array}\right]_{n\times N}

P_{2}=\left[\begin{array}[]{c}x_{0}^{T}B_{1}\left(\sum_{p=1}^{N}(\hat{v}_{0}^{*})_{p}B_{p}+KI_{d}\right)^{-1}\\ x_{0}^{T}B_{2}\left(\sum_{p=1}^{N}(\hat{v}_{0}^{*})_{p}B_{p}+KI_{d}\right)^{-1}\\ \vdots\\ x_{0}^{T}B_{N}\left(\sum_{p=1}^{N}(\hat{v}_{0}^{*})_{p}B_{p}+KI_{d}\right)^{-1}\end{array}\right]_{N\times n}

P_{2}=\left[\begin{array}[]{c}x_{0}^{T}B_{1}\left(\sum_{p=1}^{N}(\hat{v}_{0}^{*})_{p}B_{p}+KI_{d}\right)^{-1}\\ x_{0}^{T}B_{2}\left(\sum_{p=1}^{N}(\hat{v}_{0}^{*})_{p}B_{p}+KI_{d}\right)^{-1}\\ \vdots\\ x_{0}^{T}B_{N}\left(\sum_{p=1}^{N}(\hat{v}_{0}^{*})_{p}B_{p}+KI_{d}\right)^{-1}\end{array}\right]_{N\times n}

E = {E_{l η}} = γ_{l} x_{0}^{T} B_{l} (p = 1 \sum N (\overset{v}{^}_{0}^{*})_{p} B_{p} + K I_{d})^{- 1} B_{η} x_{0} + δ_{l η}_{N \times N}

E = {E_{l η}} = γ_{l} x_{0}^{T} B_{l} (p = 1 \sum N (\overset{v}{^}_{0}^{*})_{p} B_{p} + K I_{d})^{- 1} B_{η} x_{0} + δ_{l η}_{N \times N}

\overline{E} = {\overline{E}_{l η}} = {E_{l η}}^{- 1} .

\overline{E} = {\overline{E}_{l η}} = {E_{l η}}^{- 1} .

D = \hat{B} (p = 1 \sum N (\overset{v}{^}_{0}^{*})_{p} B_{p} + K I_{d})^{- 1} + I_{d}

D = \hat{B} (p = 1 \sum N (\overset{v}{^}_{0}^{*})_{p} B_{p} + K I_{d})^{- 1} + I_{d}

\hat{B}_{n \times n} = {\hat{B}_{j k}} = {l = 1 \sum N s, q = 1 \sum n γ_{l} (x_{0})_{s} (B_{l})_{j s} (B_{l})_{q k} (x_{0})_{q}} .

\hat{B}_{n \times n} = {\hat{B}_{j k}} = {l = 1 \sum N s, q = 1 \sum n γ_{l} (x_{0})_{s} (B_{l})_{j s} (B_{l})_{q k} (x_{0})_{q}} .

δ \tilde{J} (\overset{v}{^}^{*}) = 0,

δ \tilde{J} (\overset{v}{^}^{*}) = 0,

δ^{2} \tilde{J} (\overset{v}{^}^{*}) > 0,

δ^{2} \tilde{J} (\overset{v}{^}^{*}) > 0,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Variational Analysis · Advanced Optimization Algorithms Research · Sparse and Compressive Sensing Techniques

Full text

A duality principle for non-convex optimization in $\mathbb{R}^{n}$

Fabio Silva Botelho

Departamento de Matemática

Universidade Federal de Santa Catarina, UFSC

Florianópolis, SC - Brazil

Abstract

This article develops a duality principle for a class of optimization problems in $\mathbb{R}^{n}$ . The results are obtained based on standard tools of convex analysis and on a well known result of Toland for D.C. optimization. Global sufficient optimality conditions are also presented as well as relations between the critical points of the primal and dual formulations. Finally we formally prove there is no duality gap between the primal and dual formulations in a local extremal context.

1 Introduction

Consider a function $J:\mathbb{R}^{n}\rightarrow\mathbb{R}$ defined by

[TABLE]

where

[TABLE]

and

[TABLE]

and where $x\in\mathbb{R}^{n}$ , $v\in\mathbb{R}^{N}$ , $A$ is a $n\times n$ real symmetric matrix, $B_{j}$ is a $n\times n$ real symmetric matrix and $c_{j},\gamma_{j}\in\mathbb{R}$ , where $\gamma_{j}>0,$ $\forall j\in\{1,\ldots,N\}.$

Finally, $f\in\mathbb{R}^{n}$ as well.

Observe that

[TABLE]

We shall develop a duality principle which has no restriction concerning $n$ and $N$ , so that it includes the case $n\neq N.$

Also, we establish a relation between the corresponding critical points of the primal and dual formulations.

The main result is established through an extension of a Toland result found in [7].

Indeed, we must emphasize our work is a kind of extension and continuation of the original works of Bielski and Telega [1, 2] combined with the work of Toland [7]. The technical details follow in some extent the results in [3]. Anyway, we highlight once more our work in some sense complements the results in [1, 2] but now applied to a $\mathbb{R}^{n}$ simpler context.

Similar problems have been addressed in [5, 6], among others.

2 The main result

We start this section with a remark.

Remark 2.1.

About the notation we denote the canonical basis of $\mathbb{R}^{n}$ by

[TABLE]

and we recall that in general $A^{T}$ denotes the transpose of the matrix $A$ . For a $n\times n$ matrix $A$ we denote $A>\mathbf{0}$ if $A$ is positive definite. Finally, $I_{d}$ denotes the identity matrix $n\times n$ and by $\{\delta_{ij}\}$ we denote the standard $N\times N$ Kronecker delta, that is,

[TABLE]

$\forall i,j\in\{1,\ldots,N\}.$ **

Our main result is summarized by the following theorem.

Theorem 2.2.

Let $J:\mathbb{R}^{n}\rightarrow\mathbb{R}$ be defined by

[TABLE]

where

[TABLE]

and

[TABLE]

Assume $A$ is a $n\times n$ symmetric matrix and $B_{j}$ are $n\times n$ symmetric matrices $\forall j\in\{1,\dots,N\}$ such that

[TABLE]

as $|x|\rightarrow\infty,$ and $K>0$ is such that $KI_{d}>A$ .

Define also $G^{*}_{1}:\mathbb{R}^{n}\rightarrow\mathbb{R}$ by

[TABLE]

and $G_{2}^{*}:\mathbb{R}^{n}\times C^{*}\rightarrow\mathbb{R}$ by

[TABLE]

where

[TABLE]

Moreover, define

[TABLE]

and

[TABLE]

At this point we denote

[TABLE]

and define

[TABLE]

Assume $x_{0}\in\mathbb{R}^{n}$ is such that $\delta J(x_{0})=\mathbf{0}$ and define

[TABLE]

and

[TABLE]

where

[TABLE]

and

[TABLE]

where

[TABLE]

and

[TABLE]

Furthermore,

[TABLE]

where

[TABLE]

Under such assumptions and notation, we have,

If $\delta^{2}J(x_{0})>\mathbf{0}$ , $\delta^{2}J(x_{0})+(KI_{d}-A)(\alpha_{1})>\mathbf{0}$ and $\hat{v}_{0}^{*}\in C^{*}$ , then

[TABLE]

and

[TABLE]

so that there exist $r>0$ and $r_{1}>0$ such that

[TABLE] 2. 2.

If $\hat{v}_{0}^{*}\in A^{*}$ so that

[TABLE]

define

[TABLE]

Thus in such a case, we have

[TABLE]

and

[TABLE] 3. 3.

If $\delta^{2}J(x_{0})<\mathbf{0}$ , $\delta^{2}J(x_{0})+(KI_{d}-A)(\alpha_{1})<\mathbf{0}$ and $\hat{v}_{0}^{*}\in C^{*}$ then

[TABLE]

and

[TABLE]

so that there exist $r>0$ and $r_{1}>0$ such that

[TABLE]

Proof.

From $\delta J(x_{0})=\mathbf{0}$ we obtain

[TABLE]

Hence

[TABLE]

Thus,

[TABLE]

so that

[TABLE]

and therefore

[TABLE]

From this and and the implicit function theorem, we get

[TABLE]

However, from

[TABLE]

we have

[TABLE]

so that from (12), we obtain

[TABLE]

Hence, we may denote

[TABLE]

On the other hand from (10), we have

[TABLE]

and

[TABLE]

Therefore

[TABLE]

Observe also that

[TABLE]

where $\hat{v}_{0}^{*}$ is such that

[TABLE]

Taking the variation of this last equation in $v^{*}_{k}$ , we get

[TABLE]

From this, denoting

[TABLE]

we obtain

[TABLE]

Also

[TABLE]

so that

[TABLE]

Therefore

[TABLE]

Therefore, recalling that

[TABLE]

where

[TABLE]

we may write

[TABLE]

Therefore, denoting also

[TABLE]

we have

[TABLE]

Since $D$ , $H_{1}$ and $H_{2}$ are symmetric positive definite matrices, assuming $\delta^{2}J(u_{0})>\mathbf{0}$ and $\delta^{2}J(x_{0})+(KI_{d}-A)(\alpha_{1})>\mathbf{0}$ , we have

[TABLE]

so that there exist $r>0$ and $r_{1}>0$ such that

[TABLE]

Assume now $\hat{v}_{0}^{*}\in A^{*}$ so that

[TABLE]

Observe that if $v_{0}^{*}\in A^{*}$ , then

[TABLE]

is such that

[TABLE]

so that defining

[TABLE]

we have that $J_{2}^{*}$ is convex as the supremum of a family of convex functions.

Similarly as above, we may obtain

[TABLE]

and

[TABLE]

From this, since $J_{2}^{*}$ is convex, from the min-max theorem and from the general result in Toland [7], we may infer that

[TABLE]

Hence

[TABLE]

so that

[TABLE]

Finally, the proof of third item is similar to that of the first one.

This would complete the proof. ∎

Remark 2.3.

For the special case in which $n=N=1$ we obtain $\alpha_{1}=0.$

Remark 2.4.

We may obtain an even more interesting result if we consider a more general case in which $K$ is a symmetric matrix $n\times n$ . Specifically for the case

[TABLE]

we get

[TABLE]

and in such a case

[TABLE]

so that we recover at least approximately a correspondence between $\delta^{2}J(x_{0})$ and $\delta^{2}\tilde{J}^{*}(\hat{v}^{*}),$ up to considering the sign of $H_{2}$ as well.

Observe that in this last context,

[TABLE]

and

[TABLE]

Remark 2.5.

Let us now consider a dual functional proposed in the current literature (see [6], for example). For the model addressed in this article, such a functional is expressed as

[TABLE]

Taking the variation (in fact derivative) of such a functional in $(v_{0}^{*})_{j}$ , since the matrices in question are symmetric, we obtain

[TABLE]

Now taking the derivative of this expression relating $(v_{0}^{*})_{k}$ we get

[TABLE]

Since the matrices in question are symmetric, at a critical point as specified in the last theorem, we obtain,

[TABLE]

On the other hand, for the functional $J(x)$ we obtain

[TABLE]

where

[TABLE]

From this we may see that there exists a qualitative correspondence (in terms of positivity or negativity in a matrix sense) between the two second derivative matrices only for the special case $n=N=1$ . Even so we have to consider the sign of $\sum_{p=1}^{N}(\hat{v}_{0}^{*})_{p}B_{p}+A$ to get a right conclusion.

For a general case such a correspondence may not hold even if $n=N.$

3 Conclusion

In this article we have developed a duality principle for a class of non-convex optimization problems in $\mathbb{R}^{n}$ . For such a class of problems we address the case in which for the variables in question, $n\neq N.$

We believe to have obtained a very interesting way of developing the dual formulation, establishing a correct relation between the critical points of the primal and dual problems, with no duality gap between such primal and dual formulations.

This problem has been addressed in similar form in [5, 6], for example. It is not our objective here to comment extensively such previous results, but just offer a new possibility of obtaining the dual formulations for such a class of problems.

Bibliography7

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] W.R. Bielski, A. Galka, J.J. Telega, The Complementary Energy Principle and Duality for Geometrically Nonlinear Elastic Shells. I. Simple case of moderate rotations around a tangent to the middle surface. Bulletin of the Polish Academy of Sciences, Technical Sciences, Vol. 38, No. 7-9, 1988.
2[2] W.R. Bielski and J.J. Telega, A Contribution to Contact Problems for a Class of Solids and Structures, Arch. Mech., 37, 4-5, pp. 303-320, Warszawa 1985.
3[3] F. Botelho, Functional Analysis and Applied Optimization in Banach Spaces, (Springer Switzerland, 2014).
4[4] F. Botelho, Real Analysis and Applications, (Springer Switzerland, 2018).
5[5] D.Y. Gao and H.F. Yu, Multi-scale modelling and canonical dual finite element method in phase transition in solids. Int. J. Solids Struct., 45, 3660-3673 (2008).
6[6] D.Y.Gao and C. Wu, On the Triality Theory in Global Optimization, Arxiv: 1104.2970 - v 2, February, 2012.
7[7] J.F. Toland, A duality principle for non-convex optimisation and the calculus of variations , Arch. Rath. Mech. Anal., 71 , No. 1 (1979), 41-61.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A duality principle for non-convex optimization in Rn\mathbb{R}^{n}Rn

Abstract

1 Introduction

2 The main result

Remark 2.1**.**

Theorem 2.2**.**

Proof.

Remark 2.3**.**

Remark 2.4**.**

Remark 2.5**.**

3 Conclusion

A duality principle for non-convex optimization in $\mathbb{R}^{n}$

Remark 2.1.

Theorem 2.2.

Remark 2.3.

Remark 2.4.

Remark 2.5.