Newton-like dynamics associated to nonconvex optimization problems
Radu Ioan Bot, Ern\"o Robert Csetnek

TL;DR
This paper introduces a Newton-like dynamical system for nonconvex optimization, demonstrating convergence to critical points under certain conditions and providing convergence rates based on the Kurdyka- property.
Contribution
It proposes a novel dynamical system framework for nonconvex optimization and establishes convergence results and rates under the Kurdyka- property.
Findings
Limit points are contained in the set of critical points.
Trajectory convergence to critical points is proven under the Kurdyka- property.
Convergence rates depend on the exponent.
Abstract
We consider the dynamical system \begin{equation*}\left\{ \begin{array}{ll} v(t)\in\partial\phi(x(t))\\ \lambda\dot x(t) + \dot v(t) + v(t) + \nabla \psi(x(t))=0, \end{array}\right.\end{equation*} where is a proper, convex and lower semicontinuous function, is a (possibly nonconvex) smooth function and is a parameter which controls the velocity. We show that the set of limit points of the trajectory is contained in the set of critical points of the objective function , which is here seen as the set of the zeros of its limiting subdifferential. If the objective function satisfies the Kurdyka-\L{}ojasiewicz property, then we can prove convergence of the whole trajectory to a critical point. Furthermore, convergence rates for the orbits are obtained in terms of the \L{}ojasiewicz exponent of the objective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFunctional Equations Stability Results
Newton-like dynamics associated to nonconvex optimization problems
Radu Ioan Boţ University of Vienna, Faculty of Mathematics, Oskar-Morgenstern-Platz 1, A-1090 Vienna, Austria, email: [email protected]. Research partially supported by FWF (Austrian Science Fund), project I 2419-N32.
Ernö Robert Csetnek University of Vienna, Faculty of Mathematics, Oskar-Morgenstern-Platz 1, A-1090 Vienna, Austria, email: [email protected]. Research supported by FWF (Austrian Science Fund), project P 29809-N32.
Abstract. We consider the dynamical system
[TABLE]
where is a proper, convex and lower semicontinuous function, is a (possibly nonconvex) smooth function and is a parameter which controls the velocity. We show that the set of limit points of the trajectory is contained in the set of critical points of the objective function , which is here seen as the set of the zeros of its limiting subdifferential. If the objective function satisfies the Kurdyka-Łojasiewicz property, then we can prove convergence of the whole trajectory to a critical point. Furthermore, convergence rates for the orbits are obtained in terms of the Łojasiewicz exponent of the objective function, provided the latter satisfies the Łojasiewicz property.
Key Words. dynamical systems, Newton-like methods, Lyapunov analysis, nonsmooth optimization, limiting subdifferential, Kurdyka-Łojasiewicz property
AMS subject classification. 34G25, 47J25, 47H05, 90C26, 90C30, 65K10
1 Introduction and preliminaries
The dynamical system
[TABLE]
where and is a (set-valued) maximally monotone operator, has been introduced and investigated in [10] as a continuous version of Newton and Levenberg-Marquardt-type algorithms. It has been shown that under mild conditions on the trajectory converges weakly to a zero of the operator , while converges to zero as .
These investigations have been continued in [2] in the context of solving optimization problems of the form
[TABLE]
where is a proper, convex and lower semicontinuous function and is a convex and differentiable function with locally Lipschitz-continuous gradient. More precisely, problem (2) has been approached via the dynamical system
[TABLE]
where is the convex subdifferential of . It has been shown in [2] that if the set of minimizers of (2) is nonempty and some mild conditions on the damping function are satisfied, then the trajectory converges to a minimizer of (2) as . Further investigations on dynamical systems of similar type have been reported in [1] and [21].
The aim of this paper is to perform an asymptotic analysis of the dynamical system (3) in the absence of the convexity of , for constant damping function and by assuming that the objective function of (2) satisfies the Kurdyka-Łojasiewicz property, in other words is a KL function. To the class of KL functions belong semialgebraic, real subanalytic, uniformly convex and convex functions satisfying a growth condition. The convergence analysis relies on methods of real algebraic geometry introduced by Łojasiewicz [30] and Kurdyka [28] and developed recently in the nonsmooth setting by Attouch, Bolte and Svaiter [7] and Bolte, Sabach and Teboulle [16].
Optimization problems involving KL functions have attracted the interest of the community since the works of Łojasiewicz [30], Simon [34], Haraux and Jendoubi [26]. The most important contributions of the last years in the field include the works of Alvarez, Attouch, Bolte and Redont [3, Section 4] and Bolte, Daniilidis and Lewis [12, Section 4]. Ever since the interest in this topic increased continuously (see [5, 6, 7, 15, 16, 20, 18, 19, 23, 24, 27, 32]).
In the first part of the paper we show that the set of limit points of the trajectory generated by (3) is entirely contained in the set of critical points of the objective function , which is seen as the set of zeros of its limiting subdifferential. Under some supplementary conditions, including the Kurdyka-Łojasiewicz property, we prove the convergence of the trajectory to a critical point of . Furthermore, convergence rates for the orbits are obtained in terms of the Łojasiewicz exponent of the objective function, provided the latter satisfies the Łojasiewicz property.
In the following we recall some notions and results which are needed throughout the paper. We consider on the Euclidean scalar product and the corresponding norm denoted by and , respectively.
The domain of the function is defined by and we say that is proper, if it has a nonempty domain. For the following generalized subdifferential notions and their basic properties we refer to [17, 31, 33]. Let be a proper and lower semicontinuous function. The Fréchet (viscosity) subdifferential of at is the set
[TABLE]
If , we set . The limiting (Mordukhovich) subdifferential is defined at by
[TABLE]
while for , we set . Obviously, for each .
When is convex, these subdifferential notions coincide with the convex subdifferential, thus for all .
The following closedness criterion of the graph of the limiting subdifferential will be used in the convergence analysis: if and are sequences in such that for all , and as , then .
The Fermat rule reads in this nonsmooth setting as follows: if is a local minimizer of , then . We denote by
[TABLE]
the set of (limiting)-critical points of .
When is continuously differentiable around we have . We will also make use of the following subdifferential sum rule: if is proper and lower semicontinuous and is a continuously differentiable function, then for all .
Further, we recall the notion of a locally absolutely continuous function and state two of its basic properties.
Definition 1
(see [10, 2]) A function is said to be locally absolutely continuous, if it absolutely continuous on every interval for .
Remark 1
- (a)
An absolutely continuous function is differentiable almost everywhere, its derivative coincides with its distributional derivative almost everywhere and one can recover the function from its derivative by integration. 2. (b)
If is absolutely continuous for and is -Lipschitz continuous for , then the function is absolutely continuous, too. Moreover, is differentiable almost everywhere on and the inequality holds for almost every .
The following two results, which can be interpreted as continuous versions of the quasi-Fejér monotonicity for sequences, will play an important role in the asymptotic analysis of the trajectories of the dynamical system (3). For their proofs we refer the reader to [2, Lemma 5.1] and [2, Lemma 5.2], respectively.
Lemma 2
Suppose that is locally absolutely continuous and bounded from below and that there exists such that for almost every
[TABLE]
Then there exists .
Lemma 3
If , , is locally absolutely continuous, , , and for almost every
[TABLE]
then .
The following result, which is due to Brézis ([22, Lemme 3.3, p. 73]; see also [8, Lemma 3.2]), provides an expression for the derivative of the composition of convex functions with absolutely continuous trajectories.
Lemma 4
Let be a proper, convex and lower semicontinuous function. Let be absolutely continuous such that and for almost every . Assume that there exists such that for almost every . Then the function is absolutely continuous and for almost every such that we have
[TABLE]
2 Asymptotic analysis
In this paper we investigate the dynamical system
[TABLE]
where and . We assume that is proper, convex and lower semicontinuous and is possibly nonconvex and Fréchet differentiable with -Lipschitz continuous gradient, for ; in other words, for all .
In the following we specify what we understand under a solution of the dynamical system (4).
Definition 2
Let and be such that . We say that the pair is a strong global solution of (4) if the following properties are satisfied:
- (i)
are locally absolutely continuous functions; 2. (ii)
for every ; 3. (iii)
for almost every ; 4. (iv)
.
The existence and uniqueness of the trajectories generated by (4) has been investigated in [2]. A careful look at the proofs in [2] reveals the fact that the convexity of is not used in the mentioned results on the existence, but the Lipschitz-continuity of its gradient.
We start our convergence analysis with the following technical result.
Lemma 5
Let and be such that . Let be the unique strong global solution of the dynamical system (4). Then the following statements are true:
- (i)
* for almost every ;* 2. (ii)
* for almost every .*
**Proof. **
(i) See [10, Proposition 3.1]. The proof relies on the first relation in (4) and the monotonicity of the convex subdifferential.
(ii) The proof makes use of Lemma 4. This relation has been already stated in [2, relation (51)] without making use in its proof of the convexity of .
Lemma 6
Let and be such that . Let be the unique strong global solution of the dynamical system (4). Suppose that is bounded from below. Then the following statements are true:
- (i)
* for almost every ;* 2. (ii)
, and \lim_{t\rightarrow+\infty}\dot{x}(t)=\ \ \lim_{t\rightarrow+\infty}\dot{v}(t)=\lim_{t\rightarrow+\infty}\big{(}v(t)+\nabla\psi(x(t))\big{)}=0; 3. (iii)
\exists\lim_{t\rightarrow+\infty}(\phi+\psi)\big{(}x(t)\big{)}\in\mathbb{R}.
**Proof. **
(i) The statement follows by inner multiplying the both sides of the second relation in (4) by and by taking afterwards into consideration Lemma 5(ii).
(ii) After integrating the relation (i) and by taking into account that is bounded from below, we easily derive and (see also Lemma 5(i)). Further, by using the second relation in (4), Remark 1(b) and Lemma 5(i), we obtain for almost every :
[TABLE]
hence
[TABLE]
Since , a simple integration argument yields that . Considering the second equation in (4), we further obtain that . This fact combined with Lemma 3 and (5) implies that \lim_{t\rightarrow+\infty}\big{(}v(t)+\nabla\psi(x(t))\big{)}=0. From the second equation in (4) we obtain
[TABLE]
Further, from Lemma 5(i) we have for almost every
[TABLE]
hence from (6) we get . Combining this with (6) we conclude that .
(iii) From (i) and Lemma 5(i) it follows that
[TABLE]
for almost every . The conclusion follows by applying Lemma 2.
Lemma 7
Let and be such that . Let be the unique strong global solution of the dynamical system (4). Suppose that is bounded from below. Let be a sequence such that and . Then
[TABLE]
**Proof. **
From the first relation in (4) and the subdifferential sum rule of the limiting subdifferential we derive for any
[TABLE]
Further, we have
[TABLE]
and (see Lemma 6(ii))
[TABLE]
According to the closedness property of the limiting subdifferential, the proof is complete as soon as we show that
[TABLE]
From (9), (10) and the continuity of we get
[TABLE]
Further, since , we have
[TABLE]
Combining this with (9) and (12) we derive
[TABLE]
A direct consequence of the lower semicontinuity of is the relation
[TABLE]
which combined with (9) and the continuity of yields (11).
We define the limit set of as
[TABLE]
We use also the distance function to a set, defined for as for all .
Lemma 8
Let and be such that . Let be the unique strong global solution of the dynamical system (4). Suppose that is bounded from below and is bounded. Then the following statements are true:
- (i)
;
- (ii)
* is nonempty, compact and connected;*
- (iii)
\lim_{t\to+\infty}\operatorname*{dist}\big{(}x(t),\omega(x)\big{)}=0;
- (iv)
* is finite and constant on .*
**Proof. **
Statement (i) is a direct consequence of Lemma 7.
Statement (ii) is a classical result from [25]. We also refer the reader to the proof of Theorem 4.1 in [3], where it is shown that the properties of of being nonempty, compact and connected are generic for bounded trajectories fulfilling .
Statement (iii) follows immediately since is nonempty.
(iv) According to Lemma (6)(iii), there exists \lim_{t\rightarrow+\infty}(\phi+\psi)\big{(}x(t)\big{)}\in\mathbb{R}. Let us denote by this limit. Take . Then there exists such that as . From the proof of Lemma 7 we have that , hence .
Remark 9
Suppose that is coercive, in other words,
[TABLE]
Let and be such that . Let be the unique strong global solution of the dynamical system (4). Then is bounded from below and is bounded.
Indeed, since is a proper, lower semicontinuous and coercive function, it follows that is finite and the infimum is attained. Hence is bounded from below. On the other hand, from (7) it follows
[TABLE]
Since is coercive, the lower level sets of are bounded, hence the above inequality yields that is bounded. Notice that in this case is bounded too, due to the relation \lim_{t\rightarrow+\infty}\big{(}v(t)+\nabla\psi(x(t))\big{)}=0 (Lemma 6(ii)) and the Lipschitz continuity of .
3 Convergence of the trajectory when the objective function satisfies the Kurdyka-Łojasiewicz property
In order to enforce the convergence of the whole trajectory to a critical point of the objective function as more involved analytic features of the functions have to be considered.
A crucial role in the asymptotic analysis of the dynamical system (4) is played by the class of functions satisfying the Kurdyka-Łojasiewicz property. For , we denote by the class of concave and continuous functions such that , is continuously differentiable on , continuous at [math] and for all .
Definition 3
(Kurdyka-Łojasiewicz property) Let be a proper and lower semicontinuous function. We say that satisfies the Kurdyka-Łojasiewicz (KL) property at , if there exist , a neighborhood of and a function such that for all in the intersection
[TABLE]
the following inequality holds
[TABLE]
If satisfies the KL property at each point in , then is called KL function.
The origins of this notion go back to the pioneering work of Łojasiewicz [30], where it is proved that for a real-analytic function and a critical point (that is ), there exists such that the function is bounded around . This corresponds to the situation when for . The result of Łojasiewicz allows the interpretation of the KL property as a re-parametrization of the function values in order to avoid flatness around the critical points. Kurdyka [28] extended this property to differentiable functions definable in o-minimal structures. Further extensions to the nonsmooth setting can be found in [12, 6, 13, 14].
One of the remarkable properties of the KL functions is their ubiquity in applications (see [16]). We refer the reader to [12, 6, 14, 16, 13, 7, 5] and the references therein for more properties of the KL functions and illustrating examples.
In the analysis below the following uniform KL property given in [16, Lemma 6] will be used.
Lemma 10
Let be a compact set and let be a proper and lower semicontinuous function. Assume that is constant on and that it satisfies the KL property at each point of . Then there exist and such that for all and all in the intersection
[TABLE]
the inequality
[TABLE]
holds.
Due to some reasons outlined in Remark 14 below, we prove the convergence of the trajectory generated by (4) as under the assumption that is convex and differentiable with -Lipschitz continuous gradient for In these circumstances the dynamical system (4) reads
[TABLE]
where and .
Remark 11
We notice that we do no require second order assumptions for . However, we want to notice that if is a twice continuously differentiable function, then the dynamical system (15) can be equivalently written as
[TABLE]
where and . This is a differential equation with a Hessian-driven damping term. We refer the reader to [3] and [9] for more insights into dynamical systems with Hessian-driven damping terms and for motivations for considering them. Moreover, as in [9], the driving forces have been split as , where stands for classical smooth driving forces and incorporates the contact forces.
In this context, an improved version of Lemma 5(i) can be stated.
Lemma 12
Let and be such that . Let be the unique strong global solution of the dynamical system (15). Then:
[TABLE]
**Proof. **
Take an arbitrary . For we have
[TABLE]
where the inequality follows from the Baillon-Haddad Theorem [11, Corollary 18.16]. The conclusion follows by dividing (18) by and by taking the limit as converges to zero from above.
We are now in the position to prove the convergence of the trajectories generated by (15).
Theorem 13
Let and be such that . Let be the unique strong global solution of the dynamical system (15). Suppose that is a KL function which is bounded from below and is bounded. Then the following statements are true:
- (i)
*, and *
\lim_{t\rightarrow+\infty}\dot{x}(t)=\lim_{t\rightarrow+\infty}\dot{v}(t)=\lim_{t\rightarrow+\infty}\big{(}\nabla\phi(x(t))+\nabla\psi(x(t))\big{)}=0;
- (ii)
there exists (that is ) such that .
**Proof. **
According to Lemma 8, we can choose an element (that is ) such that . According to Lemma 6(iii), the proof of Lemma 7 and the proof of Lemma 8(iv), we have
[TABLE]
We consider the following two cases.
I. There exists such that
[TABLE]
From (7) we obtain for every that
[TABLE]
Thus for every . According to Lemma 6(i) and (17), it follows that for almost every , hence and are constant on and the conclusion follows.
II. For every it holds . Take .
By using Lemma 8(ii), (iv) and the fact that is a KL function, by Lemma 10, there exist positive numbers and and a concave function such that for all belonging to the intersection
[TABLE]
one has
[TABLE]
Let be such that for all . Since \lim_{t\to+\infty}\operatorname*{dist}\big{(}x(t),\Omega\big{)}=0 (see Lemma 8(iii)), there exists such that for all the inequality \operatorname*{dist}\big{(}x(t),\Omega\big{)}<\epsilon holds. Hence for all , belongs to the intersection in (19). Thus, according to (20), for every we have
[TABLE]
From the second equation in (15) we obtain for almost every
[TABLE]
By using Lemma 6(i), that and
[TABLE]
we further deduce that for almost every it holds
[TABLE]
We invoke now Lemma 17 and obtain
[TABLE]
Let (not depending on ) be such that
[TABLE]
One can for instance chose such that . From (24) we derive the inequality
[TABLE]
which holds for almost every . Since is bounded from below, by integration it follows . From here we obtain that exists and the conclusion follows from the results obtained in the previous section.
Remark 14
Taking a closer look at the above proof, one can notice that the inequality (23) can be obtained also when is a (possibly nonsmooth) proper, convex and lower semicontinuous function. Though, in order to conclude that the inequality obtained in Lemma 5(i) is not enough. The improved version stated in Lemma 12 is crucial in the convergence analysis.
If one attempts to obtain in the nonsmooth setting the inequality stated in Lemma 12, from the proof of Lemma 12 it becomes clear that one would need the inequality
[TABLE]
for all and all such that and . This is nothing else than (see for example [11])
[TABLE]
for all and all such that and . Here denotes the Fenchel conjugate of , defined for all by . The latter inequality is equivalent to is -strongly monotone, which is further equivalent (see [35, Theorem 3.5.10] or [11]) to is is strongly convex. This is the same with asking that is differentiable on the whole with Lipschitz-continuous gradient (see [11, Theorem 18.15]). In conclusion, the smooth setting provides the necessary prerequisites for obtaining the result in Lemma 12 and, finally, Theorem 13.
4 Convergence rates
In this subsection we investigate the convergence rates of the trajectories generated by the dynamical system (15) as . When solving optimization problems involving KL functions, convergence rates have been proved to depend on the so-called Łojasiewicz exponent (see [30, 12, 5, 24]). The main result of this subsection refers to the KL functions which satisfy Definition 3 for , where and . We recall the following definition considered in [5].
Definition 4
Let be a proper and lower semicontinuous function. The function is said to have the Łojasiewicz property, if for every there exist and such that
[TABLE]
According to [6, Lemma 2.1 and Remark 3.2(b)], the KL property is automatically satisfied at any noncritical point, fact which motivates the restriction to critical points in the above definition. The real number in the above definition is called Łojasiewicz exponent of the function at the critical point .
The convergence rates obtained in the following theorem are in the spirit of [12] and [5].
Theorem 15
Let and be such that . Let be the unique strong global solution of the dynamical system (15). Suppose that is bounded and is a function which is bounded from below and satisfies Definition 3 for , where and . Then there exists (that is ) such that and . Let be the Łojasiewicz exponent of at , according to the Definition 4. Then there exist and such that for every the following statements are true:
- (i)
if , then and converge in finite time;
- (ii)
if , then ;
- (iii)
if , then .
**Proof. **
According to the proof of Theorem 13, and there exists , in other words , such that and . Let be the Łojasiewicz exponent of at , according to the Definition 4.
We define by (see also [12])
[TABLE]
It is immediate that
[TABLE]
Indeed, this follows by noticing that for
[TABLE]
and by letting afterwards .
Similarly, we have
[TABLE]
[TABLE]
We assume that for every we have As seen in the proof of Theorem 13 otherwise the conclusion follows automatically. Furthermore, by invoking again the proof of Theorem 13 , there exist , and such that for almost every (see (26))
[TABLE]
and
[TABLE]
We derive by integration for
[TABLE]
[TABLE]
hence
[TABLE]
Since is the Łojasiewicz exponent of at , we have
[TABLE]
for every . From the second relation in (15) we derive for almost every
[TABLE]
which combined with (32) yields
[TABLE]
Since
[TABLE]
we conclude that there exists such that for almost every
[TABLE]
If , then
[TABLE]
for almost every . By multiplying with and integrating afterwards from to , it follows that there exist such that
[TABLE]
and the conclusion of (b) is immediate from (30).
Assume that . We obtain from (35)
[TABLE]
for almost every .
By integration we obtain
[TABLE]
where . Thus there exists such that
[TABLE]
which implies that and are constant on .
Finally, suppose that . We obtain from (35)
[TABLE]
for almost every . By integration we derive
[TABLE]
where . Statement (c) follows from (30).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] B. Abbas, An asymptotic viscosity selection result for the regularized Newton dynamic , ar Xiv:1504.07793 v 1, 2015
- 2[2] B. Abbas, H. Attouch, B.F. Svaiter, Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces , Journal of Optimization Theory and its Applications 161(2), 331–360, 2014
- 3[3] F. Alvarez, H. Attouch, J. Bolte, P. Redont, A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics , Journal de Mathématiques Pures et Appliquées (9) 81(8), 747–779, 2002
- 4[4] H. Attouch, G. Buttazzo, G. Michaille, Variational Analysis in Sobolev and BV Spaces: Applications to PD Es and Optimization, Second Edition , MOS-SIAM Series on Optimization, Philadelphia, 2014
- 5[5] H. Attouch, J. Bolte, On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , Mathematical Programming 116(1-2) Series B, 5–16, 2009
- 6[6] H. Attouch, J. Bolte, P. Redont, A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality , Mathematics of Operations Research 35(2), 438–457, 2010
- 7[7] H. Attouch, J. Bolte, B.F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods , Mathematical Programming 137(1-2) Series A, 91–129, 2013
- 8[8] H. Attouch, M.-O. Czarnecki, Asymptotic behavior of coupled dynamical systems with multiscale aspects , Journal of Differential Equations 248(6), 1315–1344, 2010
