The Kurdyka-\L{}ojasiewicz inequality as regularity condition
Daniel Gerth, Stefan Kindermann

TL;DR
This paper demonstrates that the Kurdyka-ojasiewicz inequality can serve as a regularity condition in Tikhonov regularization, linking it to existing smoothness and rate conditions in Banach spaces.
Contribution
It establishes the equivalence between the KL inequality and various known regularity conditions, providing a unified framework for convergence analysis.
Findings
KL inequality is equivalent to known regularity conditions
Theoretical link between KL inequality and convergence rates
Illustrative examples with source conditions and stability estimates
Abstract
We show that a Kurdyka-\L{}ojasiewicz (KL) inequality can be used as regularity condition for Tikhonov regularization with linear operators in Banach spaces. In fact, we prove the equivalence of a KL inequality and various known regularity conditions (variational inequality, rate conditions, and others) that are utilized for postulating smoothness conditions to obtain convergence rates. Case examples of rate estimates for Tikhonov regularization with source conditions or with conditional stability estimate illustrate the theoretical result.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical methods in inverse problems · Stability and Controllability of Differential Equations · Topology Optimization in Engineering
The Kurdyka-Łojasiewicz inequality as regularity condition
Daniel Gerth∗, Stefan Kindermann*†*
∗Faculty of Mathematics, Chemnitz University of Technology,
09107 Chemnitz, Germany
*†*Industrial Mathematics Institute,
Johannes Kepler University Linz, 4040 Linz, Austria,
Abstract
We show that a Kurdyka-Łojasiewicz (KL) inequality can be used as regularity condition for Tikhonov regularization with linear operators in Banach spaces. In fact, we prove the equivalence of a KL inequality and various known regularity conditions (variational inequality, rate conditions, and others) that are utilized for postulating smoothness conditions to obtain convergence rates. Case examples of rate estimates for Tikhonov regularization with source conditions or with conditional stability estimate illustrate the theoretical result.
1 Introduction
In the theory of the regularization of ill-posed inverse problems, it is well-known that the behavior of regularization methods essentially depends on the interplay of the forward operator with the true solution. Over time, several conditions have been developed that, usually formulated as assumptions, allow for a more or less precise description of the regularization process. In this paper, we will connect the set of smoothness conditions discussed in the recent paper [23] to a Kurdyka-Łojasiewicz (KL) inequality. The KL inequality, which we introduce in detail in Section 3, has been utilized in various branches of mathematics since its discovery in the 1960’s. Hence, it may open new perspectives to inverse problems.
Before going into detail, we introduce the setting of our paper. We consider operator equations
[TABLE]
where is a bounded linear operator mapping from an infinite-dimensional Banach space to an infinite-dimensional Hilbert space . We assume that the range of is not closed in , , such that is not continuously invertible and hence (1.1) is ill-posed. We assume that only noisy data is available with for . Due to the ill-posedness of (1.1) and the noisy data, we employ the Tikhonov-type regularization
[TABLE]
to determine a stable approximation to the true solution for which holds. In (1.2), is the regularization parameter and the penalty functional. The minimizer of (1.2) is the regularized solution, i.e.,
[TABLE]
By omitting the superscript , we denote noise-free data and variables, i.e.,
[TABLE]
In order to guarantee existence and stability of the approximations and , respectively, we impose the following standard assumptions (see, e.g., [23, 31]) on the penalty functional throughout the paper:
Assumption 1.1**.**
The functional is a proper, convex functional defined on a Banach space , which is lower semicontinuous with respect to weak (or weak) sequential convergence. Additionally, we assume that is a stabilizing (weakly coercive) functional, i.e., the sublevel sets of are, for all , weakly (or weakly*) sequentially compact. Moreover, we assume that at least one solution of (1.1) with finite penalty value exists and that the subgradient exists.*
With the basic regularization properties covered as consequence of Assumption 1.1, we move directly to the discussion of convergence rates. In Banach space regularization, the Bregman distance
[TABLE]
where the subgradient is an element of the subdifferential of in the point , has become a popular choice to measure the speed of convergence of the approximate solution to the true solution . In this paper, we follow the approach of [23] and consider the Bregman distance
[TABLE]
with subgradient taken at the approximate solutions. Note that the Bregman distance is not symmetric in its arguments. Our task is to find an index function , i.e., a monotonically increasing function with that is continuous (possibly only in a neighborhood of [math]), such that
[TABLE]
It is well-known that no uniform function exists for all , and that has to take into account the interplay between the operator , the solution , and the penalty functional , in combination with an appropriate choice of the regularization parameter in (1.2) and (1.4), respectively. Many conditions have been developed that control this interplay and yield convergence rates (1.5). It is the aim of this paper to show the equivalence of most of the known conditions, and more important, we add another equivalent condition in form of the KL-inequality.
2 Convergence rate theory for convex Tikhonov regularization
For the complete statement of our equivalence results, we also need Flemming’s distance function [14, 15]:
[TABLE]
Theorem 2.1**.**
The following statements are equivalent:
- (a)
(-rate) There is an index function such that
[TABLE]
- (b)
(-rate) There is an index function such that
[TABLE]
- (c)
(Variational inequality) There is an index function such that
[TABLE]
- (d)
(Distance function) There is an index function such that
[TABLE]
- (e)
(Dual -rate) There exists an index function such that for all a exist with and
[TABLE]
- (f)
*(KL-inequality) There exists a concave index function such that *
* is nonincreasing with , with*
[TABLE]
Proof.
In the proof we provide the formula for converting the various index functions: In [23, Prop. 2.4] the equivalence of (a) and (b) was shown:
[TABLE]
Also in [23, Prop 3.3] it was shown that
[TABLE]
It follows that is increasing and by continuity of , it can be shown that . We now show (b) (c): From (2.2), it follows, for all and all ,
[TABLE]
Thus, from the optimality of , we find
[TABLE]
Taking the infimum over yields the variational inequality (2.3) with the function
[TABLE]
If follows easily that is an index function.
Moreover, (d) (c) by results of Flemming [14, Lemma 3.4] [15, Thm. 12.32], with
[TABLE]
and
[TABLE]
Concerning (f), we remark that by duality we may rewrite the Tikhonov functional as
[TABLE]
Young’s inequality yields , and by setting it is clear that (f) is just a reformulation of (b): (Note that the infimum over is attained).
[TABLE]
Similar formulas were actually already used by Flemming [15].
The essential equivalence of the KL inequality (g) is one of the main issues in this paper and will be shown in later sections in Theorem 4.1. ∎
Hence, any of the conditions in Theorem 2.1 implies the other ones. These conditions imply a certain decay rate for the approximation error in the Bregman distance. This subsequently yields convergence rate for the total error measured in the Bregman distance. Not only this, but we immediately obtain errors in the strict metric and a Tikhonov rate (These results were obtained or follow easily from [23, Thm. 2.8, Prop. 3.7]):
Theorem 2.2**.**
Let any of the equivalent assumptions in Theorem 2.1 hold. Then, for all ,
(Bregman rate) there is a constant such that
[TABLE] 2. 2.
(strict metric rate) there is a constant such that such that for all
[TABLE] 3. 3.
(Tikhonov rate) there is a constant such that
[TABLE]
Moreover, defining the companion as
[TABLE]
the a-priori choice
[TABLE]
obtained by equilibrating the error decomposition (2.7) yields the following convergence rate:
Corollary 2.1**.**
Let any of the equivalent assumptions in Theorem 2.1 hold. Then with the choice (2.11) we obtain the convergence rates
[TABLE]
Note that the same rates holds for the analog error measures in (2.8) and (2.9).
3 The Łojasiewicz-inequality
In this section we give a brief overview over the Kurdyka-Łojasiewicz (KL) inequality and some of its implications. A main reason for our interest in this inequality is its broad spectrum of applications in several mathematical disciplines. This may open new interconnections for inverse problems. We start with a short and certainly incomplete overview of the KL inequality.
Łojasiewicz showed that for any real analytic function there is such that
[TABLE]
remains bounded around any critical point , i.e., [28, 29]. Kurdyka [27] later generalized the result to functions whose graphs belong to an o-minimal structure. A further generalization to nonsmooth subanalytic functions was given in [6]. It can also be formulated in (general) Hilbert spaces, see, e.g., [11, 21], and has applications, for example, in PDE analysis (see, for example, [22, 24, 32]), neural networks [16] and complexity theory [30]. First approaches towards inverse problems were made in [18, 19]. In the optimization literature, the KL inequality has emerged as a powerful tool to characterize the convergence properties of iterative algorithms; see, e.g., [1, 2, 5, 6, 7, 17, 19].
It is known that the KL inequality immediately yields a measure for the distance between the level-sets of a function, which, under some additional assumptions, directly yields convergence rates for the noise free Tikhonov functional (1.4). To show the generality of the KL inequality, we temporarily consider the problem
[TABLE]
where is a complete metric space with metric and is lower semicontinuous. To formulate the result in this abstract setting, we use the following notation.
Definition 3.1**.**
We denote by
[TABLE]
the level-set of for the levels . With slight abuse of notation we write, for fixed , . Furthermore, for any , the distance of to a set is denoted by
[TABLE]
With this we recall the Hausdorff distance between sets,
[TABLE]
The KL inequality is directly linked to certain index functions, which we specify below.
Definition 3.2**.**
A concave function is called desingularizuation function or smooth index function if , , and for all . We denote the set of all such with .
Now we are ready to cite the main inspiration for our work. It is taken from [4]. In comparison to the original result we have omitted a third equivalence to the concept of metric regularity, see [20]. Note that we replaced with .
Proposition 3.1**.**
[4, Corollary 4]** Let be a lower semicontinuous function defined on a complete metric space and . Assume that . Then the following assumptions are equivalent.
- (a)
For all
[TABLE]
- (b)
For all
[TABLE]
where is the strong slope.
Now we return to being a Banach space and consider the Tikhonov functional . Due to the convexity of the penalty , we can write Proposition 3.1 in the following way, where
[TABLE]
is the remoteness of the subdifferential of in ; see also [3].
Corollary 3.1**.**
Let either be injective or be strictly convex. Then, for the Tikhonov functional from (1.4), the following are equivalent for a smooth index function , , and .
- (a)
[TABLE]
- (b)
[TABLE]
Proof.
Due to Assumption 1.1 minimizers of exist, and due to the injectivity of or strict convexity of the minimizers are unique. Hence it is plain to see from the definition of the Hausdorff-metric (3.3) that
[TABLE]
and we obtain (a). For (semi)-convex functions, the strong slope coincides with ([4, Remark 12]), from which the remainder follows. ∎∎
We close this section by mentioning two obstacles in the application of Corollary 3.1. Firstly, it should be noted that a functional is does not necessarily fulfill a KL inequality although both and do so. It is therefore not clear how to properly treat such a sum functional. While a partial answer is given in [19, Theorem 3.11], we can not apply the results since they require an invertible operator . We will sketch in Section 6 that the Tikhonov functional (1.4) behaves differently than it would be expected from the sum of its parts. The second issue in applying Corollary 3.1 lies in the fact that it only holds in the noise-free case. To the best of the authors knowledge, there are no results on how the KL inequality behaves under noisy data. It is, however, out of the scope of this paper to close this gap.
4 The KL-regularity condition
Due to the equivalences of Theorem 2.1, it is sufficient to connect one of the conditions (a)-(e) with the KL inequality, and (b) appears to be most simple.
Theorem 4.1**.**
The following are equivalent:
- (a)
There is a such that is nonincreasing with* and a constant such that*
[TABLE] 2. (b)
There is an index function such that
[TABLE]
The functions and are connected via .
Proof.
First, we observe that in our context, where is the minimizer of the Tikhonov functional and is the point of interest, the KL inequality (4.1) can be written as
[TABLE]
where
[TABLE]
By concavity, is monotonically decreasing and thus (4.3) leads to
[TABLE]
Dividing both sides by yields (b) with
[TABLE]
This function is an index function by assumptions.
On the other hand, we write (b) as
[TABLE]
and by defining
[TABLE]
we have
[TABLE]
As is nonincreasing so is , hence
[TABLE]
Finally, identifying and noting that , we get the KL inequality (4.3) up to constants. As is nonincreasing, is concave. Note that such that the stated condition on follow as is an index function. ∎∎
It is interesting that in the proof we stumbled upon the companion function from (2.10). Namely, we have . The proof also reveals the identification
[TABLE]
Equation (2.11) for the a priori choice () of the regularization parameter then reads
[TABLE]
and we obtain the formal convergence rate
[TABLE]
Since is by definition concave, it holds that
[TABLE]
which follows from the property of the “subgradient” of concave functions, where the inequality is reversed compared to convex ones:
[TABLE]
5 Relation to conditional stability estimates
We illustrate how the KL-theory quite directly yields convergence rates in case that a conditional stability estimate holds. Note that such estimates are a very useful tool in, e.g., parameter identification problems in partial differential equations; for examples, see, e.g., [8, 10, 25, 33]. The use of conditional stability estimates (5.2) for rate estimates was in particular investigated by Cheng and Yamamoto in the seminal article [9].
Consider the Tikhonov functionals
[TABLE]
where and . We furthermore assume that the Hilbert space is continuously embedded into a Banach space , and there we assume a conditional stability estimate to hold (which, for simplicity, we take as a Hölder function): for some we assume that
[TABLE]
Cheng and Yamamoto have considered precisely this setup and verified convergence rates.
Here we illustrate the approach via the KL-inequality. To this end, we extend the Tikhonov functionals as follows to :
[TABLE]
At first we verify the KL-inequality (3.7) for on . Note that it is enough to consider the inequality for , thus for . In this case it reads
[TABLE]
In the following we write for the adjoint of in the space .
By [3, Prop 3.1] the strong slope or the remoteness can be characterized by the directional derivative ,
[TABLE]
where is the usual gradient in the space :
[TABLE]
The optimality condition for reads
[TABLE]
After some algebraic manipulation exploiting this identity, we obtain
[TABLE]
Using the optimality condition and the conditional stability estimate (5.2), we have using (5.3)
[TABLE]
Thus,
[TABLE]
and consequently
[TABLE]
We have thus found a KL-inequality (3.7) with
[TABLE]
We now apply Proposition to (which agrees with for the relevant arguments) and obtain
[TABLE]
noting that is Hölder continuous. We have
[TABLE]
Since
[TABLE]
and
[TABLE]
we obtain that
[TABLE]
Thus, choosing yields
[TABLE]
and hence the convergence rate
[TABLE]
This is the same parameter choice and the same rate as obtained by Cheng and Yamamoto.
6 Example: Tikhonov regularization
Due to the (partial) equivalence of the KL-inequality with the conditions of [23], their examples apply in our case as long as is a power function. Therefore, we will not go through all of those examples again, but focus on the most prominent one, which is classical Tikhonov regularization
[TABLE]
where is a linear operator between Hilbert spaces and and denotes the norm in the respective spaces.
As is well known, the convergence behavior of Tikhonov-regularization (6.1) depends on the specific solution , and we employ here source conditions of the type
[TABLE]
While the treatment of more general source conditions is possible within our framework (see [23]), it shall be sufficient here to treat only the classical setting (6.2).
We recall from [18] that the residual fulfills a KL inequality with
[TABLE]
if
[TABLE]
i.e., both and lie in the source set (6.2). This will become important again later. For now we simply apply the theory from [23] in the case and demonstrate that the KL inequality and Corollary 2.1 yield convergence in the Bregman distance. Before starting, we summarize some results from [23, Section 4.1]. Namely, we have for (6.1) and under (6.2) that
[TABLE]
and
[TABLE]
Then we have from (6.5) and (6.6) that
[TABLE]
Because , the KL inequality requires
[TABLE]
and it is easy to see that we even have equality for
[TABLE]
with derivative
[TABLE]
This function satisfies the condition in Theorem 4.1. From this, we obtain
[TABLE]
This yields, according to (4.4)
[TABLE]
and the convergence rate is given by
[TABLE]
Identifying , we obtain the well-known rate
[TABLE]
Note that Corollary 3.1 does not apply directly since it would yield a convergence rate , which is clearly off the correct rate by a square in the exponent. We will now sketch a likely explanation for this.
Comparing the functionals (6.1) and (5.1), it appears that similar techniques should lead to a KL inequality. This is indeed the case, and we obtain for the classical Tikhonov functional (6.1)
[TABLE]
We follow the next steps to arrive at the equivalent of (5), which reads
[TABLE]
The conditional stability estimate (5.2) no longer holds, but the source condition (6.4) yields an alternative. Namely, using the interpolation inequality
[TABLE]
for all , we see that
[TABLE]
Inserting this into (6.8), and following the argument after (5), we obtain
[TABLE]
which yields a KL inequality with or
[TABLE]
Comparing this with the previous results, we see that we have the same function as for the residual functional (6.3), but this is only the square root of the function from (6.7) that we derived earlier in this section. Note that fulfill a KL inequality with . The discrepancy is due to the local character of the KL inequality for ill-posed problems. From the optimality condition of the classical Tikhonov functional (6.1) it follows that (and , respectively) are always in the range of . Therefore, while may fulfill the source condition (6.2) for arbitrary , the source condition (6.4) with only holds for , and we can only apply Corollary 3.1 in this case. Indeed, using the well-known a priori choice , we have , which yields via Corollary 3.1 with from (6.3) with the convergence rate . Therefore, the different index functions (6.7) and (6.3) are no contradiction.
Acknowledgement
Part of this research was started during a visit of the second author at the Chemnitz University of Technology. S.K. would like to thank the Faculty of Mathematics in Chemnitz and especially Bernd Hofmann for their great hospitality. D.G. would like to thank Prof. Masahiro Yamamoto for his hospitality during his stay in Tokio, where the author first learned of the KL inequality.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] P.-A. Absil, R. Mahony and B. Andrews, Convergence of the iterates of descent methods for analytic cost functions , SIAM J. Optim., 16 (2005), 531–547.
- 2[2] H. Attouch and J. Bolte, On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , Math. Programming, 116 (2009), 5–16.
- 3[3] D. Azé and J. N. Corvellec, Characterizations of error bounds for lower semicontinuous functions on metric spaces , ESAIM Control Optim. Calc. Var., 10 (2004), pp. 409–425.
- 4[4] J. Bolte, A. Daniilidis, O. Ley and L. Mazet, Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity , T. Am. Math. Soc., 382 (2010), pp. 3319–3363.
- 5[5] J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems , Math. Prog., 146 (2014), pp. 459–494.
- 6[6] J. Bolte, A. Daniilidis and A. Lewis, The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems , SIAM J. Opt., 17 (2007), pp. 1205–1223.
- 7[7] R. I. Boţ, E. R. Csetnek, Proximal-gradient algorithms for fractional programming , Optimization, 66 (2017), pp. 1383–1396.
- 8[8] A. L. Bukhgeim, J. Cheng, and M. Yamamoto, Stability for an inverse boundary problem of determining a part of a boundary , Inverse Problems, 15 (1999), pp. 1021–1032.
