The Kurdyka-\L{}ojasiewicz inequality as regularity condition

Daniel Gerth; Stefan Kindermann

arXiv:1905.10177·math.FA·May 27, 2019

The Kurdyka-\L{}ojasiewicz inequality as regularity condition

Daniel Gerth, Stefan Kindermann

PDF

Open Access

TL;DR

This paper demonstrates that the Kurdyka-ojasiewicz inequality can serve as a regularity condition in Tikhonov regularization, linking it to existing smoothness and rate conditions in Banach spaces.

Contribution

It establishes the equivalence between the KL inequality and various known regularity conditions, providing a unified framework for convergence analysis.

Findings

01

KL inequality is equivalent to known regularity conditions

02

Theoretical link between KL inequality and convergence rates

03

Illustrative examples with source conditions and stability estimates

Abstract

We show that a Kurdyka-\L{}ojasiewicz (KL) inequality can be used as regularity condition for Tikhonov regularization with linear operators in Banach spaces. In fact, we prove the equivalence of a KL inequality and various known regularity conditions (variational inequality, rate conditions, and others) that are utilized for postulating smoothness conditions to obtain convergence rates. Case examples of rate estimates for Tikhonov regularization with source conditions or with conditional stability estimate illustrate the theoretical result.

Equations199

A x = y

A x = y

T_{α}^{δ} (x) = \frac{1}{2} ∣∣ A x - y^{δ} ∣ ∣^{2} + α J (x)

T_{α}^{δ} (x) = \frac{1}{2} ∣∣ A x - y^{δ} ∣ ∣^{2} + α J (x)

x_{α}^{δ} = x \in D (A) arg min T_{α}^{δ} (x) .

x_{α}^{δ} = x \in D (A) arg min T_{α}^{δ} (x) .

x_{α} = x \in D (A) arg min T_{α} with T_{α} (x) = \frac{1}{2} ∣∣ A x - y ∣ ∣^{2} + α J (x) .

x_{α} = x \in D (A) arg min T_{α} with T_{α} (x) = \frac{1}{2} ∣∣ A x - y ∣ ∣^{2} + α J (x) .

B_{ξ} (z, x) := J (x) - J (z) - ⟨ ξ, x - z ⟩ \geq 0, x \in X, ξ \in \partial J (z) \subset X^{*},

B_{ξ} (z, x) := J (x) - J (z) - ⟨ ξ, x - z ⟩ \geq 0, x \in X, ξ \in \partial J (z) \subset X^{*},

B_{ξ_{α}^{δ}} (x_{α}^{δ}, x^{†})

B_{ξ_{α}^{δ}} (x_{α}^{δ}, x^{†})

B_{ξ_{α}^{δ}} (x_{α}^{δ}, x^{†}) \leq φ (δ) .

B_{ξ_{α}^{δ}} (x_{α}^{δ}, x^{†}) \leq φ (δ) .

D (r) := x \in X sup (J (x^{†}) - J (x) - r ∥ A x - A x^{†} ∥) .

D (r) := x \in X sup (J (x^{†}) - J (x) - r ∥ A x - A x^{†} ∥) .

J (x^{†}) - J (x_{α}) \leq Ψ_{1} (α) for all α > 0.

J (x^{†}) - J (x_{α}) \leq Ψ_{1} (α) for all α > 0.

\frac{1}{α} (T_{α} (x^{†}) - T_{α} (x_{α})) \leq Ψ_{2} (α) for all α > 0.

\frac{1}{α} (T_{α} (x^{†}) - T_{α} (x_{α})) \leq Ψ_{2} (α) for all α > 0.

J (x^{†}) - J (x) \leq Φ_{3} (∥ A x^{†} - A x ∥) for all x \in X .

J (x^{†}) - J (x) \leq Φ_{3} (∥ A x^{†} - A x ∥) for all x \in X .

D (\frac{1}{r}) \leq Ψ_{4} (r) \forall r > 0.

D (\frac{1}{r}) \leq Ψ_{4} (r) \forall r > 0.

J^{*} (A^{*} z) - J (x^{*}) - (x^{†}, A^{*} z - x^{*})_{X, X^{*}} + α \frac{1}{2} ∥ z ∥^{2} \leq Ψ_{5} (α),

J^{*} (A^{*} z) - J (x^{*}) - (x^{†}, A^{*} z - x^{*})_{X, X^{*}} + α \frac{1}{2} ∥ z ∥^{2} \leq Ψ_{5} (α),

\partial (φ \circ (T_{α} (x^{†}) - T_{α} (x_{α}))) \geq \frac{1}{k} .

\partial (φ \circ (T_{α} (x^{†}) - T_{α} (x_{α}))) \geq \frac{1}{k} .

(a) \Rightarrow (b) : Ψ_{2} \leq Ψ_{1} (b) \Rightarrow (a) : Ψ_{1} \leq 2 Ψ_{2} .

(a) \Rightarrow (b) : Ψ_{2} \leq Ψ_{1} (b) \Rightarrow (a) : Ψ_{1} \leq 2 Ψ_{2} .

(c) \Rightarrow (b) : Ψ_{2} (α) \leq t > 0 sup (Φ_{3} (t) - \frac{t ^{2}}{2 α}) .

(c) \Rightarrow (b) : Ψ_{2} (α) \leq t > 0 sup (Φ_{3} (t) - \frac{t ^{2}}{2 α}) .

J (x^{†}) - J (x_{α}) - \frac{1}{2 α} ∥ A x_{α} - A x^{†} ∥^{2} \leq \frac{1}{α} T_{α} (x^{†}, A x^{†}) - \frac{1}{α} T_{α} (x_{α}, A x^{†}) \leq Ψ_{2} (α) .

J (x^{†}) - J (x_{α}) - \frac{1}{2 α} ∥ A x_{α} - A x^{†} ∥^{2} \leq \frac{1}{α} T_{α} (x^{†}, A x^{†}) - \frac{1}{α} T_{α} (x_{α}, A x^{†}) \leq Ψ_{2} (α) .

J (x^{†}) - J (x) \leq Ψ_{2} (α) + \frac{1}{2 α} ∥ A x - A x^{†} ∥^{2} .

J (x^{†}) - J (x) \leq Ψ_{2} (α) + \frac{1}{2 α} ∥ A x - A x^{†} ∥^{2} .

(b) \Rightarrow (c) : Φ_{3} (α) = t > 0 in f (Ψ_{2} (t) + \frac{α ^{2}}{2 t}) .

(b) \Rightarrow (c) : Φ_{3} (α) = t > 0 in f (Ψ_{2} (t) + \frac{α ^{2}}{2 t}) .

(d) \Rightarrow (c) : Φ_{3} (α) = r > 0 in f (Φ_{4} (r) + r α)

(d) \Rightarrow (c) : Φ_{3} (α) = r > 0 in f (Φ_{4} (r) + r α)

(c) \Rightarrow (d) : Φ_{4} (α) = r > 0 in f (Φ_{3} (t) - α t) .

(c) \Rightarrow (d) : Φ_{4} (α) = r > 0 in f (Φ_{3} (t) - α t) .

\frac{1}{α} (T_{α} (x^{†}) - T_{α} (x_{α})) = J (x^{†}) - \frac{1}{α} T_{α} (x_{α})

\frac{1}{α} (T_{α} (x^{†}) - T_{α} (x_{α})) = J (x^{†}) - \frac{1}{α} T_{α} (x_{α})

= J (x^{†}) - \frac{1}{α} p sup [- \frac{1}{2} ∥ p ∥^{2} + (p, A x^{†}) - α J^{*} (\frac{1}{α} A^{*} p)]

= p in f [J (x^{†}) + J^{*} (\frac{1}{α} A^{*} p) - \frac{1}{α} (p, A x^{†}) + \frac{1}{2 α} ∥ p ∥^{2}] .

(b) \Leftrightarrow (f) : Ψ_{5} = Ψ_{2} .

(b) \Leftrightarrow (f) : Ψ_{5} = Ψ_{2} .

B_{ξ_{α}^{δ}} (x_{α}^{δ}, x^{†}) \leq C α in f (\frac{δ ^{2}}{α} + Ψ_{2} (α));

B_{ξ_{α}^{δ}} (x_{α}^{δ}, x^{†}) \leq C α in f (\frac{δ ^{2}}{α} + Ψ_{2} (α));

J (x^{†}) - J (x_{α}^{δ}) ∥ A x_{α}^{δ} - A x^{†} ∥^{2} \leq C (Ψ_{2} (α) + \frac{δ ^{2}}{α}), \leq C (α Ψ_{2} (α) + δ^{2});

J (x^{†}) - J (x_{α}^{δ}) ∥ A x_{α}^{δ} - A x^{†} ∥^{2} \leq C (Ψ_{2} (α) + \frac{δ ^{2}}{α}), \leq C (α Ψ_{2} (α) + δ^{2});

∣ J (x^{†}) - \frac{1}{α} T_{α}^{δ} (x_{α}^{δ}) ∣ \leq C (Ψ_{2} (α) + \frac{δ ^{2}}{α}) .

∣ J (x^{†}) - \frac{1}{α} T_{α}^{δ} (x_{α}^{δ}) ∣ \leq C (Ψ_{2} (α) + \frac{δ ^{2}}{α}) .

Θ (α) := α Ψ_{2} (α),

Θ (α) := α Ψ_{2} (α),

α_{*} = α_{*} (δ) := (Θ^{2})^{- 1} (\frac{δ ^{2}}{2}) = Θ^{- 1} (\frac{δ}{2})

α_{*} = α_{*} (δ) := (Θ^{2})^{- 1} (\frac{δ ^{2}}{2}) = Θ^{- 1} (\frac{δ}{2})

B_{ξ_{α}^{δ}} (x_{α}^{δ}, x^{†}) \leq 2 Ψ_{2} (Θ^{- 1} (\frac{δ}{2})) .

B_{ξ_{α}^{δ}} (x_{α}^{δ}, x^{†}) \leq 2 Ψ_{2} (Θ^{- 1} (\frac{δ}{2})) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical methods in inverse problems · Stability and Controllability of Differential Equations · Topology Optimization in Engineering

Full text

The Kurdyka-Łojasiewicz inequality as regularity condition

Daniel Gerth∗, Stefan Kindermann*†*

∗Faculty of Mathematics, Chemnitz University of Technology,

09107 Chemnitz, Germany

[email protected]

*†*Industrial Mathematics Institute,

Johannes Kepler University Linz, 4040 Linz, Austria,

[email protected]

Abstract

We show that a Kurdyka-Łojasiewicz (KL) inequality can be used as regularity condition for Tikhonov regularization with linear operators in Banach spaces. In fact, we prove the equivalence of a KL inequality and various known regularity conditions (variational inequality, rate conditions, and others) that are utilized for postulating smoothness conditions to obtain convergence rates. Case examples of rate estimates for Tikhonov regularization with source conditions or with conditional stability estimate illustrate the theoretical result.

1 Introduction

In the theory of the regularization of ill-posed inverse problems, it is well-known that the behavior of regularization methods essentially depends on the interplay of the forward operator with the true solution. Over time, several conditions have been developed that, usually formulated as assumptions, allow for a more or less precise description of the regularization process. In this paper, we will connect the set of smoothness conditions discussed in the recent paper [23] to a Kurdyka-Łojasiewicz (KL) inequality. The KL inequality, which we introduce in detail in Section 3, has been utilized in various branches of mathematics since its discovery in the 1960’s. Hence, it may open new perspectives to inverse problems.

Before going into detail, we introduce the setting of our paper. We consider operator equations

[TABLE]

where $A$ is a bounded linear operator mapping from an infinite-dimensional Banach space $X$ to an infinite-dimensional Hilbert space $H$ . We assume that the range ${\mathcal{R}}(A)$ of $A$ is not closed in $H$ , ${\mathcal{R}}(A)\neq\overline{{\mathcal{R}}(A)}$ , such that $A$ is not continuously invertible and hence (1.1) is ill-posed. We assume that only noisy data $y^{\delta}$ is available with $||y-y^{\delta}||\leq\delta$ for $\delta>0$ . Due to the ill-posedness of (1.1) and the noisy data, we employ the Tikhonov-type regularization

[TABLE]

to determine a stable approximation to the true solution $x^{\dagger}$ for which $Ax^{\dagger}=y$ holds. In (1.2), $\alpha>0$ is the regularization parameter and $J:{\mathcal{D}}(J)\subset X\rightarrow\mathbb{R}$ the penalty functional. The minimizer of (1.2) is the regularized solution, i.e.,

[TABLE]

By omitting the superscript $\delta$ , we denote noise-free data and variables, i.e.,

[TABLE]

In order to guarantee existence and stability of the approximations $x_{\alpha}^{\delta}$ and $x_{\alpha}$ , respectively, we impose the following standard assumptions (see, e.g., [23, 31]) on the penalty functional $J$ throughout the paper:

Assumption 1.1.

The functional $J:X\rightarrow[0,\infty]$ is a proper, convex functional defined on a Banach space $X$ , which is lower semicontinuous with respect to weak (or weak) sequential convergence. Additionally, we assume that $J$ is a stabilizing (weakly coercive) functional, i.e., the sublevel sets $[J\leq c]:=\{x\in X:J(x)\leq c\}$ of $J$ are, for all $c\geq 0$ , weakly (or weakly*) sequentially compact. Moreover, we assume that at least one solution $x^{\dagger}$ of (1.1) with finite penalty value $J(x^{\dagger})<\infty$ exists and that the subgradient $\partial J(x^{\dagger})$ exists.*

With the basic regularization properties covered as consequence of Assumption 1.1, we move directly to the discussion of convergence rates. In Banach space regularization, the Bregman distance

[TABLE]

where the subgradient $\xi$ is an element of the subdifferential $\partial J(z)$ of $J$ in the point $z\in X$ , has become a popular choice to measure the speed of convergence of the approximate solution to the true solution $x^{\dagger}$ . In this paper, we follow the approach of [23] and consider the Bregman distance

[TABLE]

with subgradient taken at the approximate solutions. Note that the Bregman distance is not symmetric in its arguments. Our task is to find an index function $\varphi$ , i.e., a monotonically increasing function $\varphi:[0,\infty)\rightarrow[0,\infty)$ with $\varphi(0)=0$ that is continuous (possibly only in a neighborhood of [math]), such that

[TABLE]

It is well-known that no uniform function $\varphi$ exists for all $x^{\dagger}\in X$ , and that $\varphi$ has to take into account the interplay between the operator $A$ , the solution $x^{\dagger}$ , and the penalty functional $J$ , in combination with an appropriate choice of the regularization parameter $\alpha>0$ in (1.2) and (1.4), respectively. Many conditions have been developed that control this interplay and yield convergence rates (1.5). It is the aim of this paper to show the equivalence of most of the known conditions, and more important, we add another equivalent condition in form of the KL-inequality.

2 Convergence rate theory for convex Tikhonov regularization

For the complete statement of our equivalence results, we also need Flemming’s distance function [14, 15]:

[TABLE]

Theorem 2.1.

The following statements are equivalent:

(a)

( $J$ -rate) There is an index function $\Psi_{1}$ such that

[TABLE]

(b)

( $T$ -rate) There is an index function $\Psi_{2}$ such that

[TABLE]

(c)

(Variational inequality) There is an index function $\Phi_{3}$ such that

[TABLE]

(d)

(Distance function) There is an index function $\Psi_{4}$ such that

[TABLE]

(e)

(Dual $T$ -rate) There exists an index function $\Psi_{5}$ such that for all $\alpha>0$ a $z\in Y$ exist with $x^{*}=\partial J(x^{\dagger})$ and

[TABLE]

(f)

*(KL-inequality) There exists a concave index function $\varphi$ such that *

$(\partial\varphi)^{-1}(z)z$ * is nonincreasing with $\lim_{z\to\infty}(\partial\varphi)^{-1}(z)z=0$ , with*

[TABLE]

Proof.

In the proof we provide the formula for converting the various index functions: In [23, Prop. 2.4] the equivalence of (a) and (b) was shown:

[TABLE]

Also in [23, Prop 3.3] it was shown that

[TABLE]

It follows that $\Psi_{2}$ is increasing and by continuity of $\Phi_{3}$ , it can be shown that $\Phi_{2}(0)=0$ . We now show (b) $\Rightarrow$ (c): From (2.2), it follows, for all $x$ and all $\alpha$ ,

[TABLE]

Thus, from the optimality of $x_{\alpha}$ , we find

[TABLE]

Taking the infimum over $\alpha$ yields the variational inequality (2.3) with the function $\Phi_{3}$

[TABLE]

If follows easily that $\Psi$ is an index function.

Moreover, (d) $\Leftrightarrow$ (c) by results of Flemming [14, Lemma 3.4] [15, Thm. 12.32], with

[TABLE]

and

[TABLE]

Concerning (f), we remark that by duality we may rewrite the Tikhonov functional as

[TABLE]

Young’s inequality yields $J(x^{\dagger})=(x^{\dagger},x^{*})-J^{*}(x^{*})$ , and by setting $z=\frac{1}{\alpha}p$ it is clear that (f) is just a reformulation of (b): (Note that the infimum over $p$ is attained).

[TABLE]

Similar formulas were actually already used by Flemming [15].

The essential equivalence of the KL inequality (g) is one of the main issues in this paper and will be shown in later sections in Theorem 4.1. ∎

Hence, any of the conditions in Theorem 2.1 implies the other ones. These conditions imply a certain decay rate for the approximation error in the Bregman distance. This subsequently yields convergence rate for the total error measured in the Bregman distance. Not only this, but we immediately obtain errors in the strict metric and a Tikhonov rate (These results were obtained or follow easily from [23, Thm. 2.8, Prop. 3.7]):

Theorem 2.2.

Let any of the equivalent assumptions in Theorem 2.1 hold. Then, for all $\alpha>0$ ,

(Bregman rate) there is a constant $C$ such that

[TABLE] 2. 2.

(strict metric rate) there is a constant $C$ such that such that for all $\alpha>0$

[TABLE] 3. 3.

(Tikhonov rate) there is a constant $C$ such that

[TABLE]

Moreover, defining the companion $\Theta(\alpha)$ as

[TABLE]

the a-priori choice

[TABLE]

obtained by equilibrating the error decomposition (2.7) yields the following convergence rate:

Corollary 2.1.

Let any of the equivalent assumptions in Theorem 2.1 hold. Then with the choice (2.11) we obtain the convergence rates

[TABLE]

Note that the same rates holds for the analog error measures in (2.8) and (2.9).

3 The Łojasiewicz-inequality

In this section we give a brief overview over the Kurdyka-Łojasiewicz (KL) inequality and some of its implications. A main reason for our interest in this inequality is its broad spectrum of applications in several mathematical disciplines. This may open new interconnections for inverse problems. We start with a short and certainly incomplete overview of the KL inequality.

Łojasiewicz showed that for any real analytic function $f:D(f)\subset\mathbb{R}^{n}\rightarrow\mathbb{R}$ there is $\theta\in[0,1)$ such that

[TABLE]

remains bounded around any critical point $\bar{x}$ , i.e., $\nabla f(\bar{x})=0$ [28, 29]. Kurdyka [27] later generalized the result to $C^{1}$ functions whose graphs belong to an o-minimal structure. A further generalization to nonsmooth subanalytic functions was given in [6]. It can also be formulated in (general) Hilbert spaces, see, e.g., [11, 21], and has applications, for example, in PDE analysis (see, for example, [22, 24, 32]), neural networks [16] and complexity theory [30]. First approaches towards inverse problems were made in [18, 19]. In the optimization literature, the KL inequality has emerged as a powerful tool to characterize the convergence properties of iterative algorithms; see, e.g., [1, 2, 5, 6, 7, 17, 19].

It is known that the KL inequality immediately yields a measure for the distance between the level-sets of a function, which, under some additional assumptions, directly yields convergence rates for the noise free Tikhonov functional (1.4). To show the generality of the KL inequality, we temporarily consider the problem

[TABLE]

where $X$ is a complete metric space with metric $d(x,y)$ and $f:X\rightarrow\mathbb{R}\cup\{\infty\}$ is lower semicontinuous. To formulate the result in this abstract setting, we use the following notation.

Definition 3.1.

We denote by

[TABLE]

the level-set of $f$ for the levels $t_{1}\leq t_{2}$ . With slight abuse of notation we write, for fixed $x\in X$ , $[f(x)]:=[f=f(x)]$ . Furthermore, for any $x\in X$ , the distance of $x$ to a set $S\subset X$ is denoted by

[TABLE]

With this we recall the Hausdorff distance between sets,

[TABLE]

The KL inequality is directly linked to certain index functions, which we specify below.

Definition 3.2.

A concave function $\varphi:[0,\bar{r})\rightarrow\mathbb{R}$ is called desingularizuation function or smooth index function if $\varphi\in C(0,\bar{r})\cap C^{1}(0,\bar{r})$ , $\varphi(0)=0$ , and $\varphi^{\prime}(x)>0$ for all $x\in(0,\bar{r})$ . We denote the set of all such $\varphi$ with $\mathcal{K}(0,\bar{r})$ .

Now we are ready to cite the main inspiration for our work. It is taken from [4]. In comparison to the original result we have omitted a third equivalence to the concept of metric regularity, see [20]. Note that we replaced $f$ with $f-\inf f$ .

Proposition 3.1.

[4, Corollary 4]** Let $f:X\rightarrow\mathbb{R}\cup\{\infty\}$ be a lower semicontinuous function defined on a complete metric space and $\varphi\in\mathcal{K}(0,r_{0})$ . Assume that $[\inf f<f<r_{0}-\inf f]\neq\emptyset$ . Then the following assumptions are equivalent.

(a)

For all $r_{1},r_{2}\in(\inf f,r_{0})$

[TABLE]

(b)

For all $x\in[0<f<r_{0}]$

[TABLE]

where $|\nabla f|(x):=\limsup_{\tilde{x}\rightarrow x}\frac{\max(f(x)-f(\tilde{x}),0)}{d(x,\tilde{x})}$ is the strong slope.

Now we return to $X$ being a Banach space and consider the Tikhonov functional $f=T_{\alpha}(x)$ . Due to the convexity of the penalty $J$ , we can write Proposition 3.1 in the following way, where

[TABLE]

is the remoteness of the subdifferential of $f$ in $x$ ; see also [3].

Corollary 3.1.

Let either $A$ be injective or $J$ be strictly convex. Then, for the Tikhonov functional $T_{\alpha}(x)$ from (1.4), the following are equivalent for a smooth index function $\varphi\in\mathcal{K}(0,\tilde{r})$ , $x\in[T_{\alpha}(x_{\alpha})\leq T_{\alpha}(x)\leq\tilde{r}]$ , and $0<k<\infty$ .

(a)

[TABLE]

(b)

[TABLE]

Proof.

Due to Assumption 1.1 minimizers of $T_{\alpha}(x)$ exist, and due to the injectivity of $A$ or strict convexity of $J$ the minimizers are unique. Hence it is plain to see from the definition of the Hausdorff-metric (3.3) that

[TABLE]

and we obtain (a). For (semi)-convex functions, the strong slope coincides with $\|\partial T_{\alpha}(x)\|_{-}$ ([4, Remark 12]), from which the remainder follows. ∎∎

We close this section by mentioning two obstacles in the application of Corollary 3.1. Firstly, it should be noted that a functional $f=g+h$ is does not necessarily fulfill a KL inequality although both $g$ and $h$ do so. It is therefore not clear how to properly treat such a sum functional. While a partial answer is given in [19, Theorem 3.11], we can not apply the results since they require an invertible operator $A$ . We will sketch in Section 6 that the Tikhonov functional (1.4) behaves differently than it would be expected from the sum of its parts. The second issue in applying Corollary 3.1 lies in the fact that it only holds in the noise-free case. To the best of the authors knowledge, there are no results on how the KL inequality behaves under noisy data. It is, however, out of the scope of this paper to close this gap.

4 The KL-regularity condition

Due to the equivalences of Theorem 2.1, it is sufficient to connect one of the conditions (a)-(e) with the KL inequality, and (b) appears to be most simple.

Theorem 4.1.

The following are equivalent:

(a)

There is a $\varphi\in\mathcal{K}(0,\infty)$ such that $(\partial\varphi)^{-1}(z)z$ is nonincreasing with $\lim_{z\to\infty}(\partial\varphi)^{-1}(z)z=0$ * and a constant $k$ such that*

[TABLE] 2. (b)

There is an index function $\Psi$ such that

[TABLE]

The functions $\varphi$ and $\Psi$ are connected via $\Psi(t)=\tfrac{1}{t}(\partial\varphi)^{-1}\left(\frac{1}{tk\|[\partial J](x^{\dagger})\|_{-}}\right)$ .

Proof.

First, we observe that in our context, where $x_{\alpha}$ is the minimizer of the Tikhonov functional and $x^{\dagger}$ is the point of interest, the KL inequality (4.1) can be written as

[TABLE]

where

[TABLE]

By concavity, $\partial\varphi$ is monotonically decreasing and thus (4.3) leads to

[TABLE]

Dividing both sides by $\alpha>0$ yields (b) with

[TABLE]

This function is an index function by assumptions.

On the other hand, we write (b) as

[TABLE]

and by defining

[TABLE]

we have

[TABLE]

As $\Psi(\frac{1}{\alpha})$ is nonincreasing so is $\bar{\Theta}$ , hence

[TABLE]

Finally, identifying $\partial\varphi=\bar{\Theta}^{-1}$ and noting that $\|\partial T_{\alpha}(x^{\dagger})\|\sim\alpha$ , we get the KL inequality (4.3) up to constants. As $\bar{\Theta}^{-1}$ is nonincreasing, $\varphi$ is concave. Note that $\partial\varphi^{-1}(z)z=\Psi(\frac{1}{z})$ such that the stated condition on $\varphi$ follow as $\Psi$ is an index function. ∎∎

It is interesting that in the proof we stumbled upon the companion function $\Theta$ from (2.10). Namely, we have $\Theta^{2}(\alpha)=\bar{\Theta}(\frac{1}{\alpha})$ . The proof also reveals the identification

[TABLE]

Equation (2.11) for the a priori choice ( $\Theta^{2}(\alpha_{*})\sim\delta^{2}$ ) of the regularization parameter then reads

[TABLE]

and we obtain the formal convergence rate

[TABLE]

Since $\varphi\in\mathcal{K}(0,r_{0})$ is by definition concave, it holds that

[TABLE]

which follows from the property of the “subgradient” of concave functions, where the inequality is reversed compared to convex ones:

[TABLE]

5 Relation to conditional stability estimates

We illustrate how the KL-theory quite directly yields convergence rates in case that a conditional stability estimate holds. Note that such estimates are a very useful tool in, e.g., parameter identification problems in partial differential equations; for examples, see, e.g., [8, 10, 25, 33]. The use of conditional stability estimates (5.2) for rate estimates was in particular investigated by Cheng and Yamamoto in the seminal article [9].

Consider the Tikhonov functionals

[TABLE]

where $A:Z\to Y$ and $y=Ax^{\dagger}$ . We furthermore assume that the Hilbert space $Z\hookrightarrow X$ is continuously embedded into a Banach space $X$ , and there we assume a conditional stability estimate to hold (which, for simplicity, we take as a Hölder function): for some $0\leq\alpha<2$ we assume that

[TABLE]

Cheng and Yamamoto have considered precisely this setup and verified convergence rates.

Here we illustrate the approach via the KL-inequality. To this end, we extend the Tikhonov functionals as follows to $X$ :

[TABLE]

At first we verify the KL-inequality (3.7) for $\bar{T}_{\alpha}$ on $X$ . Note that it is enough to consider the inequality for $\bar{T}_{\alpha}(x)<\infty$ , thus for $x\in Z$ . In this case it reads

[TABLE]

In the following we write $A^{*}$ for the adjoint of $A$ in the space $Z$ .

By [3, Prop 3.1] the strong slope or the remoteness can be characterized by the directional derivative $\bar{T}_{\alpha}^{\prime}$ ,

[TABLE]

where $\nabla T_{\alpha}(x)$ is the usual gradient in the space $Z$ :

[TABLE]

The optimality condition for $x_{\alpha}$ reads

[TABLE]

After some algebraic manipulation exploiting this identity, we obtain

[TABLE]

Using the optimality condition and the conditional stability estimate (5.2), we have using (5.3)

[TABLE]

Thus,

[TABLE]

and consequently

[TABLE]

We have thus found a KL-inequality (3.7) with

[TABLE]

We now apply Proposition to $\bar{T}_{\alpha}$ (which agrees with ${T}_{\alpha}$ for the relevant arguments) and obtain

[TABLE]

noting that $\varphi$ is Hölder continuous. We have

[TABLE]

Since

[TABLE]

and

[TABLE]

we obtain that

[TABLE]

Thus, choosing $\alpha\sim\delta^{2}$ yields

[TABLE]

and hence the convergence rate

[TABLE]

This is the same parameter choice and the same rate as obtained by Cheng and Yamamoto.

6 Example: Tikhonov regularization

Due to the (partial) equivalence of the KL-inequality with the conditions of [23], their examples apply in our case as long as $\Psi$ is a power function. Therefore, we will not go through all of those examples again, but focus on the most prominent one, which is classical Tikhonov regularization

[TABLE]

where $A:X\rightarrow Y$ is a linear operator between Hilbert spaces $X$ and $Y$ and $\|\cdot\|$ denotes the norm in the respective spaces.

As is well known, the convergence behavior of Tikhonov-regularization (6.1) depends on the specific solution $x^{\dagger}$ , and we employ here source conditions of the type

[TABLE]

While the treatment of more general source conditions $x^{\dagger}=\phi(A^{\ast}A)w$ is possible within our framework (see [23]), it shall be sufficient here to treat only the classical setting (6.2).

We recall from [18] that the residual $\|Ax-Ax^{\dagger}\|^{2}$ fulfills a KL inequality with

[TABLE]

if

[TABLE]

i.e., both $x$ and $x^{\dagger}$ lie in the source set (6.2). This will become important again later. For now we simply apply the theory from [23] in the case $0<\mu<\frac{1}{2}$ and demonstrate that the KL inequality and Corollary 2.1 yield convergence in the Bregman distance. Before starting, we summarize some results from [23, Section 4.1]. Namely, we have for (6.1) and under (6.2) that

[TABLE]

and

[TABLE]

Then we have from (6.5) and (6.6) that

[TABLE]

Because $\nabla T_{\alpha}(x^{\dagger})=\alpha\|x^{\dagger}\|$ , the KL inequality requires

[TABLE]

and it is easy to see that we even have equality for

[TABLE]

with derivative

[TABLE]

This function satisfies the condition in Theorem 4.1. From this, we obtain

[TABLE]

This yields, according to (4.4)

[TABLE]

and the convergence rate is given by

[TABLE]

Identifying $\|x_{\alpha}^{\delta}-x^{\dagger}\|^{2}=B_{\xi_{\alpha}^{\delta}}(x_{\alpha}^{\delta},x^{\dagger})$ , we obtain the well-known rate

[TABLE]

Note that Corollary 3.1 does not apply directly since it would yield a convergence rate $\|x^{\dagger}-x_{\alpha}^{\dagger}\|\leq c\delta^{\frac{4\mu}{2\mu+1}}$ , which is clearly off the correct rate by a square in the exponent. We will now sketch a likely explanation for this.

Comparing the functionals (6.1) and (5.1), it appears that similar techniques should lead to a KL inequality. This is indeed the case, and we obtain for the classical Tikhonov functional (6.1)

[TABLE]

We follow the next steps to arrive at the equivalent of (5), which reads

[TABLE]

The conditional stability estimate (5.2) no longer holds, but the source condition (6.4) yields an alternative. Namely, using the interpolation inequality

[TABLE]

for all $q>r\geq 0$ , we see that

[TABLE]

Inserting this into (6.8), and following the argument after (5), we obtain

[TABLE]

which yields a KL inequality with $\varphi^{\prime}(t)\sim t^{\frac{\mu}{2\mu+1}-1}$ or

[TABLE]

Comparing this with the previous results, we see that we have the same function $\varphi$ as for the residual functional (6.3), but this $\varphi$ is only the square root of the function from (6.7) that we derived earlier in this section. Note that $\|\cdot\|^{2}$ fulfill a KL inequality with $\varphi(t)=\sqrt{t}$ . The discrepancy is due to the local character of the KL inequality for ill-posed problems. From the optimality condition of the classical Tikhonov functional (6.1) it follows that $x_{\alpha}$ (and $x_{\alpha}^{\delta}$ , respectively) are always in the range of $A^{\ast}=(A^{\ast}A)^{\frac{1}{2}}$ . Therefore, while $x^{\dagger}$ may fulfill the source condition (6.2) for arbitrary $0<\mu<\infty$ , the source condition (6.4) with $x\in\{x_{\alpha},x_{\alpha}^{\delta}\}$ only holds for $\mu=\frac{1}{2}$ , and we can only apply Corollary 3.1 in this case. Indeed, using the well-known a priori choice $\alpha\sim\delta^{\frac{2}{2\mu+1}}=\delta$ , we have $T_{\alpha}(x^{\dagger})-T_{\alpha}(x_{\alpha}^{\delta})\sim\delta^{2}$ , which yields via Corollary 3.1 with $\varphi(t)$ from (6.3) with $\mu=\frac{1}{2}$ the convergence rate $\|x_{\alpha}^{\delta}-x^{\dagger}\|\sim\sqrt{\delta}$ . Therefore, the different index functions $\varphi$ (6.7) and (6.3) are no contradiction.

Acknowledgement

Part of this research was started during a visit of the second author at the Chemnitz University of Technology. S.K. would like to thank the Faculty of Mathematics in Chemnitz and especially Bernd Hofmann for their great hospitality. D.G. would like to thank Prof. Masahiro Yamamoto for his hospitality during his stay in Tokio, where the author first learned of the KL inequality.

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P.-A. Absil, R. Mahony and B. Andrews, Convergence of the iterates of descent methods for analytic cost functions , SIAM J. Optim., 16 (2005), 531–547.
2[2] H. Attouch and J. Bolte, On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , Math. Programming, 116 (2009), 5–16.
3[3] D. Azé and J. N. Corvellec, Characterizations of error bounds for lower semicontinuous functions on metric spaces , ESAIM Control Optim. Calc. Var., 10 (2004), pp. 409–425.
4[4] J. Bolte, A. Daniilidis, O. Ley and L. Mazet, Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity , T. Am. Math. Soc., 382 (2010), pp. 3319–3363.
5[5] J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems , Math. Prog., 146 (2014), pp. 459–494.
6[6] J. Bolte, A. Daniilidis and A. Lewis, The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems , SIAM J. Opt., 17 (2007), pp. 1205–1223.
7[7] R. I. Boţ, E. R. Csetnek, Proximal-gradient algorithms for fractional programming , Optimization, 66 (2017), pp. 1383–1396.
8[8] A. L. Bukhgeim, J. Cheng, and M. Yamamoto, Stability for an inverse boundary problem of determining a part of a boundary , Inverse Problems, 15 (1999), pp. 1021–1032.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

The Kurdyka-Łojasiewicz inequality as regularity condition

Abstract

1 Introduction

Assumption 1.1**.**

2 Convergence rate theory for convex Tikhonov regularization

Theorem 2.1**.**

Proof.

Theorem 2.2**.**

Corollary 2.1**.**

3 The Łojasiewicz-inequality

Definition 3.1**.**

Definition 3.2**.**

Proposition 3.1**.**

Corollary 3.1**.**

Proof.

4 The KL-regularity condition

Theorem 4.1**.**

Proof.

5 Relation to conditional stability estimates

6 Example: Tikhonov regularization

Acknowledgement

Assumption 1.1.

Theorem 2.1.

Theorem 2.2.

Corollary 2.1.

Definition 3.1.

Definition 3.2.

Proposition 3.1.

Corollary 3.1.

Theorem 4.1.