Case studies and a pitfall for nonlinear variational regularization   under conditional stability

Daniel Gerth; Bernd Hofmann; Christopher Hofmann

arXiv:1905.11682·math.NA·May 29, 2019

Case studies and a pitfall for nonlinear variational regularization under conditional stability

Daniel Gerth, Bernd Hofmann, Christopher Hofmann

PDF

Open Access

TL;DR

This paper investigates the use of conditional stability estimates in nonlinear Tikhonov regularization, highlighting potential pitfalls and demonstrating convergence behavior through theoretical analysis and numerical case studies.

Contribution

It provides a detailed analysis of convergence and rates in nonlinear Tikhonov regularization under conditional stability, emphasizing the importance of correct stability estimates.

Findings

01

Oversmoothing penalties can achieve optimal convergence rates with modified assumptions.

02

Incorrect stability estimates can lead to failure of convergence.

03

Numerical examples illustrate potential pitfalls in applying conditional stability.

Abstract

Conditional stability estimates are a popular tool for the regularization of ill-posed problems. A drawback in particular under nonlinear operators is that additional regularization is needed for obtaining stable approximate solutions if the validity area of such estimates is not completely known. In this paper we consider Tikhonov regularization under conditional stability estimates for nonlinear ill-posed operator equations in Hilbert scales. We summarize assertions on convergence and convergence rate in three cases describing the relative smoothness of the penalty in the Tikhonov functional and of the exact solution. For oversmoothing penalties, for which the rue solution no longer attains a finite value, we present a result with modified assumptions for a priori choices of the regularization parameter yielding convergence rates of optimal order for noisy data. We strongly highlight…

Tables3

Table 1. Table 1: Model Problem 1 : Numerically computed convergence rates ( 39 ) and α 𝛼 \alpha -rates ( 40 ) for the five test cases with estimated values p 𝑝 p from the index κ x subscript 𝜅 𝑥 \kappa_{x} , characterizing approximately the smoothness of the exact solution in these test cases.

RS	$c_{x}$	$κ_{x}$	$c_{α}$	$κ_{α}$	est. $p$	$\frac{4}{p + 1}$
1	0.9578	0.3276	5.8483	2.7950	0.4871	2.6898
2	0.9017	0.3426	13.5714	2.8609	0.5212	2.6290
3	1.6102	0.4110	0.2782	2.3221	0.6978	2.3560
4	0.2571	0.2582	462.1747	2.7974	0.3481	2.9671
5	0.8868	0.6135	25.8986	1.9546	1.5875	1.5459

Table 2. Table 2: Model Problem 1 : numerically computed convergence rates ( 39 ) and α 𝛼 \alpha -rates ( 40 ) for various s 𝑠 s in ( 41 ) for given x † ∈ X 1 / 3 − ϵ superscript 𝑥 † subscript 𝑋 1 3 italic-ϵ x^{\dagger}\in X_{1/3-\epsilon} .

s	$c_{x}$	$κ_{x}$	$c_{α}$	$κ_{α}$
0.1	0.9460	0.2647	86.90	1.8168
0.33	1.1492	0.2828	337.42	2.1324
0.9	1.2633	0.2919	250.08	2.5319

Table 3. Table 3: Model Problem 3 : Numerically computed convergence rates ( 39 ) and α 𝛼 \alpha -rates ( 40 ) in case (b) for various s 𝑠 s in ( 41 ) and x † ∈ X 1 / 3 superscript 𝑥 † subscript 𝑋 1 3 x^{\dagger}\in X_{1/3} .

$s$	$c_{x}$	$κ_{x}$	$c_{α}$	$κ_{α}$
0.1	0.1275	0.2531	6.04e+02	1.6479
0.33	0.1228	0.2461	1.66e+03	1.8805
0.9	0.1229	0.2427	3.84e+04	2.5385

Equations170

F (x) = y

F (x) = y

∥ y - y^{δ} ∥_{Y} \leq δ

∥ y - y^{δ} ∥_{Y} \leq δ

T_{α}^{δ} (x) := ∥ F (x) - y^{δ} ∥_{Y}^{2} + α ∥ B x ∥_{X}^{2} \to min, \mbox s u bj ec tt o x \in D (F) .

T_{α}^{δ} (x) := ∥ F (x) - y^{δ} ∥_{Y}^{2} + α ∥ B x ∥_{X}^{2} \to min, \mbox s u bj ec tt o x \in D (F) .

∥ x - x^{†} ∥_{X} \leq K φ (∥ F (x) - F (x^{†}) ∥_{Y}) \mbox f or a l l x \in B_{r}^{X} (x^{†}) \cap D (F)

∥ x - x^{†} ∥_{X} \leq K φ (∥ F (x) - F (x^{†}) ∥_{Y}) \mbox f or a l l x \in B_{r}^{X} (x^{†}) \cap D (F)

∥ x - x^{†} ∥_{- a} \leq K ∥ F (x) - F (x^{†}) ∥_{Y}^{γ} \mbox f or a l l x \in Q \cap D (F)

∥ x - x^{†} ∥_{- a} \leq K ∥ F (x) - F (x^{†}) ∥_{Y}^{γ} \mbox f or a l l x \in Q \cap D (F)

∥ F (x) - y^{δ} ∥_{Y}^{2} \to min, \mbox s u bj ec tt o x \in Q \cap D (F) .

∥ F (x) - y^{δ} ∥_{Y}^{2} \to min, \mbox s u bj ec tt o x \in Q \cap D (F) .

∥ x ∥_{t} \leq ∥ x ∥_{- a}^{\frac{p - t}{p + a}} ∥ x ∥_{p}^{\frac{t + a}{p + a}}

∥ x ∥_{t} \leq ∥ x ∥_{- a}^{\frac{p - t}{p + a}} ∥ x ∥_{p}^{\frac{t + a}{p + a}}

∥ x_{l s}^{δ} - x^{†} ∥_{X} \leq \overset{ˉ}{K} δ^{\frac{γ p}{p + a}}

∥ x_{l s}^{δ} - x^{†} ∥_{X} \leq \overset{ˉ}{K} δ^{\frac{γ p}{p + a}}

T_{α}^{δ} (x_{α}^{δ}) \leq T_{α}^{δ} (x^{†}),

T_{α}^{δ} (x_{α}^{δ}) \leq T_{α}^{δ} (x^{†}),

∥ x_{α}^{δ} ∥_{1} \leq ∥ x^{†} ∥_{1}^{2} + \frac{δ ^{2}}{α} .

∥ x_{α}^{δ} ∥_{1} \leq ∥ x^{†} ∥_{1}^{2} + \frac{δ ^{2}}{α} .

α \to 0 \mbox an d \frac{δ ^{2}}{α} \to 0 \mbox a s δ \to 0.

α \to 0 \mbox an d \frac{δ ^{2}}{α} \to 0 \mbox a s δ \to 0.

n \to \infty lim ∥ x_{n} ∥_{1} = ∥ x^{†} ∥_{1},

n \to \infty lim ∥ x_{n} ∥_{1} = ∥ x^{†} ∥_{1},

n \to \infty lim ∥ x_{n} - x^{†} ∥_{ν} = 0 \mbox f or a l l 0 \leq ν \leq 1.

n \to \infty lim ∥ x_{n} - x^{†} ∥_{ν} = 0 \mbox f or a l l 0 \leq ν \leq 1.

∥ x - x^{†} ∥_{- a} \leq K (ρ) ∥ F (x) - F (x^{†}) ∥_{Y}^{γ} \mbox f or a l l x \in B_{ρ}^{X_{1}} (0) \cap D (F)

∥ x - x^{†} ∥_{- a} \leq K (ρ) ∥ F (x) - F (x^{†}) ∥_{Y}^{γ} \mbox f or a l l x \in B_{ρ}^{X_{1}} (0) \cap D (F)

∥ x_{α}^{δ} - x^{†} ∥_{X} = O (δ^{\frac{γ p}{p + a}}) \mbox a s δ \to 0,

∥ x_{α}^{δ} - x^{†} ∥_{X} = O (δ^{\frac{γ p}{p + a}}) \mbox a s δ \to 0,

α (δ) \sim δ^{2 - 2 γ \frac{p - 1}{p + a}} .

α (δ) \sim δ^{2 - 2 γ \frac{p - 1}{p + a}} .

∥ x_{α}^{δ} - x^{†} ∥_{X} = O (δ^{\frac{γ}{1 + a}}) \mbox a s δ \to 0,

∥ x_{α}^{δ} - x^{†} ∥_{X} = O (δ^{\frac{γ}{1 + a}}) \mbox a s δ \to 0,

α (δ) \sim δ^{2} .

α (δ) \sim δ^{2} .

∥ x_{α}^{δ} - x^{†} ∥_{- a} \leq C δ^{γ},

∥ x_{α}^{δ} - x^{†} ∥_{- a} \leq C δ^{γ},

∥ x_{α}^{δ} - x^{†} ∥_{X} \leq C (∥ x_{α}^{δ} ∥_{1} + ∥ x^{†} ∥_{1})^{\frac{a}{1 + a}} δ^{\frac{γ}{1 + a}} .

∥ x_{α}^{δ} - x^{†} ∥_{X} \leq C (∥ x_{α}^{δ} ∥_{1} + ∥ x^{†} ∥_{1})^{\frac{a}{1 + a}} δ^{\frac{γ}{1 + a}} .

∥ x - x^{†} ∥_{- a} \leq K (r) ∥ F (x) - F (x^{†}) ∥_{Y}^{γ} \mbox f or a l l x \in B_{r}^{X} (x^{†}) \cap D (F)

∥ x - x^{†} ∥_{- a} \leq K (r) ∥ F (x) - F (x^{†}) ∥_{Y}^{γ} \mbox f or a l l x \in B_{r}^{X} (x^{†}) \cap D (F)

\underline{K} ∥ x - x^{†} ∥_{- a} \leq ∥ F (x) - F (x^{†}) ∥_{Y} \mbox f or a l l x \in D (F) \cap D (B) = D (F) \cap X_{1}

\underline{K} ∥ x - x^{†} ∥_{- a} \leq ∥ F (x) - F (x^{†}) ∥_{Y} \mbox f or a l l x \in D (F) \cap D (B) = D (F) \cap X_{1}

∥ F (x) - F (x^{†}) ∥_{Y} \leq \overline{K} ∥ x - x^{†} ∥_{- a} \mbox f or a l l x \in B_{r}^{X} (x^{†}) \cap X_{1}

∥ F (x) - F (x^{†}) ∥_{Y} \leq \overline{K} ∥ x - x^{†} ∥_{- a} \mbox f or a l l x \in B_{r}^{X} (x^{†}) \cap X_{1}

∥ x_{α_{*}}^{δ} - x^{†} ∥_{X} = O (δ^{\frac{p}{p + a}}) \mbox a s δ \to 0,

∥ x_{α_{*}}^{δ} - x^{†} ∥_{X} = O (δ^{\frac{p}{p + a}}) \mbox a s δ \to 0,

α_{*} = α (δ) = δ^{2 - 2 γ \frac{p - 1}{p + a}} .

α_{*} = α (δ) = δ^{2 - 2 γ \frac{p - 1}{p + a}} .

α (δ) \to 0 \mbox an d \frac{δ ^{2}}{α ( δ )} = δ^{\frac{2 ( p - 1 )}{p + a}} \to \infty \mbox a s δ \to 0.

α (δ) \to 0 \mbox an d \frac{δ ^{2}}{α ( δ )} = δ^{\frac{2 ( p - 1 )}{p + a}} \to \infty \mbox a s δ \to 0.

α (δ) \to 0 \mbox an d \frac{δ ^{2}}{α ( δ )} \to \infty \mbox a s δ \to 0

α (δ) \to 0 \mbox an d \frac{δ ^{2}}{α ( δ )} \to \infty \mbox a s δ \to 0

∥ x_{α}^{δ} - x^{†} ∥_{X} = O (δ^{\frac{p}{p + a}}) \mbox a s δ \to 0

∥ x_{α}^{δ} - x^{†} ∥_{X} = O (δ^{\frac{p}{p + a}}) \mbox a s δ \to 0

α (δ) = δ^{2 - 2 \frac{p - 1}{p + a}} = δ^{\frac{2 ( a - 1 )}{a + p}} .

α (δ) = δ^{2 - 2 \frac{p - 1}{p + a}} = δ^{\frac{2 ( a - 1 )}{a + p}} .

\underline{K} ∥ x ∥_{- a} \leq ∥ A x ∥_{Y} \leq \overline{K} ∥ x ∥_{- a} \mbox f or a l l x \in X,

\underline{K} ∥ x ∥_{- a} \leq ∥ A x ∥_{Y} \leq \overline{K} ∥ x ∥_{- a} \mbox f or a l l x \in X,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical methods in inverse problems · Thermoelastic and Magnetoelastic Phenomena · Probabilistic and Robust Engineering Design

Full text

11institutetext: Faculty of Mathematics, Chemnitz University of Technology, 09107 Chemnitz, Germany

11email: daniel.gerth/bernd.hofmann/christopher.[email protected]

Case studies and a pitfall for nonlinear variational regularization under

conditional stability

Daniel Gerth

Bernd Hofmann

Christopher Hofmann

Abstract

Conditional stability estimates are a popular tool for the regularization of ill-posed problems. A drawback in particular under nonlinear operators is that additional regularization is needed for obtaining stable approximate solutions if the validity area of such estimates is not completely known. In this paper we consider Tikhonov regularization under conditional stability estimates for nonlinear ill-posed operator equations in Hilbert scales. We summarize assertions on convergence and convergence rate in three cases describing the relative smoothness of the penalty in the Tikhonov functional and of the exact solution. For oversmoothing penalties, for which the rue solution no longer attains a finite value, we present a result with modified assumptions for a priori choices of the regularization parameter yielding convergence rates of optimal order for noisy data. We strongly highlight the local character of the conditional stability estimate and demonstrate that pitfalls may occur through incorrect stability estimates. Then convergence can completely fail and the stabilizing effect of conditional stability may be lost. Comprehensive numerical case studies for some nonlinear examples illustrate such effects.

MSC 2010: 47J06, 65J20, 47A52

keywords:

Nonlinear inverse problems, conditional stability, Tikhonov regularization, oversmoothing penalties, Hilbert scales, convergence rates

1 Introduction

Regularization theory for nonlinear ill-posed inverse problems is always a challenging endeavor. In contrast to linear inverse problems, where the theory is rather coherent and well-developed (see, for example, the monographs [10, 35]), the nonlinear theory is harder to grasp. Numerous assumptions exist in the literature that restrict the nonlinear behavior of the forward operator in such a way that stable approximate solutions exist which converge to the exact solution in the limit of vanishing data noise. It is important to keep in mind that the nonlinearity conditions only hold locally. A main goal of this paper is to show that this can be a pitfall, as incorrect localization leads to the loss of the stabilizing property. A second objective of the paper is to verify theoretical convergence results in numerical examples, as well as pointing out some open questions. To this end, we focus here on regularization in Hilbert scales. Going into detail, we consider in this paper the stable approximate solution of the nonlinear operator equation

[TABLE]

by variational (Tikhonov-type) regularization. Equation (1) serves as a model for an inverse problem where the nonlinear forward operator $F:\mathcal{D}(F)\subseteq X\to Y$ maps between the infinite dimensional real Hilbert spaces $X$ and $Y$ with domain $\mathcal{D}(F)$ . The symbols $\|\cdot\|_{X},\;\|\cdot\|_{Y}$ and $\langle\cdot,\cdot\rangle_{X},\;\langle\cdot,\cdot\rangle_{Y}$ designate the norms and inner products of the spaces $X$ and $Y$ , respectively. Instead of the exact right-hand side $y=F(x^{\dagger})$ , with the uniquely determined preimage $x^{\dagger}\in\mathcal{D}(F)$ , we assume to know a noisy element $y^{\delta}\in Y$ satisfying the noise model

[TABLE]

with some noise level $\delta>0$ . Based on this data element $y^{\delta}\in Y$ we use as approximations to $x^{\dagger}$ global minimizers $x_{\alpha}^{\delta}\in\mathcal{D}(F)$ of the extremal problem

[TABLE]

Here, $B\colon\mathcal{D}(B)\subset X\to X$ is a densely defined, unbounded, linear, and self-adjoint operator which is strictly positive such that $\|Bx\|_{X}\geq c_{B}\|x\|_{X}$ holds for all $x\in\mathcal{D}(B)$ . Such operators $B$ generate a Hilbert scale $\{X_{\nu}\}_{\nu\in\mathbb{R}}$ , where $X_{\nu}=\mathcal{D}(B^{\nu})$ coincides with the range $\mathcal{R}(B^{-\nu})$ of the operator $B^{-\nu}$ . In particular $X_{0}=X$ , and we set $\|x\|_{\nu}:=\|B^{\nu}x\|_{X}$ for the norm of the Hilbert scale element $x\in X_{\nu}$ . With this, the specific Tikhonov functional $T^{\delta}_{\alpha}:X\to[0,\infty]$ in (3) is the weighted sum of the quadratic misfit functional $\|F(\cdot)-y^{\delta}\|_{Y}^{2}$ and the Hilbert-scale penalty functional $\|B\,\cdot\|_{X}^{2}=\|\cdot\|_{1}^{2}$ , where the regularization parameter $\alpha>0$ acts as weight factor. Note that no generality is lost by considering only the penalty in the $1$ -norm $\|\cdot\|_{1}$ , since one can always rescale the operator $B$ to obtain $\|Bx\|=\|(B^{\frac{1}{p}})^{p}x\|=\|\tilde{B}^{p}x\|$ for $p>0$ , i.e, one obtains a penalty of arbitrary index $p$ in the Hilbert scale generated by the operator $\tilde{B}:=B^{\frac{1}{p}}$ . Finally, we mention that for $x\in\mathcal{D}(F)$ we set $T^{\delta}_{\alpha}(x):=+\infty$ if $x\notin\mathcal{D}(B)$ , and that the Tikhonov functional attains a well-defined value $0\leq T^{\delta}_{\alpha}(x)<+\infty$ if $x\in\mathcal{D}:=\mathcal{D}(F)\cap\mathcal{D}(B)\not=\emptyset$ .

A typical phenomenon of the nonlinear equation (1) as a model for an inverse problem is local ill-posedness at the solution point $x^{\dagger}\in\mathcal{D}(F)$ (cf. [27, Def. 2] or [26, Def. 3]), which means that inequalities of the form

[TABLE]

cannot hold for any positive constants $K,\;r$ and any index function $\varphi$ .111Throughout, $\mathcal{B}^{H}_{r}(\bar{x})$ denotes a closed ball in the Hilbert space $H$ around $\bar{x}\in H$ with radius $r>0$ . Furthermore, we call a function $\varphi\colon[0,\infty)\to[0,\infty)$ index function if it is continuous, strictly increasing and satisfies the boundary condition $\varphi(0)=0$ . However, the inverse problem literature offers numerous examples, where the left-hand term $\|x-x^{\dagger}\|_{X}$ in (4) is replaced with a weaker norm $\|x-x^{\dagger}\|_{-a}\;(a>0)$ and a corresponding conditional stability estimate takes place. In the sequel, we restrict our considerations to the concave index functions $\varphi(t)=t^{\gamma}$ of Hölder-type with exponents $0<\gamma\leq 1$ and hence to conditional stability estimates of the form

[TABLE]

with some index $a>0$ , which can be interpreted as degree of ill-posedness of $F$ at $x^{\dagger}$ , a suitable subset $Q$ in $X$ which acts as the aforementioned localization of the nonlinearity condition, and a constant $K>0$ that may depend on $Q$ .

Let us consider the situation that $x^{\dagger}\in Q$ and $Q$ is known. Then one may employ a least squares iteration process of minimizing the norm square

[TABLE]

The minimizers $x_{ls}$ of (6) satisfy $\|F(x_{ls})-y^{\delta}\|\leq\delta$ by definition and due to $x^{\dagger}\in Q$ . Hence we have convergence $\|x_{ls}^{\delta}-x^{\dagger}\|_{-a}\to 0$ as $\delta\to 0$ of these least squares-type solutions to $x^{\dagger}$ in the norm of the space $X_{-a}$ which is weaker than the one in $X$ .

To achieve convergence and even convergence rates in the norm of $X$ , additional smoothness $x^{\dagger}\in X_{p}$ for some $p>0$ is needed. If the approximate solutions $x_{ls}^{\delta}\in Q\cap\mathcal{D}(F)$ also possess such smoothness with $\|x_{ls}^{\delta}\|_{p}$ uniformly bounded for all $0<\delta\leq\bar{\delta}$ , then, with $-a<t\leq p$ the interpolation inequality in Hilbert scales (see [33]) applies in the form

[TABLE]

for all $x\in X_{p}$ . Hence we derive from (5) and (7) with $t=0$ and by the triangle inequality that

[TABLE]

for sufficiently small $\delta>0$ and some constant $\bar{K}$ . A way to ensure the property that the approximate solutions belong to $X_{p}\cap Q\cap\mathcal{D}(F)$ is to use regularized solutions which minimize the Tikhonov functional $\|F(x)-y^{\delta}\|_{Y}^{2}+\alpha\|B^{s}x\|_{X}^{2}$ , subject to $x\in Q\cap\mathcal{D}(F)$ , where $s\geq p$ is required. Hence, Tikhonov-type regularization is here an auxiliary tool which complements the conditional stability estimate (5) in order to obtain stable approximate solutions measured in the norm of $X$ .

On the other hand, we have to take into account the frequently occurring situation that the set $Q$ in (5) is not or not completely known and a minimization process according to (6) is impossible, because of a not completely known set of constraints for the optimization problem. Nevertheless, a combination of the conditional stability estimate (5) with variational regularization of the form (3) can be successful. For a systematic treatment of convergence results in the context of regularization theory we will distinguish the following cases relating the smoothness of the solution $x^{\dagger}$ and of the approximate solutions $x_{\alpha}^{\delta}$ implied by the functional (3):

Case distinction.

(a)

Classical regularization: $x^{\dagger}\in X_{p}$ for $p>1$ , which means that $\|Bx^{\dagger}\|_{X}^{2}<+\infty$ and there is some source element $w\in X_{\varepsilon}\;(\varepsilon>0)$ such that $x^{\dagger}=B^{-1}w$ ; 3. (b)

Matching smoothness: $x^{\dagger}\in X_{1}$ , i.e. $\|Bx^{\dagger}\|_{X}<\infty$ , but $x^{\dagger}\notin X_{1+\varepsilon}$ for all $\varepsilon>0$ . 4. (c)

Oversmoothing regularization: $x^{\dagger}\in X_{p}$ for some $0<p<1$ , but $x^{\dagger}\notin X_{1}$ , i.e. $\|Bx^{\dagger}\|_{X}=+\infty$ .

The goal of this paper is to discuss the different opportunities and limitations for convergence and rates of regularized solutions $x_{\alpha}^{\delta}$ in the situations (a), (b), and (c), respectively. It is organized as follows: Section 2 recalls assertions on convergence of regularized solutions in cases (a) and (b). Moreover, usual technical assumptions on forward operator, its domain and the exact solution are listed. In Section 3, Hölder rate results under conditional stability estimates are summarized for the cases of classical regularization and matching smoothness. The rate result of Proposition 3.7 for the oversmoothing case (c) is of specific interest. It requires two-sided inequalities as conditional stability estimates, whereas in cases (a) and (b) only one-sided inequalities are needed. Three inverse model problems of ill-posed nonlinear equations covering all cases (a), (b), and (c) are outlined in Section 4, for which numerical case studies are presented in Section 5. The proof of Proposition 3.7 is given in the appendix.

2 Convergence

In this section we collect properties of the regularized solutions $x_{\alpha}^{\delta}$ obtained as solutions of the optimization problem (3) for the cases (a), (b), and (c) in different ways. Throughout this paper we suppose that the following assumption concerning the nonlinear forward operator $F$ and the solvability of the operator equation (1) holds true.

Assumption 1.

The operator $F:D(F)\subseteq X\to Y$ is weak-to-weak sequentially continuous and its domain $D(F)$ is a convex and closed subset of $X$ . For the right-hand side $y=F(x^{\dagger})\in Y$ under consideration let $x^{\dagger}\in\mathcal{D}(F)$ be the uniquely determined solution to the operator equation (1).

Under the setting introduced in Section 1, the penalty $\|Bx\|_{X}^{2}$ as part of the Tikhonov functional $T_{\alpha}^{\delta}$ in (3) is a non-negative, convex, and sequentially lower semi-continuous functional. Moreover, this functional is stabilizing in the sense that all its sublevel sets are weakly sequently compact in $X$ . Taking also into account Assumption 1, the Assumptions 3.11 and 3.22 of [45] are satisfied and the assertions from [45, Section 4.1.1] apply, which ensure existence and stability of the regularized solutions $x_{\alpha}^{\delta}$ in our present Hilbert scale setting, consistent for all three cases (a), (b), and (c).

We emphasize at this point that we always have $x_{\alpha}^{\delta}\in X_{1}$ by definition of the minimizers in (3), but only in the cases (a) and (b) one can take profit of the inequality

[TABLE]

which implies for all $\alpha>0$ that

[TABLE]

In the case (c), however, due to $x^{\dagger}\notin X_{1}$ and hence $\|x^{\dagger}\|_{1}=+\infty$ we have no such uniform bounds of $\|x_{\alpha}^{\delta}\|_{1}$ from above. On the contrary, in [13] it was shown that $\|x_{\alpha}^{\delta}\|_{1}\rightarrow\infty$ as $\delta\rightarrow 0$ is necessary even for weak convergence of the regularizers $x_{\alpha}^{\delta}$ to $x^{\dagger}$ .

In order to obtain convergence of the regularized solutions $x_{\alpha}^{\delta}$ to $x^{\dagger}$ as $\delta\to 0$ , the interplay of the noise level and the choice of the regularization parameter $\alpha>0$ , which we choose either a priori $\alpha=\alpha(\delta)$ or a posteriori $\alpha=\alpha(\delta,y^{\delta})$ , must be appropriate. In the literature, this interplay is typically controlled by the limit conditions

[TABLE]

In our case (a) this is a sufficient description.

Proposition 2.1.

Let the regularization parameter $\alpha>0$ fulfill the conditions (10). Then we have under Assumption 1 and for case (a), i.e. for $1<p<\infty$ , by setting $\alpha_{n}=\alpha(\delta_{n})$ or $\alpha_{n}=\alpha(\delta_{n},y^{\delta_{n}})$ , $x_{n}=x_{\alpha_{n}}^{\delta_{n}}$ , that for $\delta_{n}\to 0$ as $n\to\infty$

[TABLE]

and

[TABLE]

Proof 2.2.

The proof follows along the lines of Theorem 4.3 and Corollary 4.6 from [45].

As we will see in Proposition 3.1 in the next section, the optimal parameter choice fulfills the conditions (10) in case (a). In case (b), where the smoothness of $x^{\dagger}$ coincides with the smoothness of the regularization, i.e., $p=1$ , the matter becomes unclear. On one hand, it is easily seen that Proposition 2.1 holds in the exact same way for case (b), which is a consequence of (9) holding in both cases. Hence, we have the following corollary:

Corollary 2.3.

Under the assumptions of Proposition 2.1, in particular for the cases (a) and (b) and for a regularization parameter choice satisfying (10), we have that the regularized solutions $x_{\alpha}^{\delta}$ belong to the ball $\mathcal{B}^{X_{\nu}}_{r}(x^{\dagger})$ for prescribed values $r>0$ and $0\leq\nu\leq 1$ whenever $\delta>0$ is sufficiently small.

The surprising difference between the cases (a) and (b) on the other hand, is that the optimal choice of the regularization parameter for (b) (we show in Proposition 3.3 below that $\alpha\sim\delta^{2}$ yields the optimal convergence rate) violates the second condition in (10). Since obviously a convergence rate implies norm convergence, this means that the condition $\delta^{2}/\alpha\rightarrow 0$ in (10) is not necessary but sufficient for convergence, at least in case (b).

In case (c) with oversmoothing penalty, the inequality (8) and consequently (9) are missing. Results of Proposition 2.1 and Corollary 2.3 in general do not apply in that case. One cannot even show weak convergence $x_{n}\rightharpoonup x^{\dagger}$ in $X$ , and regularized solutions $x_{\alpha}^{\delta}$ need not belong to a ball $\mathcal{B}^{X}_{r}(x^{\dagger})$ with small radius $r>0$ if $\delta>0$ is sufficiently small. As will be shown in Proposition 3.7 of Section 3 (see also [24, 25]), convergence rates can be proven under stronger conditions also for (c), where we have some $0<p<1$ such that $x^{\dagger}\in X_{p}$ . The key to these results was the appropriate choice of $\alpha$ either by an a priori or a posteriori parameter choice. In particular, $\frac{\delta^{2}}{\alpha}\rightarrow\infty$ as $\delta\rightarrow 0$ , which violates (10), is typical there. The interplay of $\alpha$ and $\delta$ will be in the focus of our numerical case studies in Section 5 below.

3 Convergence rate results

In this section, we are going to discuss convergence rate results for cases (a) and (b) on one hand, but also (c) on the other hand. In addition to Assumption 1 some versions of conditional stability estimates have to be imposed which, in combination with the smoothness assumptions $x^{\dagger}\in X_{p}$ , are essentially hidden forms of source conditions for the solution $x^{\dagger}$ .

In Assumption 2 we first consider the situation for the setting $Q:=\mathcal{B}^{X_{1}}_{\rho}(0)$ . This model setting was comprehensively discussed and illustrated by examples of associated nonlinear inverse problems in the papers [8, 9, 24, 28]. Here we have evidently $x^{\dagger}\in Q$ for the cases (a) and (b) whenever $\|x^{\dagger}\|_{1}\leq\rho$ .

Assumption 2.

Let for fixed $a>0$ and $0<\gamma\leq 1$ the conditional stability estimates

[TABLE]

hold, where constants $K(\rho)>0$ are supposed to exist for all radii $\rho>0$ .

Then the following proposition, which is a direct consequence of [9, Theorem 2.1] when adapting the corresponding proof, yields an order optimal convergence rate in case (a).

Proposition 3.1.

Under Assumptions 1 and 2 and for $x^{\dagger}\in X_{p}$ with $1<p\leq a+2$ we have the rate of convergence of regularized solutions $x_{\alpha}^{\delta}\in\mathcal{D}(F)\cap\mathcal{D}(B)$ to the solution $x^{\dagger}\in\mathcal{D}(F)\cap\mathcal{D}(B)$ as

[TABLE]

provided that the regularization parameter $\alpha=\alpha(\delta)$ is chosen a priori as

[TABLE]

We easily see that the convergence results of Proposition 2.1 apply here for $p>1$ and that in particular (13) implies (10). The additional smoothness of $x^{\dagger}$ , which is always required to obtain convergence rates in regularization of ill-posed problems appears in Hilbert scales in form $x^{\dagger}=B^{-p}v$ with some source element $v\in X$ .

Remark 3.2.

We mention that along the lines of [9, Theorem 2.2] the rate (12) can also be shown under the assumptions of Proposition 3.1 when the regularization parameter $\alpha=\alpha(\delta,y^{\delta})$ is chosen a posteriori by a sequential discrepancy principle.

The modified version of the rate result for case (b) is as follows:

Proposition 3.3.

Under the Assumptions 1 and 2 and for $x^{\dagger}\in X_{1}$ we have the rate of convergence of regularized solutions $x_{\alpha}^{\delta}\in\mathcal{D}(F)\cap\mathcal{D}(B)$ to the solution $x^{\dagger}\in\mathcal{D}(F)\cap\mathcal{D}(B)$ as

[TABLE]

if the regularization parameter $\alpha=\alpha(\delta)$ is chosen a priori as

[TABLE]

Proof 3.4.

By the standard technique of variational regularization under conditional stability estimates (cf. [9, Proof of Theorem 1.1] or [45, Section 4.2.5]) we obtain for the choice (15) of the regularization parameter and by using the conditional stability estimate (11) the inequality

[TABLE]

where the constant $C>0$ via $\rho$ and $K(\rho)$ depends on $\|x^{\dagger}\|_{1}$ and on upper and lower bounds of $\delta^{2}/\alpha$ . Combining this with the interpolation inequality (7), taking $t=0$ and $s=1$ , and applying the triangle inequality provides us with the estimate

[TABLE]

Due to (9) the norm $\|x_{\alpha}^{\delta}\|_{1}$ is uniformly bounded by a finite constant for $\alpha(\delta)$ from (15). This yields the rate (14) and completes the proof. Finally, we should note that the inequality (16) can only be established, because constants $K(\rho)>0$ in (11) exist for arbitrarily large $\rho>0$ .

In the borderline case (b) we have also a borderline a priori choice of the regularization parameter which contradicts the second limit condition in (10) such that the quotient $\frac{\delta^{2}}{\alpha}$ is uniformly bounded below by a positive constant and above by a finite constant.

In Assumption 3 we consider alternatively the situation that $Q:=\mathcal{B}^{X}_{r}(x^{\dagger})$ . This model, which is illustrated by Example 1 in Section 4 below, is typical for conditional stability estimates that arise from nonlinearity conditions imposed on the forward operator $F$ in a neighbourhood of the solution $x^{\dagger}$ . In this context, the radius $r>0$ which restricts the validity area of stability estimates can be rather small. In all cases of the Case distinction we have here $x^{\dagger}\in Q\cap\mathcal{D}(F)$ , but only for (a) and (b) also $x^{\dagger}\in\mathcal{D}(F)\cap\mathcal{D}(B)$ .

Assumption 3.

Let for fixed $a>0$ and $0<\gamma\leq 1$ the conditional stability estimate

[TABLE]

hold, where the constant $K(r)>0$ depends on the largest admissible radius $r>0$ .

Corollary 3.5.

The assertion of Proposition 3.1 remains true if Assumption 2 is replaced with Assumption 3.

Proof 3.6.

To see the validity of Proposition 3.1 under Assumption 3 in case (a) of the Case distinction, where the regularization parameter choice satisfies (10), it is enough to take the assertion of Corollary 2.3 into account. This assertion implies that for sufficiently small $\delta>0$ the regularized solutions $x_{\alpha}^{\delta}$ belong to the ball $\mathcal{B}^{X}_{r}(x^{\dagger})$ for prescribed $r>0$ . Then the conditional stability estimate (17) applies and yields the convergence rate (12) along the lines of the proof of [9, Theorem 2.1].

In case (b), however, for the choice (15) of Proposition 3.3 the condition (10) fails and even if $\delta>0$ is sufficiently small, it cannot be shown that $x_{\alpha}^{\delta}\in\mathcal{B}^{X}_{r}(x^{\dagger})$ for prescribed $r>0$ . Consequently, the conditional stability estimate (17) need not hold for the regularized solutions $x=x_{\alpha}^{\delta}$ and the rate assertion (14) of Proposition 3.3 is only valid under Assumption 3 if constants $K(r)>0$ in (17) exist for arbitrarily large $r>0$ . This is, however, the case in the exponential growth model of Example 1 below.

Now we turn to the cases with oversmoothing penalty, where $x^{\dagger}\notin X_{1}$ and restrict ourselves to $\gamma=1$ in the conditional stability estimates. As is well-known since the paper by Natterer [39], convergence rates in this case require lower and upper estimates of $\|F(x)-F(x^{\dagger})\|_{Y}$ by multiples of the term $\|x-x^{\dagger}\|_{-a}$ . We start with a corresponding analytical result. The goal of the case studies in Section 5 below is to gain further insight into the behavior of regularized solutions in case (c) for a priori and a posteriori choices of the regularization parameter.

Assumption 4.

Let $a>0$ . Moreover, let $x^{\dagger}$ be an interior point of $\mathcal{D}(F)$ such that for the radius $r>0$ we have $\mathcal{B}^{X}_{r}(x^{\dagger})\subset\mathcal{D}(F)$ and the two estimates

[TABLE]

and

[TABLE]

hold true, where $0<\underline{K}\leq\overline{K}<\infty$ are constants.

Proposition 3.7.

Let $x^{\dagger}\in X_{p}$ for some $0<p<1$ , but $x^{\dagger}\notin X_{1}$ . Under the Assumptions 1 and 4 we then have the rate of convergence of regularized solutions to the exact solution as

[TABLE]

if the regularization parameter is chosen a priori as

[TABLE]

The proof of Proposition 3.7 is given in the appendix along the lines of [25, Theorem 1], where we set for simplicity $\bar{x}=0$ . Note that Theorem 1 in [25] refers to a simplified version of the pair of estimates (18) and (19), which are ibid both assumed to hold for all $x\in\mathcal{D}(F)$ . As the proof in the appendix shows, the upper estimate (19) is only exploited by auxiliary elements $x_{\alpha}$ , which belong to $\mathcal{B}^{X}_{r}(x^{\dagger})\cap X_{1}$ for sufficiently small $\alpha>0$ . On the other hand, there are no arguments for restricting the noisy regularized solutions $x_{\alpha}^{\delta}$ to small balls. Consequently, the lower estimate (18) needs to hold for all elements in $\mathcal{D}(F)\cap X_{1}$ . This is an essential drawback for the application of Proposition 3.7 to practical problems. An analogue of Proposition 3.7 for the discrepancy principle as parameter choice rule can be formulated and proven along the lines of the paper [24].

As already mentioned in Section 2, we stress again that, despite the assertion of Proposition 3.7, norm convergence of regularized solutions cannot be shown in general for case (c), not even weak convergence in $X$ can be established. Evidently the parameter choice (21) violates (10) since we have

[TABLE]

It appears that

[TABLE]

tends to be the typical situation in the oversmoothing case (c), at least for regularization parameters yielding optimal convergence rates. Numerical case studies below support this conjecture. A similar behavior of the regularization parameters was noted for oversmoothing $\ell^{1}$ -regularization [13].

To conclude and summarize this section, we stress that in all cases of the Case distinction, we have under the appropriate conditional stability assumption (to show the similarities between the cases, we fix $\gamma=1$ for (a), (b), and (c) for the next assertion) and for $x^{\dagger}\in X_{p}$ for some $p>0$ the convergence rate

[TABLE]

under both the discrepancy principle and the a priori parameter choice

[TABLE]

Hence, we obtain the same parameter choice and the same convergence rate as in the case of a linear operator. Namely in [39], (22) and (23) were obtained for a linear operator $A:X\rightarrow Y$ under a two-sided inequality

[TABLE]

in analogy to the estimates from Assumption 4.

4 Examples

In the following, we introduce two nonlinear inverse problems of type (1), for which we will investigate the analytic results from the previous section numerically. Before doing so, we will introduce two similar, but different Hilbert scales used as penalty in the minimization problem (3) and as measure of the solution smoothness. On one hand, we consider the standard Sobolev-scale $H^{p}[0,1]$ . For integer values of $p\geq 0$ , these function spaces consist of functions whose $p$ -th derivative is still in $L^{2}(0,1)$ . For real parameters of $p>0$ , the spaces can be defined by an interpolation argument [1]. Using Fourier-analysis, one can define a norm in $H^{p}[0,1]$ via

[TABLE]

where $\hat{x}$ is the Fourier-transform of $x$ . Then $x\in H^{p}[0,1]$ iff $\|x\|_{H^{p}[0,1]}<\infty$ . The Sobolev scale for $p\geq 0$ does not constitute a Hilbert scale in the strict sense, but for each $0<p^{\ast}<\infty$ there is an operator $B:L^{2}(0,1)\to L^{2}(0,1)$ such that $\{X_{p}\}_{0\leq p\leq p^{\ast}}$ is a Hilbert scale [40]. This is not an issue in numerical experiments. Note that the norm (24) is easy to implement, in particular it allows a precise gauging of the solution smoothness.

The reason why the Sobolev scale does not form a Hilbert scale for arbitrary values of $p$ lies in the boundary values. In order to generate a full Hilbert scale $\{X_{\tau}\}_{\tau\in\mathbb{R}}$ , we exploit the simple integration operator

[TABLE]

of Volterra-type mapping in $X=Y=L^{2}(0,1)$ and set

[TABLE]

By considering the Riemann-Liouville fractional integral operator $J^{p}$ and its adjoint $(J^{*})^{p}=(J^{p})^{*}$ for $0<p\leq 1$ we have that

[TABLE]

cf. [15, 16, 43], and hence by [16, Lemma 8]

[TABLE]

where the fractional Sobolev spaces $H^{p}[0,1]$ occur. One can also show that

[TABLE]

On the other hand, it is well-known that $X_{2}\subset X_{p}\subset X_{1}$ for $1<p<2$ and that $X_{2}$ is characterized by

[TABLE]

in an explicit manner, see for example [35, Beispiel 2.1.5]. We omit discussing higher smoothness spaces since we will not consider those in our examples. In the following we present examples that show and illustrate the occurrence of the cases (a), (b) and (c) of the Case distinction in Section 1. Note that for $0<p<\frac{1}{2}$ the Sobolev-scale $H^{p}[0,1]$ and the Hilbert scale $\{X_{p}\}_{p>0}$ induced by $J$ coincide.

Model problem 1 (Exponential growth model).

The following exponential growth model has been previously discussed in the literature, and we refer for more details and properties to [18, Section 3.1] and [20]. To identify the time dependent growth rate $x(t)\;(0\leq t\leq T)$ of a population we use observations $y(t)\;(0\leq t\leq T)$ of the time-dependent size of the population with initial size $y(0)=y_{0}>0$ , where the O.D.E. initial value problem

[TABLE]

is assumed to hold. For simplicity let in the sequel $T:=1$ and consider the space setting $X=Y:=L^{2}(0,1)$ . Then we simply derive the nonlinear forward operator $F:x\mapsto y$ mapping in the real Hilbert space $L^{2}(0,1)$ as

[TABLE]

with full domain $\mathcal{D}(F)=L^{2}(0,1)$ and with the Fréchet derivative

[TABLE]

It can be shown that there is some constant $\hat{K}>0$ such that for all $x\in X$ the inequality

[TABLE]

is valid. By applying the triangle inequality to (31) we obtain the estimate

[TABLE]

for all $x\in\mathcal{B}^{X}_{r}(x^{\dagger})$ , where the constant $\check{K}(r)>0$ attains the form $\check{K}(r):=r\hat{K}+1$ for arbitrary $r>0$ .

Using the Hilbert scale generated by the operator $J$ from (25), taking into account that $\|Jh\|_{Y}=\|(J^{*}J)^{1/2}h\|_{X}=\|B^{-1}h\|_{X}=\|h\|_{-1}$ for all $h\in X$ , and that there is some $0<\underline{c}<\infty$ such that $\underline{c}\leq[F(x^{\dagger})](t)\;(0\leq t\leq 1)$ for the multiplier function in $F^{\prime}(x^{\dagger})$ , there is a constant $0<c_{0}<\infty$ satisfying

[TABLE]

This implies by formula (32) the estimate

[TABLE]

This estimate is of the form (17) with $a:=1$ and $K(r):=\frac{r\hat{K}+1}{c_{0}}$ . But it is specific for this example that there exist constants $K(r)>0$ for arbitrarily large radii $r>0$ such that (33) is valid.

The case (a) of the Case distinction in Section 1 occurs due to formula (28) if the solution is sufficiently smooth, i.e. $x^{\dagger}\in H^{p}[0,1]$ for some $p>1$ and, for the Hilbert scale induced by $J$ , it fulfills the necessary boundary conditions. Case (b) will be the subject of Model problem 3 below. The oversmoothing case (c) of the Case distinction occurs either if the solution is insufficiently smooth, i.e. $x^{\dagger}\in H^{p}[0,1]$ for $0<p<1$ , or in case of the Hilbert scale induced by $J$ , one might have $x^{\dagger}\in H^{p}[0,1]$ for $p\geq 1$ but the boundary condition $x^{\dagger}(1)=0$ fails. Due to formula (27) we then have $x^{\dagger}\in X_{p}$ for all $p<1/2$ , but $x^{\dagger}\notin X_{p}$ for all $p>1/2$ and consequently also $x^{\dagger}\notin X_{1}$ . This is, for example, the case for the constant function $x^{\dagger}(t)=1\;(0\leq t\leq 1)$ . We complete this example with the remark that due to formula (29) a function $x^{\dagger}\in H^{2}[0,1]$ , like the function $x^{\dagger}(t)=-(t-0.5)^{2}+0.25$ used in the case studies below and satisfying $x^{\dagger}(1)=0$ , does not belong to $X_{2}$ whenever its first derivative at $t=0$ does not vanish.

Model problem 2 (Autoconvolution).

As a second problem, we consider under the same space setting $X=Y:=L^{2}(0,1)$ the autoconvolution operator on the unit interval defined as

[TABLE]

with full domain $\mathcal{D}(F)=L^{2}(0,1)$ . This operator and the associated nonlinear operator equation (1) with applications in statistics and physics have been discussed early in the literature of inverse problems (cf. [14]). Due to extensions in laser optics, the deautoconvolution problem was comprehensively revisited recently (see, e.g., [5] and [12]). Even though $F$ from (34) is a non-compact operator, we have for all $x\in X$ a compact Fréchet derivative

[TABLE]

Taking the Hilbert scale $\{X_{\tau}\}_{\tau\in\mathbb{R}}$ based on the operator $B$ from (26) and the integral operator $J$ from (25), we see for the specific solution

[TABLE]

that

[TABLE]

Unfortunately no estimate of the form (32) is available, because such estimates with $F$ -differences on the right-hand side are not known for the autoconvolution operator. However, as a condition characterizing the nonlinearity of $F$ the inequality

[TABLE]

is valid. Thus we have for all $x\in X$ and $x^{\dagger}$ from (35), by using the triangle inequality,

[TABLE]

Using the interpolation inequality (7) in the form

[TABLE]

we derive for $x-x^{\dagger}\in X_{1}$ the inequality

[TABLE]

and, if moreover $\|x-x^{\dagger}\|_{1}\leq\kappa<2$ , even the conditional stability estimate

[TABLE]

The estimate (36) can only unfold a stabilizing effect if approximate solutions $x$ are such that $x-x^{\dagger}\in\mathcal{B}^{X_{1}}_{\kappa}(0)$ for some $\kappa<2$ . For $x^{\dagger}$ from (35) with $x^{\dagger}(1)=1\not=0$ we have $x^{\dagger}\notin X_{1}$ , but regularized solutions $x=x_{\alpha}^{\delta}$ solving the extremal problem (3) have by definition the property $x_{\alpha}^{\delta}\in X_{1}$ , which implies that $x_{\alpha}^{\delta}-x^{\dagger}\notin X_{1}$ . This is a pitfall, because convergence assertions for $x_{\alpha}^{\delta}$ as $\delta\to 0$ are missing in case (c) and thus the behaviour of $x_{\alpha}^{\delta}$ remains completely unclear.

Model problem 3 (Situation of $x^{\dagger}$ meeting case (b)).

It is not straight forward to construct an example for case (b) of Case distinction. We base our construction on the observation that the series $\sum\limits_{n=2}^{\infty}\frac{1}{n(\log n)^{2}}$ is convergent, i.e. it characterizes a finite value, whereas the series $\sum\limits_{n=2}^{\infty}\frac{n^{\varepsilon}}{n(\log n)^{2}}$ is divergent for all $\varepsilon>0$ , i.e. we have $\sum\limits_{n=2}^{\infty}\frac{n^{\varepsilon}}{n(\log n)^{2}}=\infty$ . In order to be able to use the model operators and the Hilbert scale introduced before, we use the following integral formulation.

Lemma 4.1.

The improper integral $\int_{2}^{\infty}\frac{1}{x^{\eta}\log^{2}(x)}\,dx$ converges for $\eta\geq 1$ and diverges for $\eta<1$ .

Proof 4.2.

It is

[TABLE]

$C\in\mathbb{R}$ , where $E(z):=\int_{z}^{\infty}\frac{e^{-t}}{t}\,dt$ . The claim follows since

[TABLE]

Hence, we construct exact solutions $x^{\dagger}$ via their Fourier transform

[TABLE]

to obtain, after an inverse Fourier transform, solutions $x^{\dagger}\in H^{p}[0,1]$ , but for arbitrarily small $\epsilon>0$ $x\notin H^{p+\epsilon}[0,1]$ . Namely, for this $x^{\dagger}$ , the $H^{p}$ -norm in Fourier-domain (24) reads

[TABLE]

Since we have little control over the boundary values through this approach, we will only consider the Hilbert scale induced by (26) for $0<p<\frac{1}{2}$ .

5 Case studies

In this section we provide numerical evidence for the behavior of regularized solutions $x_{\alpha}^{\delta}$ with respect to the Case distinction from Section 1 and the Model problems from Section 4.

5.1 Numerical studies for Model Problem 1

We consider the forward operator $F$ from (30) in the setting $X=Y=L^{2}(0,1)$ , $\mathcal{D}(F)=X$ . As was shown, a conditional stability estimate of the form (17) is valid there with $a=1$ and $\gamma=1$ (cf. formula (33)). It must be emphasized that Assumption 3 applies even in an extended manner, which means that there are finite constants $K(r)>0$ for arbitrarily large radii $r>0$ such that (33) is valid.

In our first set of experiments, we will investigate the interplay between the value $p\in(0,1)$ , $\alpha$ -rates of the regularization parameter and error rates of regularized solutions $x_{\alpha}^{\delta}$ using several test cases. To this end, we consider five reference solutions as shown in Figure 1. Of these examples, only RS5 fulfills the boundary condition $x(1)=0$ , hence RS1-RS4 can only be an element of $X_{p}$ for $0<p<\frac{1}{2}$ . Since for RS5 $x^{\prime}(0)\neq 0$ , we have in this case $x^{\dagger}\in H^{p}[0,1]$ for $p\leq\frac{3}{2}$ .

To confirm our theoretical findings of Section 3, we solve (3) using, after discretization via the trapezoidal rule for the integral, the MATLAB®-function fmincon. Typically, we use a discretization level $N=200$ . To the simulated data $y=F(x^{\dagger})$ we add random noise for which we prescribe the relative error $\bar{\delta}$ such that $\|y-y^{\delta}\|=\bar{\delta}\|y\|$ , i.e., we have (2) with $\delta=\bar{\delta}\|y\|$ . To obtain the $X_{1}$ norm in the penalty, we set $\|\cdot\|_{1}=\|\cdot\|_{H^{1}[0,1]}$ and additionally force the boundary condition $x(1)=0$ . The regularization parameter $\alpha$ is chosen as $\alpha_{\scriptscriptstyle DP}=\alpha(\delta,y^{\delta})$ using the discrepancy principle, i.e.,

[TABLE]

with some prescribed multiplier $C>1$ . Unless otherwise noted C=1.1 was used. From Section 3, we know that this should yield a $\alpha$ -rate similar to the a-priori choice

[TABLE]

cf. formula (23), that has already been used by Natterer in [39] for linear problems in the case of oversmoothing penalties. We should also be able to observe the order optimal convergence rate

[TABLE]

Since $x^{\dagger}$ is known, we can compute the regularization errors $\|x_{\alpha}^{\delta}-x^{\dagger}\|_{X}$ . We interpret this as a function of $\delta$ and make a regression for the model function

[TABLE]

and similarly we estimate the function behind the regularization parameter through the ansatz

[TABLE]

Comparing (39) and the predicted rate (22), we have $\kappa_{x}=\frac{p}{a+p}$ , hence we can estimate the smoothness of the solution as $p=\frac{a\kappa_{x}}{1-\kappa_{x}}$ . Recall that $a=1$ in this example. Results on regularized solutions $x_{\alpha_{DP}}^{\delta}$ with the discrepancy principle for all five reference solutions are summarized in Table 1. From this estimated $p$ , we can calculate the a-priori parameter choice (23) and compare it to the measured one. Results on regularized solutions $x_{\alpha_{DP}}^{\delta}$ with the discrepancy principle for all five reference solutions are summarized in Table 1.

As discussed before, the reference solutions RS1, RS2, RS3 and RS4 all belong to case (c) of the Case distinction whereas RS5 belongs to case (a). Our computed results fit to this narrative. For RS1–RS4 we obtain an estimated $p<1$ and $\kappa_{\alpha}>2$ , i.e., $\delta^{2}/\alpha\rightarrow\infty$ as $\delta\to 0$ . As expected we have $0<\kappa_{\alpha}<2$ for RS5 ( $p>1$ ) together with $\frac{\delta^{2}}{\alpha_{DP}}\rightarrow 0$ as $\delta\to 0$ . For RS1 we know that $p$ is bounded above by 0.5 and the estimated value 0.487 fits well. In particular in the oversmoothing cases, we have an excellent fit between the $\alpha$ -rates from the discrepancy principle and the a-priori choice based on our estimate of $p$ .

As a second scenario for this model problem, we use the Sobolev scale $H^{p}[0,1]$ , $0<p<\frac{1}{2}$ to investigate a particular case of (c) in our Case distinction. Using the Fourier transform, we construct our solutions $x^{\dagger}$ such that $\hat{x}^{\dagger}(\xi):=(1+|\xi|)^{2})^{-\frac{p}{2}-\frac{1}{4}}$ which yields solutions $x^{\dagger}\in H^{p-\epsilon}[0,1]$ for all $\epsilon>0$ , but $x^{\dagger}\notin H^{p}[0,1]$ . This follows, because $\int_{\mathbb{R}}(1+|\xi|^{2})^{\nu}d\xi$ converges for $\nu<-1/2$ (but diverges for $\nu\geq-1/2$ ) and the definition of the $H^{p}$ -norm (24) in Fourier domain. We take $p=\frac{1}{3}$ and in principal repeat the previous experiments with the new solutions. The main difference is that we now minimize a Tikhonov functional with variable penalty smoothness,

[TABLE]

From Section 2 we would expect $\delta^{2}/\alpha\rightarrow 0$ for $s<p=\frac{1}{3}$ , $\delta^{2}/\alpha\approx\mathrm{const}$ for $s=p=\frac{1}{3}$ , and $\delta^{2}/\alpha\rightarrow\infty$ for $s>p=\frac{1}{3}$ . The numerical results confirm this behavior, see Figure 2 for a plot and Table 2 for the regression results along (39) and (40). Note that in particular the exponent in the convergence rate $\kappa_{x}$ remains approximately constant as predicted by the theory.

Note that the “bumpy” structure in Figure 2 and related plots below are due to the discrepancy principle as exemplified in Figure 3 for $s=0.9$ .

5.2 Numerical studies for Model Problem 2

We now turn to the autoconvolution operator $F$ from (34). In this context, we consider only the specific solution $x^{\dagger}(t)=1\;(0\leq t\leq 1)$ , where $x^{\dagger}\notin X_{1}$ , and the minimization problem (3). It must be emphasized that here, in contrast to model problem 1, Assumption 3 does not apply for any radius $r>0$ . It is therefore completely unclear which behavior the regularized solutions $x_{\alpha}^{\delta}$ show when the noise level $\delta$ tends to zero. It is a pitfall for exploiting Tikhonov regularization to get stable approximations for $x^{\dagger}$ when the regularized solutions $x_{\alpha}^{\delta}$ from case (c), i.e., $0<p<1$ , do not meet the validity area of the conditional stability estimate even if $\delta>0$ is sufficiently small. Then due to $x_{\alpha}^{\delta}\notin Q$ estimates of type (5) are useless, convergence $x_{\alpha}^{\delta}\to x^{\dagger}$ as $\delta\to 0$ cannot be ensured, and the behaviour of the regularized solutions remains unclear. Such a situation occurs, as shown before, in Model problem 2 with $Q=\mathcal{B}^{X_{1}}_{\kappa}(0)$ and $x^{\dagger}\notin Q$ for $x^{\dagger}\equiv 1$ . The following numerical case studies for that situation enlighten the properties of $x_{\alpha}^{\delta}$ . For the test computations, again the discrepancy principle $\alpha_{DP}=\alpha(\delta,y^{\delta})$ according to (38) has been used with $C=1.3$ and a discretization of $N=200$ grid points over the interval $[0,1]$ . In particular, we demonstrate that $\|x_{\alpha_{DP}}^{\delta}-x^{{\dagger}}\|_{X}$ does not tend to zero for $\delta\rightarrow 0$ .

Figure 4 shows the regularization error $\|x_{\alpha_{DP}}^{\delta}-x^{{\dagger}}\|_{X}$ depending on the relative noise level $\bar{\delta}$ . It can be seen, that $\|x_{\alpha_{DP}}^{\delta}-x^{{\dagger}}\|_{X}$ decreases for decreasing noise levels $\bar{\delta}$ whenever $\bar{\delta}\geq 2.9\cdot 10^{-4}$ . If $\bar{\delta}$ falls below the value $2.9\cdot 10^{-4}$ , then the monotonicity turns around and $\|x^{{\dagger}}-x_{\alpha}^{\delta}\|_{X}$ begins to grow. As illustrated in the overview of Figure 5, the regularized solutions tend to oscillate for small $\delta>0$ , especially near the left and right boundaries of the interval $[0,1]$ in the sense of the Gibbs phenomenon. The Gibbs phenomenon at the right boundary $t=1$ accompanies the required jump from one to zero between $x^{\dagger}\notin X_{1}$ and $x_{\alpha}^{\delta}\in X_{1}$ . The oscillations blow up for small values of $\delta$ (Figure 5 (c)–(f)) and indicate non-convergence of $x_{\alpha}^{\delta}$ for $\delta\to 0$ . Note that the Gibbs phenomenon starts to appear around the minimum of $\delta^{2}/\alpha$ , compare to Figure 4.

To confirm that this phenomenon is inherent to the oversmoothing situation, we consider again the Tikhonov functional (41) with $H^{s}$ -penalty for $x^{\dagger}\equiv 1$ , $s=0.1$ and $s=0.5$ respectively. As $x^{\dagger}\equiv 1\in X_{p}$ for $0<p<1/2$ we expect similar asymptotic behavior of $\frac{\delta^{2}}{\alpha}$ for $\delta\rightarrow 0$ as at the end of Section 5.1. Figure 6 shows the result.

5.3 Numerical studies for Model Problem 3

Based on the case destinction in Section 1 we now study the convergence rates and properties of $\frac{\delta^{2}}{\alpha}$ as $\delta$ decays to zero for the Model Problem 3 in case (b) of Case distinction. Using the Sobolev-scale with norm (24) we define $x^{\dagger}\in X_{p}$ , but $x^{\dagger}\notin X_{p+\epsilon}$ via (37). For given $x^{\dagger}\in X_{p}$ in the above sense, we then turn to the Tikhonov functional (41) with penalty in $H^{s}[0,1]$ . Again we choose $p=\frac{1}{3}$ which means $x^{\dagger}\in X_{1/3}$ such that we can employ the theory from Model Problem 1. For $s>p$ we are in the classical setting and therefore expect $\frac{\delta^{2}}{\alpha}\rightarrow 0$ as $\delta\rightarrow 0$ , for $s<p$ we are in a oversmoothing situation and expect that $\frac{\delta^{2}}{\alpha}\rightarrow\infty$ . Letting $s=p$ yields precisely case (b) of the Case distinction, and $\frac{\delta^{2}}{\alpha}$ should remain approximately constant. The numerical results, see Figure 7 and Table 3, confirm this. Note that, since $a=1$ and $p=\frac{1}{3}$ , we expect and obtain $\kappa_{x}=0.25$ . We also see that the $\kappa_{\alpha}<2$ for s=0.1, $\kappa>2$ for $s=0.9$ , i.e. in the oversmoothing situation, and $k\approx 2$ and therefore $\frac{\delta^{2}}{\alpha}$ approximately constant for the situation where $x^{\dagger}$ and penalty term are of the same smoothness.

Appendix: Proof of Proposition 3.7

In this proof we set $E:=\|x^{\dagger}\|_{p}$ . To prove the convergence rate result (20) under the a priori parameter choice (21) it is sufficient to show that for sufficiently small $\delta>0$ there are two constants $K>0$ and $\tilde{E}>0$ such that the inequalities

[TABLE]

and

[TABLE]

hold. Namely, the convergence rate (20) follows directly from inequality chain

[TABLE]

which is valid for sufficiently small $\delta>0$ as a consequence of (42), (43) and of the interpolation inequality for the Hilbert scale $\{X_{\tau}\}_{\tau\in\mathbb{R}}$ .

As an essential tool for the proof we use auxiliary elements $x_{\alpha}$ , which are for all $\alpha>0$ the uniquely determined minimizers over all $x\in X$ of the artificial Tikhonov functional

[TABLE]

Note that the elements $x_{\alpha}$ are independent of the noise level $\delta>0$ and belong by definition to $X_{1}$ , which is in strong contrast to $x^{\dagger}\notin X_{1}$ .

The following lemma is an immediate consequence of [24, Prop. 2], see also [25, Prop. 3].

Lemma 5.1.

Let $\|x^{\dagger}\|_{p}=E$ and $x_{\alpha}$ be the minimizer of the functional $T_{-a,\alpha}$ from (44) over all $x\in X$ . Given $\alpha_{\ast}=\alpha_{\ast}(\delta)>0$ as defined by formula (21) the resulting element $x_{\alpha_{\ast}}$ obeys the bounds

[TABLE]

and

[TABLE]

Due to (45) we have $\|x_{\alpha_{\ast}}-x^{\dagger}\|_{X}\to 0$ as $\delta\to 0$ . Hence by Assumption 4, in particular because $x^{\dagger}$ is an interior point of $\mathcal{D}(F)$ , for sufficiently small $\delta>0$ the element $x_{\alpha_{\ast}}$ belongs to $\mathcal{B}^{X}_{r}(x^{\dagger})\subset\mathcal{D}(F)$ and moreover with $x_{\alpha_{\ast}}\in X_{1}$ the inequality (19) applies for $x=x_{\alpha_{\ast}}$ and such small $\delta$ .

Instead of the inequality (8), which is missing in case of oversmoothing penalties, we can use here the inequality

[TABLE]

as minimizing property for the Tikhonov functional. Using (48) it is enough to bound $T^{\delta}_{\alpha_{\ast}}(x_{\alpha_{\ast}})$ by $\overline{C}^{2}\delta^{2}$ with

[TABLE]

in order to obtain the estimates

[TABLE]

Since the inequality (19) applies for $x=x_{\alpha_{\ast}}$ and sufficiently small $\delta>0$ , we can estimate for such $\delta$ as follows:

[TABLE]

This ensures the estimates (50) and (51). Based on this we are going now to show that an inequality (42) is valid for some $K>0$ . Here, we use the inequality (18) of Assumption 4, which applies for $x=x_{\alpha_{\ast}}^{\delta}$ , and we find

[TABLE]

where $\overline{C}$ is the constant from (49) and we derive $K:=\frac{1}{\underline{K}}\left(\overline{C}+1\right)$ .

Secondly, we still have to show the existence of a constant $\tilde{E}>0$ such that the inequality (43) holds. By using the triangle inequality in combination with (51) and (47) we find that

[TABLE]

Again, we use the interpolation inequality and can estimate further as

[TABLE]

Finally, we have now

[TABLE]

This shows (43) and thus completes the proof of Proposition 3.7. ∎

Acknowledgment

We thank the colleagues Volker Michel and Robert Plato from the University of Siegen for a hint to the series that allowed us to formulate Model problem 3. The research was financially supported by Deutsche Forschungsgemeinschaft (DFG-grant HO 1454/12-1).

Bibliography51

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. A. Adams and J. F. J. Fournier. Sobolev spaces. Elsevier/Academic Press, Amsterdam, 2003.
2[2] S. W. Anzengruber, B. Hofmann, and P. Mathé. Regularization properties of the sequential discrepancy principle for Tikhonov regularization in Banach spaces. Appl. Anal. , 93(7):1382–1400, 2014.
3[3] S. W. Anzengruber and R. Ramlau. Morozov’s discrepancy principle for Tikhonov-type functionals with nonlinear operators. Inverse Problems , 26(2):025001 (17pp), 2010.
4[4] R. I. Boţ and B. Hofmann. An extension of the variational inequality approach for obtaining convergence rates in regularization of nonlinear ill-posed problems. Journal of Integral Equations and Applications , 22(3):369–392, 2010.
5[5] S. Bürger and B. Hofmann. About a deficit in low order convergence rates on the example of autoconvolution. Applicable Analysis , 94:477–493, 2015.
6[6] M. Burger, J. Flemming, and B. Hofmann. Convergence rates in ℓ 1 superscript ℓ 1 \ell^{1} -regularization if the sparsity assumption fails. Inverse Problems , 29:025013 (16pp), 2013.
7[7] J. Cheng, B. Hofmann, and S. Lu. The index function and Tikhonov regularization for ill-posed problems. J. Comput. Appl. Math. , 265:110–119, 2014.
8[8] J. Cheng and M. Yamamoto. On new strategy for a priori choice of regularizing parameters in Tikhonov’s regularization. Inverse Problems , 16:L 31–L 38, 2000.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Case studies and a pitfall for nonlinear variational regularization under

Abstract

keywords:

1 Introduction

Case distinction**.**

2 Convergence

Assumption 1**.**

Proposition 2.1**.**

Proof 2.2**.**

Corollary 2.3**.**

3 Convergence rate results

Assumption 2**.**

Proposition 3.1**.**

Remark 3.2**.**

Proposition 3.3**.**

Proof 3.4**.**

Assumption 3**.**

Corollary 3.5**.**

Proof 3.6**.**

Assumption 4**.**

Proposition 3.7**.**

4 Examples

Model problem 1** (Exponential growth model).**

Model problem 2** (Autoconvolution).**

Model problem 3** (Situation of x†x^{\dagger}x† meeting case (b)).**

Lemma 4.1**.**

Proof 4.2**.**

5 Case studies

5.1 Numerical studies for Model Problem 1

5.2 Numerical studies for Model Problem 2

5.3 Numerical studies for Model Problem 3

Appendix: Proof of Proposition 3.7

Lemma 5.1**.**

Acknowledgment

Case distinction.

Assumption 1.

Proposition 2.1.

Proof 2.2.

Corollary 2.3.

Assumption 2.

Proposition 3.1.

Remark 3.2.

Proposition 3.3.

Proof 3.4.

Assumption 3.

Corollary 3.5.

Proof 3.6.

Assumption 4.

Proposition 3.7.

Model problem 1 (Exponential growth model).

Model problem 2 (Autoconvolution).

Model problem 3 (Situation of $x^{\dagger}$ meeting case (b)).

Lemma 4.1.

Proof 4.2.

Lemma 5.1.