Local convergence analysis of the Gauss-Newton-Kurchatov method

Ioannis K. Argyros; Stepan Shakhno

arXiv:1906.03505·math.NA·September 23, 2024

Local convergence analysis of the Gauss-Newton-Kurchatov method

Ioannis K. Argyros, Stepan Shakhno

PDF

Open Access

TL;DR

This paper analyzes the local convergence of the Gauss-Newton-Kurchatov method for nonlinear least squares problems, improving convergence region and accuracy over previous results through refined estimates and weaker hypotheses.

Contribution

It provides an enhanced convergence analysis of the Gauss-Newton-Kurchatov method, extending the convergence region and improving solution accuracy under weaker assumptions.

Findings

01

Extended convergence region compared to previous results

02

Finer error estimates and solution localization

03

Numerical examples confirm theoretical improvements

Abstract

We present a local convergence analysis of the Gauss-Newton-Kurchatov method for solving nonlinear least squares problems with a decomposition of the operator. The method uses the sum of the derivative of the differentiable part of the operator and the divided difference of the nondifferentiable part instead of computing the full Jacobian. A theorem, which establishes the conditions of convergence, radius and the convergence order of the proposed method, is proved (Shakhno 2017). However, the radius of convergence is small in general limiting the choice of initial points. Using tighter estimates on the distances, under weaker hypotheses (Argyros et al. 2013), we provide an analysis of the Gauss-Newton-Kurchatov method with the following advantages over the corresponding results (Shakhno 2017): extended convergence region; finer error distances, and an at least as precise information on…

Equations202

x \in R^{n} min \frac{1}{2} F (x)^{⊤} F (x),

x \in R^{n} min \frac{1}{2} F (x)^{⊤} F (x),

x \in R^{n} min \frac{1}{2} (F (x) + G (x))^{⊤} (F (x) + G (x)),

x \in R^{n} min \frac{1}{2} (F (x) + G (x))^{⊤} (F (x) + G (x)),

\begin{array}[]{l}{x_{n+1}=x_{n}-(A_{n}^{{\bf\top}}A_{n})^{-1}A_{n}^{{\bf\top}}(F(x_{n})+G(x_{n})),}\\ {}\\ {A_{n}=F^{\prime}(x_{n})+G(2x_{n}-x_{n-1},x_{n-1}),{\rm\;\;\;\;\;\;}n=0,1,\ldots,}\end{array}

\begin{array}[]{l}{x_{n+1}=x_{n}-(A_{n}^{{\bf\top}}A_{n})^{-1}A_{n}^{{\bf\top}}(F(x_{n})+G(x_{n})),}\\ {}\\ {A_{n}=F^{\prime}(x_{n})+G(2x_{n}-x_{n-1},x_{n-1}),{\rm\;\;\;\;\;\;}n=0,1,\ldots,}\end{array}

\begin{array}[]{l}{x_{n+1}=x_{n}-A_{n}^{-1}(F(x_{n})+G(x_{n})),}\\ {}\\ {A_{n}=F^{\prime}(x_{n})+G(2x_{n}-x_{n-1},x_{n-1}),{\rm\;\;\;}n=0,1,\ldots}\end{array}.

\begin{array}[]{l}{x_{n+1}=x_{n}-A_{n}^{-1}(F(x_{n})+G(x_{n})),}\\ {}\\ {A_{n}=F^{\prime}(x_{n})+G(2x_{n}-x_{n-1},x_{n-1}),{\rm\;\;\;}n=0,1,\ldots}\end{array}.

\begin{array}[]{l}{x_{n+1}=x_{n}-A_{n}^{-1}(F(x_{n})+G(x_{n})),}\\ {}\\ {A_{n}=F^{\prime}(x_{n})+G(x_{n},x_{n-1}),{\rm\;\;\;\;}n=0,1,\ldots}\end{array}.

\begin{array}[]{l}{x_{n+1}=x_{n}-A_{n}^{-1}(F(x_{n})+G(x_{n})),}\\ {}\\ {A_{n}=F^{\prime}(x_{n})+G(x_{n},x_{n-1}),{\rm\;\;\;\;}n=0,1,\ldots}\end{array}.

∥ F^{'} (x) - F^{'} (x^{*}) ∥ \leq L_{0} ∥ x - x^{*} ∥,

∥ F^{'} (x) - F^{'} (x^{*}) ∥ \leq L_{0} ∥ x - x^{*} ∥,

∥ G (x, y) - G (u, v) ∥ \leq M_{0} (∥ x - u ∥ + ∥ y - v ∥),

∥ G (x, y) - G (u, v) ∥ \leq M_{0} (∥ x - u ∥ + ∥ y - v ∥),

∥ G (u, x, y) - G (v, x, y) ∥ \leq N_{0} ∥ u - v ∥ .

∥ G (u, x, y) - G (v, x, y) ∥ \leq N_{0} ∥ u - v ∥ .

h (t) = B [(2 α + (L_{0} + 2 M_{0}) t + N_{0} t^{2}] [(L_{0} /2 + M_{0}) t + N_{0} t^{2}] .

h (t) = B [(2 α + (L_{0} + 2 M_{0}) t + N_{0} t^{2}] [(L_{0} /2 + M_{0}) t + N_{0} t^{2}] .

∥ F^{'} (x) - F^{'} (y) ∥ \leq L ∥ x - y ∥

∥ F^{'} (x) - F^{'} (y) ∥ \leq L ∥ x - y ∥

∥ G (x, y) - G (u, v) ∥ \leq M (∥ x - u ∥ + ∥ y - v ∥)

∥ G (x, y) - G (u, v) ∥ \leq M (∥ x - u ∥ + ∥ y - v ∥)

∥ G (u, x, y) - G (v, x, y) ∥ \leq N ∥ u - v ∥ .

∥ G (u, x, y) - G (v, x, y) ∥ \leq N ∥ u - v ∥ .

∥ F^{'} (x) - F^{'} (y) ∥ \leq L_{1} ∥ x - y ∥

∥ F^{'} (x) - F^{'} (y) ∥ \leq L_{1} ∥ x - y ∥

(A_{*}^{⊤} A_{*})^{- 1} \leq B .

(A_{*}^{⊤} A_{*})^{- 1} \leq B .

∥ F (x^{*}) + G (x^{*}) ∥ \leq η, ∥ F^{'} (x^{*}) + G (x^{*}, x^{*}) ∥ \leq α,

∥ F (x^{*}) + G (x^{*}) ∥ \leq η, ∥ F^{'} (x^{*}) + G (x^{*}, x^{*}) ∥ \leq α,

B (L + 2 M) η < 1,

B (L + 2 M) η < 1,

Ω (x^{*}, 3 r_{*}) \subseteq D,

Ω (x^{*}, 3 r_{*}) \subseteq D,

\begin{array}[]{c}q(r)=B{\bf[}(\alpha+(L\,+2M)r+4Nr^{2})((L/2+M)r+4Nr^{2})+(L+2M+4Nr)\eta{\bf]}\\ +B{\bf[}2\alpha+(L_{0}+2M_{0})r+4N_{0}r^{2}{\bf][}(L_{0}+2M_{0})r+4N_{0}r^{2}{\bf]}-1.\end{array}

\begin{array}[]{c}q(r)=B{\bf[}(\alpha+(L\,+2M)r+4Nr^{2})((L/2+M)r+4Nr^{2})+(L+2M+4Nr)\eta{\bf]}\\ +B{\bf[}2\alpha+(L_{0}+2M_{0})r+4N_{0}r^{2}{\bf][}(L_{0}+2M_{0})r+4N_{0}r^{2}{\bf]}-1.\end{array}

\begin{array}[]{c}\left\|x_{n+1}-x^{*}\right\|\leq C_{1}\left\|x_{n}-x^{*}\right\|+C_{2}\left\|x_{n}-x_{n-1}\right\|^{2}+C_{3}\left\|x_{n}-x^{*}\right\|^{2}\\ +C_{4}\left\|x_{n-1}-x^{*}\right\|^{2}\left\|x_{n}-x^{*}\right\|,\end{array}

\begin{array}[]{c}\left\|x_{n+1}-x^{*}\right\|\leq C_{1}\left\|x_{n}-x^{*}\right\|+C_{2}\left\|x_{n}-x_{n-1}\right\|^{2}+C_{3}\left\|x_{n}-x^{*}\right\|^{2}\\ +C_{4}\left\|x_{n-1}-x^{*}\right\|^{2}\left\|x_{n}-x^{*}\right\|,\end{array}

g (r) = B [1 - B (2 α + (L_{0} + 2 M_{0}) r + 4 N_{0} r^{2}) ((L_{0} + 2 M_{0}) r + 4 N_{0} r^{2})]^{- 1},

g (r) = B [1 - B (2 α + (L_{0} + 2 M_{0}) r + 4 N_{0} r^{2}) ((L_{0} + 2 M_{0}) r + 4 N_{0} r^{2})]^{- 1},

C_{1} = g (r_{*}) (L + 2 M) η, C_{2} = g (r_{*}) N η,

C_{1} = g (r_{*}) (L + 2 M) η, C_{2} = g (r_{*}) N η,

C_{3} = g (r_{*}) (L / 2 + M) (α + (L + 2 M) r_{*} + 4 N r_{*}^{2}),

C_{3} = g (r_{*}) (L / 2 + M) (α + (L + 2 M) r_{*} + 4 N r_{*}^{2}),

C_{4} = g (r_{*}) N (α + (L + 2 M) r_{*} + 4 N r_{*}^{2}) .

C_{4} = g (r_{*}) N (α + (L + 2 M) r_{*} + 4 N r_{*}^{2}) .

∥ 2 x_{0} - x_{- 1} - x^{*} ∥ \leq ∥ x_{0} - x^{*} ∥ + ∥ x_{0} - x_{- 1} ∥

∥ 2 x_{0} - x_{- 1} - x^{*} ∥ \leq ∥ x_{0} - x^{*} ∥ + ∥ x_{0} - x_{- 1} ∥

\leq ∥ x_{0} - x^{*} ∥ + ∥ x_{0} - x^{*} ∥ + ∥ x_{- 1} - x^{*} ∥ < 3 r_{*} .

\leq ∥ x_{0} - x^{*} ∥ + ∥ x_{0} - x^{*} ∥ + ∥ x_{- 1} - x^{*} ∥ < 3 r_{*} .

\begin{array}[]{c}\left\|I-(A_{*}^{{\bf\top}}A_{*})^{-1}A_{0}^{{\bf\top}}A_{0}\right\|=\left\|(A_{*}^{{\bf\top}}A_{*}^{{\rm\;}})^{-1}(A_{*}^{{\bf\top}}A_{*}^{{\rm\;}}-A_{0}^{{\bf\top}}A_{0})\right\|\\ \\ =\left\|(A_{*}^{{\bf\top}}A_{*}^{{\rm\;}})^{-1}{\rm(}A_{*}^{{\rm\top}}(A_{*}^{{\rm\;}}-A_{0})+(A_{*}^{{\rm\top}}-A_{0}^{{\rm\top}})(A_{0}-A_{*}^{{\rm\;}})\right.\\ \\ \left.+(A_{*}^{{\bf\top}}-A_{0}^{{\bf\top}})A_{*}^{{\rm\;}}{\rm)}\right\|\leq\left\|(A_{*}^{{\rm\top}}A_{*}^{{\rm\;}})^{-1}\right\|{\rm(}\,\left\|A_{*}^{{\rm\top}}\right\|\left\|A_{*}^{{\rm\;}}-A_{0}\right\|\\ \\ +\left\|A_{*}^{{\bf\top}}-A_{0}^{{\bf\top}}\right\|\left\|A_{0}-A_{*}^{{\rm\;}}\right\|+\left\|A_{*}^{{\bf\top}}-A_{0}^{{\bf\top}}\right\|\left\|A_{*}^{{\rm\;}}\right\|\,{\rm)}\\ \\ \leq B{\rm(}\alpha\left\|A_{*}^{{\rm\;}}\,-\,A_{0}\right\|+\left\|A_{*}^{{\bf\top}}\,-\,A_{0}^{{\bf\top}}\right\|\left\|A_{0}\,-\,A_{*}^{{\rm\;}}\right\|+\alpha\left\|A_{*}^{{\bf\top}}\,-\,A_{0}^{{\bf\top}}\right\|\,{\bf)}.\end{array}

\begin{array}[]{c}\left\|I-(A_{*}^{{\bf\top}}A_{*})^{-1}A_{0}^{{\bf\top}}A_{0}\right\|=\left\|(A_{*}^{{\bf\top}}A_{*}^{{\rm\;}})^{-1}(A_{*}^{{\bf\top}}A_{*}^{{\rm\;}}-A_{0}^{{\bf\top}}A_{0})\right\|\\ \\ =\left\|(A_{*}^{{\bf\top}}A_{*}^{{\rm\;}})^{-1}{\rm(}A_{*}^{{\rm\top}}(A_{*}^{{\rm\;}}-A_{0})+(A_{*}^{{\rm\top}}-A_{0}^{{\rm\top}})(A_{0}-A_{*}^{{\rm\;}})\right.\\ \\ \left.+(A_{*}^{{\bf\top}}-A_{0}^{{\bf\top}})A_{*}^{{\rm\;}}{\rm)}\right\|\leq\left\|(A_{*}^{{\rm\top}}A_{*}^{{\rm\;}})^{-1}\right\|{\rm(}\,\left\|A_{*}^{{\rm\top}}\right\|\left\|A_{*}^{{\rm\;}}-A_{0}\right\|\\ \\ +\left\|A_{*}^{{\bf\top}}-A_{0}^{{\bf\top}}\right\|\left\|A_{0}-A_{*}^{{\rm\;}}\right\|+\left\|A_{*}^{{\bf\top}}-A_{0}^{{\bf\top}}\right\|\left\|A_{*}^{{\rm\;}}\right\|\,{\rm)}\\ \\ \leq B{\rm(}\alpha\left\|A_{*}^{{\rm\;}}\,-\,A_{0}\right\|+\left\|A_{*}^{{\bf\top}}\,-\,A_{0}^{{\bf\top}}\right\|\left\|A_{0}\,-\,A_{*}^{{\rm\;}}\right\|+\alpha\left\|A_{*}^{{\bf\top}}\,-\,A_{0}^{{\bf\top}}\right\|\,{\bf)}.\end{array}

\begin{array}[]{c}\left\|G(2x_{0}-x_{-1},x_{-1})-G(x_{0},x_{0})\right\|\\ \\ =\left\|G(2x_{0}-x_{-1},x_{-1})-G(x_{0},x_{-1})+G(x_{0},x_{-1})-G(x_{0},x_{0})\right\|\\ \\ =\left\|G(2x_{0}\,-x_{-1},x_{-1},x_{0})(x_{0}\,-x_{-1})-G(x_{0},x_{-1},x_{0})(x_{0}\,-x_{-1})\right\|\\ \\ \leq N_{0}\left\|x_{0}-x_{-1}\right\|^{2}\end{array}

\begin{array}[]{c}\left\|G(2x_{0}-x_{-1},x_{-1})-G(x_{0},x_{0})\right\|\\ \\ =\left\|G(2x_{0}-x_{-1},x_{-1})-G(x_{0},x_{-1})+G(x_{0},x_{-1})-G(x_{0},x_{0})\right\|\\ \\ =\left\|G(2x_{0}\,-x_{-1},x_{-1},x_{0})(x_{0}\,-x_{-1})-G(x_{0},x_{-1},x_{0})(x_{0}\,-x_{-1})\right\|\\ \\ \leq N_{0}\left\|x_{0}-x_{-1}\right\|^{2}\end{array}

\begin{array}[]{c}\left\|G(2x_{0}-x_{-1},x_{-1})-G(x_{0},x^{*})\right\|\\ \\ =\left\|G(2x_{0}-x_{-1},x_{-1})-G(x_{0},x_{0})+G(x_{0},x_{0})-G(x_{0},x^{*})\right\|\\ \\ \leq N_{0}\left\|x_{0}-x_{-1}\right\|^{2}+M_{0}\left\|x_{0}-x^{*}\right\|.\end{array}

\begin{array}[]{c}\left\|G(2x_{0}-x_{-1},x_{-1})-G(x_{0},x^{*})\right\|\\ \\ =\left\|G(2x_{0}-x_{-1},x_{-1})-G(x_{0},x_{0})+G(x_{0},x_{0})-G(x_{0},x^{*})\right\|\\ \\ \leq N_{0}\left\|x_{0}-x_{-1}\right\|^{2}+M_{0}\left\|x_{0}-x^{*}\right\|.\end{array}

\begin{array}[]{c}\left\|A_{0}-A_{*}\right\|=\left\|(F^{\prime}(x_{0})+G(2x_{0}-x_{-1},x_{-1}))-(F^{\prime}(x^{*})+G(x^{*},x^{*}))\right\|\\ \\ =\left\|F^{\prime}(x_{0})-F^{\prime}(x^{*})+G(2x_{0}-x_{-1},x_{-1})\right.\\ \\ -\left.G(x_{0},x^{*})+G(x_{0},x^{*})-G(x^{*},x^{*})\right\|\\ \\ \leq L\left\|x_{0}-x^{*}\right\|+N\left\|x_{0}-x_{-1}\right\|^{2}+2M\left\|x_{0}-x^{*}\right\|\\ \\ =(L_{0}+2M_{0})\left\|x_{0}-x^{*}\right\|+N_{0}\left\|x_{0}-x_{-1}\right\|^{2}.\end{array}

\begin{array}[]{c}\left\|A_{0}-A_{*}\right\|=\left\|(F^{\prime}(x_{0})+G(2x_{0}-x_{-1},x_{-1}))-(F^{\prime}(x^{*})+G(x^{*},x^{*}))\right\|\\ \\ =\left\|F^{\prime}(x_{0})-F^{\prime}(x^{*})+G(2x_{0}-x_{-1},x_{-1})\right.\\ \\ -\left.G(x_{0},x^{*})+G(x_{0},x^{*})-G(x^{*},x^{*})\right\|\\ \\ \leq L\left\|x_{0}-x^{*}\right\|+N\left\|x_{0}-x_{-1}\right\|^{2}+2M\left\|x_{0}-x^{*}\right\|\\ \\ =(L_{0}+2M_{0})\left\|x_{0}-x^{*}\right\|+N_{0}\left\|x_{0}-x_{-1}\right\|^{2}.\end{array}

∥ A_{0} ∥ \leq ∥ A_{0} ∥ ∣∣ + ∥ A_{0} - A_{*} ∥ \leq α + (L_{0} + 2 M_{0}) ∥ x_{0} - x^{*} ∥ + N_{0} ∥ x_{0} - x_{- 1} ∥^{2} .

∥ A_{0} ∥ \leq ∥ A_{0} ∥ ∣∣ + ∥ A_{0} - A_{*} ∥ \leq α + (L_{0} + 2 M_{0}) ∥ x_{0} - x^{*} ∥ + N_{0} ∥ x_{0} - x_{- 1} ∥^{2} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIterative Methods for Nonlinear Equations · Advanced Optimization Algorithms Research · Numerical methods in inverse problems

Full text

Local convergence analysis of the Gauss-Newton-Kurchatov method

Ioannis K. Argyros1, Stepan Shakhno2

1Department of Mathematics, Cameron University,

Lawton, USA, OK 73505;

[email protected],

2Department of Theory of Optimal Processes,

Ivan Franko National University of Lviv,

Lviv, Ukraine, 79000;

[email protected]

Abstract

**Abstract. ** We present a local convergence analysis of the Gauss-Newton-Kurchatov method for solving nonlinear least squares problems with a decomposition of the operator. The method uses the sum of the derivative of the differentiable part of the operator and the divided difference of the nondifferentiable part instead of computing the full Jacobian. A theorem, which establishes the conditions of convergence, radius and the convergence order of the proposed method, is proved (Shakhno 2017). However, the radius of convergence is small in general limiting the choice of initial points. Using tighter estimates on the distances, under weaker hypotheses (Argyros et al. 2013), we provide an analysis of the Gauss-Newton-Kurchatov method with the following advantages over the corresponding results (Shakhno 2017): extended convergence region; finer error distances, and an at least as precise information on the location of the solution. The numerical examples illustrate the theoretical results.

**Keywords: ** Gauss-Newton-Kurchatov method, local convergence, Fréchet-derivative, Lipschitz / center-Lipschitz condition, convergence domain.

**AMS Classification: **65F20, 65G99, 65H10, 49M15

1 Introduction

Let us consider the problem of finding an approximate solution of the nonlinear least squares problem

[TABLE]

where the residual function $F:D\subseteq{\bf{\rm R}}^{n}\to{\bf{\rm R}}^{m}$ , $m\geq n$ is nonlinear in $x$ , $F$ is continously differentiable, and $D$ is an open convex set in ${\bf{\rm R}}^{n}$ .

A large number of problems in applied mathematics and also in engineering are solved by finding the solutions of problem (1). For example, solving overdetermined systems of nonlinear equations, estimating parameters of physical processes by measurement results, constructing nonlinear regressions models for solving engineering, problems dynamic systems, etc. The used solution methods are iterative – when starting from one or several initial approximations a sequence is constructed that converges to a solution of the problems (1).

Known methods of the Gauss-Newton type (Dennis et al. 1996; Ortega et al. 1970; Argyros 2008; Shakhno 2001) are used to solve the problem (1), which have derivatives of function in their iterative formulas. However, in practice, problems with calculations of derivative arise. In this case, we can use iterative-difference methods (Argyros 2008; Ren et al. 2010, 2011; Shakhno et al. 1999, 2005) that do not require the calculation of the matrix of derivatives and often are not inferior over the Gauss-Newton method at the order of convergence and the number of iterations. But sometimes the nonlinear function consists of differentiable and non-differentiable parts. Then a nonlinear least squares problem arises

[TABLE]

where the residual function $F+G:D\subseteq{\bf{\rm R}}^{n}\to{\bf{\rm R}}^{m}$ , $m\geq n$ , is nonlinear in $x$ , $F$ is continously differentiable, $G$ is continous function, differentiability of which, in general, is not assumed, and $D$ is an open convex set in ${\bf{\rm R}}^{n}$ . Although it is possible to apply iterative-difference methods for solving a nonlinear problem (2), but it is also possible to construct iterative methods that take into account the decomposition of the residual function. In this case, when solving nonlinear equations, methods (Shakhno et al. 2014, 2011; Shakhno 2016; Cătinaş 1994; Iakymchuk et al. 2016) were constructed as combinations of the Newton method (Dennis et al. 1996; Ortega et al. 1970; Argyros 2008; Deuflhard 2004) and iterative-difference methods of chord (secant) and Kurchatov (Dennis et al. 1996; Ortega et al. 1970; Shakhno 2006, 2007; Argyros 2008; Ren et al. 2010, 2011; Shakhno et al. 2005).

In the paper (Shakhno 2017), we proposed a method for solving a nonlinear problem of least squares with a non-differentiable operator (2) constructed on the basis of the Gauss-Newton method method (Dennis et al. 1996; Ortega et al. 1970) and the Kurchatov type method (Shakhno et al. 2011, 2005; Ren 2011). We studied its local convergence under Lipschitz conditions and showed its effectiveness in comparison with other methods using test problems.

2 Preliminaries

To find the solution of the problem (2) we consider the Gauss-Newton-Kurchatov method (Shakhno 2017):

[TABLE]

where $F^{\prime}(x_{n})$ is matrix of Jacobi of $F(x)$ ; $G(2x_{n}-x_{n-1},x_{n-1})$ is the divided difference of the first order of functions (Ulm 1967), and the points $2x_{n}-x_{n-1},x_{n-1}$ ; $x_{0}$ , $x_{-1}$ are initial approximations. Method (3) is a combination of the Gauss-Newton method (Dennis et al. 1996; Ortega et al. 1970) and the Kurchatov type method (Shakhno et al. 2011, 2005; Ren 2011).

If $m=n$ , method (3) reduces to the Newton-Kurchatov method for solving the nonlinear equation $F(x)+G(x)=0$ (Shakhno et al. 2016, 2015; Hernández-Verón 2017; Iakymchuk et al. 2016):

[TABLE]

Setting in (3) $A_{n}=F^{\prime}(x_{n})+G(x_{n},x_{n-1})$ , we obtain a combination of the Gauss-Newton method (Dennis et al. 1996; Ortega et al. 1970) and the Secant type method (Ren et al. 2010; Shakhno et al. 2005) of the form (Shakhno et al. 2017)

[TABLE]

We need the following Lipschitz conditions.

Definition 2.1.

We say that the Fréchet derivative $F^{\prime}$ satisfies the center Lipschitz conditions on $D$ , if there exist such that for each $x\in D$

[TABLE]

where $x^{*}\in D$ solves problem (2).

Definition 2.2.

We say that divided differences $G(\,\cdot\,,\;\cdot)$ and $G(\,\cdot\,,\;\cdot\;,\;\cdot\,)$ satisfy the special Lipschitz conditions on $D\times D$ and $D\times D\times D$ , if there exist $M_{0}>0$ and $N_{0}>0$ such that for each $x,y\in D$

[TABLE]

and

[TABLE]

Let $B>0$ and $\alpha>0$ . Define function $h$ on $[0,\,+\infty)$ by

[TABLE]

Suppose that equation $h(t)=1$ has at least one positive solution. Denote by $\gamma$ the smallest such solution. Set $D_{0}=D\bigcap\Omega(x^{*},\gamma)$ .

Definition 2.3.

We say that the Fréchet derivative $F^{\prime}$ satisfies the restricted special Lipschitz conditions on $D_{0}$ , if there exist $L>0$ such that for euch $x,y\in D_{0}$

[TABLE]

Definition 2.4.

We say that divided differences $G(\,\cdot\,,\;\cdot)$ and $G(\,\cdot\,,\;\cdot\;,\;\cdot\,)$ satisfy the special Lipschitz conditions on $D_{0}\times D_{0}$ and $D_{0}\times D_{0}\times D_{0}$ , respectively, if there exist $M>0$ and $N>0$ such that for each $x,y,u,v\in D_{0}$

[TABLE]

and

[TABLE]

The following condition together with (7) and (8) have been used instead of the preceding ones in the study of such iterative methods (Shakhno 2017).

Definition 2.5.

We say that the Fréchet derivative $F^{\prime}$ satisfies the Lipschitz conditions on $D$ , if there exist $L_{1}>0$ such that for euch $x,y\in D$

[TABLE]

Let $\Omega(x^{*},3r_{*})={\rm\{}x:\left\|x-x^{*}\right\|<3r_{*}{\rm\}}.$

3 Convergence analysis of the iterative process (3)

Next, we improve Theorem 1 (Shakhno et al. 2017).

Theorem 3.1.

Let function $F+G:{\bf{\rm R}}^{n}\to{\bf{\rm R}}^{m}$ be continuous on the open subset $D\subseteq{\bf{\rm R}}^{n}$ , $F$ continuously differentiable in this domain, and let $G$ be a continuous function. Assume that the problem (1) has a solution $x^{*}$ in the domain and there exist the inverse operator $(A_{*}^{{\bf\top}}A_{*})^{-1}={\bf[}{\rm(}F^{\prime}(x^{*})+G(x^{*},x^{*}){\rm)}^{{\rm\top}}{\rm(}F^{\prime}(x^{*})+G(x^{*},x^{*}){\rm)}{\rm]}^{-1}$ and

[TABLE]

Estimates (6), (7), (8), (10), (11), (12) hold and $\gamma$ given by (9) exists,

[TABLE]

where $r_{*}$ is unique positive zero of the function $q$ , given by

[TABLE]

Then for $x_{0},\;x_{-1}\in\Omega(x^{*},r_{*})$ the iterative process (3) is well defined, the sequence ${\rm\{}x_{n}{\rm\}}$ , $n=0,1,\ldots$ , generated by it, remains in the open subset $\Omega(x^{*},r_{*})$ , and converges to the solution $x^{*}$ . Moreover, the following error estimates hold for $n=0,1,\ldots$

[TABLE]

where

[TABLE]

Proof. According to the intermediate value theorem on ${\bf[}0,\;r{\bf]}$ the function $q$ for a sufficiently large $r$ and by (15) has a positive zero denoted by $r_{*}$ . But $q^{\prime}(r)\geq 0$ for $r\geq 0.$ So, this root is the only one on ${\bf[}0,\;r{\bf]}$ .

By assumption $x_{0},\;x_{-1}\in\Omega(x^{*},r_{*})$ . Then we have

[TABLE]

So, $2x_{0}-x_{-1}\in\Omega(x^{*},3r_{*})$ .

Let’s denote $A_{n}=F^{\prime}(x_{n})+G(2x_{n}-x_{n-1},x_{n-1})$ . Let $n=0$ and we will get this estimate:

[TABLE]

Using (8), we get

[TABLE]

and

[TABLE]

We use inequalities (7), (20), (21):

[TABLE]

Then

[TABLE]

Then we obtain from the inequality (19) and the definition $r_{*}$ (16)

[TABLE]

By Banach’s theorem on the inverse operator (Ortega et al. 270) there exists $(A_{0}^{{\bf\top}}A_{0})^{-1}$ and we have from (24)

[TABLE]

Consequently, iterate $x_{1}$ is well defined.

Then let’s show that $x_{1}\in\Omega(x^{*},r_{*})$ . Using equality

[TABLE]

we will get an estimate

[TABLE]

Hence, taking into account (21), (23) and inequalities

[TABLE]

we will get

[TABLE]

where

[TABLE]

Hence, $x_{1}\in\Omega(x^{*},r_{*})$ and inequality (16) is true for $n=0$ .

Assume that $x_{n}\in\Omega(x^{*},r_{*})$ for $n=0,1,\ldots,k$ , and the estimate (17) for $n=0,1,\ldots,k-1$ , where $k\,\geq$ 1 is an integer, holds. Next we prove that $x_{n+1}\in\Omega(x^{*},r_{*})$ , and the estimate (17) holds for $n=k$ .

Define

[TABLE]

So, ${\rm(}A_{{}^{k}}^{{\bf\top}}A_{k}{\bf)}^{-1}$ exists and

[TABLE]

Therefore, the iteration $x_{k+1}$ is well defined, and we can get in turn

[TABLE]

i.e. $x_{k+1}\in\Omega(x^{*},r_{*})$ , and estimate (17) holds for $n=k$

Consequently, the iterative process (3) is well defined, $x_{n}\in\Omega(x^{*},r_{*})$ for all $n\geq 0$ , and estimate (17) holds for all $n\geq 0$ .

Next, we prove that $x_{n}\to x^{*}$ for $n\to\infty$ . Define functions $a$ and $b$ on ${\bf[}0,r_{*}{\bf]}$ by:

[TABLE]

where $\varphi(r)=\alpha+(L+2M)r+4Nr^{2}$ .

According to the choice $r_{*}$ , we have

[TABLE]

Using the estimate (17), the definition of constants $C_{i},\,\,i=1,2,3,4$ , as well as the functions $a$ and $b$ , for $n\geq 0$ , we obtain

[TABLE]

Similarly to (Ren at al. 2011), we prove that under the conditions (25), (26) the sequence ${\rm\{}x_{n}{\rm\}}$ for $n\to\infty$ converges to $x^{*}$ .

First of all, for a real number $r_{*}>0$ and initial points $x_{0},x_{-1}\in\Omega(x^{*},r_{*})$ there exists a real number $r^{\prime}$ such that $0<r^{\prime}<r_{*}$ , $x_{0},x_{-1}\in\Omega(x^{*},r^{\prime})$ . Then all the above estimates for the sequence ${\rm\{}x_{n}{\rm\}}$ are valid, if replaced $r_{*}$ by $r^{\prime}$ . In particular, from (27) for $n\geq 0$ , we get

[TABLE]

where $a=a(r^{\prime})$ , $b=b(r^{\prime})$ .

Clearly, we also have

[TABLE]

Define sequences ${\rm\{}\theta_{n}{\rm\}}$ , ${\rm\{}\rho_{n}{\rm\}}$ :

[TABLE]

We divide the two parts of inequality (28) into $r^{\prime}$ and obtain $\theta_{n+1}=a\theta_{n}+b\theta_{n}{}_{-1},{\rm\;\;\;\;}n=0,1,2,\ldots$ .

By definition of the sequence ${\rm\{}\rho_{n}{\rm\}}$ , we have

[TABLE]

For the sequence ${\rm\{}\rho_{n}{\rm\}}$ known explicit formulas

[TABLE]

where

[TABLE]

and

[TABLE]

Note that

[TABLE]

Taking into account (30) and (31), we conclude that ${\rm\{}\theta_{n}{\rm\}}\to 0$ as $n\to\infty$ . Therefore, we conclude that $x_{n}\to x^{*}$ as $n\to\infty$ . ${\hbox to0.0pt{$ \sqcup $\hss}\displaystyle\sqcap}$

Remark 3.2.

If $L_{0}=L=L_{1}$ , $M_{0}=M$ and $N_{0}=N$ , our results specialize to the corresponding ones (Shakhno 2017). Otherwise they constitute an improvement. As an example let $q_{1},\,g_{1},\,C_{1}^{1},\,C_{2}^{1},\,C_{3}^{1},\,C_{4}^{1},\,r_{*}^{1}$ used in (Shakhno 2017) denote the functions and parameters, where $L_{0},\,L,\,M,\,N$ are replaced by $L_{1},\,L_{1},\,M_{0},\,N_{0}$ , respectively. Then, since $L_{0}\leq\,L_{1}$ , $L\leq\,L_{1}\,$ , $M\leq\,M_{0}\,$ , $N\leq\,N_{0}\,$ and since $D_{0}\,\subseteq D$ , we have $q(r)\leq q_{1}(r)$ , $g(r)\leq g_{1}(r)$ , $C_{1}\leq C_{1}^{1}$ , $C_{2}\leq C_{2}^{1}$ , $C_{3}\leq C_{3}^{1}$ , $C_{4}\leq C_{4}^{1}$ , so $r_{*}^{1}\leq r_{*}$ , and the new error bounds are tighter than the corresponding ones (23) (Shakhno 2017) .

Moreover, we have

[TABLE]

but not vice versa, unless if $L_{0}=L$ and $M_{0}=M$ .

Hence, the new sufficient convergence criteria for method (3) are weaker. These advantages are obtained under the same computational cost as (Shakhno 2017), since in practice the new constants are special cases of the previous ones.

Corollary 3.3.

In the case of zero residual, the convergence order of the iterative process (3) is quadratic.

If $\eta=0$ , we have a nonlinear least squares problem with zero residual in the solution. Then the constants $C_{1}=0$ and $C_{2}\,=0$ and (17) reduces to

[TABLE]

It follows from the inequality (32) that the order of convergence (3) is not higher than quadratic. Consequently, there exist a constant $C_{5}\geq 0$ and a positive integer $N$ such that for all $n\geq N$

[TABLE]

By

[TABLE]

we have

[TABLE]

and from (32) we have

[TABLE]

Consequently, the convergence order of the iterative process (3) is quadratic.

As we see from the estimates (17) and (18), the convergence of the iterative process (3) essentially depends on the terms containing the values $\eta$ , $\alpha$ , $L$ , $M$ and $N$ .

For problems with zero residual in the solution ( $\eta=0$ ), the quadratic convergence of the iterative process (3) is established.

For problems with a small residual in the solution ( $\eta$ – ”small”) and with weak nonlinearity ( $\alpha$ , $L_{0}$ , $L$ , $M$ and $N$ – ”small”), the convergence of the iterative process is linear. In the case of large residual ( $\eta$ – ”large”) or for strongly nonlinear problems ( $\alpha$ , $L_{0}$ , $L$ , $M$ and $N$ – ”large”), the iterative process (3) may not converge at all.

4 Results of numerical experiment

On several test cases, we compare the convergence rates of the Gauss-Newton-Kurchatov method (3), the Gauss-Newton-Secant method (5 ) and the Secant-type difference method** **(Ren et al. 2010; Shakhno et al 2005)

[TABLE]

and the Kurchatov-type difference method** (Ren et al. 2011; **Shakhno et al. 2005)

[TABLE]

We tested methods on nonlinear systems with a non-differentiable operator with zero and non-zero residuen. The classical Gauss-Newton method and the Newton method cannot apply to solving these problems.

Solution results are with accurate $\varepsilon=10^{-8}$ . The additional approximation was chosen as follows: $x_{-1}=x_{0}-10^{-4}$ . The calculations were carried out until the conditions were fulfilled

[TABLE]

with $f(x)={\mathop{\min}\limits_{x\in{\bf{\rm R}}^{n}}}\frac{1}{2}(F(x)+G(x))^{{\bf\top}}(F(x)+G(x))$ .

Example **1 **(Shakhno et al. 2014; Argyros 2008; Cătinaş 1994):

[TABLE]

Example 2. $n=2,{\rm\;\;}m=3$ **:

[TABLE]

Table 1 shows the results of a numerical experiment. In particular, the investigated methods are compared by the number of iterations performed to find a solution with a given accuracy.

Table 1. Number of iterations for solving of the test problems

[TABLE]

5 Conclusions

It follows from the theoretical results, practical calculations and comparison of the results obtained, that the combined differential-difference methods (3) and (5) converge faster than the Kurchatov type method (34) and the Secant type method (33). As proved, in the case of zero residual, method (3) has a quadratic order of convergence and does not require the calculation of derivatives from a non-differentiable part of the operator. Then the method (3), as well as the method (5), are effective methods for solving nonlinear least squares problems with non-differentiable operator.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Argyros, I.K.: Convergence and applications of Newton-type iterations. Springer-Verlag, New York (2008)
2[2] Argyros, I. K., Hilout, S.: On an improved convergence analysis of Newton’s method. J. Applied Math. Comp. 225, 372-386 (2013)
3[3] Argyros, I. K., Magrenan, A.A.: A contemporary study of iterative methods: Convergence, Dynamics and applications. Acad. Press, Elsevier, London (2018)
4[4] Cătinaş, E. On some iterative methods for solving nonlinear equations . Revue d’Analyse Numér. Théor. de l’Appr. 23, 47–53 (1994)
5[5] Dennis, J. E. (Jr.), Schnabel, R. B.: Numerical methods for unconstrained optimization and nonlinear equations. SIAM, Philadelphia (1996)
6[6] Deuflhard, P. Newton methods for nonlinear problems. Affine invariance and adaptive algorithms. Springer-Verlag, Berlin ( 2004)
7[7] Hernández-Verón, M.A., Rubio, M.J.: On the local convergence of Newton–Kurchatov-type method for non-differentiable operators. Appl. Math. Comp. 304, 1–9 (2017) https://doi.org/10.1016/j.amc.2017.01.010
8[8] Iakymchuk, R., Shakhno, S., Yarmola, H.: Combined Newton-Kurchatov method for solving nonlinear operator equations. Proc. Appl. Math. Mech. – 16, 719–720 (2016) https://doi.org/10.1002/pamm.201610348