This paper extends Rayleigh Quotient Iteration to nonlinear equations with constraints, linking its cubic convergence to second covariant derivatives and applying it to matrix and tensor eigenproblems.
Contribution
It generalizes RQI for constrained nonlinear problems, establishes a geometric interpretation of cubic convergence, and develops new RQIs for matrices and tensors.
Findings
01
Established cubic convergence linked to second covariant derivatives.
02
Derived new RQIs for matrix and tensor eigenproblems.
03
Connected classical RQI forms with geometric concepts.
Abstract
We generalize the Rayleigh Quotient Iteration (RQI) to the problem of solving a nonlinear equation where the variables are divided into two subsets, one satisfying additional equality constraints and the other could be considered as (generalized nonlinear Lagrange) multipliers. This framework covers several problems, including the (linear\slash nonlinear) eigenvalue problems, the constrained optimization problem, and the tensor eigenpair problem. Often, the RQI increment could be computed in two equivalent forms. The classical Rayleigh quotient algorithm uses the {\it Schur form}, while the projected Hessian method in constrained optimization uses the {\it Newton form}. We link the cubic convergence of these iterations with a {\it constrained Chebyshev term}, showing it is related to the geometric concept of {\it second covariant derivative}. Both the generalized Rayleigh quotient and…
Tables1
Table 1. Table 1. Examples of residual errors for Rayleigh (RQI) and Rayleigh-Chebyshev (RC) iterations for tensor eigenpairs (Tensor, section 6.1.1 , n = 6 , m = 3 , d = 2 formulae-sequence 𝑛 6 formulae-sequence 𝑚 3 𝑑 2 n=6,m=3,d=2 ) and eigenvalue with a constant term (EigC) ( section 6.3 , n = 10 𝑛 10 n=10 ).
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Tensor decomposition and applications · Advanced Optimization Algorithms Research
Full text
Rayleigh Quotient Iteration, cubic convergence, and second covariant derivative
We generalize the Rayleigh Quotient Iteration (RQI) to the problem of solving a nonlinear equation where the variables are divided into two subsets, one satisfying additional equality constraints and the other could be considered as (generalized nonlinear Lagrange) multipliers. This framework covers several problems, including the (linear/nonlinear) eigenvalue problems, the constrained optimization problem, and the tensor eigenpair problem. Often, the RQI increment could be computed in two equivalent forms. The classical Rayleigh quotient algorithm uses the Schur form, while the projected Hessian method in constrained optimization uses the Newton form. We link the cubic convergence of these iterations with a constrained Chebyshev term, showing it is related to the geometric concept of second covariant derivative. Both the generalized Rayleigh quotient and the Hessian of the retraction used in the RQI appear in the Chebyshev term. We derive several cubic convergence results in application and construct new RQIs for matrix and tensor problems.
Key words and phrases:
Tensor eigenpair, Newton method, Chebyshev method, Rayleigh quotient iteration, cubic convergence, feasibility perturbation, second covariant derivative.
2020 Mathematics Subject Classification:
65K10, 65F10, 65F15, 15A69, 49Q12, 90C23
1. Introduction
Several important problems in mathematics could be reduced to solving a vector equation of the form L(X,λ)=0, where the function L and the unknowns are divided into two groups as follows
[TABLE]
Here, X and λ are vector variables, defined on two vector spaces E and EL, respectively. The function L(X,λ)=0 in the first equation involves all variables, L maps E×EL to E. In the second group, the constraintC(X)=0, involves only X, C maps E to the constraint vector space EL. The vector variable λ∈EL plays the role of (generalized Lagrange) multipliers. Thus L maps E×EL to itself. This setup covers at least four classes of problems encountered in the literature:
a) The eigenvector problem:
[TABLE]
where X is an n×1 vector, A is an n×n matrix, λ is a scalar. Here, E is the space of n×1 vectors and EL is the base field, R. A related problem has L(X,λ)=AX−Xλ−b=0 where b=0 is a vector ([1] and references therein).
b) The constrained optimization problem.
This is the problem of optimizing a cost function f in a space E under the constraint C(X)=0 where C is a function with target space EL and λ∈EL are the Lagrange multipliers. Let C′ be the Jacobian of C then the Lagrangian multiplier equation is
[TABLE]
Note C′(X)T is a map from EL to E. The system eq. 1.1 gives us the set of critical points.
c) The nonlinear eigenvalue problem:
[TABLE]
In the real case, we set EL=R, with λ∈R is scalar. Here P is a matrix with polynomial entries in λ. While this is not in the form of eq. 1.1 we can impose the constraint C(X)=XTX−1 (or C(X)=zTX−1 for a fixed vector z). See [2] for a survey.
d) The tensor eigenpairs problem: Here, E=Fn is a vector space over a base field F, F=R or C. Set EL=F , thus λ is scalar. Let T and B be two vector-valued functions from E to itself, with entries Ti and Bi,1≤i≤n are homogeneous polynomials of degrees m−1 and d−1 respectively. The evaluations of T and B at X∈E are written T(X[m−1]) and B(X[d−1]). Set
[TABLE]
A popular constraint is C(X)=21(XTX−1) for the real case, and we will study C(X)=21(X∗X−1) for the complex case (∗ is the Hermitian transpose). An important case is when T=1/mT^′, where T^′ denotes the gradient of a scalar homogeneous polynomial T^ of order m, and B(X)=X. These eigenpairs could be used to determine if T^ is nonnegative ([3]).
The Rayleigh quotient iteration (RQI) is among the most powerful methods to compute eigenvalues and vectors. For a vector X, the Rayleigh quotient is λ=XTXXTAX, or XTAX on the unit sphere. In the i-th step, with λi computed from Xi by this equation, the iteration computes
[TABLE]
It has cubic convergence when A is normal and quadratic otherwise (for suitable initial points).
Similar iterations have been suggested for the remaining problems. Points satisfying C(X)=0 are called feasible points. At the i-th step, where X=Xi is feasible, a (vector-valued) function R(X) of X is used to compute λ for the iterative step, then an intermediate step X^i+1 is produced by solving a linear equation depending on λ, and a feasibility perturbation is applied to produce the next feasible step Xi+1. The feasibility perturbation is often available for common constraints, thus the overall procedure consists of constructing the generalized Rayleigh quotient R and the iteration equation. In the literature, convergence results are obtained separately for each problem. We show here there is a common procedure and convergence analysis for these problems, including criteria for cubic convergence.
Recall an iteration {Xi} converging to X∗ has order k if ∣Xi+1−X∗∣≤M∣Xi−X∗∣k for some M, for large enough i. Quadratic and cubic convergence corresponds to k=2 and k=3. It is well-known Newton’s method has quadratic convergence order and the Chebyshev method [4] achieves cubic convergence. We link the cubic convergence of the Rayleigh quotient iteration with a constrained Chebyshev term. While very little geometric prerequisite is needed in the proof or in applications, we explain this Chebyshev term is essentially a second covariant derivative using two connections constructed from the problem formulation.
1.1. Generalized RQI: quadratic convergence
We denote the Jacobian of a nonlinear map F between vector spaces V and W by F′. For X∈V, F′(X) is a linear operator between V and W. Let Lin(V,W) be the space of linear operators from V to W. We will assume sufficient smoothness. Consider a constraint C(X)=0 defining a feasible set M such that C′(X) is of full rank in the subset of interest, M is equipped with a feasibility perturbation r. Recall the tangent space of M at X∈M is the nullspace TXM, or TX for short, of C′(X). We are mainly interested in feasibility perturbation from the tangent space. For X∈M,η∈TX, instead of considering a map from E to M, mapping X+η to a point on M, following [5], we consider a map r sending (X,η) to r(X,η)∈M for η sufficiently small, with r(X,0)=X, (0=0TX∈TX is the zero vector). For the constraint XTX=1, we can use the rescaling r(X,η)=∣X+η∣1(X+η)=(1+∣η∣2)1/21(X+η).
Fix X, then r(X,.):η↦r(X,η) is a map from TX to E. Assume it has the Taylor expansion
[TABLE]
or r is a retraction, thus, the partial derivative rη of r in η, considered as a linear map from TX to E satisfies rη(X,0)η=η. This holds for the rescaling perturbation of the sphere and most feasibility perturbations in practice, and we can normalize a reasonable perturbation to this form.
The second ingredient of a generalized RQI is a projection function. In eqs. 1.2 and 1.5 of the classical RQI, with λ=XTAX
[TABLE]
The matrix Π(X)=I−XXT is a projection, as (I−XXT)2=I−XXT. Thus Π(X)F(X)=Π(X)2AX=F(X). This seems incidental, but it is crucial in generalizing the RQI.
An operator P∈Lin(E,E) is called an affine projection if P2=P. It is often constructed in two ways. If A− is a left inverse of a linear operator A, A−A=I, then AA− is a projection. If P is a projection then IE−P is also a projection. If η∈E is in the range Im(P) of P then Pη=η. In the special case where P is self-adjoint, P is called an orthogonal projection. We call a smooth map Π from M to Lin(E,E), the space of linear operators on E an (affine) projection function, or just projection when there is no confusion, if, for X∈M, Π(X) is an affine projection.
We will assume for a projection function Π, Π(X) is of rank dimE−dimEL for X∈M.
The following claim is the key to our generalized RQI to solve eq. 1.1.
Claim 1**.**
Consider a smooth feasible set M defined by C(X)=0, C′(X) is of full rank for X∈M, and Π:M→Lin(E,E) is a projection function. Let F be a smooth function from M to E satisfying
[TABLE]
For an initial point X0∈M sufficiently close to a solution X∗ of F(X)=0, the iteration Xi+1=r(Xi,ηi) with r satisfies eq. 1.6 and ηi∈TXiM=TXi solving the system (1.8), (1.9) below
[TABLE]
is well-defined and converges quadratically to X∗ if the system is non-degenerated at X∗.
To use this claim to solve eq. 1.1, note for any projection Π, L(X,λ)=0 is equivalent to
[TABLE]
If a function R(X) substituted to λ satisfying eq. 1.11 for all X, consider F(X):=L(X,R(X)), then Π(X)F(X)=F(X) is satisfied by design, and we can use the iteration in the claim.
To simplify eq. 1.8, we construct Π as below. Let Lλ(X,λ) be the partial derivative of L in λ, considered as a linear map from EL to E. Assume it is onto, which is the case when the inverse function theorem is satisfied for eq. 1.1, then it has a left inverse Lλ−(X,λ) (any computationally efficient inverse or the Moore-Penrose inverse can be used). Solve for λ=R(X) from
[TABLE]
Then with Π−=LλLλ−, Π(X):=IE−Π−(X,R(X)) is a projection, eq. 1.11 becomes eq. 1.13 and eq. 1.14 holds as Lλ=LλLλ−Lλ by the left inverse assumption
[TABLE]
From eq. 1.14, the expression Π(X)F′(X) in claim 1 with F(X)=L(X,R(X)) reduces to
[TABLE]
Here LX(X,λ) is the partial derivative in X. We will choose R solving eq. 1.12.
For the eigenvalue problem L(X,λ)=AX−Xλ with constraint XTX=1, Lλ(X,λ)=−X, take the left inverse Lλ−=−XT, then eq. 1.12 gives us λ=XTAX and Π(X)=IE−XXT. Using the iteration in claim 1 for F(X)=AX−XXTAX, we get the Rayleigh quotient iteration, since
Lemma 1.1** (Schur form).**
Let P∈Lin(E,E) be an affine projection, B∈Lin(E,E) be an invertible operator, F∈E, and D∈Lin(E,EL). Consider the linear equations in η
[TABLE]
If H∈Lin(EL,E) is such that PH=0 and DB−1H is invertible, η below solves eq. 1.15
[TABLE]
The proof is by direct substitution, see section A.1.
Continuing with the eigenvalue problem, with λ=XTAX, F(X)=AX−XXTAX, LX(X,λ)η=(A−λIE)η for η∈E. In the lemma, set B=A−λIE, H:λ↦Xλ, D:η↦XTη, F=AX−Xλ, then B−1F=X and ζ:=B−1H=(A−λI)−1X, hence
[TABLE]
with (XTζ)−1 is a scalar. Then X+η is proportional to (A−λI)−1X, and the retraction gives us the iteration Xi+1=∥(A−λI)−1Xi∥−1(A−λI)−1Xi.
This “explains” why the Rayleigh quotient iteration has quadratic convergence and relates it to the Newton-type iteration of claim 1. This explanation for the classical Rayleigh quotient essentially appeared in [5, example 5]. Note Π is hidden in the classical RQI.
In lemma 1.1, take H=Lλ(X,R(X)), we have the Schur form for claim 1. We will apply this result to the remaining three problems in section 6. The Schur form’s usage is restricted by the condition LX(X,R(X)) is invertible. In optimization problems (section 6.2), eq. 1.8 is often solved directly. Let QΠ be an orthonormal basis of the nullspace of Lλ−(X,λ) (same as the range of Π(X)), and QT an orthonormal basis of TX, then expressing L and ΠLX in these bases (evaluated at (X,λ)) we have
[TABLE]
the Newton form of the iteration. Note QΠQΠT and Π(X) are both projections to the range of Π(X), hence, QΠQΠTΠ=Π,QΠQΠTL=L which helps verify ΠLXη=QΠQΠTΠLXη=−QΠQΠTL=−L.
To summarize, we have a procedure to solve eq. 1.1 by a generalized RQI
Find a left inverse Lλ−(X,λ) of Lλ(X,λ) such that Lλ−(X,λ)L(X,λ)=0 is easy to solve for λ as a function λ=R(X) in X. Choose a retraction r.
Determine if the Schur form or the Newton form solution of eq. 1.8 is preferable. For the Newton form, also set up Π(X)=IE−Lλ(X,λ)Lλ−(X,λ) with λ=R(X).
The discussion in this section and section 5 are not required to follow the rest of the paper. However, we believe they are interesting interpretations of the results in geometric terms, clarifying several geometric concepts that are often presented more abstractly.
The projection function Π on the feasible set M associates to each feasible point a vector space EX:=Im(Π(X)). The collections EΠ=∪X∈MEX and TM=∪X∈MTX are vector bundles in the geometric literature, assuming appropriate smoothness and constant rank of EX and TX. They could be considered smooth subsets of E2 of pairs (X,ω) with X satisfying a nonlinear constraint C(X)=0 not involving ω, while
the constraint on ω is linear. We have EΠ=TM in the eigenvalue problem, but they are different for the RQI in section 6.1.1.
A function F from M to E satisfying the condition Π(X)F(X)=F(X) is called a section of the bundle EΠ, we require F(X)∈EX.
Our RQI framework could be understood as a construction of a projection function Π defining a vector bundle EΠ, (and the connection ∇ below) and a section F(X)=L(X,R(X)). A section c of the tangent bundle (a function to E with c(X)∈TX) is called a vector field, each c(X),X∈M is a tangent vector at X.
For a section F and a tangent vector η∈TX,X∈M, we define
[TABLE]
Here, Π′(X,η) and F′(X,η) are directional derivatives of Π and F in direction η, the second equality follows by differentiating Π(X)F(X)=F(X) in direction η. If c is a vector field then ∇cF:X↦Π(X)F′(X;c(X)) is a section of EΠ. The expression for ∇cF shows it is a covariant derivative, or connection in the differential geometric sense [6, section 5.2]. Intuitively, a covariant derivative is a rule to take directional derivative of sections, resulting in sections. The Newton increment equation reads ∇ηF(X)=−F(X).
Thus, claim 1 is a Newton method on a vector bundle, see [5, 7, 8] and references therein. Newton method on the tangent bundle is well-studied in the Riemannian optimization literature. The novelty here is the identification of a generalized Rayleigh quotient with the construction of a vector bundle and a connection from the data of eq. 1.1 using a left-inverse, even for nonlinear multipliers, and the convergence analysis with a retraction (see [5, 6] for the tangent bundle case), versus geodesics in [7]. The Schur form iteration for the general case is also new.
1.3. Cubic convergence and Chebyshev iterations
For normal matrices, the classical RQI has cubic convergence and the two-sided iterations in [9, 10] also converge cubically for any matrix. The second goal of the paper is to clarify the condition for cubic convergence. We will define notations more properly in section 1.4, but briefly, the Hessian of a function F could be considered as a map valued in bilinear functions, we use the notation F(2)(X;η[2])=F(2)(X)η[2] to denote the evaluation of this map at X, and of the bilinear function in both linear variables at η. The (partial) Hessian rηη(X,0;η[2]) of the retraction r in the variable η at (X,0) and the directional derivative R′(X;η) also appear in the analysis.
Claim 2**.**
For a function F satisfying Π(X)F(X)=F(X) as in claim 1, and for a tangent vector η∈TX at X, set
[TABLE]
If Π(X∗)G(X∗;η[2])=0 for all η∈TX∗ at a solution X∗, the RQI in claim 1 converges cubically to X∗ when X0 is sufficiently close to X∗. If ηi=XiNT is the RQI increment in claim 1, then the Rayleigh-Chebyshev iteration Xi+1=r(Xi,XiNC) with XiNC=ηi−21Xiτ∈TX and Xiτ∈TXi satisfies
[TABLE]
converges cubically for an initial point X0 sufficiently close to X∗. If F=L(X,R(X)), λ=R(X) arises from an RQI as in section 1.1, we can use GL in place of G:
[TABLE]
The above is a constrained version of the Chebyshev iteration, see section 3. For the eigenvalue problems, expanding r(X,η)=(1+∣η∣2)1/2X+η to Taylor series in η to the second term gives rηη(X,0;η[2])=−X∣η∣2. In section 6, we use this to verify cubic convergence for several eigenvalue RQIs. The term F′(X)rηη(X,0;η[2]) often vanishes at a solution X∗ for homogeneous tensor/eigenvalue problems. In section 5, we relate it to a connection constructed from r, and give an interpretation of the Chebyshev term as a second covariant derivative.
We apply the general analysis here to construct new RQIs for the tensor eigenpair problem in section 6.1. A unitary version of the Schur form RQI over C finds all complex eigenpairs and also identifies the real pairs as a by-product in the generic case, improving on previous work [11]. We find new complex pairs for the Motzkin polynomial [12, example 5.9], completing the eigenpair count for this classical example. We verify cubic convergence for several eigenvalue problems, propose a cubic convergence iteration for the generalized eigenvalue problem, and derive Chebyshev iterations in a few interesting cases. A natural question is if we can develop a framework for constrained homotopy continuation [13], using RQI-type iterations. We hope this work will lead to further research along this line.
1.4. Notations and outline
We mimic the convention of [14] for derivatives. We work with a base field F which is R or C. We denote by Fn×m the space of n×m matrices on F, by Lin(V,W) the space of linear map between vector spaces V and W. The zero vector in V is denoted 0V, or just [math]. The inner product of two (column) vectors X1,X2 is X1TX2 or X1∗X2, identifying F1×1 with F.
For a vector-valued function F from an open subset D⊂V to W, by F′ we denote the Jacobian or Fréchet derivative, and the directional derivative at X∈D in direction η is written F′(X)η or F′(X;η), the latter is preferable when we need grouping. Thus, if D is an open subset in V, F′ is an operator-valued function, F′(X)∈Lin(V,W). Higher (partial) derivatives could be considered as a map from D to the space of multilinear maps from V to W. Thus, at X∈D, the l-th-order derivative of F, denoted by F(l)(X) is a l-linear map from V to W. For η1,⋯,ηl∈V, we denote its evaluation as F(l)(X;η1,⋯ηl) or F(l)(X;[η1,⋯ηl]). In the expressions F(l)(X)η[l]=F(l)(X;η[l]), η[l] denotes η being repeated l times. If Φ is an operator value function from V to Lin(W1,W2), with V,W1,W2 are vector spaces and Φ(X)ω denotes its valuation at X∈V operating on ω∈W1, then Φ(l)(X;η1,⋯ηl)ω denotes the l-th order derivative of Φ evaluated in direction η1,⋯ηl and operate at ω. In general, we use the round brackets for base variable evaluations or for groupings of base and directional variables. The semicolumn separates the base variable(s) from the directional derivative variables. The l-terms multi-dimensional Taylor series expansion around X0 is [14, NR 3.3-3]
[TABLE]
Partial derivatives are sometimes denoted by the usual subscript convention, eg LX,Lλ when convenient. However, we will use a position-based notation for more complex formulas. If F is defined using two vector variables X,Y∈D1×D2 for two sets D1,D2, then we write F(X,.) for the map Y↦F(X,Y) with fixed X in the appropriate domain, and write F(.,Y) similarly. The partial derivatives in Y evaluated at Y are F(X,.)′(Y)=FY(X,Y), which is a linear operator, and F(X,.)′(Y;η) denotes its valuation at η. Thus, for a function F and a retraction r, the directional derivative in direction (0,ξ) of F(r(X,η)) is F′(r(X,η);r(X,.)′(η;ξ)) by the chain rule. The notation r(X,.)′(η)ξ=r(X,.)′(η;ξ) is also used.
Denote by AV0↓W0 the restriction of a map A∈Lin(V,W) from two spaces V and W to subspaces V0⊂V,W0⊂W, with A(V0)⊂W0. The inverse, if exists, is denoted by AV0↓W0−1, shorthand for (AV0↓W0)−1.
To focus on the main ideas, we will assume sufficient smoothness, and use the word smooth for short. Most results hold for class C3 or C4.
The main idea of the convergence analysis is to require the retraction r maps an open ball BXf(ρ) of the tangent space of a feasible point Xf (an initial or final point) to an open subset of M in section 2, then translate the problem to one of linear constraint, with a nontrivial retraction s. The linear constraint case in section 3 is the most technical, the proofs are in the appendices. The iterations in section 4 are as described in the introduction, with applications in section 6. The relationship with the second covariant derivative is in section 5.
2. Feasibility perturbation and retractions
We state the definition and prove the basic properties of a retraction in this section, illustrated with the example of the rescaling retraction of the sphere. We also define the rescaling retractions and compute their Taylor expansion in the tangent variable.
2.1. Definition and basic properties
Assume C:E→EL is a smooth map, with Jacobian C′(X) which is ontoEL at any X∈E with C(X)=0. Let M be the corresponding feasible set, the solution set of C(X)=0. This is a multidimensional smooth surface, a manifold. Denote by TM, its tangent bundle, the subset of E2=E×E of pairs (X,η) such that
[TABLE]
For a fixed X∈M, the vector space defined by CX(X)η=0 is the tangent space at X, denoted by TXM, or TX for short subsequently, and η is called a tangent vector. In elementary physics, an element η∈TXM is considered as a velocity vector of a particle moving smoothly on M. The Jacobian of the constraints eq. 2.1 is a block diagonal matrix with two diagonal blocks C′(X), hence is ontoEL2, thus TM is also a smooth manifold. In the next two paragraphs, we summarize the basic geometric facts required, well-known for surfaces and curves.
Locally, M is parametrized by open subsets of ET:=RdimE−dimEL. For X∈M, a local parametrization around X is defined as a smooth map ϕ, injective, from an open subset Ω of ET into E, with X∈ϕ(Ω)⊂M, such that ϕ−1, defined on ϕ(Ω), is continuous, and ϕ′(z) is injective for z∈Ω. The full rank assumption of C′(X) implies there is a local parametrization around X.
Differentiating C(ϕ(z))=0 in direction v∈ET, we get C′(ϕ(z))ϕ′(z)v=0, or ϕ′(z)v is tangent to M at ϕ(z). The dimension count and the injective assumption of ϕ′(z) imply ϕ′(z) is bijective to Tϕ(z). Let Ω2 be an open subset of ET, then for (z,v)∈Ω×Ω2⊂Ω×ET, define
[TABLE]
then Dϕ is a local parametrization of TM⊂E2.
A map r from an open subset T1 of TM to M is smooth at (X,η)∈T1 if for a local parametrization Dϕ on Ω×Ω2 around (X,η) and ϕ1 on Ω1 around r(X,η), the combined map ϕ1−1∘r∘Dϕ from Dϕ−1(T1)∩Ω×Ω2 to Ω1 is smooth. This is independent of parametrizations.
We are interested in a perturbation r mapping a pair (X,η)∈TM to M for η sufficiently small, with r(X,0)=X, since our Newton increments are tangent vectors.
Lemma 2.1**.**
Fix X∈M. If K is a C1 map from a neighborhood U⊂TX of ξ∈TX to M⊂E then for η∈TX, K′(ξ) as a map from TX to E satisfies K′(ξ)η∈TK(ξ), or K′(ξ)(U)⊂TK(ξ).
Proof.
Differentiate C(K(ξ+tη))=0 in t at t=0, we get C′(K(ξ))K′(ξ;η)=0 by the chain rule.
∎
Applying this to feasibility perturbations, for ξ∈TX, then r(X,.)′(ξ) could be considered as a map from TX to Tr(X,ξ). In particular, r(X,.)′(0) maps TX to itself. Note r(X,.)(ξ)′∈Lin(TX,E), but we will denote r(X,.)′(ξ)−1 the inverse, if exists, of r(X,.)′(ξ)TX↓Tr(X,ξ) (recall this means restricting the range of r(X,.)′(ξ) to the image).
Remark 2.1*.*
Technically, it is simplest to consider a retraction as a map from TM onto M satisfying certain requirements on the Taylor series. This works for the rescaling map on the sphere, as (X+η)/∣X+η∣ is defined for any η∈TX. However, we do want to consider, for example, rescaling maps for other constraints where rescaling only works for tangent vectors close enough to 0X. The following makes explicit the requirement that the radius where a perturbation exists is not too small, required in iterations, otherwise, it is adapted from [5, 15].
Definition 2.2**.**
For a point Xf∈M, a retraction around Xf (or a retraction to Ωr) is a smooth map r from an open subset Tr⊂TM to M, with range in M containing a neighborhood Ωr of Xf, and there is ρr>0 such that Tr contains
[TABLE]
If r(X,0) is defined then we require r(X,0)=X and r(X,.)′(0)=ITX.
Remark 2.3*.*
Let ϕ be a local parametrization of M with ϕ(0)=Xf. Requiring the existence of Ωr and ρr such that r is defined in TΩρrr is equivalent to requiring the range of r containing {(ϕ(z)∣∣z∣≤ρ1} and the domain of r contains a set TΩϕ,ρ1,ρ2r for some ρ1,ρ2>0 where
[TABLE]
This follows as ϕ′(z) and ϕ′(z)−1 exist and are continuous for a parametrization, thus bounded uniformly on a bounded set, so we can translate a bound on ∣η∣=∣ϕ′(z)v∣ to a bound on v, and vice versa.
We can normalize a feasibility perturbation to a retraction. If r is feasibility perturbation satisfying the conditions of the retraction except for r(X,.)′(0) is only assumed to be invertible, then r^(X,η):=r(X,(r(X,.)′(0))−1η) is a retraction since r^(X,.)′(0;η)=r(X,.)′(0;(r(X,.)′(0))−1η)=η.
If M is the unit sphere, for a feasible point Xf, the rescaling of Xf+η for η∈TXf maps TXf bijectively to the hemisphere centered at Xf, and this map parametrizes the hemisphere.
We now generalize this result, showing r(Xf,.) parametrizes an open set around Xf.
Proposition 2.1**.**
Assume r is a retraction around a feasible point Xf∈M.
1) There is a radius ρ>0 such that the map Kf:=r(Xf,.):η↦r(Xf,η) from the ball BXf(ρ):={η∈TXf∣∣η∣<ρ} to its image CXf(ρ)⊂M is a local parametrization. The ball BXf(ρ)⊂TXf is called a retraction ball, Kf(BXf(ρ))=CXf(ρ)⊂M is called the retraction cap.
2) Thus, the set TΩKf,ρ1,ρ2r in eq. 2.4 exists, for each Xf∈Ω, there are radii ρ1>0, ρ2>0 such that Kf maps BXf(ρ1) to CXf(ρ1) with invertible Jacobian, and for all X∈CXf(ρ1), and δ∈TXf with ∣δ∣<ρ2, if X=r(Xf,ξ) then r(X,Kf′(ξ;δ)) and Kf−1r(X,Kf′(ξ;δ)) exist.
Proof.
Let ϕ be a local parametrization of M around Xf. By remark 2.3, there are radii r0,Δ0 such that ϕ(B0) is in the range of r, where B0 is the ball radius r0 in ET=RdimE−dimEL, and TΩϕ,r0,Δ0r exists.
Since Kf:=r(Xf,.) is continuous, Kf−1(ϕ(B0)) is open in TXf, thus there is a neighborhood U⊂TXf of 0TXf such that Kf(U)⊂ϕ(B0). Consider the map h:U×ET→ET, h(η,z)=ϕ−1(Kf(η))−z for (η,z)∈U×ET. Since Kf′(0)ξ=ξ, by the chain rule, the partial derivative hη(0,0) is ξ↦ϕ′(0)−1ξ∈TXf for ξ∈TXf, where ϕ′(0) is invertible as a map to TXf. Thus, the implicit function theorem [14, 5.2.4] applies, we have open balls B1⊂B0 centered at 0ET, Bη⊂TXf at 0TXf, and a unique function ηf(z) with h(ηf(z),z)=0 for z∈B0,ηf(z)∈Bη.
We have ϕ−1(Kf(ηf(z))=z or Kf(ηf(z))=ϕ(z). If Y∈ϕ(B1), then η=ηf∘ϕ−1(Y) satisfies Kf(η)=ϕ(ϕ−1(Y))=Y. Since Kf is continuous, Kf−1(ϕ(B1)) is open, and its intersection with B1 is open, containing a ball of radius ρ. Restricting to BXf(ρ), the uniqueness of the implicit function shows Kf is one-to-one, with continuous inverse ηf∘ϕ−1. The remaining properties of parametrizations are verified. This proves 1).
We offer the first clue why the (partial) Hessian of r appears in claim 2.
Proposition 2.2**.**
If A is a smooth function on E with values in a vector space V, assume X∈M and η∈TXM then A′(X)η is only dependent on the values of A on M. If r is a retraction then
[TABLE]
is also only dependent on the values of A on M.
Proof.
The first statement is well-known. Let ϕ be a local parametrization around X defined on Ω⊂ET, and η=ϕ′(0)v for some v∈ET. Consider the function g(t)=A(ϕ(tv)) for t sufficiently small. Then g˙(0)=A′(ϕ(0))ϕ′(0)v=A′(X)η, but g depends on values of A in ϕ(Ω)⊂M only.
If r is a retraction, consider f(t)=A(r(X,tη)). The second derivative at [math] of f is
[TABLE]
by the chain rule. On the other hand, A(r(X,tη)) is dependent on the values of A on M only.
∎
We can get similar statements for higher derivatives. The function s(ξ,δ) below expresses a retraction to X=r(Xf,ξ)∈M,ξ∈TXf in terms of a retraction to a chosen point Xf∈M. It appears when we translate an iteration on M to an iteration on TXf by a change of variable.
Proposition 2.3**.**
For Xf∈⊂M, set Kf:=r(Xf,.), then with ρ1,ρ2>0 as in 2) proposition 2.1,
[TABLE]
exists for ∣ξ∣<ρ1 and ∣δ∣<ρ2 and is a retraction to TXf∈E. Moreover,
[TABLE]
Proof.
Recall for ϕ1,ϕ2∈TX, r(X,.)′(ϕ1;ϕ2) is limt→0t1(r(X,ϕ1+tϕ2)−r(X,ϕ1)). Rewrite the equation for s then differentiate it with respect to δ in direction ϵ∈TXf twice
[TABLE]
Set δ to zero, we get the first three terms of the power series expansion of s(ξ,ϵ),
[TABLE]
where we used r(Kf,ξ),.)′(0;Kf′(ξ;ϵ))=Kf′(ξ;ϵ) as r is a retraction, note (Kf′(ξ))−1 exists in the retraction ball. The statement follows from the Taylor series expansion.
∎
2.2. The rescaling retraction on the unit sphere
The computation of Kf′ for the unit sphere is known in [6, section 8.1].
Proposition 2.4**.**
Consider the sphere M=Sn−1∈Rn (n≥2), defined by the equation XTX=1. For Xf∈Sn−1, the retraction Kf(ξ)=r(Xf,ξ)=(1+∣ξ∣2)1/21(Xf+ξ) defined on Tr=TSn−1 maps ξ∈TXf to the hemisphere {X∈Sn−1∣XfTX>0}. For ρ>0, let CXf(ρ) be the retraction cap, then
[TABLE]
Any ρ1,ρ2>0 satisfying condition 2.15 below satisfies the first condition of 2), proposition 2.1, that is s(Xf;ξ,δ) exists if ∣ξ∣<ρ1,∣δ∣<ρ2.
[TABLE]
Proof.
The Taylor series expansion of (1+∣ξ∣2)−1/2 gives eq. 2.10. Since XfTr(Xf,ξ)=(1+∣ξ∣2)1/21>0, and the function (1+ρ2)−1/2 is decreasing in ρ, the image of Kf is in the hemisphere XfTX>0 and (2.11) follows. We can verify directly Kf−1X on the right-hand side of eq. 2.12 satisfying XfT(Kf−1X)=0, and Xf+Kf−1X=XfTX1X is proportional to X, thus ∣Xf+Kf−1X∣1(Xf+Kf−1X)=X.
A routine derivative gives Kf′. Since r(Kf(ξ),Kf′(ξ,δ)) is proportional to Kf(ξ)+Kf′(ξ,δ), we assume the scaling factor is of the form c(1+∣ξ∣2)1/2, then using XfTξ=XfTδ=0
[TABLE]
which gives us the formula for s. It exists if f1=1+∣ξ∣2−ξTδ>0. If ∣δ∣=ρ is constant, then f1 is smallest if δ=∣ξ∣ρξ, with f1=1+∣ξ∣2−ρ∣ξ∣. The region defined by 1+ρ12−ρ1ρ2>0,ρ1>0,ρ2>0 could be divided into three subregions, the union of two is described in eq. 2.15, characterized by the condition that if 0<z1<ρ1,0<z2<ρ2 then 1+z12−z1z2>0. The remaining subregion with ρ1≥1,ρ2≥2 does not have this property to ensure s(ξ,δ) exists, if ∣ξ∣=z1,∣δ∣=ρ=z2.
∎
2.3. Rescaling retractions
We generalize the rescaling retraction on the sphere to feasible sets with a single constraint.
Proposition 2.5** (Rescaling retraction).**
Assume EL=R, and C′(X;X)=0∈EL=R near Xf∈M. There is a neighborhood Ωr⊂M of Xf and ρr>0 such that the equation in γ∈R
[TABLE]
for (X,η)∈TΩρrr⊂TM (in eq. 2.3) has an implicit function solution with γ=1 at η=0. Then r(X,η)=γ(X+η) is a retraction defined on TΩr. We have the Taylor expansion
[TABLE]
Proof.
Let ϕ be a local parametrization from ΩT⊂ET=Rdim(E)−dim(EL) of M near Xf. Consider h(z,v,γ)=C(γ(ϕ(z)+ϕ′(z)v))∈EL defined on (z,v,γ)∈ΩT×ET×R. Then h(0,0,1)=C(Xf)=0, hγ(0,0,1)=C′(Xf;Xf)=0. Thus, the implicit function theorem applies, giving us a ball D=D1×D2 in ΩT×ET of points {(z,v)} with ∣z∣<ρ1,∣v∣<ρ2 and δγ>0 such that γ can be solved uniquely as a function γϕ(z,v) with γϕ(0,0)=1 for (z,v)∈D and ∣γ−1∣<δγ. The conditions of remark 2.3 for r(X,η)=γ(X+η) follows by properties of the implicit function. By local uniqueness of the implicit solution, γϕ(z,0)=1, or r(X,0)=X for X∈ϕ(D1).
Composing with ϕ−1, γ=γϕ∘ϕ−1 is a function of X and η, its partial derivatives are evaluated by implicit function rule for g(γ,X,η)=C(γ(X+η)). Hence, we can compute r(X,.)′ using
[TABLE]
where ξ∈TX, hence C′(X,ξ)=0. Consider a fixed X, set γη=γ(X,.)′,γηη=γ(X,.)(2). Take directional derivative of C(γ(X+η))=0 in η in tangent directions ξ1,ξ2 consecutively
[TABLE]
At η=0,γ=1, γη(0;ξ)=0 for ξ∈TXM since C′(X,ξ)=0 as before, the above gives us
[TABLE]
Thus, equation (2.17) follows from r(X,.)(2)(0;η[2])=γηη(0;η[2])X and the Taylor formula.
∎
The requirement C′(X;X)=0 means the vector X is not tangent to M. Note that when C(X)=f(X)−1 and f is a homogeneous function of degree n=0, Euler’s identity implies C′(X;X)=nf(X)=n=0, the equation C(γ(X+η))=0 has a simple solution γ=f(X+η)−1/n if η is sufficiently small. The sphere corresponds to f(X)=XTX and n=2. In general, γ is a solution of a scalar equation.
3. Linear constraints and unconstrained case with a projection
This section is the most technical in the paper. The main proofs are deferred to the appendix.
Using the result of section 2, we translate the constrained problem to one of linear constraints, by considering a reference point Xf, and solve F(ξ)=F(r(Xf,ξ))=0 instead of F(X)=0, for ξ∈TXf with the projection Φ(ξ)=Π(r(Xf,ξ)), Φ(ξ)F(ξ)=F(ξ), and the retraction s in proposition 2.3. The feasible set is now called T0 (=TXf). The variable becomes ξ instead of X.
A local linear constrained problem becomes an unconstrained problem if we parametrized the constraint set. Hence, we can consider the unconstrained problem over a vector space T0.
Thus, we will start with two vector spaces T0 and E, with a function F from an open subset Ω from T0 to E, with Jacobian F′. We assume Ω to be convex, and dimT0≤dimE.
When dimT0=dimE, for ξ∗∈Ω, if F(ξ∗)=0 with F′(ξ∗) invertible, an inverse function FI exists, FI(F(ξ))=ξ near ξ∗. For Y∈U⊂E where the inverse function is defined, the k terms Taylor expansion of FI at F(ξi) (assumed to be in U) is
[TABLE]
The Taylor coefficients for FI could be computed by the implicit-valued function theorem, or by power series substitution. We have FI′(Y)η=(F′(ξ))−1η with ξ=FI(Y) and
[TABLE]
Following [4], we can construct an iteration with order-k convergence as follows. For Y=0, FI(0)=ξ∗, and if we set ξi+1=ξi+∑j=1k−1j!1FI(j)(F(ξi);[−F(ξi)][j]),
the residual ξ∗−ξi+1, from the above expansion is O(∣F(ξi)∣k)=O(∣ξi−ξ∗∣k) if we consider a bounded region and F has bounded derivatives. Thus, k-th order convergence follows from the Taylor series of FI by design. The case k=2 corresponds to the Newton method with increment δ2=−(F′(ξ))−1F(ξ), and the case k=3 corresponds to the Chebyshev method, with step
ξC=ξ+δ2−21(F′(ξ))−1F(2)(ξ;δ2[2]).
When dimT0<dimE and F′(ξ∗) is not invertible, assume there is an affine projection Φ such that Φ(ξ)F(ξ)=F(ξ) for ξ∈Ω. We also assume dim(Im(Φ(ξ)))=dim(T0). In proposition 3.1, we define a left inverse F♮ of F′ and compute derivatives of F⋄, playing the role of FI (compare eqs. 3.8 and 3.9 with FI′ and FI(2)). The proof is in section A.3.
Recall the notation AV0↓W0 denoting the restriction of a map A from two spaces V and W to subspaces V0⊂V,W0⊂W, with inverse (if exists) denoted by AV0↓W0−1, shorthand for (AV0↓W0)−1.
Proposition 3.1**.**
Assume the map F from Ω⊂T0 to E and the projection function Φ:Ω↦Lin(E,E) are smooth. Assume for ξ∈Ω
[TABLE]
Set Eξ:=Im(Φ(ξ))⊂E. If Φ(ξ)F′(ξ) is a bijection from T0 to Eξ, for ω∈E, let F♮(ξ)ω∈T0 be the unique solution η of Φ(ξ)F′(ξ)η=Φ(ξ)ω, thus
F♮(ξ)Eξ↓T0=(Φ(ξ)F′(ξ))T0↓Eξ−1 and
[TABLE]
1) For each ξ, the linear map F♮(ξ)∈Lin(E,T0), if exists, is a left inverse of F′(ξ) and
[TABLE]
Assume F♮(ξ) is defined in a domain D1⊂Ω, then it is differentiable and for ξ,η∈T0 and ω∈E
[TABLE]
2) Assume F♮(ξ) is defined in a domain D1⊂Ω. Then for (ξ,Y)∈D1×E
[TABLE]
For ξf∈E, there is an implicit-function solution ξ=F⋄(Y) for the second equation near (ξf,F(ξf))∈D1×E, with Y∈E, ξ∈T0, where F⋄ is of class Ck if F and Φ are. If (ξ,Y) satisfies eq. 3.6 then (ξ,Z) also satisfies eq. 3.6 for Z=Φ(ξ)Y. For ω∈E, we have
[TABLE]
Assume F and Φ are smooth, then for ω∈E
[TABLE]
3) If (IT0−F♮(ξ)Φ′(ξ;.;Y))−1 exists and is bounded in a region D1, and F and Φ are of class Ck, k≥1 with bounded derivatives, then F⋄ is also of class Ck with bounded derivatives.
We now formulate the convergence theorem with a nontrivial retraction. We will see the retraction in proposition 2.3 will appear in RQIs. For the case of the sphere eq. 2.14, it has the form ss:(ξ,δ)↦ξ+1+ξT(ξ−δ)1+∣ξ∣2δ, for ξ,δ∈T0, and is only defined for δ satisfying eq. 2.15. Thus, we need to address retractions of this type.
Theorem 3.1**.**
Assume the function F, the projection Φ are smooth satisfying Φ(ξ)F(ξ)=F(ξ) for ξ∈Ω⊂T0 as in eq. 3.1. 1) Assume (Φ(ξ)F′(ξ))T0↓Eξ−1 is defined in a ball B(ξ∗,ρ) in T0, with F(ξ∗)=0. For ρ,ρ2>0, assume the map s from D1=B(ξ∗,ρ)×B(0,ρ2)⊂T0×T0 to T0 of class C2 has the Taylor expansion in δ
[TABLE]
such that there is a constant cs,2 with ∣s2(ξ,δ)∣≤c2∣δ∣2 in D1. Then there exists a radius ρ3 such that if the starting point ξ0 satisfies ∣ξ0−ξ∗∣<ρ3 then the iteration defined by
[TABLE]
is well-defined and converges quadratically to ξ∗. 2) If s is of class C3 and the 3-term Taylor series of s in δ is given by
[TABLE]
with ∣s3(ξ,δ)∣<cs,3∣δ∣3 and F is of class C3. If Φ(ξ∗)G(ξ∗,δ[2])=0 for all δ∈T0 where
[TABLE]
then the Newton iteration in 1) converges cubically to ξ∗. 3) With the smoothness assumption in 2), but we do not assume Φ(ξ∗)G(ξ∗,δ[2])=0, then there exists a radius ρ4 such that for ∣ξ0−ξ∗∣<ρ4, the following iteration converges cubically to ξ∗
4. Constrained equations and Rayleigh/Rayleigh-Chebyshev iterations
We now consider a feasible set M defined by a full range constraint C(X)=0 with a retraction r. Let Π be an affine projection function from M to Lin(E,E), Π(X)2=Π(X) for X∈M. Denote EX=Im(Π(X)),TX=Null(C′(X)). In the subset of M considered, we assume
[TABLE]
We try to solve F(X)=0 for a function F from M to E satisfying the condition
[TABLE]
If (Π(X)F′(X))TX↓EX is invertible, (i.e. the system in claim 1 is nondegenerate), define
[TABLE]
Proposition 4.1**.**
Fix a feasible point Xf∈M, let r be a retraction around Xf with a retraction cap CXf(ρ)⊂M, parametrized by a retraction ball BXf(ρ)⊂TXf via Kf:=r(Xf,.). Let F be a map from M to E, Π be an affine projection. For X=r(Xf,ξ)∈C(Xf,ρ), consider the Newton increment and step
[TABLE]
assume they are well-defined and XN∈CXf(ρ). For ξ∈BXf(ρ), set Φ(ξ):=Π(r(Xf,ξ)), F(ξ):=F(r(Xf,ξ)). Then the Newton-type iteration ξNf of the problem F(ξ)=0
[TABLE]
with s defined in eq. 2.5 corresponds to the Newton iteration XN. That means
[TABLE]
Proof.
From the chain rule, F′(ξ) and its left-inverse F♮(ξ) in eq. 3.2 are
[TABLE]
where (r(Xf,.)′(ξ)) moves to the front in the inverse. For eqs. 4.8, 4.6 and 4.7, with X=r(Xf,ξ)
[TABLE]
We now translate the theorems in section 3 to results on constrained iterations.
Theorem 4.1** (Local convergence).**
Assume F and Π satisfy eqs. 4.1 and 4.2. Assume F(X∗)=0 and (Π(X∗)F′(X∗))TX∗↓EX∗ is invertible. Let r be a retraction around X∗. Then there is a radius ρ>0 defining a retraction ball BX∗(ρ)⊂TX∗ and cap CX∗(ρ)⊂M such that for X0∈CX∗(ρ), the iteration Xi+1=XiN defined in proposition 4.1 is well-defined and converges to X∗.
If Π(X∗)G(X∗;η[2])=0 for η∈TX∗, where G is a tensor-valued function on M defined by
[TABLE]
then the Newton iteration Xi+1=XiN converges cubically. Define the Chebyshev increment
[TABLE]
Then there is ρC>0 such that for X0∈CX∗(ρC), the Rayleigh-Chebyshev iteration Xi+1=r(Xi,XiNC) converge cubically to X∗.
Proof.
Using proposition 4.1, the Newton case follows from theorem 3.1. Set F(ξ)=F(r(Xf,ξ)) for a feasible point Xf, then differentiate F′(ξ)δ=F′(r(Xf,ξ))r(Xf,.)′(ξ;δ) in direction δ
[TABLE]
if X=r(Xf,ξ),η=r(Xf,.)′(ξ;δ) and G defined in eq. 3.13. Cubic convergence of the Newton algorithm if G vanishes follows. The statement for Rayleigh-Chebyshev also follows.
∎
4.1. Rayleigh quotient iteration for constrained equations
As mentioned, for the equation eq. 1.1, if Lλ(X,λ) is onto, as a linear map from EL to E, then there is a left-inverse Lλ−(X,λ) of Lλ(X,λ). If it is constructed smoothly, then with the Rayleigh quotient λ=R(X) solving Lλ−(X,λ)L(X,λ)=0, the projection Π(X):=IE−Lλ(X,R(X))Lλ−(X,R(X)) satisfies
[TABLE]
Let F(X)=L(X,R(X)), then Π(X)F(X)=F(X), thus, the results of the previous section apply. Using Π(X)Lλ(X,R(X))=0 to simplify Π(X)F′(X) and Π(X)G(X), the following theorem follows from the chain rule. The second derivatives of L are bilinear maps, LXX operates on two copies of E via the variable η below, LXλ operates on E×EL via η∈E and R′(X;η)∈EL, and Lλλ on R′(X;η)[2]∈EL2.
Theorem 4.2**.**
If the Rayleigh quotient λ=R(X) and the projection Π satisfies eq. 4.13, with L and R, C, r, Π are smooth, then with λ=R(X), the Newton increment XNT∈TX of the RQI iteration Xi+1=r(Xi,XNT) is the solution to the equation
[TABLE]
and the convergence criteria for F=L(X,R(X))=0 in theorem 4.1 apply. Let
[TABLE]
Assuming C, r, Π, L,R are smooth. For η∈TX, set ν=r(X,.)(2)(0;η[2]) and
[TABLE]
then Π(X)GL(X;η[2])=0 if and only if Π(X)G(X;η[2])=0 in eq. 4.11. Thus, if Π(X∗)GL(X∗;η[2])=0 for all η∈TX∗ and F♮(X∗) exists for a solution X∗ of F(X)=0, the RQI above converges cubically. If F♮(X∗) exists then the Rayleigh-Chebyshev iteration Xi+1=r(Xi,XNC) with
[TABLE]
converges cubically to X∗ if X0∈CX∗(ρ) for some ρ>0. If LX(X,λ)∈Lin(E,E) is invertible and C′(X)LX(X,λ)−1Lλ(X,λ) is invertible, XNT and Xτ could be computed in Schur form
[TABLE]
If L is affine in λ, L(X,λ)=A(X)−H(X)λ for two functions A(X) and H(X), then LX(X,λ)=A′(X)−H′(X)λ. Dropping the variable name X for brevity, if H− is a left inverse of H, we can take Lλ− to be −H−, giving us a Rayleigh quotient λ=H−A and F=A−HH−A=ΠA with Π=IE−HH−. Thus, in lemma 1.1, we can use A instead of F in the Schur form solution
[TABLE]
5. Cubic convergence and second covariant derivative
In section 1.2, with the projection function Π, we defined the vector bundle EΠ=∪X∈MIm(Π(X)) with a connection(∇cF)(X)=Π(X)F′(X,c(X))∈TX for a sectionF, Π(X)F(X)=F(X) and a vector field c, c(X)∈TX. In the unconstrained case, the Chebyshev iteration
[TABLE]
converge cubically, as seen in section 3. It is natural to ask if there is a relationship between the Rayleigh-Chebyshev term and ∇ in the constrained case. The answer is yes, with the concept of a second covariant derivative.
If C is the constrained function for M, for two vector fields c1,c2, functions from M to E satisfying C′(X)ci(X)=0, i=1,2,X∈M, then c2′(X;c1(X)) is not a vector field, as taking derivative of C′(X)c2(X)=0 in direction c1(X) does not give C′(X)c2′(X;c1(X))=0 but
[TABLE]
An adjustment of the form c2′(X,c1(X))+Γ(X,c1(X),c2(X)) is needed to get a vector field, called a covariant derivative of c2 at X along c1. As seen in section 1.2, a projection function to TM defines a covariant derivative (connection). A retraction r also defines a covariant derivative ∇r:
Proposition 5.1**.**
For a retraction r and two vector fields c1,c2, ∇c1rc2 defined by
[TABLE]
is a vector field, thus ∇r is a covariant derivative.
Proof.
Differentiate C′(r(X,tc1(X)))r(X,.)′(tc1(X);c2(X))=0 (from lemma 2.1) in t then set t=0 we get
The association of a connection with a retraction is discussed in [6, section 8.1.1]. Here, we give an explicit formula. A naive attempt for a geometric expression of the Chebyshev term is to consider an adjustment like 21∇η(∇ηF) , with ∇ is the connection from Π in section 1.2 and η is the Newton increment. To evaluate the outer
∇η, we need the inner to be a section, for this we need to replace η with a vector field. For two vector fields c1,c2
[TABLE]
The last term involves c2′(X;c1(X)), hence dependent on the extensions of η to vector fields. But ∇∇c1rc2F(X)=Π(X)F′(X;∇c1r(c2X)) is well-defined, hence ∇c1,c22F(X) given below is not dependent on derivatives of c1 or c2, it is the second covariant derivative using ∇ and ∇r
[TABLE]
Therefore, we can define 21∇η,η2F(X) as above, for vector fields satisfying c1(X)=c2(X)=η. [16] uses this expression as the third-order adjustment for the geodesic retraction. Compare with eq. 4.11, we have an extra term Π(X)Π′(X,η)F′(X)η. Differentiate Π(Y)Π′(Y;c(Y))F(Y)=0 (eq. A.1) in direction c(Y), Y∈M
[TABLE]
At a solution X∗, F(X∗)=0, the first three terms are zero, hence Π(X∗)Π′(X∗,η)F′(X∗)η=0 for any tangent vector η at X∗ if c(X∗)=η. Thus, this term does not affect the order of convergence.
6. Applications
The codes for these sections are found in folder colab in [17], implemented in Python and Julia (for two examples showing cubic convergence of Rayleigh-Chebyshev iterations).
6.1. Complex tensor eigenpairs
For our purpose, a tensor is a vector-valued homogeneous function. An example of the real tensor eigenpair problem is finding critical points of a scalar homogeneous function T^ under the constraint B^=1. Here, B^ is a scalar homogeneous polynomial, which leads to the equation gradT^=λgradB^. We will focus on equations of the form T=λB explained below, the constraints imposed may be unrelated to B.
We consider a complex-valued homogeneous vector function T, which may not be a gradient of a scalar function. Each entry of T is a scalar homogeneous polynomial of order m−1. The coefficients (ti1⋯im)1≤ij≤n,1≤j≤m of entry Ti1,i1=1⋯n can be arranged to what is called a multidimensional array (also called a tensor), we call T a tensor of dimension n order m. For k vectors Xn−k+1⋯Xn of size n in the base field, contraction means taking sum over the last k indices of (ti1⋯im) multiply by entries of Xj’s, j=n−k+1⋯n, the uncontracted entries are written I. Write X[k] for X repeated k times. Abuse of notation, T(X[m−1])=T(I,X[m−1]) is the vector homogeneous function evaluated at X, its Jacobian is (m−1)T(I,I,X[m−2]), while T^(X[m]) is the scalar function in the symmetric case, where gradT^=mT (We mostly follows [11]).
Let B be a homogeneous polynomial vector function of order d−1, the tensor eigenpair problem in eq. 1.4 solves for pairs (X,λ) satisfying T(Z[m−1])=λB(Z[d−1]).
We will focus on the case d=2, much of what is discussed here holds for d≥2. The number of complex eigenpairs is given as by ∑i=0n−1(m−1)i(d−1)n−i−1 ([13, 12]), with certain counting convention described there. An important case is the Z-pairs [3], [18], where B(Z)=Z
[TABLE]
with the normalization ZTZ=1 in the real case [3]. For the complex case, we normalize by Z∗Z=1 (∗ is the Hermitian transpose), and call the pairs unitary Z-pairs (UZ-pairs).
There are a few methods to find all real eigenpairs [13, 19], with the first citation using the homotopy method is state-of-the-art, which also finds complex pairs. In [11], the authors proposed a RQI-type algorithm, called O-NCM, which empirically finds all real Z-pairs in less time than the homotopy method, however, there is no certification that all real pairs are found, as there is no formula to count real pairs. Using the RQI method described in this paper, we extend the O-NCM method to the complex case, using the count ∑(m−1)i for complex (Z-pairs) to decide if we have found all eigenpairs, if all pairs are distinct, up to scaling.
If m>2, we can assume λ is real, or EL=R, by scaling λ by exp(−iθ) and Z by exp(−iθ/(m−2)), with θ is the angle of λ in polar coordinates. We treat Z as a real vector of dimension 2n.
From C(Z)=Z∗Z−1, we have C′(Z)η=2Re(Z∗η), thus a tangent vector satisfies Re(Z∗η)=0. We have Lλ(Z,λ)(δ)=−Zδ, δ∈R. A left inverse is Lλ−:ω↦−Re(Z∗ω)∈EL,ω∈E, thus
[TABLE]
To use theorem 4.2, we solve for the Rayleigh quotient λ=R(Z) from Lλ−(X,λ)L(Z,λ)=0, or Re(Z∗T(Z[m−1])−λReZ∗Z)=0
[TABLE]
The projection is Π(Z)ω=ω−ZRe(Z∗ω) for ω∈E. The Newton increment ZNT satisfies
[TABLE]
The Newton form eq. 1.17 is the complex version of the algorithm O-NCM op. cit. The tangent space is the same as the range EZ of Π(Z), both are nullspace of ω↦ReZ∗ω. In eq. 4.21, with A=T(Z[m−1]), H=Z, set ζ=LX−1H,ν=LX−1A, the Schur form solution is
[TABLE]
The corresponding Schur form in the real case is different from algorithm NCM in [11]. The authors found O-NCM outperforms NCM. Performance comparison between our RQI Newton and Schur forms is inconclusive, however, we focus on the Schur form for ease of implementation.
In algorithm 2, the initial point Z0 on the unitary sphere Z0∗Z0=1 is chosen randomly. To make sure we only count distinct pairs under the equivalent relation, we keep a table tracking all eigenpairs we have found within our search, identifying pairs where Z is scaled by a (m−2)-th root of 1. This algorithm finds all real pairs if the pairs are isolated. The approach has the advantage of a quadratic convergence algorithm for an individual pair. It does not work well with nonisolated zero, but some remedies could be applied. The speed to find all pairs depends on the distribution of the pairs; generally it works well for randomly generated tensors.
In UZPairsEigenTensor.ipynb in [17] (containing calculations for all examples here), we compare with examples in [19]. For several examples in that paper, RQI outperforms by a large time factor. For nonisolated eigenpairs or infinite pairs, both approaches need modifications. We will not go into details, but note the case where T is the gradient of the Motzkin’s polynomial T^(Z[6])=z04z1+z02z14+z26−3z02z12z22 [12, example 5.9], we found new complex pairs. The authors found 25 eigenpairs out of the 31 expected pairs. The 6 missing pairs are complex pairs, (satisfying ZTZ=0, the authors used the E-pair normalization ZTZ=1) with eigenvalues 121, eigenvectors of the form (z0,−zˉ0,0), 4z04=−1 (2 equivalent pairs) and 163, with eigenvectors (±2−1,±2−1,22) (4 equivalent pairs). Up to equivalence, we found 6 real pairs with eigenvalue [math], while the paper counted 14. A perturbation, adding a small diagonal tensor shows there are indeed 14 (8 complex and 6 real) pairs of small eigenvalues that collapse to the 6 real pairs, the eigenvectors (1,0,0) and (0,1,0) each has multiplicity 5, and each eigenvector (±31,±31,31) has multiplicity 1 for the count of 14. See [13, table 7] or [17] for the remaining pairs.
6.1.1. B eigenpairs
For the general problem L(X,λ)=T(X[m−1])−B(X[d−1])λ, it is easy to derive several RQIs based on different constraints, e.g. by using a linear constraint CTZ=1, where C is a randomly generated real vector [13], thus EL=C, which we will not go into detail.
We discuss briefly the case where T,B are symmetric, T=(1/m)T^′,B=(1/d)B^′ where T^,B^ are scalar homogeneous polynomials of order m and d, over R. Assuming B^(X[d])=1 has a nonempty solution set. We can use the constraint B^(X[d])=1 with the rescaling/projection retraction discussed in section 2.3. The tangent space at X is defined by the equation B(X[d−1])Tη=0. We can take Lλ−(X,λ)=XT, resulting in the Rayleigh quotient R(X)=T^(X[m]). The projection is defined by Π(X)ω=ω−B(X[d−1])XTω, hence Im(Π(X))=EX is defined by XTω=0. The RQI step can be computed as above.
To analyze cubic convergence, R′(X∗)η=mT(X∗[m−1])Tη=mλB(X∗[d−1])Tη=0, at an eigenpair (X∗,R(X∗) and η∈TX∗. By eq. 2.17, rηη(X,0)η[2] is proportional to X,
[TABLE]
From eq. 4.16, only ΠLXX remains in GL. Thus, for m=d=2, we have a cubically convergence iteration for the generalized eigenvalue problemAX−BXλ=0 for symmetric matrices A and B (T(X)=AX,B=BX).
[TABLE]
if B is invertible, where Xrscl is the rescaling of X∈Rn to satisfy XTBX=1. Table 1 compares RQI and Rayleigh-Chebyshev iteration for m=3,d=2 (see also [17]).
6.2. Constrained optimization
Recall in this problem, L(X,λ)=∇f(X)−C′(X)Tλ for a real-valued function f, Lλ(X,λ)=−C′(X)T. Let CX=C′(X) then the Moore-Penrose left inverse is Lλ−(X)=−(CXCXT)−1CX, and the equation Lλ−(X)L(X,λ)=0 implies λ=(CXCXT)−1CX∇f(X) which is known classically. This implies Π(X)=IE−CXT(CXCXT)−1CX. Newton’s method of this type has been studied before [20, 21], using the Newton form to solve for XNT. Let Q be an orthogonal basis of the range of Π(X), (which is the nullspace of CX), constructed from the QR decomposition of CXT. Then QTQ=I and Π(X)=QQT. In eq. 1.17, QΠ=QT=Q, and the Newton increment reduces to
[TABLE]
with QT∇f(X) called the projected gradient, QTLX(X,λ)Q the projected Hessian (the manifold literature calls Π(X)∇f(X) the projected gradient and Π(X)LX(X,λ) the projected Hessian). The Schur complement method is also used, see [22, Chapter 10].
6.3. Eigenvector and eigenvector with a constant term
We have shown our RQI with the constraint XTX−1=0 is the classical RQI for the eigenvalue problem, with LX(X,λ)η=(A−λI)η, Lλ(X,λ)=−X. With R(X)=XTAX, R′(X)η=ηTAX+XTAη=ηT(A+AT)X. Thus,
[TABLE]
When A is normal, at an eigenvector (X∗,λ) we have AX∗=ATX∗=λX∗, hence R′(X∗)η=2ληTX∗=0, thus G(X∗)=0, therefore we have cubic convergence.
When A is not normal, at an eigenvector X∗, only 2ηTATX∗ may be nonzero. To the extent we have tested, the Rayleigh-Chebyshev iteration seems not competitive for generic matrices, as although it eventually has cubic convergence, its area of convergence seems smaller than RQI.
For nonnormal matrices, the two-sided Rayleigh quotient algorithm in [9] has cubic convergence. It is a special case of the two-sided nonlinear eigenvalue algorithm in section 6.4.
For b∈Rn,b=0, an RQI for L(X,λ)=AX−λX−b=0 could be derived similarly. LX,Lλ,Lλ− are as above, the Rayleigh quotient is λ=XTAX−XTb. In the resulting Schur form iteration, LX−1L=X−(A−λI)−1b. Define Zunit=Z/∥Z∥ for Z∈Rn then
[TABLE]
The Chebyshev iteration is more stable than the eigenvector case as (A−λI) is often invertible near a solution. While requiring less iterations, it still does not outperform the RQI in total time. This does not rule out applications to special matrices. The expression for GL is unchanged, with η in eq. 6.6 and R′(X,η)=ηT(A+AT)X−ηTb. The RQI does not converge cubically even for symmetric matrices as ηTAX=ηTb near a solution. The Chebyshev increment is
[TABLE]
and the Chebyshev iteration is Xi+1=(Xi+ηC)unit. The last two columns of table 1 shows the RQI and Rayleigh-Chebyshev iterations for this case.
6.4. Nonlinear eigenvalue problem
A nonlinear RQI appeared in algorithm 4.9 in [2], proposed in [10]. We consider E=Rn, EL=R. Recall L(X,λ)=P(λ)X where P is a square matrix with polynomial entries.
Assume the unit sphere constraint C(X)=21(XTX−1). For η∈E, we have C′(X;η)=XTη and LX(X,λ)η=P(λ)η for η∈E, Lλ(X,λ)δ=P′(λ)Xδ for δ∈R. Assuming XTP′(λ)X=0, Lλ−(X,λ)ω=XTP′(λ)XXT is a left inverse of Lλ(X,λ). The equation Lλ−(X,λ)L(X,λ)=0 for the Rayleigh quotient implies XTP(λ)X=0, and λ=R(X) is solved from here. The projection is
[TABLE]
Applying algorithm 1 in Schur form, LX(X,λ)−1L(X,λ)=X,LX(X,λ)−1Lλ(X,λ)=P(λ)−1P′(λ)X,
[TABLE]
Thus, η+X is proportional to P(λ)−1P′(λ)X, and together with the retraction, the iteration could be given as Xi+1=∣P(λi)−1P′(λi)Xi∣P(λi)−1P′(λi)Xi.
For cubic convergence analysis, from eq. 4.16
[TABLE]
At a solution, P(λ∗)X∗=0, thus the last term vanishes. From the implicit function theorem
[TABLE]
If P is normal, R′(X∗,η)=0 at a solution, hence GL(X∗)=0, implying cubic convergence.
The two-sided iteration. Consider E=R2n and EL=R2, X=[uTvT]T∈E, u,v∈Rn and λ=[λ1,λ2]T∈EL. Set C(X)=21[uTu−1,vTv−1]T∈R2 and
[TABLE]
In matrix form, LX(X,λ)=P^(λ), Lλ and a left inverse Lλ− are
We have C′(X)=bdiag(uT,vT)∈R2×2n where bdiag denotes a rectangular block diagonal matrix and LX−1L=X. In algorithm 1
[TABLE]
where η is the Newton increment. Thus, components of X+η are proportional to P(λ1)−1P(λ1)u and P(λ2)T)−1P(λ2)Tv. Hence, the Newton step is
[TABLE]
For λ∈R, vTP(λ)u=uTP(λ)Tv, so in eq. 6.10 we can choose λ1=λ2, the resulting iteration converges to left and right eigenvectors of the same eigenvalue. We have cubic convergence in this case (if λ1=λ2, the iteration may still converge quadratically to different eigenvalues). In fact, if λ(u,v) solves vTP(λ)u=0, differentiate in the tangent direction η=(ηu,ηv)
[TABLE]
At an eigenvalue λ∗ with right and left eigenvectors u∗,v∗, the first two terms vanish, hence λ′(X∗,η)=0 if v∗TP′(λ∗)u∗=0. For X∗=(u∗,v∗),λ∗=(λ∗,λ∗), we have GL(X∗,λ∗)=0 since
[TABLE]
7. Concluding remarks
We gave an effective procedure to derive Rayleigh quotient iterations for constrained equations with nonlinear multipliers and provided a clear analysis of second and third-order convergence. The theory developed here explains classical cubic convergence results. It also gives an effective algorithm to find all complex tensor eigenpairs. In future work, we will provide a Kantorovich’s type theorem [8]. The non-isolated zeros case could also be studied.
Let T0 and E be two vector spaces, Ω⊂T0 be an open subset, Φ:Ω→Lin(E,E) be a smooth projection function. If Z∈Im(Φ(ξ))⊂E, hence Φ(ξ)Z=Z, then for η∈T0
[TABLE]
Proof.
Differentiate Φ(ξ)2Z=Z in direction η, we get Φ(ξ)Φ′(ξ;η)Z+Φ′(ξ;η)Φ(ξ)Z=0. Apply Φ(ξ) to both sides we get 2Φ(ξ)Φ′(ξ;η)Z=0.
∎
The first two equalities in 1) are immediate. Identify E and T0 with Rn and Rk so we can define transpose, if F♮ is defined, then Φ(ξ)F′(ξ) is of rank k, and B(ξ):=(Φ(ξ)F′(ξ))TΦ(ξ)F′(ξ)∈Lin(T0,T0) is invertible. The equation for F♮ could be written B(ξ)F♮(ξ)ω=(Φ(ξ)F′(ξ))TΦ(ξ)ω for ω∈E. Since B is differentiable, ξ↦B(ξ)−1 is differentiable, thus F♮(ξ) is differentiable.
For eq. 3.5,
differentiated Φ(ξ)F′(ξ)F♮(ξ)ω=Φ(ξ)ω in ξ in direction η then apply F♮(ξ)
[TABLE]
For 2), since F♮(ξ)Φ(ξ)=F♮(ξ), Φ(ξ)Y−F(ξ)=0 implies
[TABLE]
The converse follows by simplifying Φ(ξ)F′(ξ)F♮(ξ)[Y−F(ξ)]=0, using Φ(ξ)F(ξ)=F(ξ).
Consider G:(ξ,Y)↦F♮(ξ)[Y−F(ξ)], mapping D1×T0 to T0. If (ξ,Y) satisfies F(ξ)=Φ(ξ)Y
[TABLE]
for η∈T0, where using eq. 3.5 to expand (F♮)′, we note F♮(ξ)[Y−Φ(ξ)Y]=0, hence the first and last terms in eq. 3.5 vanish, then use eq. A.1. To apply the implicit function theorem for G, note that F♮(ξ)Φ′(ξ;.;Z)=0 for Z=F(ξ), hence, IT0−F♮(ξ)Φ′(ξ;.;Z)=IT0 is invertible, and near Z it is still invertible, thus we have eq. 3.7 from
[TABLE]
Differentiate the above in Y in direction ω1∈E, noting ξ=F⋄(Y) (and not constant) we have
[TABLE]
For Y=F(ξ)=:Z,ω1=ω, using eqs. 3.8 and A.1 to simplify the above
[TABLE]
Let ν=Φ′(ξ;F♮(ξ)ω)F(ξ)−F′(ξ)F♮(ξ)ω. Differentiate Φ(ξ)F(ξ)=F(ξ) in direction F♮(ξ)ω
[TABLE]
Thus, ν=−Φ(ξ)F(ξ)′F♮(ξ)ω. By eq. A.1,
F♮(ξ)Φ′(ξ;F♮(ξ)ω)ν=0. Let ψ=(ω+Φ′(ξ;F♮(ξ)ω)Z). Use eq. 3.5 and F♮(ξ)ψ=F♮(ξ)ω to expand F⋄(2)(Z;ω[2]) further to
[TABLE]
This proves eq. 3.9. Higher derivatives of F⋄ could be computed by repeated differentiation of F⋄′. Inductively, its derivatives could be expressed algebraically in terms of derivatives of F,Φ and powers of (IT0−F♮(ξ)Φ′(ξ;.)Y)−1, giving us the Ck condition with bounded derivatives.
From the mean value theorem, if F′(ξ) is bounded by bF′ in a domain then
[TABLE]
For 1), from the Taylor formula of F⋄ at ξi in 2) of proposition 3.1, set R2 to be the Taylor remainder
[TABLE]
in a ball in D1 where F⋄ is defined with bounded derivatives. Boundedness and bilinearity of F⋄(2) implies there is cF⋄>0 such that ∣F⋄(2)(ξ;δ[2])∣≤cF⋄∣δ∣2 for δ∈T0. Since ξi+1=s(ξi,δ2,i),
[TABLE]
provided δ2,i is defined and ∣δ2,i∣<ρ2. Thus, in D1, ∣δ2∣=∣F♮(ξ)F(ξ)∣≤∥F♮(ξ)∥∣F(ξ)∣≤bF♮bF′∣ξ−ξ∗∣ if ∥F♮(ξ)∥ and ∣F′(ξ)∣ are bounded by bF♮ and bF′, respectively. Together, we have
[TABLE]
For ξi to be well-defined for all i, we also need ∣δ2,i∣<ρ2. Thus, if ∣ξ0−ξ∗∣<ρ3 with
[TABLE]
then δ2,i and ξi are well-defined, ∣ξi−ξ∗∣ decreases with i, {ξi} converges quadratically to ξ∗.
For 2), write s2(ξi,δ2,i)=21s(ξi,.)(2)(0,δ2,i[2])+s3(ξi,δ2,i) and add 21F♮(ξ∗)G(ξ∗,δ2,i[2])=0 in eq. A.2
[TABLE]
if we bound F⋄(3)(.,[δ1,δ2,δ3])≤cF⋄,3∣δ1∣∣δ2∣∣δ3∣, hence the integral by cF⋄,3∫01(1−t)2∣F(ξi)∣∣F(ξi)∣2dt using the mean value theorem for F⋄(2) and F(ξ∗)=0, and similarly for the terms involving s3 and ∣s(ξ∗,.)[2]−s(ξi,.)[2]∣, eventually get an expression of the form c∣ξ∗−ξi∣3. It remains to estimate the expression Q below using eq. 3.9, adding −2F♮(ξi)Φ′(ξi,F♮(ξi)F(ξi))F(ξi)=0 from eq. A.1
[TABLE]
The first term is zero since F(ξ∗)=0, and the other terms are bounded by the mean value theorem, for example, for the last term, consider the mean value of
f(t)=F♮(ξ∗)Φ′(ξt,F♮(ξt)F(ξi))F(ξi) with ξ(t)=ξi+t(ξ∗−ξi), which gives us a term (ξ∗−ξi), together with the two terms F(ξi) and boundedness of derivatives giving an O(∣ξ∗−ξi∣3) estimate.
For 3), with R3 denotes the third-order Taylor remainder of F⋄ centered at F(ξi), from eqs. 3.8 and 3.9, (the term with 2F♮Φ′F is zero by eq. A.1)
[TABLE]
Let s3 be the third Taylor remainder for s, expand δ3,i, add and subtract 21F♮(ξi)Φ(2)(ξi,δ2,i[2])F(ξi)
[TABLE]
We show each term of the last expression above is O(∣ξi−ξ∗∣3). For R3, in a sufficiently small ball, there is a constant cF⋄,3 such that ∣F⋄(3)(ξ,δ[3])∣≤cF⋄,3∣δ∣3 for δ∈T0 and
[TABLE]
For the next term, bound ∥F♮(ξi)∥ by a constant in a ball, Φ(2)(ξ,δ2,i[2]) by bΦ,2∣δ2,i∣2 for a constant bΦ,2, and F(ξi) by bF′∣ξi−ξ∗∣. Since s(ξi,.)(2) is bilinear, expand δ3,i=δ2,i+(δ3,i−δ2,i), noting ∣δ2,i∣<c3∣ξ∗−ξi∣,
∣δ3,i−δ2,i∣=21∣F♮(ξ)G(ξi;δ2,i[2])≤c4∗∣δ2,i∣2≤c4∣ξ∗−ξi∣2 for constants c3,c4, the third term is bounded by
[TABLE]
The s3 term is third order. The choice of the initial point for the series to be well-defined is similar to the quadratic case.
Bibliography22
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1] E. Hazan, T. Koren, A linear-time algorithm for trust region problems., Math. Program. 158 (2016) 363––381. doi:https://doi.org/10.1007/s 10107-015-0933-y . · doi ↗
2[2] S. Güttel, F. Tisseur, The nonlinear eigenvalue problem, Acta Numer. 26 (2017) 1–94. doi:10.1017/S 0962492917000034 . · doi ↗
3[3] L. Qi, Eigenvalues and invariants of tensors, Journal of Mathematical Analysis and Applications 325 (2) (2007) 1363 – 1377. doi:https://doi.org/10.1016/j.jmaa.2006.02.071 . · doi ↗
4[4] M. Nechepurenko, On Chebyshev’s method for functional equations, Uspekhi Mat. Nauk 9 (2) (1954) 163––170.
5[5] R. L. Adler, J. Dedieu, J. Y. Margulies, M. Martens, M. Shub, Newton’s method on Riemannian manifolds and a geometric model for the human spine, IMA Journal of Numerical Analysis 22 (3) (2002) 359–390.
6[6] P.-A. Absil, R. Mahony, R. Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, Princeton, NJ, USA, 2007.
7[7] J. Wang, Convergence of Newton’s method for sections on Riemannian manifolds, Journal of Optimization Theory and Applications 148 (2011) 125–145.
8[8] O. Ferreira, B. Svaiter, Kantorovich’s Theorem on Newton’s Method in Riemannian Manifolds, Journal of Complexity 18 (1) (2002) 304–329.