A Riemannian Derivative-Free Polak-Ribiere-Polyak Method for Tangent Vector Field
Teng-Teng Yao, Zhi Zhao, Zheng-Jian Bai, Xiao-Qing Jin

TL;DR
This paper introduces a derivative-free optimization method on Riemannian manifolds for finding zeros of tangent vector fields, combining a Polak-Ribiere-Polyak approach with a hybrid Newton method, supported by convergence analysis and numerical experiments.
Contribution
It proposes a novel Riemannian derivative-free optimization algorithm with a hybrid scheme and convergence guarantees, addressing tangent vector field zero-finding problems.
Findings
The method converges globally under mild conditions.
Numerical experiments demonstrate improved efficiency.
The hybrid approach outperforms standalone methods.
Abstract
This paper is concerned with the problem of finding a zero of a tangent vector field on a Riemannian manifold. We first reformulate the problem as an equivalent Riemannian optimization problem. Then we propose a Riemannian derivative-free Polak-Ribi\'ere-Polyak method for solving the Riemannian optimization problem, where a non-monotone line search is employed. The global convergence of the proposed method is established under some mild assumptions. To further improve the efficiency, we also provide a hybrid method, which combines the proposed geometric method with the Riemannian Newton method. Finally, some numerical experiments are reported to illustrate the efficiency of the proposed method.
| DIM. | CT. | IT. | NF. | Res0. | Res. | |
| 1000 | 29535 | 0.6938 s | 131.7 | 137.7 | 1.5558 | |
| 2000 | 59535 | 2.5411 s | 147.5 | 154.1 | 1.5714 | |
| 3000 | 89535 | 5.1273 s | 179.4 | 185 | 1.5780 | |
| 4000 | 119535 | 7.8231 s | 176.3 | 185.7 | 1.5746 | |
| 5000 | 149535 | 14.0162 s | 186.9 | 195.9 | 1.5710 | |
| 6000 | 179535 | 19.1962 s | 188.3 | 198.1 | 1.5707 | |
| 7000 | 209535 | 25.5629 s | 179.4 | 189.8 | 1.5765 | |
| 8000 | 239535 | 33.7059 s | 176.3 | 197.1 | 1.5816 | |
| 9000 | 269535 | 41.5631 s | 186.9 | 194.1 | 1.5789 | |
| 10000 | 299535 | 54.4226 s | 188.3 | 202.7 | 1.5754 | |
| DIM. | CT. | IT. | NF. | Res0. | Res. | |
| 20 | 59790 | 3.9187 s | 186.1 | 193.7 | 1.2880 | |
| 40 | 119180 | 5.7383 s | 169.0 | 175.2 | 1.8110 | |
| 60 | 178170 | 9.1466 s | 170.9 | 179.1 | 2.2136 | |
| 80 | 236760 | 11.8402 s | 165.5 | 176.1 | 2.5571 | |
| 100 | 294950 | 15.9621 s | 155.2 | 165.6 | 2.8397 | |
| 120 | 352740 | 17.1358 s | 151.5 | 160.9 | 3.1074 | |
| 140 | 410130 | 19.2442 s | 139.3 | 148.9 | 3.3493 | |
| 160 | 467120 | 21.7834 s | 134.7 | 143.9 | 3.5464 | |
| 180 | 523710 | 24.5269 s | 131.8 | 142.4 | 3.7413 | |
| 200 | 579900 | 30.2070 s | 157.5 | 149.1 | 3.9330 | |
| DIM. | CT. | IT. | NF. | Res0. | Res. | |
| 200 | 5535 | 0.1362 s | 100.6 | 112.4 | ||
| 400 | 11535 | 0.4595 s | 114.6 | 128.2 | ||
| 600 | 17535 | 1.5889 s | 135.5 | 149.1 | ||
| 800 | 23535 | 1.5889 s | 124.8 | 138.6 | ||
| 1000 | 29535 | 2.6262 s | 139.3 | 156.9 | ||
| 2000 | 59535 | 13.6532 s | 219.3 | 235.5 | ||
| 3000 | 89535 | 36.7627 s | 276.1 | 291.9 | ||
| 4000 | 119535 | 63.2379 s | 275.2 | 292.6 | ||
| 5000 | 149535 | 120.2924 s | 307.0 | 325.6 | ||
| DIM. | CT. | IT. | NF. | Res0. | Res. | |
| 20 | 39790 | 8.6164 s | 170.7 | 186.1 | ||
| 40 | 79180 | 13.0390 s | 196.6 | 212.8 | ||
| 60 | 118170 | 17.9773 s | 211.5 | 228.9 | ||
| 80 | 156760 | 22.1530 s | 198.1 | 215.3 | ||
| 100 | 194950 | 31.0476 s | 241.0 | 258.6 | ||
| 120 | 232740 | 38.5245 s | 238.3 | 258.7 | ||
| 140 | 270130 | 42.8672 s | 232.4 | 249.8 | ||
| 160 | 307120 | 53.2935 s | 252.0 | 270.6 | ||
| 180 | 343710 | 48.4526 s | 203.9 | 223.1 | ||
| 200 | 379900 | 60.0266 s | 222.1 | 240.3 | ||
| DIM. | CT. | IT. | NF. | Res0. | Res. | |
|---|---|---|---|---|---|---|
| 100 | 5050 | 0.0159 s | 5.9 | 7.0 | ||
| 200 | 20100 | 0.0432 s | 6.2 | 7.2 | ||
| 300 | 45150 | 0.1038 s | 6.4 | 7.4 | ||
| 400 | 80200 | 0.2603 s | 6.5 | 7.5 | ||
| 500 | 125250 | 0.4486 s | 6.6 | 7.6 | ||
| 600 | 180300 | 0.6918 s | 6.3 | 7.3 | ||
| 700 | 245350 | 1.0513 s | 6.4 | 7.4 | ||
| 800 | 320400 | 1.5852 s | 6.6 | 7.6 | ||
| 900 | 405450 | 2.1248 s | 6.6 | 7.6 | ||
| 1000 | 500500 | 2.8110 s | 6.5 | 7.5 |
| PRP-Newton | CT. | IT. | NF. | NCG. | Res0. | Res. | ||
|---|---|---|---|---|---|---|---|---|
| 1000 | PRP Step | 0.0970 s | 13 | 20 | ||||
| Newton Step | 3.5740 s | 12 | 13 | 1073 | ||||
| PRP Step | 0.4780 s | 85 | 92 | |||||
| Newton Step | 1.3960 s | 2 | 3 | 425 | ||||
| 2000 | PRP Step | 0.1720 s | 13 | 18 | ||||
| Newton Step | 15.9840 s | 16 | 17 | 1761 | ||||
| PRP Step | 1.1880 s | 95 | 100 | |||||
| Newton Step | 6.5630 s | 2 | 3 | 705 | ||||
| 3000 | PRP Step | 0.5150 s | 13 | 22 | ||||
| Newton Step | 46.9220 s | 22 | 23 | 1967 | ||||
| PRP Step | 3.3630 s | 114 | 123 | |||||
| Newton Step | 14.8350 s | 2 | 3 | 621 | ||||
| 4000 | PRP Step | 0.7770 s | 13 | 20 | ||||
| Newton Step | 155.6490 s | 38 | 39 | 3940 | ||||
| PRP Step | 5.3730 s | 114 | 121 | |||||
| Newton Step | 33.8150 s | 2 | 3 | 859 | ||||
| 5000 | PRP Step | 1.1880 s | 13 | 18 | ||||
| Newton Step | 315.0050 s | 49 | 50 | 5008 | ||||
| PRP Step | 13.7560 s | 184 | 189 | |||||
| Newton Step | 136.9180 s | 5 | 6 | 2154 | ||||
| PRP-Newton | CT. | IT. | NF. | NCG. | Res0. | Res. | ||
| 20 | PRP Step | 0.3950 s | 12 | 21 | ||||
| Newton Step | 48.8860 s | 23 | 24 | 2605 | ||||
| PRP Step | 2.9630 s | 132 | 141 | |||||
| Newton Step | 15.7150 s | 4 | 5 | 838 | ||||
| 40 | PRP Step | 0.5300 s | 13 | 18 | ||||
| Newton Step | 104.7220 s | 30 | 31 | 3771 | ||||
| PRP Step | 3.4950 s | 98 | 103 | |||||
| Newton Step | 38.1140 s | 3 | 4 | 1348 | ||||
| 60 | PRP Step | 0.9530 s | 15 | 30 | ||||
| Newton Step | 63.3580 s | 15 | 16 | 1778 | ||||
| PRP Step | 6.4520 s | 121 | 136 | |||||
| Newton Step | 27.8180 s | 2 | 3 | 726 | ||||
| 80 | PRP Step | 1.5960 s | 16 | 27 | ||||
| Newton Step | 128.4430 s | 22 | 23 | 2445 | ||||
| PRP Step | 12.2660 s | 159 | 170 | |||||
| Newton Step | 56.0430 s | 3 | 4 | 1062 | ||||
| 100 | PRP Step | 2.1910 s | 17 | 30 | ||||
| Newton Step | 251.5150 s | 25 | 26 | 3773 | ||||
| PRP Step | 11.0250 s | 105 | 118 | |||||
| Newton Step | 76.9210 s | 3 | 4 | 1148 | ||||
| PRP-Newton | CT. | IT. | NF. | NCG. | Res0. | Res. | ||
|---|---|---|---|---|---|---|---|---|
| 1000 | PRP Step | 1.2650 s | 81 | 98 | ||||
| Newton Step | 4.6090 s | 3 | 4 | 629 | ||||
| PRP Step | 2.5630 s | 161 | 178 | |||||
| Newton Step | 1.5150 s | 1 | 2 | 189 | ||||
| 2000 | PRP Step | 5.9340 s | 90 | 105 | ||||
| Newton Step | 29.3860 s | 3 | 4 | 855 | ||||
| PRP Step | 10.9610 s | 174 | 189 | |||||
| Newton Step | 9.1750 s | 1 | 2 | 265 | ||||
| 3000 | PRP Step | 13.8460 s | 98 | 111 | ||||
| Newton Step | 64.1920 s | 3 | 4 | 860 | ||||
| PRP Step | 26.5980 s | 197 | 210 | |||||
| Newton Step | 20.1030 s | 1 | 2 | 260 | ||||
| 4000 | PRP Step | 34.0490 s | 138 | 155 | ||||
| Newton Step | 110.148 s | 3 | 4 | 850 | ||||
| PRP Step | 67.5570 s | 289 | 306 | |||||
| Newton Step | 34.1720 s | 1 | 2 | 257 | ||||
| 5000 | PRP Step | 37.3010 s | 91 | 116 | ||||
| Newton Step | 214.2700 s | 3 | 4 | 936 | ||||
| PRP Step | 75.7800 s | 199 | 224 | |||||
| Newton Step | 54.7750 s | 1 | 2 | 276 | ||||
| PRP-Newton | CT. | IT. | NF. | NCG. | Res0. | Res. | ||
| 20 | PRP Step | 5.8470 s | 95 | 121 | ||||
| Newton Step | 14.6980 s | 2 | 3 | 461 | ||||
| PRP Step | 14.6830 s | 199 | 225 | |||||
| Newton Step | 9.2240 s | 1 | 2 | 239 | ||||
| 40 | PRP Step | 7.9220 s | 109 | 128 | ||||
| Newton Step | 65.6650 s | 3 | 4 | 1799 | ||||
| PRP Step | 16.3930 s | 235 | 254 | |||||
| Newton Step | 8.9040 s | 1 | 2 | 240 | ||||
| 60 | PRP Step | 7.2970 s | 95 | 114 | ||||
| Newton Step | 46.2810 s | 4 | 5 | 1142 | ||||
| PRP Step | 13.6400 s | 179 | 198 | |||||
| Newton Step | 9.8130 s | 1 | 2 | 240 | ||||
| 80 | PRP Step | 11.2390 s | 94 | 111 | ||||
| Newton Step | 50.2640 s | 3 | 4 | 847 | ||||
| PRP Step | 22.1340 s | 193 | 210 | |||||
| Newton Step | 15.2620 s | 1 | 2 | 252 | ||||
| 100 | PRP Step | 15.1400 s | 104 | 121 | ||||
| Newton Step | 148.2740 s | 6 | 7 | 2104 | ||||
| PRP Step | 28.7670 s | 212 | 229 | |||||
| Newton Step | 18.3350 s | 1 | 2 | 257 | ||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIterative Methods for Nonlinear Equations · Advanced Optimization Algorithms Research · Statistical and numerical algorithms
A Riemannian Derivative-Free Polak-Ribiére-Polyak Method for Tangent Vector Field
Teng-Teng Yao Department of Mathematics, School of Sciences, Zhejiang University of Science and Technology, Hangzhou 310023, People’s Republic of China ([email protected]). The research of this author is supported by the National Natural Science Foundation of China (No. 11701514).
Zhi Zhao Department of Mathematics, School of Sciences, Hangzhou Dianzi University, Hangzhou 310018, People’s Republic of China ([email protected]). The research of this author is supported by the National Natural Science Foundation of China (No. 11601112).
Zheng-Jian Bai Corresponding author. School of Mathematical Sciences and Fujian Provincial Key Laboratory on Mathematical Modeling & High Performance Scientific Computing, Xiamen University, Xiamen 361005, People’s Republic of China ([email protected]). The research of this author is partially supported by the National Natural Science Foundation of China (No. 11671337), the Natural Science Foundation of Fujian Province of China (No. 2016J01035), and the Fundamental Research Funds for the Central Universities (No. 20720180008).
Xiao-Qing Jin Department of Mathematics, University of Macau, Macao, People’s Republic of China ([email protected]). The research of this author is supported by the research grant MYRG2016-00077-FST from University of Macau.
Abstract
This paper is concerned with the problem of finding a zero of a tangent vector field on a Riemannian manifold. We first reformulate the problem as an equivalent Riemannian optimization problem. Then we propose a Riemannian derivative-free Polak-Ribiére-Polyak method for solving the Riemannian optimization problem, where a non-monotone line search is employed. The global convergence of the proposed method is established under some mild assumptions. To further improve the efficiency, we also provide a hybrid method, which combines the proposed geometric method with the Riemannian Newton method. Finally, some numerical experiments are reported to illustrate the efficiency of the proposed method.
Keywords. Tangent vector field, Riemannian manifold, Polak-Ribiére-Polyak method, non-monotone line search.
2010 AMS subject classifications. 65K05, 90C30, 90C56.
1 Introduction
Let be a finite-dimensional Riemannian manifold and let be the Riemannian metric on with its induced norm . Let denote the Riemannian connection on induced by the Riemannian metric . Let be the tangent space of at a point and be the tangent bundle of . In this paper, we aim to find a zero of a continuously differentiable tangent vector field , i.e., find such that
[TABLE]
where is the zero tangent vector of .
Such smooth tangent vector fields arise in many applications such as geodesic convex optimizations on Riemannian manifolds where the gradients of the convex objective functions are geodesic monotone vector fields [11, 27], statistical principal component analysis where the Oja’s flow leads to the Oja’s vector field [29, 30], the discretized Kohn-Sham (KS) total energy minimization in electronic structure calculations [6, 26, 32], and the trace ratio optimization in the linear discriminant analysis (LDA) for dimension reduction [28, 41, 42] where the corresponding eigenvector-dependent nonlinear eigenvalue problems are smooth tangent vector fields, etc.
In particular, for multivalued monotone tangent vector fields on Hadamard manifolds, several proximal point algorithms have been proposed in [17, 23, 34, 35, 36], where the convergence analysis is investigated under some different assumptions. However, these proximal point algorithms are mainly restricted to finding zeros of monotone tangent vector fields.
For smooth tangent vector fields on general Riemannian manifolds, Riemannian Newton method was widely studied (see for instance [1, 3, 14, 24]). In [1, Section 6.1], Absil et al. presented a geometric Newton method for solving (1.1): Given current , solve the Riemannian Newton equation
[TABLE]
for and set
[TABLE]
where is a retraction defined on [1, Definition 4.1.1] and for , is the restriction of to . Here, denotes the Jacobian of at a point , which is a linear operator from to defined by [1, p.111]
[TABLE]
With respect to the Riemannian metric , the adjoint of is defined by
[TABLE]
The quadratic convergence of the Riemannian Newton method was established under the nonsingularity assumption of the Jacobian of at a solution point [1, Theorem 6.3.2]. In [2], Absil et al. also proposed a geometric Newton method for finding a zero of Oja’s vector field.
The advatange of a geometric Newton method lies in its quadratic convergence. However, it is often computationally costly to solve the Riemannian Newton equation, especially when the Jacobian is ill-conditioned. In the case of large-scale problems, the Jacobians of some tangent vector fields (e.g., monotone tangent vector fields on Hadamard manifolds) may not be easily available. Finally, the convergence of the Riemannian Newton method also depends on the starting point. Therefore, it is indispensable to find an efficient globally convergent Jacobian-free method for solving (1.1), especially for large-scale problems.
In recent years, some derivative-free optimization methods have been proposed for solving nonlinear systems of equations of the form of defined on Euclidean spaces [8, 9, 10, 16, 25, 39, 40], where is a continuously differentiable mapping. These methods use at the current iterate as a search direction and their global convergence are guaranteed by using some non-monotone line search techniques. These methods need not to form the Jacabian matrices and require a small storage space and thus are applicable to solving large-scale nonlinear systems of equations. Sparked by this, in this paper, we propose a Riemannian Derivative-Free Polak-Ribiére-Polyak (PRP) conjugate gradient method for solving (1.1). The global convergence is established under some assumptions. We apply the proposed method to finding zeros of Oja’s vector fields, the tangent vector field corresponding to the trace ratio optimization problem, and monotone tangent vector fields on Hadamard manifolds accordingly. Finally, we combine the proposed method with the Riemannian Newton method to get a solution of high accuracy.
The remaining part of this paper is organized as follows. In section 2, we present a Riemannian derivative-free PRP conjugate gradient method for solving (1.1). In section 3, we give the global convergence of the proposed method under some basic assumptions. In section 4, the proposed method is used to find zeros of tangent vector fields for some practical applications. In section 5, we present a hybrid method. Finally some concluding remarks are given in section 6.
2 A Riemannian Derivative-free Polak-Ribiére-Polyak Method
We first recall the Riemannian nonlinear conjugate gradient method for solving the following optimization problem
[TABLE]
where is a continuously differentiable function. A nonlinear conjugate gradient method aims to update the current iterate by
[TABLE]
where the step length is determined by a line search. The search direction is given by
[TABLE]
where is a scalar, is the Riemannian gradient of at the point , and is a vector transport associated with the retraction [1, Definition 8.1.1]. In particular, for the Riemannian PRP method in [1, p.182], the parameter is given by
[TABLE]
For more Riemannian nonlinear conjugate gradient methods, one may refer to [1, 15, 31, 33, 38, 44, 45]. However, for any Riemannian nonlinear conjugate gradient method for solving problem (2.1), the Riemannian gradient of is needed.
To solve (1.1), it is natural to consider the following minimization problem
[TABLE]
Since is continuously differentiable, the function is also continuously differentiable. By the definition of Riemannian gradient and using the compatibility of Riemannian connection with the Riemannian metric , we have
[TABLE]
for all . Thus,
[TABLE]
In order to apply the Riemannian PPR method determined by (2.2), (2.3), and (2.4) for solving problem (2.5), we need the Riemannian gradient of . By using (2.6), to calculate the Riemannian gradient of at the current iterate , we need to compute the adjoint of the Jacobian of at . If the Jacobian of is not available or numerically expensive to calculate, then it is unsuitable to directly apply a Riemannian nonlinear conjugate gradient method to problem (2.5).
In the following, we propose a derivative-free PRP method for solving (1.1). This is motivated by the derivative-free PRP method for solving a large-scale nonlinear system of equations of the form with being continuously differentiable [25], where the search direction uses and at the current iterate and the previous iterate and a non-monotone line search is used. In particular, we use the PRP method defined by (2.2), (2.3), and (2.4) to problem (2.5), where the Riemannian gradients of at the current iterate is replaced by the values of the tangent vector field at , and a Riemannian nonmonotnoe line search is employed. We now describe a Riemannian derivative-free PRP algorithm for solving (1.1) as follows.
Algorithm 2.1
(A Riemannian derivative-free PRP method (RDF-PRP))**
Step 0.
Choose an initial point , , , , , . Let , , . Select a positive sequence such that
[TABLE]
Step 1.
If , then stop. Otherwise, go to Step 2.
Step 2.
Set
[TABLE]
where
[TABLE]
Step 3.
*Determine such that *
if
[TABLE]
then set
[TABLE]
Else if
[TABLE]
then set
[TABLE]
Step 4.
Choose and compute
[TABLE]
Step 5.
Replace by and go to Step 1.
We point out that the non-monotone line search in Step 3 of Algorithm 2.1 can be seen as a generalization of that in [7, 25]. Let
[TABLE]
By following the similar proof of [7, Lemma 2.2], for any choice of , we have for all that
[TABLE]
Then condition (2.10) or (2.12) holds for some . This shows that the line search step in Algorithm 2.1 is well-defined.
3 Convergence Analysis
In this section, we establish the global convergence of Algorithm 2.1. To facilitate the analysis, we define the pullback of through by [1, p.55]
[TABLE]
For , let denote the restriction of to , i.e.,
[TABLE]
We also need the following assumptions.
Assumption 3.1
The level set is bounded, where is a constant defined by (2.7). 2. 2.
In some neighborhood of , is continuously differentiable and is Lipschitz continuous with respect to the vector transport , i.e., there is a constant such that
[TABLE]
for all and with . 3. 3.
The vector transport is bounded, i.e., there exists a constant such that
[TABLE]
for all and .
Under Assumption 3.1, the tangent vector field is bounded on , i.e., there exists a constant such that
[TABLE]
By using the continuity of and Assumption 3.1, it is easy to see that the level set is closed and bounded and thus is a compact subset of . According to Corollary 7.4.6 in [1], there exist two scalars and such that
[TABLE]
for and with . If the vector tansport is chosen as the parallel translation, then the inequality in (3.2) holds as an equality with . Specially, if is an embedded Riemannian submanifold of a Euclidean space and is defined through orthogonal projection by the formula (8.10) in [1, p.174], then the inequality in (3.2) holds with .
To establish the global convergence Algorithm 2.1, we need the following preliminary lemma whose proof is similar to that of Lemma 3.2 and Lemma 3.3 in [25], and thus we omit it here.
Lemma 3.2
Suppose Assumption 3.1 is satisfied. Then the sequence generated by Algorithm 2.1 is contained in . In addition, we have
[TABLE]
For the search directions generated by Algorithm 2.1, we have the following result. The proof can be can seen as a generalization of [25, Lemma 3.4].
Lemma 3.3
Suppose Assumption 3.1 is satisfied and Algorithm 2.1 generates infinite sequences and . If the sequence is bounded below by a constant , i.e.,
[TABLE]
then there exists a constant such that
[TABLE]
and
[TABLE]
Proof: We first prove (3.6). From (2.9), (2.11), (2.13), (3.1), and (3.4) it follows that for all sufficiently large,
[TABLE]
It follows from (2.8), (2.9), (3.2), (3.3), and (3.8) that for all sufficiently large,
[TABLE]
By Lemma 3.2, for any constant , there exists an index such that
[TABLE]
This, together with (3.9), yields for all ,
[TABLE]
Hence, (3.6) holds by setting .
Next, we prove (3.7). By using Lemma 3.2, (3.1), (3.5), (3.6), and (3.8) we have for all sufficiently large that
[TABLE]
This, together with Lemma 3.2, yields (3.7).
On the global convergence of Algorithm 2.1, we have the following theorem. The proof is a generalization of Theorem 3.5 in [25] and Theorem 1 in [9].
Theorem 3.4
Suppose Assumption 3.1 is satisfied and Algorithm 2.1 generates an infinite sequence . Then we have
[TABLE]
or for any accumulation point of
[TABLE]
Proof: Let be any accumulation point of the sequence . One may assume that , taking a subsequence if necessary. By Lemma 3.2 we have
[TABLE]
If , then it follows from (3.11) that
[TABLE]
In this case, since is continuous and .
In the following, we assume that and . From Step 3 of Algorithm 2.1, taking a subsequence if necessary, we may assume that satisfies neither (2.10) nor (2.12) for large enough and thus
[TABLE]
and
[TABLE]
From (2.15) we have . This, together with (3.12), yields
[TABLE]
From Assumption 3.1, it follows that
[TABLE]
By hypothesis, the sequence is bounded from below. Thus, the condition (3.5) in Lemma 3.3 is satisfied. By using Lemma 3.3 and (3.3) we have
[TABLE]
where . Hence,
[TABLE]
Let for all . It follows from (2.8) that for ,
[TABLE]
By the mean value theorem and using (3.15), there exists a such that
[TABLE]
This, together with (3.14), yields
[TABLE]
By Lemma 3.2 and using the smoothness and local rigidity condition of retraction [1, (4.2)] we have
[TABLE]
where denotes the identity operator on . From Lemma 3.2, Lemma 3.3, (2.11), (2.13), and using the smoothness and consistency condition of vector transport [1, Definition 8.1.1], we obtain
[TABLE]
By using Lemma 3.2, Lemma 3.3, (3.17), (3.18), and taking limits in (3.16), we have
[TABLE]
Similarly, we can deduce from (3.13) that
[TABLE]
The equality (3.10) follows from the last two inequalities.
From Theorem 3.4, we have the following corollary.
Corollary 3.5
Suppose Assumption 3.1 is satisfied and Algorithm 2.1 generates an infinite sequence . Let be an accumulation point of . If
[TABLE]
then .
Suppose is a strongly geodesic monotone vector field [12, 23, 27, 37] and is continuously differentiable, then there exists a positive constant such that
[TABLE]
By Corollary 3.5 and (3.19), we have the following result.
Corollary 3.6
Suppose or is strictly monotone and continuously differentiable, and Algorithm 2.1 generates an infinite sequence . Then every accumulation point of is a zero of .
4 Numerical Experiments
In this section, we consider the application of Algorithm 2.1 to finding zeros of Oja’s vector fields [2], the tangent vector field corresponding to the trace ratio optimization problem [28, 41, 42], and monotone tangent vector fields on Hadamard manifolds [13]. All numerical tests are carried out using MATLAB R2010a on a Lenovo Laptop Intel(R) Core(TM)2 i7-8550U with a 1.80 GHz CPU and 16-GB RAM.
In our numerical tests, we set , , , , , and for all . In Step 3 of Algorithm 2.1, the initial steplength is set to be
[TABLE]
where
[TABLE]
The stopping criterion for Algorithm 2.1 for solving (1.1) is set to be [9, 25]
[TABLE]
where , , and denotes the dimension of .
For comparison purposes, we repeat our experiments over different random generated problems. In our numerical tests, ‘DIM.’ denotes the dimension of , ‘CT.’, IT.’, and ‘NF.’ mean the averaged total computing time in seconds, the averaged number of iterations, the averaged number of function evaluations at the final iterates of our algorithm accordingly. In addition, ‘Res0.’ and ‘Res.’ denote the averaged residual at the initial iterates and final iterates of our algorithm, respectively.
Example 4.1
We consider the problem of finding a zero of Oja’s vector field defined by real symmetric positive-definite matrices [2]. Let be a symmetric positive-definite matrix, and be a positive integer smaller than . The Oja’s vector field associated with is given by [2, 29, 30]
[TABLE]
Suppose is of full column rank. Then is a solution to if and only if the column space of is an invariant subspace of and is orthonormal (i.e., ) (see [2, Proposition 2.1]), where is the identity matrix of order . Thus we can restrict the nonlinear map to the compact Stiefel manifold [1, p.26], i.e., . The dimension of the Stiefel manifold is equal to [1, p.27]. Let , which is the orthogonal group [1, p.27]. Since for any , the zeros of are degenerate, thus Newton’s method can’t be applied directly. To apply Riemannian Newton’s method, one need to restrict to the Grassmann manifold , while the application of Algorithm 2.1 to finding a zero of (4.1) does not need the nondegeneracy condition of the zeros of . Let be endowed with induced Riemannian metric from , i.e.,
[TABLE]
The retraction on is chosen as [1, p.59]
[TABLE]
for all and , where is the factor of the QR decomposition of with . Here, the set denotes the set of all real matrices with linearly independent columns, , and is an upper triangular matrix with strictly positive diagonal elements. The orthogonal projection of a matrix onto is given by
[TABLE]
where and for a real square matrix. Since is an embeded submanifold of , we may adopt the vector transport defined by [1, p.174]
[TABLE]
for , where . Thus condition (3.2) in Assumption 3.1 is satisfied.
We consider the problem of finding a zero of the Oja’s vector field defined by (4.1) with varying and . Let be a random matrix generated by the MATLAB built-in functions rand, randn, and qr:
[TABLE]
Thus is a random symmetric positive-definite matrix with uniformly distributed eigenvalues in the interval . The starting points are randomly generated by the MATLAB built-in functions randn and qr:
[TABLE]
Table 4.1 lists the numerical results for Example 4.1. We observe from Table 4.1 that the iteration number and the number of function evaluations do not change obviously with the increase of the dimension of the Stiefel manifold . This indicates that Algorithm 2.1 is stable and suitable for solving large-scale problems.
To further illustrate the effectiveness of our algorithm, in Figure 4.1, we give the convergence history of Algorithm 2.1 for two tests with and . Figure 4.1 depicts the logarithm of the residual versus the number of iterations for finding a zero of Oja’s vector field defined in Example 4.1. The convergence trajectory indicates that the residual decreases steadily as the number of iterations increases.
Example 4.2
We consider the problem of finding a zero of the tangent vector field corresponding to the first-order optimization conditions for the trace ratio optimization problem [28, 41, 42]. Let be real symmetric matrices with being positive-definite and be a positive integer smaller than . The tangent vector field is given by [41, Theorem 2.1]
[TABLE]
where
[TABLE]
and for any real symmetric matrix . We choose the retraction on as in (4.2). The vector transport on is chosen the same as (4.3) and thus condition (3.2) in Assumption 3.1 is satisfied.
We consider the problem of finding a zero of the tangent vector field defined by (4.4) with varying and . Let be random matrices generated by the MATLAB built-in functions rand, randn, orth, diag, and ones [5]:**
[TABLE]
The starting points are randomly generated by the MATLAB built-in functions randn and qr:
[TABLE]
In Table 4.2, we report numerical results for Example 4.2 with varying values of and . In Figure 4.2, we give the convergence history of Algorithm 2.1 for two tests with and . Figure 4.2 depicts the logarithm of the residual versus the number of iterations for finding a zero of the tangent vector field defined in (4.4). We see from Table 4.2 and Figure 4.2 that Algorithm 2.1 is stable and efficient for solving large-scale problems.
Example 4.3
Let denote the set of all real symmetric positive definite matrices. Endowing with the following Riemannian metric
[TABLE]
Thus, is a Hadamard manifold manifold of nonpositive curvature everywhere [22, 36]. The dimension of is equal to [21, Proposition 2.1]. The geodesic monotone vector field is defined by [13]
[TABLE]
The retraction on is chosen as [22, (3.10)]
[TABLE]
for and . The vector transport associated with the above is chosen as [22, (3.13)]
[TABLE]
for and . Thus condition (3.2) in Assumption 3.1 is satisfied.
We consider the problem of finding a zero of the vector field defined by (4.5) with varying . The starting points are randomly generated by the MATLAB built-in functions rand, randn, and qr:
[TABLE]
Table 4.3 shows the numerical results for Example 4.3. We observe from Table 4.3 that Algorithm 2.1 requires only a few iterations and function evaluations for finding an approximate zero of the monotone vector field (4.5) with different values of . This indicates that Algorithm 2.1 is very stable and efficient for solving large-scale problems. In Figure 4.3, we give the convergence history of Algorithm 2.1 for two tests with and . Figure 4.3 depicts the logarithm of the residual versus the number of iterations for finding a zero of the tangent vector field defined in (4.5). The convergence trajectory indicates that the residual decreases very rapidly as the number of iterations increases, which shows the local fast convergence speed of Algorithm 2.1 for solving large-scale problems.
5 Hybrid Method
We note that Algorithm 2.1 is globally convergent. We see from the numerical experiments in section 4 that, in general, Algorithm 2.1 converges at a low or medium order of accuracy. To improve the efficiency, one may adopt some hybrid method. A possible strategy is to combine Algorithm 2.1 with the Riemannian Newton method. As noted in section 1, the Riemannian Newton method may be computationally expensive but has quadratic convergence. In particular, one may use Algorithm 2.1 to generate an initial point for the Riemannian Newton method with a relatively low accuracy and then switch to the Riemannian Newton method for finding a solution of high accuracy. A hybrid algorithm for solving (1.1) is described as follows.
Algorithm 5.1
(PRP-Newton Method)**
Step 0.
Choose an initial point , , and , , , , . Let , , . Select a positive sequence such that (2.7) is satisfied.
Step 1.
For , do the RDF-PRP iteration as follows:**
- (a).
Set to be (2.8) where and are given by (2.9).
- (b).
Determine such that if the condition (2.10) is satisfied, then compute from (2.11); else if the condition (2.12) is satisfied, then compute from (2.13).
- (c).
Choose and compute and from (2.14).
- (d).
Stop if .
Step 2.
Set to be the limit point of the RDF-PRP iteration.
Step 3.
For , do the Riemannian Newton iteration as follows:**
- (a).
Apply the conjugate gradient (CG) method [19, Algorithm 10.2.1] to solving
[TABLE]
for such that
[TABLE]
where .
- (b).
Set
[TABLE]
- (c).
Stop if .
We point out that, in Step 3 of Algorithm 5.1, the Riemannian Newton equation is solved inexactly by choosing appropriate value of . In addition, different values of lead to different starting points for the Riemannian Newton method.
For demonstration purpose, we use Algorithm 5.1 to Examples 4.1–4.2, i.e., finding zeros of the tangent vector fields defined by (4.1) and (4.4). To develop the Riemannian Newton method, one need to restrict the tangent vector fields in (4.1) and (4.4) to the Grassmann manifold endowed with the induced Riemannian metric from . The restriction of defined in (4.1) to is given by
[TABLE]
where denotes the equivalent class corresponding to a point . Given and a tangent vector , let denote the horizontal lift of at , where denotes the horizontal space at [43, p.757]. The horizontal lift of J\widehat{F}([X])\big{[}\xi_{[X]}\big{]}\in\mathcal{H}_{X} at is denoted by \overline{J\widehat{F}([X])\big{[}\xi_{[X]}\big{]}}, which has the following form:
[TABLE]
Similarly, the restriction of defined in (4.4) to is given by
[TABLE]
Given a point and a tangent vector , the horizontal lift of J\widehat{F}([X])\big{[}\xi_{[X]}\big{]}\in\mathcal{H}_{X} at is denoted by \overline{J\widehat{F}([X])\big{[}\xi_{[X]}\big{]}}, which is given by
[TABLE]
where
[TABLE]
and
[TABLE]
For the application of Riemannian optimization algorithms on Riemannian quotient manifolds, one can refer to [1, p.86 and p.121] and [43].
Next, we consider the application of Algorithm 5.1 to Examples 4.1–4.2 for different values of and . In our numerical tests, ‘NCG.’ denotes the total number of CG iterations of the Newton step at the final iterate of Algorithm 5.1. In our numerical tests, we set , the parameter pairs are set to be and , respectively, and the other parameters and the starting points are set as in section 4. For simplicity, two different pairs of are tested.
Table 5.1 displays the numerical results for Example 4.1 with different values of and . In Figure 5.1, we give the convergence history of Algorithm 5.1 for two tests of Example 4.1 with and . Figure 5.1 depicts the logarithm of the residual versus the number of iterations for finding a zero of the tangent vector field defined in (4.1). Table 5.2 shows the numerical results for Example 4.2 with different values of and . In Figure 5.2, we give the convergence history of Algorithm 5.1 for for two tests of Example 4.2 with and . Figure 5.2 depicts the logarithm of the residual versus the number of iterations for finding a zero of the tangent vector field defined in (4.4).
We observe from Tables 5.1–5.2 and Figures 5.1–5.2 that, by choosing suitable , Algorithm 2.1 may provide a good initial point for the Riemannian Newton method, which give a high accuracy solution. This shows that the proposed hybrid method is very effective for solving large-scale problems.
6 Conclusions
In this paper, we have proposed a Riemannian Derivative-Free PRP Method for finding a zero of a tangent vector field on a Riemannian manifold. By using a non-monotone line search, the global convergence of the proposed geometric method is established under some mild conditions. To further improve the efficiency, we also provide a hybrid method, which combines the proposed geometric algorithm with the Riemannian Newton method. Numerical tests illustrate the efficiency of the proposed geometric algorithm for large-scale problems. An interesting question is how to choose the stopping tolerance such that the overall computational cost of Algorithm 5.1 is minimized, which needs further study.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] P.-A. Absil, R. Mahony, and R. Sepulchre , Optimization Algorithms on Matrix Manifolds , Princeton University Press, Princeton, 2008.
- 2[2] P.-A. Absil, M. Ishteva, L. Lathauwer, and S. van Huffel , A geometric Newton method for Oja’s vector field , Neural Comput., 21, (2009), pp. 1415–1433.
- 3[3] R. L. Adler, J.-P. Dedieu, J. Y. Margulies, M. Martens, and M. Shub , Newton’s method on Riemannian manifolds and a geometric model for the human spine , IMA J. Numer. Anal., 22 (2002), pp. 359–390.
- 4[4] G. C. Bento and J. X. Cruz Neto , Finite termination of the proximal point method for convex functions on Hadamard manifolds , Optim., 63 (2014), pp. 1281–1288.
- 5[5] Y. F. Cai, Z. G. Jia, and Z. J. Bai , Perturbation analysis of an eigenvector-dependent nonlinear eigenvalue problem wiith applications , ar Xiv:1803.01518, 2018.
- 6[6] H. Chen, X. Dai, X. Gong, L. He, and A. Zhou , Adaptive finite element approximations for Kohn-Sham models , Multiscale Model. Simul., 12 (2014), pp. 1828–1869.
- 7[7] W. Cheng and D. Li , A derivative-free non-monotone line search and its applications to the spectral residual method , IMA J. Numer. Anal., 29 (2009), pp. 814–825.
- 8[8] W. Cheng, Y. Xiao, and Q. J. Hu , A family of derivative-free conjugate gradient methods for large-scale nonlinear systems of equations , J. Comput. Appl. Math., 224 (2009), pp. 11–19.
