A Riemannian Derivative-Free Polak-Ribiere-Polyak Method for Tangent   Vector Field

Teng-Teng Yao; Zhi Zhao; Zheng-Jian Bai; Xiao-Qing Jin

arXiv:1901.04700·math.NA·December 20, 2024

A Riemannian Derivative-Free Polak-Ribiere-Polyak Method for Tangent Vector Field

Teng-Teng Yao, Zhi Zhao, Zheng-Jian Bai, Xiao-Qing Jin

PDF

Open Access

TL;DR

This paper introduces a derivative-free optimization method on Riemannian manifolds for finding zeros of tangent vector fields, combining a Polak-Ribiere-Polyak approach with a hybrid Newton method, supported by convergence analysis and numerical experiments.

Contribution

It proposes a novel Riemannian derivative-free optimization algorithm with a hybrid scheme and convergence guarantees, addressing tangent vector field zero-finding problems.

Findings

01

The method converges globally under mild conditions.

02

Numerical experiments demonstrate improved efficiency.

03

The hybrid approach outperforms standalone methods.

Abstract

This paper is concerned with the problem of finding a zero of a tangent vector field on a Riemannian manifold. We first reformulate the problem as an equivalent Riemannian optimization problem. Then we propose a Riemannian derivative-free Polak-Ribi\'ere-Polyak method for solving the Riemannian optimization problem, where a non-monotone line search is employed. The global convergence of the proposed method is established under some mild assumptions. To further improve the efficiency, we also provide a hybrid method, which combines the proposed geometric method with the Riemannian Newton method. Finally, some numerical experiments are reported to illustrate the efficiency of the proposed method.

Tables5

Table 1. Table 4.1: Numerical results for Example 4.1 .

$p = 30$
$m$	DIM.	CT.	IT.	NF.	Res0.	Res.
1000	29535	0.6938 s	131.7	137.7	1.5558	$1.8068 \times 10^{- 4}$
2000	59535	2.5411 s	147.5	154.1	1.5714	$2.5403 \times 10^{- 4}$
3000	89535	5.1273 s	179.4	185	1.5780	$3.1178 \times 10^{- 4}$
4000	119535	7.8231 s	176.3	185.7	1.5746	$3.5450 \times 10^{- 4}$
5000	149535	14.0162 s	186.9	195.9	1.5710	$3.9744 \times 10^{- 4}$
6000	179535	19.1962 s	188.3	198.1	1.5707	$4.3520 \times 10^{- 4}$
7000	209535	25.5629 s	179.4	189.8	1.5765	$4.6722 \times 10^{- 4}$
8000	239535	33.7059 s	176.3	197.1	1.5816	$5.0002 \times 10^{- 4}$
9000	269535	41.5631 s	186.9	194.1	1.5789	$5.3148 \times 10^{- 4}$
10000	299535	54.4226 s	188.3	202.7	1.5754	$5.5995 \times 10^{- 4}$
$m = 3000$
$p$	DIM.	CT.	IT.	NF.	Res0.	Res.
20	59790	3.9187 s	186.1	193.7	1.2880	$2.5059 \times 10^{- 4}$
40	119180	5.7383 s	169.0	175.2	1.8110	$3.5750 \times 10^{- 4}$
60	178170	9.1466 s	170.9	179.1	2.2136	$4.3705 \times 10^{- 4}$
80	236760	11.8402 s	165.5	176.1	2.5571	$5.0355 \times 10^{- 4}$
100	294950	15.9621 s	155.2	165.6	2.8397	$5.6216 \times 10^{- 4}$
120	352740	17.1358 s	151.5	160.9	3.1074	$6.1428 \times 10^{- 4}$
140	410130	19.2442 s	139.3	148.9	3.3493	$6.6242 \times 10^{- 4}$
160	467120	21.7834 s	134.7	143.9	3.5464	$7.0720 \times 10^{- 4}$
180	523710	24.5269 s	131.8	142.4	3.7413	$7.4701 \times 10^{- 4}$
200	579900	30.2070 s	157.5	149.1	3.9330	$7.8585 \times 10^{- 4}$

Table 2. Table 4.2: Numerical results for Example 4.2 .

$p = 30$
$m$	DIM.	CT.	IT.	NF.	Res0.	Res.
200	5535	0.1362 s	100.6	112.4	$5.0376 \times 10^{1}$	$5.3678 \times 10^{- 4}$
400	11535	0.4595 s	114.6	128.2	$7.4573 \times 10^{1}$	$8.1477 \times 10^{- 4}$
600	17535	1.5889 s	135.5	149.1	$9.2751 \times 10^{1}$	$1.0105 \times 10^{- 3}$
800	23535	1.5889 s	124.8	138.6	$1.0726 \times 10^{2}$	$1.1774 \times 10^{- 3}$
1000	29535	2.6262 s	139.3	156.9	$1.2027 \times 10^{2}$	$1.3185 \times 10^{- 3}$
2000	59535	13.6532 s	219.3	235.5	$1.7193 \times 10^{2}$	$1.9084 \times 10^{- 3}$
3000	89535	36.7627 s	276.1	291.9	$2.1109 \times 10^{2}$	$2.3692 \times 10^{- 3}$
4000	119535	63.2379 s	275.2	292.6	$2.4399 \times 10^{2}$	$2.7274 \times 10^{- 3}$
5000	149535	120.2924 s	307.0	325.6	$2.7326 \times 10^{2}$	$3.0436 \times 10^{- 3}$
$m = 2000$
$p$	DIM.	CT.	IT.	NF.	Res0.	Res.
20	39790	8.6164 s	170.7	186.1	$1.4107 \times 10^{2}$	$1.5714 \times 10^{- 3}$
40	79180	13.0390 s	196.6	212.8	$1.9777 \times 10^{2}$	$2.2190 \times 10^{- 3}$
60	118170	17.9773 s	211.5	228.9	$2.4134 \times 10^{2}$	$2.7032 \times 10^{- 3}$
80	156760	22.1530 s	198.1	215.3	$2.7716 \times 10^{2}$	$3.1042 \times 10^{- 3}$
100	194950	31.0476 s	241.0	258.6	$3.0827 \times 10^{2}$	$3.4166 \times 10^{- 3}$
120	232740	38.5245 s	238.3	258.7	$3.3572 \times 10^{2}$	$3.7598 \times 10^{- 3}$
140	270130	42.8672 s	232.4	249.8	$3.6124 \times 10^{2}$	$4.0373 \times 10^{- 3}$
160	307120	53.2935 s	252.0	270.6	$3.8352 \times 10^{2}$	$4.2983 \times 10^{- 3}$
180	343710	48.4526 s	203.9	223.1	$4.0462 \times 10^{2}$	$4.5413 \times 10^{- 3}$
200	379900	60.0266 s	222.1	240.3	$4.2432 \times 10^{2}$	$4.7507 \times 10^{- 3}$

Table 3. Table 4.3: Numerical results for Example 4.3 .

$m$	DIM.	CT.	IT.	NF.	Res0.	Res.
100	5050	0.0159 s	5.9	7.0	$1.3313 \times 10^{3}$	$2.0499 \times 10^{- 4}$
200	20100	0.0432 s	6.2	7.2	$3.7973 \times 10^{3}$	$3.8884 \times 10^{- 4}$
300	45150	0.1038 s	6.4	7.4	$6.9051 \times 10^{3}$	$3.0280 \times 10^{- 5}$
400	80200	0.2603 s	6.5	7.5	$1.0638 \times 10^{4}$	$6.7415 \times 10^{- 5}$
500	125250	0.4486 s	6.6	7.6	$1.4760 \times 10^{4}$	$9.8202 \times 10^{- 5}$
600	180300	0.6918 s	6.3	7.3	$1.9674 \times 10^{4}$	$6.4741 \times 10^{- 5}$
700	245350	1.0513 s	6.4	7.4	$2.4651 \times 10^{4}$	$1.7830 \times 10^{- 4}$
800	320400	1.5852 s	6.6	7.6	$3.0098 \times 10^{4}$	$2.6627 \times 10^{- 4}$
900	405450	2.1248 s	6.6	7.6	$3.6046 \times 10^{4}$	$3.6220 \times 10^{- 4}$
1000	500500	2.8110 s	6.5	7.5	$4.2361 \times 10^{4}$	$2.6249 \times 10^{- 4}$

Table 4. Table 5.1: Numerical results for Example 4.1 .

$p = 30$
$m$	$(ζ_{1}, ζ_{2})$	PRP-Newton	CT.	IT.	NF.	NCG.	Res0.	Res.
1000	$(10^{- 1}, 10^{- 7})$	PRP Step	0.0970 s	13	20		$1.5637$	$9.2121 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	3.5740 s	12	13	1073	$9.2121 \times 10^{- 2}$	$1.2530 \times 10^{- 8}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	0.4780 s	85	92		$1.5637$	$9.6138 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	1.3960 s	2	3	425	$9.6138 \times 10^{- 4}$	$4.2673 \times 10^{- 10}$
2000	$(10^{- 1}, 10^{- 7})$	PRP Step	0.1720 s	13	18		$1.5877$	$9.8514 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	15.9840 s	16	17	1761	$9.8514 \times 10^{- 2}$	$1.4495 \times 10^{- 11}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	1.1880 s	95	100		$1.5877$	$9.7528 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	6.5630 s	2	3	705	$9.7528 \times 10^{- 4}$	$9.9379 \times 10^{- 11}$
3000	$(10^{- 1}, 10^{- 7})$	PRP Step	0.5150 s	13	22		$1.5635$	$8.6702 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	46.9220 s	22	23	1967	$8.6702 \times 10^{- 2}$	$3.6051 \times 10^{- 8}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	3.3630 s	114	123		$1.5635$	$9.9517 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	14.8350 s	2	3	621	$9.9517 \times 10^{- 4}$	$5.1732 \times 10^{- 13}$
4000	$(10^{- 1}, 10^{- 7})$	PRP Step	0.7770 s	13	20		$1.5762$	$8.6654 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	155.6490 s	38	39	3940	$8.6654 \times 10^{- 2}$	$2.3522 \times 10^{- 8}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	5.3730 s	114	121		$1.5762$	$9.9009 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	33.8150 s	2	3	859	$9.9009 \times 10^{- 4}$	$1.6149 \times 10^{- 8}$
5000	$(10^{- 1}, 10^{- 7})$	PRP Step	1.1880 s	13	18		$1.5813$	$9.4306 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	315.0050 s	49	50	5008	$9.4306 \times 10^{- 2}$	$5.9576 \times 10^{- 8}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	13.7560 s	184	189		$1.5813$	$9.8964 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	136.9180 s	5	6	2154	$9.8964 \times 10^{- 4}$	$5.2168 \times 10^{- 11}$
$m = 3000$
$p$	$(ζ_{1}, ζ_{2})$	PRP-Newton	CT.	IT.	NF.	NCG.	Res0.	Res.
20	$(10^{- 1}, 10^{- 7})$	PRP Step	0.3950 s	12	21		$1.2778$	$8.5232 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	48.8860 s	23	24	2605	$8.5232 \times 10^{- 2}$	$2.9885 \times 10^{- 11}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	2.9630 s	132	141		$1.2778$	$9.8691 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	15.7150 s	4	5	838	$9.8691 \times 10^{- 4}$	$1.2869 \times 10^{- 8}$
40	$(10^{- 1}, 10^{- 7})$	PRP Step	0.5300 s	13	18		$1.8237$	$9.9742 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	104.7220 s	30	31	3771	$9.9742 \times 10^{- 2}$	$3.3097 \times 10^{- 9}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	3.4950 s	98	103		$1.8237$	$9.6445 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	38.1140 s	3	4	1348	$9.6445 \times 10^{- 4}$	$2.0956 \times 10^{- 10}$
60	$(10^{- 1}, 10^{- 7})$	PRP Step	0.9530 s	15	30		$2.2435$	$9.6070 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	63.3580 s	15	16	1778	$9.6070 \times 10^{- 2}$	$6.9081 \times 10^{- 10}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	6.4520 s	121	136		$2.2435$	$9.8044 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	27.8180 s	2	3	726	$9.8044 \times 10^{- 4}$	$4.1246 \times 10^{- 9}$
80	$(10^{- 1}, 10^{- 7})$	PRP Step	1.5960 s	16	27		$2.5407$	$9.9187 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	128.4430 s	22	23	2445	$9.9187 \times 10^{- 2}$	$1.6713 \times 10^{- 9}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	12.2660 s	159	170		$2.5407$	$9.8479 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	56.0430 s	3	4	1062	$9.8479 \times 10^{- 4}$	$1.0519 \times 10^{- 11}$
100	$(10^{- 1}, 10^{- 7})$	PRP Step	2.1910 s	17	30		$2.9047$	$9.6005 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	251.5150 s	25	26	3773	$9.6005 \times 10^{- 2}$	$1.9501 \times 10^{- 11}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	11.0250 s	105	118		$2.9047$	$9.9363 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	76.9210 s	3	4	1148	$9.9363 \times 10^{- 4}$	$4.1883 \times 10^{- 8}$

Table 5. Table 5.2: Numerical results for Example 4.2 .

$p = 30$
$m$	$(ζ_{1}, ζ_{2})$	PRP-Newton	CT.	IT.	NF.	NCG.	Res0.	Res.
1000	$(10^{- 1}, 10^{- 7})$	PRP Step	1.2650 s	81	98		$1.2016 \times 10^{2}$	$9.8709 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	4.6090 s	3	4	629	$9.8709 \times 10^{- 2}$	$4.9052 \times 10^{- 10}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	2.5630 s	161	178		$1.2016 \times 10^{2}$	$9.5649 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	1.5150 s	1	2	189	$9.5649 \times 10^{- 4}$	$2.0833 \times 10^{- 10}$
2000	$(10^{- 1}, 10^{- 7})$	PRP Step	5.9340 s	90	105		$1.7128 \times 10^{2}$	$9.8977 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	29.3860 s	3	4	855	$9.8977 \times 10^{- 2}$	$2.4162 \times 10^{- 9}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	10.9610 s	174	189		$1.7128 \times 10^{2}$	$9.8098 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	9.1750 s	1	2	265	$9.8098 \times 10^{- 4}$	$6.9162 \times 10^{- 10}$
3000	$(10^{- 1}, 10^{- 7})$	PRP Step	13.8460 s	98	111		$2.0983 \times 10^{2}$	$9.9004 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	64.1920 s	3	4	860	$9.9004 \times 10^{- 2}$	$7.2418 \times 10^{- 11}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	26.5980 s	197	210		$2.0983 \times 10^{2}$	$9.4199 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	20.1030 s	1	2	260	$9.4199 \times 10^{- 4}$	$8.3461 \times 10^{- 11}$
4000	$(10^{- 1}, 10^{- 7})$	PRP Step	34.0490 s	138	155		$2.4355 \times 10^{2}$	$9.9053 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	110.148 s	3	4	850	$9.9053 \times 10^{- 2}$	$1.4005 \times 10^{- 9}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	67.5570 s	289	306		$2.4355 \times 10^{2}$	$9.7171 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	34.1720 s	1	2	257	$9.7171 \times 10^{- 4}$	$1.3732 \times 10^{- 10}$
5000	$(10^{- 1}, 10^{- 7})$	PRP Step	37.3010 s	91	116		$2.7389 \times 10^{2}$	$9.8258 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	214.2700 s	3	4	936	$9.8258 \times 10^{- 2}$	$1.1501 \times 10^{- 9}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	75.7800 s	199	224		$2.7389 \times 10^{2}$	$9.6717 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	54.7750 s	1	2	276	$9.6717 \times 10^{- 4}$	$1.9560 \times 10^{- 11}$
$m = 2000$
$p$	$(ζ_{1}, ζ_{2})$	PRP-Newton	CT.	IT.	NF.	NCG.	Res0.	Res.
20	$(10^{- 1}, 10^{- 7})$	PRP Step	5.8470 s	95	121		$1.4046 \times 10^{2}$	$9.5643 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	14.6980 s	2	3	461	$9.5643 \times 10^{- 2}$	$4.7077 \times 10^{- 9}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	14.6830 s	199	225		$1.4046 \times 10^{2}$	$9.3379 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	9.2240 s	1	2	239	$9.3379 \times 10^{- 4}$	$2.2436 \times 10^{- 10}$
40	$(10^{- 1}, 10^{- 7})$	PRP Step	7.9220 s	109	128		$1.9775 \times 10^{2}$	$9.9096 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	65.6650 s	3	4	1799	$9.9096 \times 10^{- 2}$	$5.1574 \times 10^{- 10}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	16.3930 s	235	254		$1.9775 \times 10^{2}$	$9.8092 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	8.9040 s	1	2	240	$9.8092 \times 10^{- 4}$	$1.2536 \times 10^{- 10}$
60	$(10^{- 1}, 10^{- 7})$	PRP Step	7.2970 s	95	114		$2.4225 \times 10^{2}$	$9.7040 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	46.2810 s	4	5	1142	$9.7040 \times 10^{- 2}$	$2.5144 \times 10^{- 10}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	13.6400 s	179	198		$2.4225 \times 10^{2}$	$9.8101 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	9.8130 s	1	2	240	$9.8101 \times 10^{- 4}$	$1.5623 \times 10^{- 10}$
80	$(10^{- 1}, 10^{- 7})$	PRP Step	11.2390 s	94	111		$2.7770 \times 10^{2}$	$9.9043 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	50.2640 s	3	4	847	$9.9043 \times 10^{- 2}$	$6.0549 \times 10^{- 10}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	22.1340 s	193	210		$2.7770 \times 10^{2}$	$9.6451 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	15.2620 s	1	2	252	$9.6451 \times 10^{- 4}$	$1.0738 \times 10^{- 10}$
100	$(10^{- 1}, 10^{- 7})$	PRP Step	15.1400 s	104	121		$3.0711 \times 10^{2}$	$9.7853 \times 10^{- 2}$
	$(10^{- 1}, 10^{- 7})$	Newton Step	148.2740 s	6	7	2104	$9.7853 \times 10^{- 2}$	$2.1292 \times 10^{- 11}$
	$(10^{- 3}, 10^{- 7})$	PRP Step	28.7670 s	212	229		$3.0711 \times 10^{2}$	$9.7389 \times 10^{- 4}$
	$(10^{- 3}, 10^{- 7})$	Newton Step	18.3350 s	1	2	257	$9.7389 \times 10^{- 4}$	$4.2447 \times 10^{- 10}$

Equations168

F (X) = 0_{X},

F (X) = 0_{X},

J F (X_{k}) [Δ X_{k}] = - F (X_{k})

J F (X_{k}) [Δ X_{k}] = - F (X_{k})

X_{k + 1} = R_{X_{k}} (Δ X_{k}),

X_{k + 1} = R_{X_{k}} (Δ X_{k}),

J F (X) [ξ_{X}] := \nabla_{ξ_{X}} F, \forall ξ_{X} \in T_{X} M .

J F (X) [ξ_{X}] := \nabla_{ξ_{X}} F, \forall ξ_{X} \in T_{X} M .

⟨ ξ_{X}, (J F (X))^{*} [η_{X}]⟩ = ⟨ J F (X) [ξ_{X}], η_{X} ⟩, \forall ξ_{X}, η_{X} \in T_{X} M .

⟨ ξ_{X}, (J F (X))^{*} [η_{X}]⟩ = ⟨ J F (X) [ξ_{X}], η_{X} ⟩, \forall ξ_{X}, η_{X} \in T_{X} M .

\begin{array}[]{cc}\min&\displaystyle g(Z)\\[5.69054pt] \mbox{subject to (s.t.)}&Z\in\mathcal{M},\end{array}

\begin{array}[]{cc}\min&\displaystyle g(Z)\\[5.69054pt] \mbox{subject to (s.t.)}&Z\in\mathcal{M},\end{array}

Z_{k + 1} = R_{Z_{k}} (α_{k} Δ Z_{k}),

Z_{k + 1} = R_{Z_{k}} (α_{k} Δ Z_{k}),

\Delta Z_{k}=\left\{\begin{array}[]{ll}-{\rm grad\,}g(Z_{k}),&\mbox{if $k=0$},\\ -{\rm grad\,}g(Z_{k})+\beta_{k}\mathcal{T}_{\alpha_{k-1}\Delta Z_{k-1}}\Delta Z_{k-1},&\mbox{if $k\geq 1$},\end{array}\right.

\Delta Z_{k}=\left\{\begin{array}[]{ll}-{\rm grad\,}g(Z_{k}),&\mbox{if $k=0$},\\ -{\rm grad\,}g(Z_{k})+\beta_{k}\mathcal{T}_{\alpha_{k-1}\Delta Z_{k-1}}\Delta Z_{k-1},&\mbox{if $k\geq 1$},\end{array}\right.

β_{k} = \frac{⟨ grad g ( Z _{k} ) , grad g ( Z _{k} ) - T _{α_{k - 1} Δ Z_{k - 1}} grad g ( Z _{k - 1} )⟩}{∥ grad g ( Z _{k - 1} ) ∥ ^{2}} .

β_{k} = \frac{⟨ grad g ( Z _{k} ) , grad g ( Z _{k} ) - T _{α_{k - 1} Δ Z_{k - 1}} grad g ( Z _{k - 1} )⟩}{∥ grad g ( Z _{k - 1} ) ∥ ^{2}} .

\begin{array}[]{cc}\min&\displaystyle f(X):=\frac{1}{2}\|F(X)\|^{2}\\[5.69054pt] \mbox{s.t.}&X\in\mathcal{M}.\end{array}

\begin{array}[]{cc}\min&\displaystyle f(X):=\frac{1}{2}\|F(X)\|^{2}\\[5.69054pt] \mbox{s.t.}&X\in\mathcal{M}.\end{array}

D f (X) [ξ_{X}]

D f (X) [ξ_{X}]

grad f (X) = (J F (X))^{*} [F (X)] .

grad f (X) = (J F (X))^{*} [F (X)] .

k = 0 \sum \infty δ_{k} = δ < \infty.

k = 0 \sum \infty δ_{k} = δ < \infty.

\Delta X_{k}:=\left\{\begin{array}[]{ll}-F(X_{0})&\mbox{if $k=0$},\\[5.69054pt] -F(X_{k})+\beta_{k}\mathcal{T}_{\Delta Z_{k-1}}\Delta X_{k-1},&\mbox{if $k\geq 1$},\end{array}\right.

\Delta X_{k}:=\left\{\begin{array}[]{ll}-F(X_{0})&\mbox{if $k=0$},\\[5.69054pt] -F(X_{k})+\beta_{k}\mathcal{T}_{\Delta Z_{k-1}}\Delta X_{k-1},&\mbox{if $k\geq 1$},\end{array}\right.

β_{k} := \frac{⟨ F ( X _{k} ) , Y _{k - 1} ⟩}{∥ F ( X _{k - 1} ) ∥ ^{2}}, Y_{k - 1} := F (X_{k}) - T_{Δ Z_{k - 1}} F (X_{k - 1}) .

β_{k} := \frac{⟨ F ( X _{k} ) , Y _{k - 1} ⟩}{∥ F ( X _{k - 1} ) ∥ ^{2}}, Y_{k - 1} := F (X_{k}) - T_{Δ Z_{k - 1}} F (X_{k - 1}) .

f (R_{X_{k}} (α_{k} Δ X_{k})) \leq Γ_{k} + δ_{k} - t_{1} α_{k}^{2} ∥Δ X_{k} ∥^{2} - t_{2} α_{k}^{2} f (X_{k}),

f (R_{X_{k}} (α_{k} Δ X_{k})) \leq Γ_{k} + δ_{k} - t_{1} α_{k}^{2} ∥Δ X_{k} ∥^{2} - t_{2} α_{k}^{2} f (X_{k}),

Δ Z_{k} := α_{k} Δ X_{k}, X_{k + 1} := R_{X_{k}} (Δ Z_{k}) .

Δ Z_{k} := α_{k} Δ X_{k}, X_{k + 1} := R_{X_{k}} (Δ Z_{k}) .

f (R_{X_{k}} (- α_{k} Δ X_{k})) \leq Γ_{k} + δ_{k} - t_{1} α_{k}^{2} ∥Δ X_{k} ∥^{2} - t_{2} α_{k}^{2} f (X_{k}),

f (R_{X_{k}} (- α_{k} Δ X_{k})) \leq Γ_{k} + δ_{k} - t_{1} α_{k}^{2} ∥Δ X_{k} ∥^{2} - t_{2} α_{k}^{2} f (X_{k}),

Δ Z_{k} := - α_{k} Δ X_{k}, X_{k + 1} := R_{X_{k}} (Δ Z_{k}) .

Δ Z_{k} := - α_{k} Δ X_{k}, X_{k + 1} := R_{X_{k}} (Δ Z_{k}) .

Φ_{k + 1} = λ_{k} Φ_{k} + 1, Γ_{k + 1} = \frac{λ _{k} Φ _{k} ( Γ _{k} + η _{k} ) + f ( X _{k + 1} )}{Φ _{k + 1}} .

Φ_{k + 1} = λ_{k} Φ_{k} + 1, Γ_{k + 1} = \frac{λ _{k} Φ _{k} ( Γ _{k} + η _{k} ) + f ( X _{k + 1} )}{Φ _{k + 1}} .

Λ_{k} := \frac{\sum _{j = 1}^{k} ( f ( X _{k} ) + j δ _{j - 1} )}{k + 1} \mbox an d δ_{- 1} = 0.

Λ_{k} := \frac{\sum _{j = 1}^{k} ( f ( X _{k} ) + j δ _{j - 1} )}{k + 1} \mbox an d δ_{- 1} = 0.

f (X_{k}) \leq Γ_{k} \leq Λ_{k}, Γ_{k + 1} \leq Γ_{k} + δ_{k} .

f (X_{k}) \leq Γ_{k} \leq Λ_{k}, Γ_{k + 1} \leq Γ_{k} + δ_{k} .

\hat{f} (ξ) := f (R (ξ)), \forall ξ \in T M .

\hat{f} (ξ) := f (R (ξ)), \forall ξ \in T M .

\hat{f}_{X} (ξ_{X}) := f (R_{X} (ξ_{X})), \forall ξ_{X} \in T_{X} M .

\hat{f}_{X} (ξ_{X}) := f (R_{X} (ξ_{X})), \forall ξ_{X} \in T_{X} M .

∥ F (R_{X} (ξ_{X})) - T_{ξ_{X}} F (X) ∥ \leq L \cdot dist (X, R_{X} (ξ_{X})),

∥ F (R_{X} (ξ_{X})) - T_{ξ_{X}} F (X) ∥ \leq L \cdot dist (X, R_{X} (ξ_{X})),

∥ T_{η_{X}} ξ_{X} ∥ \leq C \cdot ∥ ξ_{X} ∥,

∥ T_{η_{X}} ξ_{X} ∥ \leq C \cdot ∥ ξ_{X} ∥,

∥ F (X) ∥ \leq τ_{1}, \forall X \in Ω.

∥ F (X) ∥ \leq τ_{1}, \forall X \in Ω.

\nu\|\xi_{X}\|\geq{\rm dist}\big{(}X,R_{X}(\xi_{X})\big{)},

\nu\|\xi_{X}\|\geq{\rm dist}\big{(}X,R_{X}(\xi_{X})\big{)},

k \to \infty lim α_{k} ∥Δ X_{k} ∥ = 0 an d k \to \infty lim α_{k}^{2} f (X_{k}) = 0.

k \to \infty lim α_{k} ∥Δ X_{k} ∥ = 0 an d k \to \infty lim α_{k}^{2} f (X_{k}) = 0.

∥ F (X_{k}) ∥ \geq τ, \forall k \geq 0,

∥ F (X_{k}) ∥ \geq τ, \forall k \geq 0,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIterative Methods for Nonlinear Equations · Advanced Optimization Algorithms Research · Statistical and numerical algorithms

Full text

A Riemannian Derivative-Free Polak-Ribiére-Polyak Method for Tangent Vector Field

Teng-Teng Yao Department of Mathematics, School of Sciences, Zhejiang University of Science and Technology, Hangzhou 310023, People’s Republic of China ([email protected]). The research of this author is supported by the National Natural Science Foundation of China (No. 11701514).

Zhi Zhao Department of Mathematics, School of Sciences, Hangzhou Dianzi University, Hangzhou 310018, People’s Republic of China ([email protected]). The research of this author is supported by the National Natural Science Foundation of China (No. 11601112).

Zheng-Jian Bai Corresponding author. School of Mathematical Sciences and Fujian Provincial Key Laboratory on Mathematical Modeling & High Performance Scientific Computing, Xiamen University, Xiamen 361005, People’s Republic of China ([email protected]). The research of this author is partially supported by the National Natural Science Foundation of China (No. 11671337), the Natural Science Foundation of Fujian Province of China (No. 2016J01035), and the Fundamental Research Funds for the Central Universities (No. 20720180008).

Xiao-Qing Jin Department of Mathematics, University of Macau, Macao, People’s Republic of China ([email protected]). The research of this author is supported by the research grant MYRG2016-00077-FST from University of Macau.

Abstract

This paper is concerned with the problem of finding a zero of a tangent vector field on a Riemannian manifold. We first reformulate the problem as an equivalent Riemannian optimization problem. Then we propose a Riemannian derivative-free Polak-Ribiére-Polyak method for solving the Riemannian optimization problem, where a non-monotone line search is employed. The global convergence of the proposed method is established under some mild assumptions. To further improve the efficiency, we also provide a hybrid method, which combines the proposed geometric method with the Riemannian Newton method. Finally, some numerical experiments are reported to illustrate the efficiency of the proposed method.

Keywords. Tangent vector field, Riemannian manifold, Polak-Ribiére-Polyak method, non-monotone line search.

2010 AMS subject classifications. 65K05, 90C30, 90C56.

1 Introduction

Let $\mathcal{M}$ be a finite-dimensional Riemannian manifold and let $\langle\cdot,\cdot\rangle$ be the Riemannian metric on $\mathcal{M}$ with its induced norm $\|\cdot\|$ . Let $\nabla$ denote the Riemannian connection on $\mathcal{M}$ induced by the Riemannian metric $\langle\cdot,\cdot\rangle$ . Let $T_{X}\mathcal{M}$ be the tangent space of $\mathcal{M}$ at a point $X\in\mathcal{M}$ and $T\mathcal{M}:=\cup_{X\in\mathcal{M}}T_{X}\mathcal{M}$ be the tangent bundle of $\mathcal{M}$ . In this paper, we aim to find a zero of a continuously differentiable tangent vector field $F:\mathcal{M}\rightarrow T\mathcal{M}$ , i.e., find $X\in\mathcal{M}$ such that

[TABLE]

where $0_{X}$ is the zero tangent vector of $T_{X}\mathcal{M}$ .

Such smooth tangent vector fields arise in many applications such as geodesic convex optimizations on Riemannian manifolds where the gradients of the convex objective functions are geodesic monotone vector fields [11, 27], statistical principal component analysis where the Oja’s flow leads to the Oja’s vector field [29, 30], the discretized Kohn-Sham (KS) total energy minimization in electronic structure calculations [6, 26, 32], and the trace ratio optimization in the linear discriminant analysis (LDA) for dimension reduction [28, 41, 42] where the corresponding eigenvector-dependent nonlinear eigenvalue problems are smooth tangent vector fields, etc.

In particular, for multivalued monotone tangent vector fields on Hadamard manifolds, several proximal point algorithms have been proposed in [17, 23, 34, 35, 36], where the convergence analysis is investigated under some different assumptions. However, these proximal point algorithms are mainly restricted to finding zeros of monotone tangent vector fields.

For smooth tangent vector fields on general Riemannian manifolds, Riemannian Newton method was widely studied (see for instance [1, 3, 14, 24]). In [1, Section 6.1], Absil et al. presented a geometric Newton method for solving (1.1): Given current $X_{k}\in\mathcal{M}$ , solve the Riemannian Newton equation

[TABLE]

for $\Delta X_{k}\in T_{X_{k}}\mathcal{M}$ and set

[TABLE]

where $R$ is a retraction defined on $\mathcal{M}$ [1, Definition 4.1.1] and for $X\in\mathcal{M}$ , $R_{X}$ is the restriction of $R$ to $T_{X}\mathcal{M}$ . Here, $JF(X)$ denotes the Jacobian of $F$ at a point $X\in\mathcal{M}$ , which is a linear operator from $T_{X}\mathcal{M}$ to $T_{X}\mathcal{M}$ defined by [1, p.111]

[TABLE]

With respect to the Riemannian metric $\langle\cdot,\cdot\rangle$ , the adjoint $(JF(X))^{*}:T_{X}\mathcal{M}\to T_{X}\mathcal{M}$ of $JF(X)$ is defined by

[TABLE]

The quadratic convergence of the Riemannian Newton method was established under the nonsingularity assumption of the Jacobian of $F$ at a solution point [1, Theorem 6.3.2]. In [2], Absil et al. also proposed a geometric Newton method for finding a zero of Oja’s vector field.

The advatange of a geometric Newton method lies in its quadratic convergence. However, it is often computationally costly to solve the Riemannian Newton equation, especially when the Jacobian is ill-conditioned. In the case of large-scale problems, the Jacobians of some tangent vector fields (e.g., monotone tangent vector fields on Hadamard manifolds) may not be easily available. Finally, the convergence of the Riemannian Newton method also depends on the starting point. Therefore, it is indispensable to find an efficient globally convergent Jacobian-free method for solving (1.1), especially for large-scale problems.

In recent years, some derivative-free optimization methods have been proposed for solving nonlinear systems of equations of the form of $G({\bf x})={\bf 0}$ defined on Euclidean spaces [8, 9, 10, 16, 25, 39, 40], where $G:{\mathbb{R}}^{n}\to{\mathbb{R}}^{n}$ is a continuously differentiable mapping. These methods use $\pm G({\bf x}_{k})$ at the current iterate ${\bf x}_{k}$ as a search direction and their global convergence are guaranteed by using some non-monotone line search techniques. These methods need not to form the Jacabian matrices and require a small storage space and thus are applicable to solving large-scale nonlinear systems of equations. Sparked by this, in this paper, we propose a Riemannian Derivative-Free Polak-Ribiére-Polyak (PRP) conjugate gradient method for solving (1.1). The global convergence is established under some assumptions. We apply the proposed method to finding zeros of Oja’s vector fields, the tangent vector field corresponding to the trace ratio optimization problem, and monotone tangent vector fields on Hadamard manifolds accordingly. Finally, we combine the proposed method with the Riemannian Newton method to get a solution of high accuracy.

The remaining part of this paper is organized as follows. In section 2, we present a Riemannian derivative-free PRP conjugate gradient method for solving (1.1). In section 3, we give the global convergence of the proposed method under some basic assumptions. In section 4, the proposed method is used to find zeros of tangent vector fields for some practical applications. In section 5, we present a hybrid method. Finally some concluding remarks are given in section 6.

2 A Riemannian Derivative-free Polak-Ribiére-Polyak Method

We first recall the Riemannian nonlinear conjugate gradient method for solving the following optimization problem

[TABLE]

where $g:\mathcal{M}\to{\mathbb{R}}$ is a continuously differentiable function. A nonlinear conjugate gradient method aims to update the current iterate $Z_{k}\in\mathcal{M}$ by

[TABLE]

where the step length $\alpha_{k}$ is determined by a line search. The search direction $\Delta Z_{k}\in T_{Z_{k}}\mathcal{M}$ is given by

[TABLE]

where $\beta_{k}$ is a scalar, ${\rm grad\,}g(Z_{k})$ is the Riemannian gradient of $g$ at the point $Z_{k}$ , and $\mathcal{T}$ is a vector transport associated with the retraction $R$ [1, Definition 8.1.1]. In particular, for the Riemannian PRP method in [1, p.182], the parameter $\beta_{k}$ is given by

[TABLE]

For more Riemannian nonlinear conjugate gradient methods, one may refer to [1, 15, 31, 33, 38, 44, 45]. However, for any Riemannian nonlinear conjugate gradient method for solving problem (2.1), the Riemannian gradient of $g$ is needed.

To solve (1.1), it is natural to consider the following minimization problem

[TABLE]

Since $F:\mathcal{M}\to T\mathcal{M}$ is continuously differentiable, the function $f:\mathcal{M}\to{\mathbb{R}}$ is also continuously differentiable. By the definition of Riemannian gradient and using the compatibility of Riemannian connection $\nabla$ with the Riemannian metric $\langle\cdot,\cdot\rangle$ , we have

[TABLE]

for all $\xi_{X}\in T_{X}\mathcal{M}$ . Thus,

[TABLE]

In order to apply the Riemannian PPR method determined by (2.2), (2.3), and (2.4) for solving problem (2.5), we need the Riemannian gradient of $f$ . By using (2.6), to calculate the Riemannian gradient of $f$ at the current iterate $X_{k}$ , we need to compute the adjoint of the Jacobian of $F$ at $X_{k}$ . If the Jacobian of $F$ is not available or numerically expensive to calculate, then it is unsuitable to directly apply a Riemannian nonlinear conjugate gradient method to problem (2.5).

In the following, we propose a derivative-free PRP method for solving (1.1). This is motivated by the derivative-free PRP method for solving a large-scale nonlinear system of equations of the form $G({\bf x})={\bf 0}$ with $G:{\mathbb{R}}^{n}\to{\mathbb{R}}^{n}$ being continuously differentiable [25], where the search direction uses $G({\bf x}_{k})$ and $G({\bf x}_{k-1})$ at the current iterate ${\bf x}_{k}$ and the previous iterate ${\bf x}_{k-1}$ and a non-monotone line search is used. In particular, we use the PRP method defined by (2.2), (2.3), and (2.4) to problem (2.5), where the Riemannian gradients of $f$ at the current iterate $X_{k}$ is replaced by the values of the tangent vector field $F$ at $X_{k}$ , and a Riemannian nonmonotnoe line search is employed. We now describe a Riemannian derivative-free PRP algorithm for solving (1.1) as follows.

Algorithm 2.1

(A Riemannian derivative-free PRP method (RDF-PRP))**

Step 0.

Choose an initial point $X_{0}\in\mathcal{M}$ , $\bar{\epsilon}>0$ , $t_{1},t_{2}>0$ , $0<\rho<1$ , $0<\lambda_{\min}<\lambda_{\max}<1$ , $0<\alpha_{\min}\leq\alpha\leq\alpha_{\max}$ . Let $k:=0$ , $\Gamma_{0}:=f(X_{0})$ , $\Phi_{0}:=1$ . Select a positive sequence $\{\delta_{k}\}$ such that

[TABLE]

Step 1.

If $\|F(X_{k})\|\leq\bar{\epsilon}$ , then stop. Otherwise, go to Step 2.

Step 2.

Set

[TABLE]

where

[TABLE]

Step 3.

*Determine $\alpha_{k}=\max\{\alpha\rho^{j},j=0,1,2,\ldots\}$ such that *

if

[TABLE]

then set

[TABLE]

Else if

[TABLE]

then set

[TABLE]

Step 4.

Choose $\lambda_{k}\in[\lambda_{\min},\lambda_{\max}]$ and compute

[TABLE]

Step 5.

Replace $k$ by $k+1$ and go to Step 1.

We point out that the non-monotone line search in Step 3 of Algorithm 2.1 can be seen as a generalization of that in [7, 25]. Let

[TABLE]

By following the similar proof of [7, Lemma 2.2], for any choice of $\lambda_{k}\in[0,1]$ , we have for all $k\geq 0$ that

[TABLE]

Then condition (2.10) or (2.12) holds for some $\alpha_{k}$ . This shows that the line search step in Algorithm 2.1 is well-defined.

3 Convergence Analysis

In this section, we establish the global convergence of Algorithm 2.1. To facilitate the analysis, we define the pullback $\widehat{f}:T\mathcal{M}\to{\mathbb{R}}$ of $f:\mathcal{M}\to{\mathbb{R}}$ through $R$ by [1, p.55]

[TABLE]

For $X\in\mathcal{M}$ , let $\widehat{f}_{X}$ denote the restriction of $\widehat{f}$ to $T_{X}\mathcal{M}$ , i.e.,

[TABLE]

We also need the following assumptions.

Assumption 3.1

The level set $\Omega:=\{X\in\mathcal{M}\;|\;f(X)\leq f(X_{0})+\delta\}$ is bounded, where $\delta$ is a constant defined by (2.7). 2. 2.

In some neighborhood $V$ of $\Omega$ , $F$ is continuously differentiable and is Lipschitz continuous with respect to the vector transport $\mathcal{T}$ , i.e., there is a constant $L>0$ such that

[TABLE]

for all $X\in V$ and $\xi_{X}\in T_{X}\mathcal{M}$ with $R_{X}(\xi_{X})\in V$ . 3. 3.

The vector transport $\mathcal{T}$ is bounded, i.e., there exists a constant $C>0$ such that

[TABLE]

for all $X\in\mathcal{M}$ and $\xi_{X},\eta_{X}\in T_{X}\mathcal{M}$ .

Under Assumption 3.1, the tangent vector field $F$ is bounded on $\Omega$ , i.e., there exists a constant $\tau_{1}>0$ such that

[TABLE]

By using the continuity of $f$ and Assumption 3.1, it is easy to see that the level set $\Omega$ is closed and bounded and thus $\Omega$ is a compact subset of $\mathcal{M}$ . According to Corollary 7.4.6 in [1], there exist two scalars $\nu>0$ and $\mu>0$ such that

[TABLE]

for $X\in\Omega$ and $\xi_{X}\in T_{X}\mathcal{M}$ with $\|\xi_{X}\|\leq\mu$ . If the vector tansport $\mathcal{T}$ is chosen as the parallel translation, then the inequality in (3.2) holds as an equality with $C=1$ . Specially, if $\mathcal{M}$ is an embedded Riemannian submanifold of a Euclidean space and $\mathcal{T}$ is defined through orthogonal projection by the formula (8.10) in [1, p.174], then the inequality in (3.2) holds with $C=1$ .

To establish the global convergence Algorithm 2.1, we need the following preliminary lemma whose proof is similar to that of Lemma 3.2 and Lemma 3.3 in [25], and thus we omit it here.

Lemma 3.2

Suppose Assumption 3.1 is satisfied. Then the sequence $\{X_{k}\}$ generated by Algorithm 2.1 is contained in $\Omega$ . In addition, we have

[TABLE]

For the search directions $\{\Delta X_{k}\}$ generated by Algorithm 2.1, we have the following result. The proof can be can seen as a generalization of [25, Lemma 3.4].

Lemma 3.3

Suppose Assumption 3.1 is satisfied and Algorithm 2.1 generates infinite sequences $\{X_{k}\}$ and $\{\Delta X_{k}\}$ . If the sequence $\{\|F(X_{k})\|\}$ is bounded below by a constant $\tau>0$ , i.e.,

[TABLE]

then there exists a constant $T>0$ such that

[TABLE]

and

[TABLE]

Proof: We first prove (3.6). From (2.9), (2.11), (2.13), (3.1), and (3.4) it follows that for all $k$ sufficiently large,

[TABLE]

It follows from (2.8), (2.9), (3.2), (3.3), and (3.8) that for all $k$ sufficiently large,

[TABLE]

By Lemma 3.2, for any constant $\pi\in(0,1)$ , there exists an index $k_{0}>0$ such that

[TABLE]

This, together with (3.9), yields for all $k>k_{0}$ ,

[TABLE]

Hence, (3.6) holds by setting $T:=\max\{\|\Delta X_{1}\|,\|\Delta X_{2}\|,\ldots,\|\Delta X_{k_{0}}\|,\frac{\tau_{1}}{1-\pi}+\|\Delta X_{k_{0}}\|\}$ .

Next, we prove (3.7). By using Lemma 3.2, (3.1), (3.5), (3.6), and (3.8) we have for all $k$ sufficiently large that

[TABLE]

This, together with Lemma 3.2, yields (3.7).

On the global convergence of Algorithm 2.1, we have the following theorem. The proof is a generalization of Theorem 3.5 in [25] and Theorem 1 in [9].

Theorem 3.4

Suppose Assumption 3.1 is satisfied and Algorithm 2.1 generates an infinite sequence $\{X_{k}\}$ . Then we have

[TABLE]

or for any accumulation point $X_{*}$ of $\{X_{k}\}$

[TABLE]

Proof: Let $X_{*}$ be any accumulation point of the sequence $\{X_{k}\}$ . One may assume that $\lim\limits_{k\to\infty}X_{k}=X_{*}$ , taking a subsequence if necessary. By Lemma 3.2 we have

[TABLE]

If $\liminf\limits_{k\to\infty}\alpha_{k}>0$ , then it follows from (3.11) that

[TABLE]

In this case, $\|F(X_{*})\|=0$ since $F$ is continuous and $\lim\limits_{k\to\infty}X_{k}=X_{*}$ .

In the following, we assume that $\liminf\limits_{k\to\infty}\alpha_{k}=0$ and $\liminf\limits_{k\to\infty}\|F(X_{k})\|>0$ . From Step 3 of Algorithm 2.1, taking a subsequence if necessary, we may assume that $\rho^{-1}\alpha_{k}$ satisfies neither (2.10) nor (2.12) for $k$ large enough and thus

[TABLE]

and

[TABLE]

From (2.15) we have $\Gamma_{k}\geq f(X_{k})\geq 0$ . This, together with (3.12), yields

[TABLE]

From Assumption 3.1, it follows that

[TABLE]

By hypothesis, the sequence $\{\|F(X_{k})\|\}$ is bounded from below. Thus, the condition (3.5) in Lemma 3.3 is satisfied. By using Lemma 3.3 and (3.3) we have

[TABLE]

where $\Upsilon=t_{1}\rho^{-2}T^{2}+t_{2}\rho^{-2}(f(X_{0})+\delta)$ . Hence,

[TABLE]

Let $\gamma(t):=R_{X_{k}}(t\rho^{-1}\alpha_{k}\Delta X_{k})$ for all $t\in[0,1]$ . It follows from (2.8) that for $t\in(0.1)$ ,

[TABLE]

By the mean value theorem and using (3.15), there exists a $\theta\in(0,1)$ such that

[TABLE]

This, together with (3.14), yields

[TABLE]

By Lemma 3.2 and using the smoothness and local rigidity condition of retraction [1, (4.2)] we have

[TABLE]

where ${\rm id}_{T_{X_{*}}\mathcal{M}}$ denotes the identity operator on $T_{X_{*}}\mathcal{M}$ . From Lemma 3.2, Lemma 3.3, (2.11), (2.13), and using the smoothness and consistency condition of vector transport [1, Definition 8.1.1], we obtain

[TABLE]

By using Lemma 3.2, Lemma 3.3, (3.17), (3.18), and taking limits in (3.16), we have

[TABLE]

Similarly, we can deduce from (3.13) that

[TABLE]

The equality (3.10) follows from the last two inequalities.

From Theorem 3.4, we have the following corollary.

Corollary 3.5

Suppose Assumption 3.1 is satisfied and Algorithm 2.1 generates an infinite sequence $\{X_{k}\}$ . Let $X_{*}$ be an accumulation point of $\{X_{k}\}$ . If

[TABLE]

then $F(X_{*})=0_{X_{*}}$ .

Suppose $F:\mathcal{M}\to T\mathcal{M}$ is a strongly geodesic monotone vector field [12, 23, 27, 37] and is continuously differentiable, then there exists a positive constant $\lambda>0$ such that

[TABLE]

By Corollary 3.5 and (3.19), we have the following result.

Corollary 3.6

Suppose $F$ or $-F$ is strictly monotone and continuously differentiable, and Algorithm 2.1 generates an infinite sequence $\{X_{k}\}$ . Then every accumulation point of $\{X_{k}\}$ is a zero of $F$ .

4 Numerical Experiments

In this section, we consider the application of Algorithm 2.1 to finding zeros of Oja’s vector fields [2], the tangent vector field corresponding to the trace ratio optimization problem [28, 41, 42], and monotone tangent vector fields on Hadamard manifolds [13]. All numerical tests are carried out using MATLAB R2010a on a Lenovo Laptop Intel(R) Core(TM)2 i7-8550U with a 1.80 GHz CPU and 16-GB RAM.

In our numerical tests, we set $\rho=0.5$ , $\lambda_{k}=0.6$ , $t_{1}=t_{2}=10^{-10}$ , $\alpha_{\min}=10^{-10}$ , $\alpha_{\max}=10^{10}$ , and $\delta_{k}=\|F(X_{0})\|/((2+k)\ln^{2}(2+k))$ for all $k$ . In Step 3 of Algorithm 2.1, the initial steplength $\alpha_{k_{0}}$ is set to be

[TABLE]

where

[TABLE]

The stopping criterion for Algorithm 2.1 for solving (1.1) is set to be [9, 25]

[TABLE]

where $e_{a}=10^{-6}$ , $e_{r}=10^{-5}$ , and $M$ denotes the dimension of $\mathcal{M}$ .

For comparison purposes, we repeat our experiments over $10$ different random generated problems. In our numerical tests, ‘DIM.’ denotes the dimension of $\mathcal{M}$ , ‘CT.’, IT.’, and ‘NF.’ mean the averaged total computing time in seconds, the averaged number of iterations, the averaged number of function evaluations at the final iterates of our algorithm accordingly. In addition, ‘Res0.’ and ‘Res.’ denote the averaged residual $\|F(X_{k})\|$ at the initial iterates and final iterates of our algorithm, respectively.

Example 4.1

We consider the problem of finding a zero of Oja’s vector field defined by real symmetric positive-definite matrices [2]. Let $A\in{\mathbb{R}}^{m\times m}$ be a symmetric positive-definite matrix, and $p$ be a positive integer smaller than $m$ . The Oja’s vector field $F:{\mathbb{R}}^{m\times p}\to{\mathbb{R}}^{m\times p}$ associated with $A$ is given by [2, 29, 30]

[TABLE]

Suppose $X\in{\mathbb{R}}^{m\times p}$ is of full column rank. Then $X$ is a solution to $F(X)=\mathbf{0}$ if and only if the column space of $X$ is an invariant subspace of $A$ and $X$ is orthonormal (i.e., $X^{T}X=I_{p}$ ) (see [2, Proposition 2.1]), where $I_{p}$ is the identity matrix of order $p$ . Thus we can restrict the nonlinear map $F$ to the compact Stiefel manifold ${\rm St}(p,m)$ [1, p.26], i.e., $F:{\rm St}(p,m)\to T{\rm St}(p,m)$ . The dimension of the Stiefel manifold ${\rm St}(p,m)$ is equal to $mp-\frac{1}{2}p(p+1)$ [1, p.27]. Let $\mathcal{O}(p)={\rm St}(p,p)$ , which is the orthogonal group [1, p.27]. Since $F(XQ)=F(X)Q$ for any $Q\in\mathcal{O}(p)$ , the zeros of $F$ are degenerate, thus Newton’s method can’t be applied directly. To apply Riemannian Newton’s method, one need to restrict $F$ to the Grassmann manifold ${\rm Grass}(p,m):={\rm St}(p,m)/\mathcal{O}(p)$ , while the application of Algorithm 2.1 to finding a zero of (4.1) does not need the nondegeneracy condition of the zeros of $F$ . Let ${\rm St}(p,m)$ be endowed with induced Riemannian metric from ${\mathbb{R}}^{m\times p}$ , i.e.,

[TABLE]

The retraction $R$ on ${\rm St}(k,n)$ is chosen as [1, p.59]

[TABLE]

for all $\xi_{X}\in T_{X}{\rm St}(p,m)$ and $X\in{\rm St}(p,m)$ , where ${\rm qf}(X+\xi_{X})$ is the $Q$ factor of the QR decomposition of $X+\xi_{X}\in{\mathbb{R}}^{m\times p}_{*}$ with $X+\xi_{X}=Q\widetilde{R}$ . Here, the set ${\mathbb{R}}^{m\times p}_{*}$ denotes the set of all real $m\times p$ matrices with linearly independent columns, $Q\in{\rm St}(p,m)$ , and $\widetilde{R}$ is an upper triangular $p\times p$ matrix with strictly positive diagonal elements. The orthogonal projection of a matrix $Z\in{\mathbb{R}}^{m\times p}$ onto $T_{X}{\rm St}(p,m)$ is given by

[TABLE]

where ${\rm skew}(A):=(A-A^{T})/2$ and ${\rm sym}(A):=(A+A^{T})/2$ for a real square matrix. Since ${\rm St}(p,m)$ is an embeded submanifold of ${\mathbb{R}}^{m\times p}$ , we may adopt the vector transport defined by [1, p.174]

[TABLE]

for $\xi_{X},\eta_{X}\in T_{X}{\rm St}(p,m)$ , where $Y:=R_{X}(\eta_{X})\in{\rm St}(p,m)$ . Thus condition (3.2) in Assumption 3.1 is satisfied.

We consider the problem of finding a zero of the Oja’s vector field $F:{\rm St}(p,m)\to T{\rm St}(p,m)$ defined by (4.1) with varying $m$ and $p$ . Let $A$ be a random $m\times m$ matrix generated by the MATLAB built-in functions rand, randn, and qr:

[TABLE]

Thus $A$ is a random symmetric positive-definite matrix with uniformly distributed eigenvalues in the interval $[0,1]$ . The starting points are randomly generated by the MATLAB built-in functions randn and qr:

[TABLE]

Table 4.1 lists the numerical results for Example 4.1. We observe from Table 4.1 that the iteration number and the number of function evaluations do not change obviously with the increase of the dimension of the Stiefel manifold ${\rm St}(p,m)$ . This indicates that Algorithm 2.1 is stable and suitable for solving large-scale problems.

To further illustrate the effectiveness of our algorithm, in Figure 4.1, we give the convergence history of Algorithm 2.1 for two tests with $(m,p)=(6000,30)$ and $(m,p)=(3000,120)$ . Figure 4.1 depicts the logarithm of the residual versus the number of iterations for finding a zero of Oja’s vector field defined in Example 4.1. The convergence trajectory indicates that the residual decreases steadily as the number of iterations increases.

Example 4.2

We consider the problem of finding a zero of the tangent vector field corresponding to the first-order optimization conditions for the trace ratio optimization problem [28, 41, 42]. Let $A,B,C\in{\mathbb{R}}^{m\times m}$ be real symmetric matrices with $B$ being positive-definite and $p$ be a positive integer smaller than $m/2$ . The tangent vector field $F:{\rm St}(p,m)\to T{\rm St}(p,m)$ is given by [41, Theorem 2.1]

[TABLE]

where

[TABLE]

and $\phi_{S}(X):={\rm tr}(X^{T}SX)$ for any $m\times m$ real symmetric matrix $S$ . We choose the retraction $R$ on ${\rm St}(k,n)$ as in (4.2). The vector transport on ${\rm St}(p,m)$ is chosen the same as (4.3) and thus condition (3.2) in Assumption 3.1 is satisfied.

We consider the problem of finding a zero of the tangent vector field $F$ defined by (4.4) with varying $m$ and $p$ . Let $A,B,C$ be random $m\times m$ matrices generated by the MATLAB built-in functions rand, randn, orth, diag, and ones [5]:**

[TABLE]

The starting points are randomly generated by the MATLAB built-in functions randn and qr:

[TABLE]

In Table 4.2, we report numerical results for Example 4.2 with varying values of $m$ and $p$ . In Figure 4.2, we give the convergence history of Algorithm 2.1 for two tests with $(m,p)=(3000,30)$ and $(m,p)=(2000,100)$ . Figure 4.2 depicts the logarithm of the residual versus the number of iterations for finding a zero of the tangent vector field $F$ defined in (4.4). We see from Table 4.2 and Figure 4.2 that Algorithm 2.1 is stable and efficient for solving large-scale problems.

Example 4.3

Let $S_{++}^{m}$ denote the set of all $m\times m$ real symmetric positive definite matrices. Endowing $S_{++}^{m}$ with the following Riemannian metric

[TABLE]

Thus, $S_{++}^{m}$ is a Hadamard manifold manifold of nonpositive curvature everywhere [22, 36]. The dimension of $S_{++}^{m}$ is equal to $m(m+1)/2$ [21, Proposition 2.1]. The geodesic monotone vector field $F:S_{++}^{m}\to TS_{++}^{m}$ is defined by [13]

[TABLE]

The retraction $R$ on $S_{++}^{m}$ is chosen as [22, (3.10)]

[TABLE]

for $\xi_{X}\in T_{X}S_{++}^{m}$ and $X\in S_{++}^{m}$ . The vector transport associated with the above $R$ is chosen as [22, (3.13)]

[TABLE]

for $\xi_{X},\eta_{X}\in T_{X}S_{++}^{m}$ and $X\in S_{++}^{m}$ . Thus condition (3.2) in Assumption 3.1 is satisfied.

We consider the problem of finding a zero of the vector field $F$ defined by (4.5) with varying $m$ . The starting points are randomly generated by the MATLAB built-in functions rand, randn, and qr:

[TABLE]

Table 4.3 shows the numerical results for Example 4.3. We observe from Table 4.3 that Algorithm 2.1 requires only a few iterations and function evaluations for finding an approximate zero of the monotone vector field (4.5) with different values of $m$ . This indicates that Algorithm 2.1 is very stable and efficient for solving large-scale problems. In Figure 4.3, we give the convergence history of Algorithm 2.1 for two tests with $m=600$ and $m=1000$ . Figure 4.3 depicts the logarithm of the residual versus the number of iterations for finding a zero of the tangent vector field $F$ defined in (4.5). The convergence trajectory indicates that the residual decreases very rapidly as the number of iterations increases, which shows the local fast convergence speed of Algorithm 2.1 for solving large-scale problems.

5 Hybrid Method

We note that Algorithm 2.1 is globally convergent. We see from the numerical experiments in section 4 that, in general, Algorithm 2.1 converges at a low or medium order of accuracy. To improve the efficiency, one may adopt some hybrid method. A possible strategy is to combine Algorithm 2.1 with the Riemannian Newton method. As noted in section 1, the Riemannian Newton method may be computationally expensive but has quadratic convergence. In particular, one may use Algorithm 2.1 to generate an initial point for the Riemannian Newton method with a relatively low accuracy and then switch to the Riemannian Newton method for finding a solution of high accuracy. A hybrid algorithm for solving (1.1) is described as follows.

Algorithm 5.1

(PRP-Newton Method)**

Step 0.

Choose an initial point $X_{0}\in\mathcal{M}$ , $0<\zeta_{2}<\zeta_{1}$ , and $0<\varsigma<1$ , $t_{1},t_{2}>0$ , $0<\rho<1$ , $0<\lambda_{\min}<\lambda_{\max}<1$ , $0<\alpha_{\min}\leq\alpha\leq\alpha_{\max}$ . Let $k:=0$ , $\Gamma_{0}:=f(X_{0})$ , $\Phi_{0}:=1$ . Select a positive sequence $\{\delta_{k}\}$ such that (2.7) is satisfied.

Step 1.

For $k=1,2,\ldots$ , do the RDF-PRP iteration as follows:**

(a).

Set $\Delta X_{k}$ to be (2.8) where $\beta_{k}$ and $Y_{k}$ are given by (2.9).

(b).

Determine $\alpha_{k}=\max\{\alpha\rho^{j},j=0,1,2,\ldots\}$ such that if the condition (2.10) is satisfied, then compute $X_{k+1}$ from (2.11); else if the condition (2.12) is satisfied, then compute $X_{k+1}$ from (2.13).

(c).

Choose $\lambda_{k}\in[\lambda_{\min},\lambda_{\max}]$ and compute $\Phi_{k+1}=\lambda_{k}\Phi_{k}+1$ and $\Gamma_{k+1}$ from (2.14).

(d).

Stop if $\|F(X_{k})\|<\zeta_{1}$ .

Step 2.

Set $X_{0}$ to be the limit point of the RDF-PRP iteration.

Step 3.

For $k=1,2,\ldots$ , do the Riemannian Newton iteration as follows:**

(a).

Apply the conjugate gradient (CG) method [19, Algorithm 10.2.1] to solving

[TABLE]

for $\Delta X_{k-1}\in T_{X_{k-1}}\mathcal{M}$ such that

[TABLE]

where $\varsigma_{k-1}:=\min\{\varsigma,\|F(X_{k-1})\|\}$ .

(b).

Set

[TABLE]

(c).

Stop if $\|F(X_{k})\|<\zeta_{2}$ .

We point out that, in Step 3 of Algorithm 5.1, the Riemannian Newton equation is solved inexactly by choosing appropriate value of $\varsigma$ . In addition, different values of $\zeta_{1}$ lead to different starting points for the Riemannian Newton method.

For demonstration purpose, we use Algorithm 5.1 to Examples 4.1–4.2, i.e., finding zeros of the tangent vector fields defined by (4.1) and (4.4). To develop the Riemannian Newton method, one need to restrict the tangent vector fields in (4.1) and (4.4) to the Grassmann manifold ${\rm Grass}(p,m)$ endowed with the induced Riemannian metric from ${\rm St}(p,m)$ . The restriction $\widehat{F}:{\rm Grass}(p,m)\to T{\rm Grass}(p,m)$ of $F$ defined in (4.1) to ${\rm Grass}(p,m)$ is given by

[TABLE]

where $[X]:=\{XQ\in{\rm St}(p,m)\ |\ Q\in\mathcal{O}(p)\}\in{\rm Grass}(p,m)$ denotes the equivalent class corresponding to a point $X\in{\rm St}(p,m)$ . Given $X\in{\rm St}(p,m)$ and a tangent vector $\xi_{[X]}\in T_{[X]}{\rm Grass}(p,m)$ , let $\overline{\xi_{[X]}}\in\mathcal{H}_{X}$ denote the horizontal lift of $\xi_{[X]}\in T_{[X]}{\rm Grass}(p,m)$ at $X\in{\rm St}(p,m)$ , where $\mathcal{H}_{X}$ denotes the horizontal space at $X\in{\rm St}(p,m)$ [43, p.757]. The horizontal lift of $J\widehat{F}([X])\big{[}\xi_{[X]}\big{]}\in\mathcal{H}_{X}$ at $X\in{\rm St}(p,m)$ is denoted by $\overline{J\widehat{F}([X])\big{[}\xi_{[X]}\big{]}}$ , which has the following form:

[TABLE]

Similarly, the restriction $\widehat{F}:{\rm Grass}(p,m)\to T{\rm Grass}(p,m)$ of $F$ defined in (4.4) to ${\rm Grass}(p,m)$ is given by

[TABLE]

Given a point $X\in{\rm St}(p,m)$ and a tangent vector $\xi_{[X]}\in T_{[X]}{\rm Grass}(p,m)$ , the horizontal lift of $J\widehat{F}([X])\big{[}\xi_{[X]}\big{]}\in\mathcal{H}_{X}$ at $X\in{\rm St}(p,m)$ is denoted by $\overline{J\widehat{F}([X])\big{[}\xi_{[X]}\big{]}}$ , which is given by

[TABLE]

where

[TABLE]

and

[TABLE]

For the application of Riemannian optimization algorithms on Riemannian quotient manifolds, one can refer to [1, p.86 and p.121] and [43].

Next, we consider the application of Algorithm 5.1 to Examples 4.1–4.2 for different values of $m$ and $p$ . In our numerical tests, ‘NCG.’ denotes the total number of CG iterations of the Newton step at the final iterate of Algorithm 5.1. In our numerical tests, we set $\varsigma=10^{-8}$ , the parameter pairs $(\zeta_{1},\zeta_{2})$ are set to be $(10^{-1},10^{-7})$ and $(10^{-3},10^{-7})$ , respectively, and the other parameters and the starting points are set as in section 4. For simplicity, two different pairs of $(\zeta_{1},\zeta_{2})$ are tested.

Table 5.1 displays the numerical results for Example 4.1 with different values of $m$ and $p$ . In Figure 5.1, we give the convergence history of Algorithm 5.1 for two tests of Example 4.1 with $(m,p)=(2000,30)$ and $(m,p)=(3000,60)$ . Figure 5.1 depicts the logarithm of the residual versus the number of iterations for finding a zero of the tangent vector field $F$ defined in (4.1). Table 5.2 shows the numerical results for Example 4.2 with different values of $m$ and $p$ . In Figure 5.2, we give the convergence history of Algorithm 5.1 for for two tests of Example 4.2 with $(m,p)=(1000,30)$ and $(m,p)=(2000,60)$ . Figure 5.2 depicts the logarithm of the residual versus the number of iterations for finding a zero of the tangent vector field $F$ defined in (4.4).

We observe from Tables 5.1–5.2 and Figures 5.1–5.2 that, by choosing suitable $\zeta_{1}$ , Algorithm 2.1 may provide a good initial point for the Riemannian Newton method, which give a high accuracy solution. This shows that the proposed hybrid method is very effective for solving large-scale problems.

6 Conclusions

In this paper, we have proposed a Riemannian Derivative-Free PRP Method for finding a zero of a tangent vector field on a Riemannian manifold. By using a non-monotone line search, the global convergence of the proposed geometric method is established under some mild conditions. To further improve the efficiency, we also provide a hybrid method, which combines the proposed geometric algorithm with the Riemannian Newton method. Numerical tests illustrate the efficiency of the proposed geometric algorithm for large-scale problems. An interesting question is how to choose the stopping tolerance $\zeta_{1}$ such that the overall computational cost of Algorithm 5.1 is minimized, which needs further study.

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P.-A. Absil, R. Mahony, and R. Sepulchre , Optimization Algorithms on Matrix Manifolds , Princeton University Press, Princeton, 2008.
2[2] P.-A. Absil, M. Ishteva, L. Lathauwer, and S. van Huffel , A geometric Newton method for Oja’s vector field , Neural Comput., 21, (2009), pp. 1415–1433.
3[3] R. L. Adler, J.-P. Dedieu, J. Y. Margulies, M. Martens, and M. Shub , Newton’s method on Riemannian manifolds and a geometric model for the human spine , IMA J. Numer. Anal., 22 (2002), pp. 359–390.
4[4] G. C. Bento and J. X. Cruz Neto , Finite termination of the proximal point method for convex functions on Hadamard manifolds , Optim., 63 (2014), pp. 1281–1288.
5[5] Y. F. Cai, Z. G. Jia, and Z. J. Bai , Perturbation analysis of an eigenvector-dependent nonlinear eigenvalue problem wiith applications , ar Xiv:1803.01518, 2018.
6[6] H. Chen, X. Dai, X. Gong, L. He, and A. Zhou , Adaptive finite element approximations for Kohn-Sham models , Multiscale Model. Simul., 12 (2014), pp. 1828–1869.
7[7] W. Cheng and D. Li , A derivative-free non-monotone line search and its applications to the spectral residual method , IMA J. Numer. Anal., 29 (2009), pp. 814–825.
8[8] W. Cheng, Y. Xiao, and Q. J. Hu , A family of derivative-free conjugate gradient methods for large-scale nonlinear systems of equations , J. Comput. Appl. Math., 224 (2009), pp. 11–19.