Iteration-complexity and asymptotic analysis of steepest descent method   for multiobjective optimization on Riemannian manifolds

Orizon P. Ferreira; Maur\'icio S. Louzeiro; Leandro F. Prudente

arXiv:1906.05975·math.OC·June 17, 2019·J. Optim. Theory Appl.

Iteration-complexity and asymptotic analysis of steepest descent method for multiobjective optimization on Riemannian manifolds

Orizon P. Ferreira, Maur\'icio S. Louzeiro, Leandro F. Prudente

PDF

TL;DR

This paper analyzes the convergence and complexity of the steepest descent method for multiobjective optimization on Riemannian manifolds, providing theoretical bounds and numerical validation for various stepsize strategies.

Contribution

It introduces iteration-complexity bounds and asymptotic analysis for the steepest descent method on Riemannian manifolds with multiple stepsize rules, a novel extension in this context.

Findings

01

The method converges under different stepsize strategies.

02

Complexity bounds are established for each stepsize rule.

03

Numerical experiments confirm theoretical results.

Abstract

The steepest descent method for multiobjective optimization on Riemannian manifolds with lower bounded sectional curvature is analyzed in this paper. The aim of the paper is twofold. Firstly, an asymptotic analysis of the method is presented with three different finite procedures for determining the stepsize, namely, Lipschitz stepsize, adaptive stepsize and Armijo-type stepsize. The second aim is to present, by assuming that the Jacobian of the objective function is componentwise Lipschitz continuous, iteration-complexity bounds for the method with these three stepsizes strategies. In addition, some examples are presented to emphasize the importance of working in this new context. Numerical experiments are provided to illustrate the effectiveness of the method in this new setting and certify the obtained theoretical results.

Figures4

Click any figure to enlarge with its caption.

Tables3

Table 1. Table 1: Performance of the Riemannian and Euclidean gradient methods in the Rosenbrock’s problem.

	$%$	it	evalf	evalg
Riemannian method	100.0	5.0	49.0	12.0
Euclidean method	95.1	1629.0	5721.0	3260.0

Table 2. Table 2: Performance of the Riemannian gradient method related to Example 29 varying: (a) the dimension of the space; (b) the number of objectives.

$n$	$m$	$%$	it	evalf	evalg
10	2	100.0	26.5	117.5	55.0
100	2	100.0	71.5	220.0	145.0
400	2	100.0	273.0	622.0	548.0
1000	2	100.0	17.0	104.0	36.0

Table 3. Table 3: Performance of the Riemannian gradient method related to Example 22 for: (a) bicriteria problems; (b) three-criteria problems.

$n$	$m$	$%$	it	evalf	evalg
5	2	100.0	8.0	26.5	18.0
10	2	100.0	13.0	38.5	28.0
20	2	100.0	18.0	49.0	38.0
50	2	100.0	27.0	64.0	56.0

Equations173

κ < 0, \overset{κ}{^} := ∣ κ ∣ .

κ < 0, \overset{κ}{^} := ∣ κ ∣ .

cosh (\overset{κ}{^} d (γ (t), q)) \leq cosh (\overset{κ}{^} d (p, q)) + \overset{κ}{^} cosh (\overset{κ}{^} d (p, q)) sinh (t \overset{κ}{^} ∥ v ∥) (\frac{t ∥ v ∥}{2} - \frac{tanh ( κ ^ d ( p , q ))}{κ ^ d ( p , q )} \frac{⟨ v , β ^{'} ( 0 ) ⟩}{∥ v ∥}),

cosh (\overset{κ}{^} d (γ (t), q)) \leq cosh (\overset{κ}{^} d (p, q)) + \overset{κ}{^} cosh (\overset{κ}{^} d (p, q)) sinh (t \overset{κ}{^} ∥ v ∥) (\frac{t ∥ v ∥}{2} - \frac{tanh ( κ ^ d ( p , q ))}{κ ^ d ( p , q )} \frac{⟨ v , β ^{'} ( 0 ) ⟩}{∥ v ∥}),

d^{2} (γ (t), q) \leq d^{2} (p, q) + \frac{sinh ( κ ^ t ∥ v ∥ )}{κ ^} (t ∥ v ∥ \frac{κ ^ d ( p , q )}{tanh ( κ ^ d ( p , q ) )} - \frac{2 ⟨ v , β ^{'} ( 0 ) ⟩}{∥ v ∥}) .

d^{2} (γ (t), q) \leq d^{2} (p, q) + \frac{sinh ( κ ^ t ∥ v ∥ )}{κ ^} (t ∥ v ∥ \frac{κ ^ d ( p , q )}{tanh ( κ ^ d ( p , q ) )} - \frac{2 ⟨ v , β ^{'} ( 0 ) ⟩}{∥ v ∥}) .

∥ \mbox h ess f (p) ∥ := sup {∥ \mbox h ess f (p) v ∥ : v \in T_{p} M, ∥ v ∥ = 1} .

∥ \mbox h ess f (p) ∥ := sup {∥ \mbox h ess f (p) v ∥ : v \in T_{p} M, ∥ v ∥ = 1} .

e := (1, \dots, 1) \in R^{m} .

e := (1, \dots, 1) \in R^{m} .

F (exp_{p} (t v)) ⪯ F (p) + t \nabla F (p) v + t^{2} \frac{L}{2} ∥ v ∥^{2} e, \forall t \in [0, + \infty), v \in T_{p} M .

F (exp_{p} (t v)) ⪯ F (p) + t \nabla F (p) v + t^{2} \frac{L}{2} ∥ v ∥^{2} e, \forall t \in [0, + \infty), v \in T_{p} M .

min {F (p) : p \in M} .

min {F (p) : p \in M} .

min_{v \in T_{p} M} {max_{i \in I} ⟨ grad f_{i} (p), v ⟩ + \frac{1}{2} ∥ v ∥^{2}}, I = {1, \dots, m} .

min_{v \in T_{p} M} {max_{i \in I} ⟨ grad f_{i} (p), v ⟩ + \frac{1}{2} ∥ v ∥^{2}}, I = {1, \dots, m} .

v_{p} := argmin_{v \in T_{p} M} {max_{i \in I} ⟨ grad f_{i} (p), v ⟩ + \frac{1}{2} ∥ v ∥^{2}} .

v_{p} := argmin_{v \in T_{p} M} {max_{i \in I} ⟨ grad f_{i} (p), v ⟩ + \frac{1}{2} ∥ v ∥^{2}} .

v_{p} = - j \in I (v_{p}) \sum μ_{j} grad f_{j} (p), j \in I (v_{p}) \sum μ_{j} = 1,

v_{p} = - j \in I (v_{p}) \sum μ_{j} grad f_{j} (p), j \in I (v_{p}) \sum μ_{j} = 1,

max_{i \in I} ⟨ grad f_{i} (p), v_{p} ⟩ = - ∥ v_{p} ∥^{2} .

max_{i \in I} ⟨ grad f_{i} (p), v_{p} ⟩ = - ∥ v_{p} ∥^{2} .

- ∥ v_{p} ∥^{2} = ⟨ - v_{p}, v_{p} ⟩ = ⟨ j \in I (v_{p}) \sum μ_{j} grad f_{j} (p), v_{p} ⟩ = j \in I (v_{p}) \sum μ_{j} ⟨ grad f_{j} (p), v_{p} ⟩ .

- ∥ v_{p} ∥^{2} = ⟨ - v_{p}, v_{p} ⟩ = ⟨ j \in I (v_{p}) \sum μ_{j} grad f_{j} (p), v_{p} ⟩ = j \in I (v_{p}) \sum μ_{j} ⟨ grad f_{j} (p), v_{p} ⟩ .

F (\mbox e x p_{p} (t v_{p})) ⪯ F (p) + (\frac{L t ^{2}}{2} - t) ∥ v_{p} ∥^{2} e, \forall t \in [0, + \infty) .

F (\mbox e x p_{p} (t v_{p})) ⪯ F (p) + (\frac{L t ^{2}}{2} - t) ∥ v_{p} ∥^{2} e, \forall t \in [0, + \infty) .

p_{k + 1} := \mbox e x p_{p_{k}} (t_{k} v_{k}) .

p_{k + 1} := \mbox e x p_{p_{k}} (t_{k} v_{k}) .

ε < t_{k} \leq \frac{1}{L} .

ε < t_{k} \leq \frac{1}{L} .

i_{k} := min {i : F (exp_{p_{k}} (η^{i} t_{k - 1} v_{k})) ⪯ F (p_{k}) - ζ η^{i} t_{k - 1} ∥ v_{k} ∥^{2} e, i = 0, 1, \dots} .

i_{k} := min {i : F (exp_{p_{k}} (η^{i} t_{k - 1} v_{k})) ⪯ F (p_{k}) - ζ η^{i} t_{k - 1} ∥ v_{k} ∥^{2} e, i = 0, 1, \dots} .

F (\mbox e x p_{p_{k}} (v_{k} / L)) ⪯ F (p_{k}) - (ζ ∥ v_{k} ∥^{2} / L) e .

F (\mbox e x p_{p_{k}} (v_{k} / L)) ⪯ F (p_{k}) - (ζ ∥ v_{k} ∥^{2} / L) e .

\frac{η}{L} \leq t_{k} \leq \frac{1}{L _{0}} .

\frac{η}{L} \leq t_{k} \leq \frac{1}{L _{0}} .

F (exp_{p_{k}} (\hat{t}_{k_{ℓ}} v_{k})) ⪯ F (p_{k}) - δ \hat{t}_{k_{ℓ}} ∥ v_{k} ∥^{2} e,

F (exp_{p_{k}} (\hat{t}_{k_{ℓ}} v_{k})) ⪯ F (p_{k}) - δ \hat{t}_{k_{ℓ}} ∥ v_{k} ∥^{2} e,

F (\mbox e x p_{p_{k}} (t v_{k})) ⪯ F (p_{k}) - δ t ∥ v_{k} ∥^{2} e .

F (\mbox e x p_{p_{k}} (t v_{k})) ⪯ F (p_{k}) - δ t ∥ v_{k} ∥^{2} e .

A := {p \in M : F (p) ⪯ F (p_{k}), k = 0, 1, \dots} .

A := {p \in M : F (p) ⪯ F (p_{k}), k = 0, 1, \dots} .

F (p_{k + 1}) ⪯ F (p_{k}) - ν t_{k} ∥ v_{k} ∥^{2} e, k = 0, 1, \dots,

F (p_{k + 1}) ⪯ F (p_{k}) - ν t_{k} ∥ v_{k} ∥^{2} e, k = 0, 1, \dots,

0 ⪯ k = 0 \sum ℓ t_{k} ∥ v_{k} ∥^{2} e ⪯ \frac{1}{ν} k = 0 \sum ℓ (F (p_{k}) - F (p_{k + 1})) ⪯ \frac{1}{ν} (F (p_{0}) - F (q)),

0 ⪯ k = 0 \sum ℓ t_{k} ∥ v_{k} ∥^{2} e ⪯ \frac{1}{ν} k = 0 \sum ℓ (F (p_{k}) - F (p_{k + 1})) ⪯ \frac{1}{ν} (F (p_{0}) - F (q)),

k = 0 \sum \infty t_{k}^{2} ∥ v_{k} ∥^{2} \leq ρ := ⎩ ⎨ ⎧ min_{i \in I} {2 [f_{i} (p_{0}) - f_{i} (q)] / L : q \in A}, \mbox f or S t r a t e g y \mbox \ref f i x e d . s t e p; min_{i \in I} {[f_{i} (p_{0}) - f_{i} (q)] / (ζ L_{0}) : q \in A}, \mbox f or S t r a t e g y \mbox \ref a d a pt i v e . s t e p; min_{i \in I} {t_{max} [f_{i} (p_{0}) - f_{i} (q)] / δ : q \in A}, \mbox f or S t r a t e g y \mbox \ref a r mij o . s t e p .

k = 0 \sum \infty t_{k}^{2} ∥ v_{k} ∥^{2} \leq ρ := ⎩ ⎨ ⎧ min_{i \in I} {2 [f_{i} (p_{0}) - f_{i} (q)] / L : q \in A}, \mbox f or S t r a t e g y \mbox \ref f i x e d . s t e p; min_{i \in I} {[f_{i} (p_{0}) - f_{i} (q)] / (ζ L_{0}) : q \in A}, \mbox f or S t r a t e g y \mbox \ref a d a pt i v e . s t e p; min_{i \in I} {t_{max} [f_{i} (p_{0}) - f_{i} (q)] / δ : q \in A}, \mbox f or S t r a t e g y \mbox \ref a r mij o . s t e p .

C_{ρ, κ}^{q}

C_{ρ, κ}^{q}

K_{ρ, κ}^{q}

d (p_{k + 1}, q) \leq \frac{1}{κ ^} C_{ρ, κ}^{q}, k = 0, 1, \dots .

d (p_{k + 1}, q) \leq \frac{1}{κ ^} C_{ρ, κ}^{q}, k = 0, 1, \dots .

d^{2} (p_{k + 1}, q) ⪯ d^{2} (p_{k}, q) + K_{ρ, κ}^{q} t_{k}^{2} ∥ v_{k} ∥^{2}, k = 0, 1, \dots .

d^{2} (p_{k + 1}, q) ⪯ d^{2} (p_{k}, q) + K_{ρ, κ}^{q} t_{k}^{2} ∥ v_{k} ∥^{2}, k = 0, 1, \dots .

⟨ v_{k}, β^{'} (0) ⟩ = - j \in I (v_{k}) \sum μ_{j} ⟨ grad f_{j} (p_{k}), β^{'} (0) ⟩ \geq 0, j \in I (v_{k}) \sum μ_{j} = 1.

⟨ v_{k}, β^{'} (0) ⟩ = - j \in I (v_{k}) \sum μ_{j} ⟨ grad f_{j} (p_{k}), β^{'} (0) ⟩ \geq 0, j \in I (v_{k}) \sum μ_{j} = 1.

cosh (\overset{κ}{^} d (p_{k + 1}, q)) \leq cosh (\overset{κ}{^} d (p_{k}, q)) (1 + \frac{1}{2} (\overset{κ}{^} t_{k} ∥ v_{k} ∥)^{2} \frac{sinh ( κ ^ t _{k} ∥ v _{k} ∥ )}{κ ^ t _{k} ∥ v _{k} ∥}) .

cosh (\overset{κ}{^} d (p_{k + 1}, q)) \leq cosh (\overset{κ}{^} d (p_{k}, q)) (1 + \frac{1}{2} (\overset{κ}{^} t_{k} ∥ v_{k} ∥)^{2} \frac{sinh ( κ ^ t _{k} ∥ v _{k} ∥ )}{κ ^ t _{k} ∥ v _{k} ∥}) .

cosh (\overset{κ}{^} d (p_{k + 1}, q)) \leq cosh (\overset{κ}{^} d (p_{k}, q)) (1 + σ (t_{k} ∥ v_{k} ∥)^{2}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Iteration-complexity and asymptotic analysis of steepest descent method for multiobjective optimization on Riemannian manifolds

O. P. Ferreira IME/UFG, Avenida Esperança, s/n, Campus Samambaia, Goiânia, GO, 74690-900, Brazil (e-mails: [email protected], [email protected]).

M. S. Louzeiro TU Chemnitz, Fakultät für Mathematik, D-09107, Chemnitz, Germany (e-mail: [email protected]).

L. F. Prudente 11footnotemark: 1

Abstract

The steepest descent method for multiobjective optimization on Riemannian manifolds with lower bounded sectional curvature is analyzed in this paper. The aim of the paper is twofold. Firstly, an asymptotic analysis of the method is presented with three different finite procedures for determining the stepsize, namely, Lipschitz stepsize, adaptive stepsize and Armijo-type stepsize. The second aim is to present, by assuming that the Jacobian of the objective function is componentwise Lipschitz continuous, iteration-complexity bounds for the method with these three stepsizes strategies. In addition, some examples are presented to emphasize the importance of working in this new context. Numerical experiments are provided to illustrate the effectiveness of the method in this new setting and certify the obtained theoretical results.

Keywords: Steepest descent method, multiobjective optimization problem , Riemannian manifold, lower bounded curvature, iteration-complexity bound.

**AMS ** subject classification: 90C33, 49K05, 47J25.

1 Introduction

A constrained multiobjective optimization problem with constraint set ${\cal M}$ , consists of $m$ objective functions $f_{1},\ldots,f_{m}$ , that have to be optimized at the same time on ${\cal M}$ . In recent years, there has been a significant increase in the number of papers addressing this class of problems; for example, see [1, 2, 3, 4, 5, 6, 7]. Here, among the methods designed for solving multiobjective optimization problems, we are interested in the steepest descent method. This method, was proposed in [8] and since of then several variants have been considered, including but not limited to [9, 10, 11, 12, 13, 14]. Recently some iteration-complexity results to gradient method for unconstrained multi-objective optimization problem were presented in [15]. These results have been shown to be the same global rates as for steepest descent method in scalar objective optimization.

Constrained optimization problems, where the constraint set $\mathcal{M}$ can be endowed with Riemannian manifold structure, have been studied extensively in the last few years. Some aspects about the use of Riemannian geometry tools to study these class of problems arises from the following interesting fact. Endowing $\mathcal{M}$ with a suitable Riemannian metric, an Euclidean non-convex constrained problem with constraint set ${\cal M}$ can be seen as a Riemannian convex unconstrained problem. In addition to this property, for differentiable functions, its gradient can also become Riemannian Lipschitz continuous; see [16]. Consequently, the geometric and algebraic structures that come from the Riemannian metric make possible to greatly reduce the computational cost for solving such problems. Indeed, it is well known that the iteration-complexity of several optimization methods for convex optimization problems such that objective functions have Lipschitz continuous gradient is much lower than nonconvex optimization problems; see for example [17, 18, 19, 20, 21] and references therein. Furthermore, many optimization problems are naturally posed on the Riemannian context; see [22, 18, 23, 20]. Then, to take advantage of the intrinsic Riemannian geometric structure, it is preferable to treat these problems as the ones of finding singularities of gradient vector fields on Riemannian manifolds rather than using Lagrange multipliers or projection methods; see [24, 23, 25]. In this sense constrained optimization problems can be seen as unconstrained from the point of view of Riemannian geometry. Moreover, intrinsic Riemannian structures can also opens up new research directions that aid in developing competitive optimization algorithms; see [26, 22, 18, 27, 23, 20]. More about concepts and techniques of optimization on Riemannian context can be found in [28, 29, 30, 31, 32, 21, 33, 25, 34] and the bibliographies therein.

In this paper we will study the steepest descent method for multiobjective optimization on Riemannian manifolds. The aim is twofold. First, asymptotic analysis will be done for quasi-convex and convex vectorial functions. In fact, in [35] asymptotic analysis of this method has already been done in Riemannian context; see also [36]. However, the analysis asymptotic presented in these previous works is just to stepsize given by Armijo rule and it demand that the Riemannian manifolds have nonnegative sectional curvature. The asymptotic analysis presented in the present paper increase the previous ones in two different aspects. It is provided an analysis with three different finite procedures for determining the stepsize, namely, Lipschitz stepsize, adaptive stepsize and Armijo-type stepsize and only lower boundedness of the curvature of the Riemannian manifold is assumed. The second aim is to present iteration-complexity bounds for steepest descent method for multiobjective optimization on Riemannian manifolds. It is worth noting that, our results generalize to the Riemannian context the results obtained in [15]. Besides, we present one iteration-complexity bound that is new even in Euclidean setting. In addition, some examples are presented to emphasize the importance of working in this new context. Numerical experiments are provided to illustrate the effectiveness of the method in this new setting and certify the obtained theoretical results.

The organization of this paper is as follows. In Section 2, some notations and auxiliary results, used throughout of the paper, are placed. In Section 3, we present the algorithm and the stepsizes that will be used. In Section 3.1, the asymptotic convergence analysis of the sequence generated by the steepest descent method is made. In Section 3.2, we present iteration-complexity bounds related to the steepest descent method. In Section 4, we present examples of vectorial convex functions with componentwise Lipschitz continuous Jacobian. Numerical experiments are present in Section 5. Finally, some conclusions are given in Section 6.

2 Notations and Auxiliary Concepts

In this section, we recall some concepts, notations, and basics results about Riemannian manifolds and vector optimization. For more details we refer the reader to [37, 38, 25, 19].

We denote by $T_{p}\mathcal{M}$ the tangent space of a finite dimensional Riemannian manifold $\mathcal{M}$ at $p$ , and by $T\mathcal{M}=\cup_{p\in M}T_{p}\mathcal{M}$ tangent bundle of ${\cal M}$ . The corresponding norm associated to the Riemannian metric $\langle\cdot,\cdot\rangle$ is denoted by $\|\cdot\|$ . We use $\ell(\alpha)$ to denote the length of a piecewise smooth curve $\alpha:[a,b]\to\mathcal{M}$ . The Riemannian distance between $p$ and $q$ in $\mathcal{M}$ is denoted by $d(p,q)$ . Denote by ${\cal X}(\mathcal{M})$ , the space of smooth vector fields on $\mathcal{M}$ . Let $\nabla$ be the Levi-Civita connection associated to $(\mathcal{M},\langle\cdot,\cdot\rangle)$ . For each $t\in[a,b]$ and a piecewise smooth curve $\alpha:[a,b]\to\mathcal{M}$ , the covariant derivative $\nabla$ induces an isometry, relative to $\langle\cdot,\cdot\rangle$ , $P_{\alpha,a,t}\colon T_{\alpha(a)}{\mathcal{M}}\to T_{\alpha(t)}{\mathcal{M}}$ defined by $P_{\alpha,a,t}\,v=V(t)$ , where $V$ is the unique vector field on the curve $\alpha$ such that $\nabla_{\alpha^{\prime}(t)}V(t)=0$ and $V(a)=v$ , the so-called parallel transport along of $\alpha$ joining $\alpha(a)$ to $\alpha(t)$ . When there is no confusion, $P_{\alpha,p,q}$ denotes the parallel transport along the segment $\alpha$ joining $p$ to $q$ . Given that the geodesic equation $\nabla_{\ \gamma^{\prime}}\gamma^{\prime}=0$ is a second order nonlinear ordinary differential equation, then the geodesic $\gamma=\gamma_{v}(\cdot,p)$ is determined by its position $p$ and velocity $v$ at $p$ . The restriction of a geodesic to a closed bounded interval is called a geodesic segment. For any two points $p,q\in\mathcal{M}$ , $\Gamma_{pq}$ denotes the set of all geodesic segments $\gamma:[0,1]\rightarrow\mathcal{M}$ with $\gamma(0)=p$ and $\gamma(1)=q$ . A geodesic segment joining $p$ to $q$ in $\mathcal{M}$ is said to be minimal if its length is equal to $d(p,q)$ . In this paper, all manifolds are assumed to be connected, finite dimensional, and complete. Hopf-Rinow’s theorem asserts that any pair of points in a complete Riemannian manifold $\mathcal{M}$ can be joined by a (not necessarily unique) minimal geodesic segment. Owing to the completeness of the Riemannian manifold $\mathcal{M}$ , the exponential map $\exp_{p}:T_{p}\mathcal{M}\to\mathcal{M}$ is given by $\exp_{p}v\,=\,\gamma_{v}(1,p)$ , for each $p\in\mathcal{M}$ . For $f:{\cal M}\to\mathbb{R}$ a differentiable function on $\mathcal{M}$ , the Riemannian metric induces the mapping $f\mapsto\operatorname{grad}f$ which associates its gradient via the following rule $\langle\operatorname{grad}f(p),V(p)\rangle:=df(p)V(p)$ , for all $p\in{\cal M}$ and $V\in{\cal X}(\mathcal{M})$ . For a twice-differentiable function, the mapping $f\mapsto\mbox{hess}f$ associates its hessian via the rule $\langle\mbox{hess}fV,V\rangle:=d^{2}f(V,V)$ , for all $V\in{\cal X}({\cal M})$ , where the last equalities imply that $\mbox{hess}fV=\nabla_{V}\operatorname{grad}f$ , for all $V\in{\cal X}({\cal M})$ . Let us to introduce some concepts of vector optimization on a Riemannian manifold $\mathcal{M}$ . Letting ${\cal I}:=\{1,\ldots,m\}$ define ${\mathbb{R}}^{m}_{+}:=\{x\in{\mathbb{R}}^{m}:~{}x_{i}\geq 0,~{}~{}i\in{\cal I}\}$ and ${\mathbb{R}}^{m}_{++}:=\{x\in{\mathbb{R}}^{m}:x_{i}>0,~{}~{}i\in{\cal I}\}$ . For $x,\,y\in{\mathbb{R}}^{m}_{+}$ , $y\succeq x$ (or $x\preceq y$ ) means that $y-x\in{\mathbb{R}}^{m}_{+}$ and $y\succ x$ (or $x\prec y$ ) means that $y-x\in{\mathbb{R}}^{m}_{++}$ . Let $F:=\left(f_{1},\ldots,f_{m}\right):{\cal M}\to\mathbb{R}^{m}$ be a differentiable function. We denote the Riemannian jacobian of $F$ at a point $p\in\mathcal{M}$ by $\nabla F(p)v:=\left(\langle\operatorname{grad}f_{1}(p),v\rangle,\ldots,\langle\operatorname{grad}f_{m}(p),v\rangle\right)$ , where $v\in T_{p}\mathcal{M}$ , and the image of the Riemannian jacobian of $F$ at $p$ by $\mbox{Im}(\nabla F(p)):=\left\{\nabla F(p)v~{}:v\in T_{p}M\right\}.$ A vectorial function $F:\mathcal{M}\to\mathbb{R}^{m}$ is said to be convex on $\mathcal{M}$ if for any $p,q\in\mathcal{M}$ and $\gamma\in\Gamma_{pq}$ the composition $F\circ\gamma:[0,1]\to\mathbb{R}$ satisfies $F\circ\gamma(t)\preceq(1-t)F(p)+tF(q),$ for all $t\in[0,1].$ By convexity of $F$ , it follows that $\nabla F(p)\gamma^{\prime}(0)\preceq F(q)-F(p)$ . A vectorial function $F$ is called quasi-convex on $\mathcal{M}$ if, for every $p,q\in\mathcal{M}$ and $\gamma\in\Gamma_{pq}$ , it holds $F(\gamma(t))\preceq\operatorname{max}\{F(p),F(q)\}$ , for all $t\in[0,1]$ , where the maximum is considered coordinate by coordinate. It is immediate of the above definitions that if $F$ is convex then it is quasi-convex. Moreover, if $F$ is a quasi-convex function, than $F(q)\preceq F(p)$ implies $\nabla F(p)\gamma^{\prime}(0)\preceq 0$ .

The next result plays an important role in next sections. Its proof, which will be omitted here, follows the same ideas as those presented in the proof of [30, Lemma 3.2], with some minor technical adjustments needed to settle it to our goals. For simplifying our notations throughout the paper, we define

[TABLE]

Lemma 1.

Let $\mathcal{M}$ be a complete Riemannian manifolds with sectional curvature $K\geq\kappa$ . Let $p,q\in\mathcal{M}$ , $p\neq q$ , $v\in T_{p}\mathcal{M}$ , ${\gamma}:[0,\infty)\longrightarrow\mathcal{M}$ be defined by ${\gamma}(t)=\mbox{exp}_{p}\left(tv\right)$ and $\beta:[0,1]\rightarrow\mathcal{M}$ be a minimizing geodesic with $\beta(0)=p$ and $\beta(1)=q$ . Then, for any $t\in[0,\infty)$ there holds

[TABLE]

and, consequently, the following inequality holds

[TABLE]

Next we present the definition of Lipschitz continuous gradient vector field; see [39].

Definition 2.

Let $f$ be a differentiable function on the set ${\cal M}$ . The gradient vector field of $f$ is said to be Lipschitz continuous on ${\cal M}$ with constant $L\geq 0$ if, for any $p,q\in{\cal M}$ and $\gamma\in\Gamma_{pq}$ , it holds that $\left\|P_{{\gamma},p,q}\operatorname{grad}f(p)-\operatorname{grad}f(q)\right\|\leq L\ell(\gamma).$

The norm of the hessian $\mbox{hess}\,f$ at $p\in{\mathcal{M}}$ is given by

[TABLE]

The next result has similar proof to its Euclidean version and it will be omitted.

Lemma 3.

Let $f:\mathcal{M}\to\mathbb{R}$ be a twice continuously differentiable function. The gradient vector field of $f$ is Lipschitz continuous with constant $L\geq 0$ if, and only if, there exists $L\geq 0$ such that $\|\mbox{hess}\,f(p)\|\leq L$ , for all $p\in\mathcal{M}$ .

In the following we present the concept of Lipschitz continuity for the Riemannian Jacobian of a vectorial function.

Definition 4.

Let $F:=\left(f_{1},\ldots,f_{m}\right):{\cal M}\to\mathbb{R}^{m}$ be a differentiable function. If for each $f_{i}:{\cal M}\rightarrow\mathbb{R}$ there exists a $L_{i}\geq 0$ such that $\left\|P_{{\gamma},p,q}\operatorname{grad}f_{i}(p)-\operatorname{grad}f_{i}(q)\right\|\leq L_{i}\ell(\gamma)$ , for any $p,q\in{\cal M}$ and $\gamma\in\Gamma_{pq}$ , then we say that $\nabla F$ is componentwise Lipschitz continuous on ${\cal M}$ with constant $L:=\operatorname{max}_{i=1,\ldots,m}\ L_{i}$ .

The proof of the next lemma follows, with appropriate adjustments, the same idea of proof of the scalar version presented in [17, Corollary 2.1]. Throughout of the paper we will use the following notation

[TABLE]

Lemma 5.

Let $F:=\left(f_{1},\ldots,f_{m}\right):{\cal M}\to\mathbb{R}^{m}$ be a differentiable function. Assume that $\nabla F$ is componentwise Lipschitz continuous on ${\cal M}$ with constant $L\geq 0$ and $p\in{\cal M}$ . Then there holds

[TABLE]

Next we introduce the concept of quasi-Fejér convergence, which played an important role in the analysis of the gradient method.

Definition 6.

A sequence $\{y_{k}\}$ in the complete metric space $(\mathcal{M},d)$ is quasi-Fejér convergent to a set $W\subset\mathcal{M}$ if, for every $w\in W$ , there exist a sequence $\{\epsilon_{k}\}\subset\mathbb{R}$ such that $\epsilon_{k}\geq 0$ , $\sum_{k=1}^{\infty}\epsilon_{k}<+\infty$ , and $d^{2}(y_{k+1},w)\leq d^{2}(y_{k},w)+\epsilon_{k}$ , for all $k=0,1,\ldots$ .

In the following we state the main property of the quasi-Fejér concept, its proof follows the same path as its Euclidean counterpart proved in [40], by replacing the Euclidean distance by the Riemannian one.

Theorem 7.

Let $\{y_{k}\}$ be a sequence in the complete metric space $(\mathcal{M},d)$ . If $\{y_{k}\}$ is quasi-Fejér convergent to a nonempty set $W\subset\mathcal{M}$ , then $\{y_{k}\}$ is bounded. Furthermore, if a cluster point $\bar{y}$ of $\{y_{k}\}$ belongs to $W$ , then $\lim_{k\to\infty}y_{k}=\bar{y}$ .

Hereafter, we assume that $\mathcal{M}$ is a complete Riemannian manifolds with sectional curvature $K\geq\kappa$ , where $\kappa<0$ . We point out that for Riemannian manifold with nonnegative sectional curvature, the convergence analysis of the steepest descent method for convex and quasi-convex vector functions is well understood; see for example [35, 36].

3 Steepest Descent for Multiobjective Optimization

Let $F:=\left(f_{1},\ldots,f_{m}\right):{\cal M}\to\mathbb{R}^{m}$ be a continuously differentiable function. The problem of finding an optimum Pareto point of $F$ , we denote by

[TABLE]

A point $p\in\mathcal{M}$ satisfying $\mbox{Im}(\nabla F(p))\cap(-{\mathbb{R}}^{m}_{++})=\emptyset$ is called critical Pareto. An optimum Pareto point of $F$ is a point $p_{*}\in\mathcal{M}$ such that there exists no other $p\in\mathcal{M}$ with $F(p)\preceq F(p_{*})$ and $F(p)\neq F(p_{*})$ . Moreover, a point $p_{*}\in\mathcal{M}$ is a weak optimal Pareto of $F$ if there is no $p\in\mathcal{M}$ with $F(p)\prec F(p_{*})$ . Consider the following problem

[TABLE]

Whenever $p\in\mathcal{M}$ is not critical Pareto, the optimization problem (3) has only one solution, which is called steepest descent direction for $F$ in $p$ and it is denoted by

[TABLE]

In the next lemma we state an important property of the steepest descent direction. Its proof can be found in [35, Lemma 5.1].

Lemma 8.

The steepest descent direction mapping ${\cal M}\ni p\mapsto v_{p}\in T_{p}M$ , is a continuous vector field.

Moreover, the vector $v_{p}$ is the solution of the problem (3) if and only if there exist $\mu_{j}\geq 0$ , for $j\in{\cal I}(v_{p}):=\{j\in{\cal I}:~{}\langle\operatorname{grad}f_{j}(p),v_{p}\rangle=\operatorname{max}_{i\in{\cal I}}\langle\operatorname{grad}f_{i}(p),v_{p}\rangle\}$ , such that

[TABLE]

see [35, Lemma 4.1]. In the following lemma we state an important inequality for our convergence analysis and an equivalence for a point $p\in{\cal M}$ to be a critical Pareto.

Lemma 9.

Let $p\in{\cal M}$ and $v_{p}$ as defined (4). Then,

[TABLE]

Consequently, $\nabla F(p)v_{p}\preceq-\left\|v_{p}\right\|^{2}e.$ In addition, $p$ is critical Pareto point of $F$ if, and only if, $\left\|v_{p}\right\|=0$ .

Proof.

Let $p\in{\cal M}$ and $v_{p}$ as in (4). Thus, from the first equality in (5) we have

[TABLE]

Hence, by the definition of ${\cal I}(v_{p})$ and the second equality in (5), it is easy to verify that (6) holds. The second statement follows by using the definitions of $\nabla F(p)v_{p}$ and ${\cal I}(v_{p})$ . We proceed with the prove of the third statement of the lemma. Assuming that $p$ is a critical Pareto, it follows from the definition that there exists $i\in{\cal I}$ such that $\left\langle\operatorname{grad}f_{i}(p),v_{p}\right\rangle\geq 0$ . Then, the by first part of lemma we have $\left\|v_{p}\right\|=0$ . The converse follows from [35, Lemma 4.2] and the proof is concluded. ∎

The proof of the next lemma is a straight combination of Lemma 5 with first part of Lemma 9 and will be omited.

Lemma 10.

Assume that $\nabla F$ is componentwise Lipschitz continuous on ${\cal M}$ with constant $L\geq 0$ . Let $p\in\mathcal{M}$ and $v_{p}$ as defined in (4). Then, there holds

[TABLE]

Next we state the steepest descent algorithm in Riemannian manifold to solve (2).

Our goal is to analyze Algorithm 1 with three different strategies for choosing the stepsize $t_{k}>0$ . An analogous analysis done in the scalar case can be found in [16]. In the first strategy we assume that $\nabla F$ is componentwise Lipschitz continuous and in the last two without any Lipschitz condition. The statements of the strategies are as follows:

Strategy 1 (Lipschitz stepsize).

Assume that $\nabla F$ is componentwise Lipschitz continuous on $\mathcal{M}$ with constant $L\geq 0$ . Let $\varepsilon>0$ and take

[TABLE]

Despite knowing that $\nabla F$ is componentwise Lipschitz continuous, in general the Lipschitz constant is not computable. Then, the next strategy can be used to compute the stepsize without any Lipschitz condition. However, as we shall show, if $\nabla F$ is componentwise Lipschitz continuous with constant $L>0$ the stepsize computed is an approximation to $1/L$ ; see the scalar case in [41, 16].

Strategy 2 (adaptive stepsize).

Take $\zeta\in(0,1/2]$ , $L_{0}>0$ , $t_{0}:=L_{0}^{-1}$ , and $0<\eta<1$ . Consider $v_{k}$ is defined as in (4). Set $t_{k}:=\eta^{i_{k}}t_{k-1}$ , where

[TABLE]

In the next remark we show that if $\nabla F$ is componentwise Lipschitz continuous on $\mathcal{M}$ , the adaptive stepsize can be seen as an approximation for $1/L$ .

Remark 11.

Suppose that $\nabla F$ is componentwise Lipschitz continuous on $\mathcal{M}$ with constant $L>0$ . Let $L_{0}>0$ be an estimate for $L$ and $v_{k}=v_{p_{k}}$ be defined as in (4). Taking $t=1/L$ , using Lemma 10 and taking into account that $\zeta\leq 1/2$ , we obtain

[TABLE]

Hence, it follows that $t_{k}=1/L$ is always accepted for Strategies 2 with $i_{k}=0$ . Therefore, if $L_{0}\geq L$ then we have $t_{k}=1/L_{0}$ , i.e., the step-size is constant. On the other hand, if $L_{0}\leq L$ then owing to $\eta<1$ we conclude that $t_{k}$ in Strategies 2 satisfies

[TABLE]

In the following strategy a stepsize satisfying an Armijo-type sufficient descent condition is chosen using a backtracking approach.

Strategy 3 (Armijo-type stepsize).

Let $t_{\operatorname{max}}>t_{\operatorname{min}}>0$ , $0<\omega_{1}<\omega_{2}<1$ and $\delta\in(0,1)$ . Let $v_{k}=v_{p_{k}}$ be defined as in (4). The stepsize $t_{k}$ is chosen according the following algorithm:

** Step 0.**

Set $\ell=0$ and take ${\hat{t}}_{k_{0}}\in[t_{\operatorname{min}},t_{\operatorname{max}}]$ .

** Step 1.**

If

[TABLE]

then set $t_{k}:={\hat{t}}_{k_{\ell}}$ and stop.

** Step 2.**

Choose a stepsize ${\hat{t}}_{k_{\ell+1}}\in[\omega_{1}{\hat{t}}_{k_{\ell}},\omega_{2}{\hat{t}}_{k_{\ell}}]$ , set $\ell\leftarrow\ell+1$ and proceed to Step 1.

In the next remark we show that, for $\nabla F$ componentwise Lipschitz continuous on $\mathcal{M}$ , the stepsizes in Strategy 3 are bounded below by a positive constant.

Remark 12.

Assume that $\nabla F$ is componentwise Lipschitz continuous on $\mathcal{M}$ with constant $L\geq 0$ , $t_{\operatorname{max}}>2[1-\delta]/L$ and $t_{\operatorname{min}}<2\omega_{1}(1-\delta)/L$ . Hence, for any $t\in(0,2[1-\delta]/L]$ , from Lemma 10 we have

[TABLE]

Therefore, $t_{k}$ in Strategies 3 satisfies the inequality $t_{k}>t_{\operatorname{min}}$ , for all $k=0,1,\ldots.$

Since well-definedness of Strategies 2 and 3 follows by using ordinary arguments, we will omitted its proof here. Hence, the sequence $\{p_{k}\}$ generated by Algorithm 1 with Strategies 1, 2 or 3 is well-defined. Finally we remind that, $p$ is a critical Pareto if, and only if, $\left\|v_{p}\right\|=0$ . Therefore, from now on we assume that $\left\|v_{k}\right\|\neq 0$ , for all $k$ . Moreover, let us denote by $\{p_{k}\}$ the infinity sequence generated by Algorithm 1.

3.1 Asymptotic Convergence Analysis

In this section, we analyze asymptotic convergence of the sequence $\{p_{k}\}$ generated by Algorithm 1 with Strategies 1, 2 and 3. Let us define

[TABLE]

To proceed with our analysis, from now on, we will assume that the set ${\cal A}$ is non-empty. A condition guaranteeing this assumption is the existence of accumulation point for the sequence $\{p_{k}\}$ .

Lemma 13.

Let $\{p_{k}\}$ be generated with any of Strategies 1, 2 or 3. Then,

[TABLE]

where $\nu=1/2$ for Strategy 1, $\nu=\zeta$ for Strategy 2 and $\nu=\delta$ for Strategy 3. As a consequence, there holds $\lim_{k\to+\infty}t_{k}\left\|v_{k}\right\|^{2}=~{}0$ .

Proof.

The inequality (12) for Strategies 2 and 3 follows from (7), (9) and (11), respectively. Now, assume that $\{p_{k}\}$ is generated by using Strategies 1. In this case, combining (7) with Lemma 10 and taking into account that (8) implies $(Lt_{k}/2-1)\leq-1/2$ , (12) follows with $\nu=1/2$ . To proceed with the proof of the last statement, take $q\in{\cal A}$ and an integer number $\ell>0$ . Thus, (12) yields

[TABLE]

with implies the desired result, and the proof of the lemma is concluded. ∎

To simplify the statement and proof of the next result we need to define three auxiliary constants. For that, let $p_{0}\in\mathcal{M}$ . By using (12) together with (8), (10) and (11) define the first constant $\rho>0$ as follows

[TABLE]

The other two auxiliaries constants ${\cal C}_{\rho,\kappa}^{q}>0$ and ${\cal K}_{\rho,\kappa}^{q}>0$ are defined as follows

[TABLE]

where the constants ${\hat{\kappa}}$ and $\rho$ , are defined in (1) and (13), respectively.

Lemma 14.

Let $\{p_{k}\}$ be generated with any of Strategies 1, 2 or 3 and $q\in{\cal A}$ . Assume that the function $F$ is quasi-convex on $\mathcal{M}$ . Then,

[TABLE]

As a consequence, $\{p_{k}\}$ is bounded and the following inequality holds

[TABLE]

Proof.

For each $k$ , let ${\gamma_{k}}:[0,\infty)\longrightarrow\mathbb{R}$ be defined by ${\gamma_{k}}(t)=\mbox{exp}_{p_{k}}\left(tv_{k}\right)$ . Let $\beta_{k}:[0,1]\rightarrow\mathcal{M}$ be a minimizing geodesic with $\beta_{k}(0)=p_{k}$ and $\beta_{k}(1)=q$ . By using (5), the definition of $v_{k}$ , the quasi-convexity of $F$ , and taking into account that $q\in{\cal A}$ , we have

[TABLE]

Thus, applying the first inequality of Lemma 1, with $t=t_{k}$ , $\gamma=\gamma_{k}$ , $\beta=\beta_{k}$ and $p=p_{k}$ , and using (7) and (18), we obtain

[TABLE]

Since (13) implies $t_{k}\left\|v_{k}\right\|\leq\sqrt{\rho}$ , and the map $(0,+\infty)\ni t\mapsto\sinh(t)/t$ is increasing, we conclude that

[TABLE]

where $\sigma:=\hat{\kappa}(\sinh(\hat{\kappa}\sqrt{\rho}))/(2\sqrt{\rho})$ . Now note that the last inequality implies that

[TABLE]

Therefore, by using (13), it follows that $\cosh(\hat{\kappa}d(p_{k+1},q))\leq\cosh(\hat{\kappa}d(p_{0},q))e^{\sigma\rho}$ which, considering the definition of $\sigma$ and (14), yields (16). The boundedness of $\{p_{k}\}$ is immediate from (16). We proceed with the proof of (17). Now, we apply the second inequality of Lemma 1 and again we take into account (7) and (18) to conclude that

[TABLE]

Since the maps $(0,+\infty)\ni t\mapsto t/\tanh(t)$ and $(0,+\infty)\ni t\mapsto\sinh(t)/t$ are increasing and positive, taking into account (16) and that $t_{k}\left\|v_{k}\right\|\leq\sqrt{\rho}$ , the inequality (19) becomes

[TABLE]

Therefore, by using (15) we have the desired inequality. ∎

In the next result we show that if $F$ is a quasi-convex function on a Riemannian manifolds with lower bounded sectional curvature, then $\{p_{k}\}$ converges to a critical Pareto point of $F$ .

Theorem 15.

Let $\{p_{k}\}$ be generated with any of Strategies 1, 2 or 3. If $F$ is quasi-convex, then $\{p_{k}\}$ converges to a critical Pareto point of $F$ .

Proof.

Since ${\cal A}$ is non-empty, Lemma 14 and (13) imply that $\{p_{k}\}$ is bounded and quasi-Fejér convergent to set ${\cal A}$ . Taking into account Lemma 13 we conclude that $\{f_{s}(p_{k})\}$ is non-increasing, for all $s=1,\ldots,m$ . Thus, we conclude that all cluster points of $\{p_{k}\}$ belongs to ${\cal A}$ . Hence, Theorem 7 implies that $\{p_{k}\}$ converges to a point $\bar{p}\in{\cal A}$ . Hence, remais to prove that $\bar{p}$ is a critical Pareto point of $F$ . We know that, for any of the three strategies 1, 2 or 3, the sequence $\{t_{k}\}$ is bounded. Let $\bar{t}\geq 0$ be a cluster point of $\{t_{k}\}$ and take $\{t_{k_{j}}\}$ such that $\lim_{j\to\infty}t_{k_{j}}=\bar{t}$ . First we suppose that $\bar{t}>0$ . Since $\lim_{j\to\infty}p_{k_{j}}=\bar{p}$ and $\lim_{j\to\infty}t_{k_{j}}=\bar{t}$ , (13) and Lemma 8 imply that $0=\lim_{j\to\infty}t_{k_{j}}\left\|v_{k_{j}}\right\|=\bar{t}\left\|v_{\bar{p}}\right\|.$ Thus, considering that we are under the assumption $\bar{t}>0$ , we obtain $v_{\bar{p}}=0$ . Therefore, Lemma 9 implies that $\bar{p}$ is a critical Pareto point of $F$ . Now, we suppose that $\bar{t}=0$ . In this case, we just need to analyze Strategies 2 and 3, due to Strategy 1 we have $\epsilon\leq\bar{t}$ . First assume that Strategy 2 is used and take $r\in\mathbb{N}$ . Since $\lim_{j\to\infty}t_{k_{j}}=0$ we conclude that if $j$ is large enough, $t_{k_{j}}<\eta^{r}t_{0}=:C_{r}$ . Thus, for each $j$ large enough, from (9) we have

[TABLE]

for some $s_{j}\in\{1,\ldots,m\}$ . Since the set $\{1,\ldots,m\}$ is finite, without lose of generality, we assume the there exist ${\hat{s}}$ and a infinite set of index $j$ such that

[TABLE]

Since $\lim_{j\to\infty}p_{k_{j}}=\bar{p}$ and $\lim_{j\to\infty}t_{k_{j}}=\bar{t}$ , letting $j$ goes to $+\infty$ and taking into account that $v_{p}$ and the exponential map are continuous, we obtain

[TABLE]

Thus, letting $r$ goes to $+\infty$ , yields $\langle\operatorname{grad}f_{\hat{s}}(\bar{p}),v_{\bar{p}}\rangle\geq-\zeta\|v_{\bar{p}}\|^{2}$ . Hence, from Lemma 9 we conclude that $-\|v_{\bar{p}}\|^{2}\geq-\zeta\|v_{\bar{p}}\|^{2}$ and, considering that $\zeta\in(0,1/2]$ , we have $\left\|v_{\bar{p}}\right\|=0$ . Consequently, using again Lemma 9 we have $\bar{p}$ is a critical Pareto of $F$ . Finally, assume that Strategy 3 is used. Since $\lim_{j\to\infty}t_{k_{j}}=0$ we conclude that if $j$ is large enough we have $t_{k_{j}}<t_{min}$ . Thus, if $j$ is large enough, there exists $0<{\hat{t}}_{j}\leq t_{max}$ such that $0<\omega_{1}{\hat{t}}_{j}\leq{t}_{k_{j}}$ and

[TABLE]

for some $s_{j}\in\{1,\ldots,m\}$ . Since the set $\{1,\ldots,m\}$ is finite, without lose of generality, we assume the there exist ${\hat{s}}$ and a infinite set of index $j$ such that

[TABLE]

Let $\gamma_{j}(t):=\exp_{p_{k_{j}}}(tv_{k_{j}})$ , for $t>0$ , be a geodesic segment. Thus, the mean value theorem implies that there exists ${\bar{t}}_{j}\in(0,{\hat{t}}_{j})$ such that

[TABLE]

On the other hand, let $B_{\epsilon}(\overline{p})\subset\mathcal{M}$ be a totally normal ball. Hence, considering that $\lim_{j\to+\infty}p_{k_{j}}=\bar{p}$ , Lemma 8 implies that $\lim_{j\to+\infty}v_{k_{j}}=\bar{v}_{\bar{p}}$ . Moreover, $0<\omega_{1}{\hat{t}}_{j}\leq{t}_{k_{j}}$ implies that $\lim_{j\to+\infty}\hat{t}_{j}=0$ . Owing to $0<{\bar{t}}_{j}\leq{\hat{t}}_{j}$ we obtain that $\lim_{j\to+\infty}\bar{t}_{j}=0$ . Hence, for all $j$ large enough we have $\{\bar{t}_{j}\}\subset(0,1)$ and $\gamma_{j}(\bar{t}_{j})\in B_{\epsilon}(\overline{p})$ , which implies

[TABLE]

Thus, letting $j$ goes to $+\infty$ and using [42, Lemma 1.1], we conclude that $\lim_{j\rightarrow+\infty}P_{\gamma_{j},0,\bar{t}_{j}}v_{k_{j}}=\bar{v}_{\bar{p}}$ (a general version for this equality, see [42, Lemma 1.2]). Then, letting $j$ goes to $+\infty$ in (20) and taking into account Lemma 8, that $\operatorname{grad}f_{\hat{s}}$ and the exponential map are continuous, we obtain $\langle\operatorname{grad}f_{\hat{s}}(\bar{p}),v_{\bar{p}}\rangle\geq-\delta\|v_{\bar{p}}\|^{2}$ . Hence, Lemma 9 implies that $-\|v_{\bar{p}}\|^{2}\geq-\delta\|v_{\bar{p}}\|^{2}$ and, considering that $\delta\in(0,1)$ , we have $\left\|v_{\bar{p}}\right\|=0$ . Consequently, using again Lemma 9 we conclude that $\bar{p}$ is a critical Pareto of $F$ . Therefore, for all Strategies 1, 2 or 3, $\bar{p}$ is a critical Pareto point of $F$ , which concludes the proof. ∎

Corollary 16.

Let $\{p_{k}\}$ be generated with any of Strategies 1, 2 or 3. If $F$ is convex, then $\{p_{k}\}$ converges to a weak optimal Pareto of $F$ .

Proof.

Since $F$ is convex, critical points are weak optimal Pareto of $F$ , see [35, Proposition 5.2]. Considering that convex functions are also quasi-convex the result follows from Theorem 15. ∎

3.2 Iteration-Complexity Analysis

In this section we present iteration-complexity bounds related to the steepest descent method with Strategies 1, 2 and 3, for $F$ having $\nabla F$ with componentwise Lipschitz continuous constant $L>0$ . For this purpose, by using (8), (10) and Remark 12, define

[TABLE]

The following result extends the scalar result [17, Theorem 3.1] to multiobjective settings. Moreover, it also extends to Riemannian context [15, Theorem 3.1].

Theorem 17.

Let $\{p_{k}\}$ be generated with any of Strategies 1, 2 or 3, and set $f_{i}^{*}:=\inf\{f_{i}(q):~{}q\in{\cal M}\}$ , for $i\in{\cal I}$ . Suppose that $f_{i}^{*}$ is bounded from below for some $i\in{\cal I}$ , and define $i_{*}\in{\cal I}$ such that

[TABLE]

Then, for every $N\in\mathbb{N}$ , there holds

[TABLE]

where $\nu=1/2$ for Strategy 1, $\nu=\zeta$ for Strategy 2 and $\nu=\delta$ for Strategy 3.

Proof.

It follows from Lemma 13 that $\nu t_{k}\left\|v_{k}\right\|^{2}e\preceq F(p_{k})-F(p_{k+1})$ , for all $k=0,1,\ldots$ . By summing both sides of this inequality for $k=0,1,\ldots,N-1$ and using (21), we obtain

[TABLE]

Thus, by the definition of $i_{*}$ , we conclude from the last inequality that

[TABLE]

which implies the statement of the theorem. ∎

Remark 18.

It is worth mentioning that in the above result it was not necessary to use any hypothesis about convexity of $F$ and curvature of ${\cal M}$ .

Now we are going to prove that under the assumption of convexity Theorem 17 can be improved. We begin by presenting an auxiliary inequality.

Lemma 19.

Let $\{p_{k}\}$ be generated with any of Strategies 1, 2 or 3. Assume that $F$ is a convex function on $\mathcal{M}$ . Then, for $q\in{\cal A}$ and each $k$ , there exist $\mu_{j^{\prime}s}^{k}\geq 0$ satisfying $\sum_{j\in{\cal I}(v_{k})}\mu_{j}^{k}=1$ such that

[TABLE]

where $\rho$ is defined in (13).

Proof.

For each $k$ , let ${\gamma}_{k}:[0,\infty)\longrightarrow\mathbb{R}$ be defined by ${\gamma_{k}}(t)=\mbox{exp}_{p_{k}}\left(tv_{k}\right)$ and $\beta_{k}:[0,1]\rightarrow\mathcal{M}$ with $\beta_{k}(0)=~{}p_{k}$ and $\beta_{k}(1)=q$ be a minimizing geodesic. Using (5) and the convexity of $F$ we conclude that exist $\mu_{j^{\prime}s}^{k}\geq 0$ satisfying $\sum_{j\in{\cal I}(v_{k})}\mu_{j}^{k}=1$ such that

[TABLE]

Applying the second inequality of Lemma 1 with $\beta=\beta_{k}$ , $\gamma=\gamma_{k}$ and $t=t_{k}$ and using the last inequality we obtain

[TABLE]

Since $(0,+\infty)\ni t\mapsto t/\tanh(t)$ and $(0,+\infty)\ni t\mapsto\psi(t):=\sinh(t)/t$ are increasing, taking into account that (13) implies $t_{k}\left\|v_{k}\right\|\leq\sqrt{\rho}$ , and using (16), the inequality (23) becomes

[TABLE]

Therefore, due to $f_{j}(q)-f_{j}(p_{k})\leq 0$ and $\psi$ be bounded from below by $1$ , the inequality (22) follows by using (15), which concludes the proof. ∎

The next result, with minor adjustments, is a generalization of [15, Theorem 4.1] to Riemannian setting, when the Armijo’s type strategy is used.

Proposition 20.

Let $\{p_{k}\}$ be generated with any of Strategies 1, 2 or 3. Assume that $F$ is a convex function on $\mathcal{M}$ and $q\in{\cal A}$ . Then, for every $N\in\mathbb{N}$ , there are non-negative numbers $\lambda_{1},\ldots,\lambda_{m}$ with $\sum_{i=1}^{m}\lambda_{i}=1$ , satisfying

[TABLE]

where $\rho$ is defined in (13).

Proof.

Since $f_{i}(p_{k})-f_{i}(q)\geq 0$ for all $i$ , Lemma 19 and (21) implies there exist $\mu_{i^{\prime}s}^{k}\geq 0$ such that

[TABLE]

and $\sum_{i=1}^{m}\mu_{i}^{k}=1$ , where for each $k$ , define $\mu_{i}^{k}:=0$ for all $i\notin{\cal I}(v_{k})$ . By summing both sides of this inequality for $k=0,1,\ldots,N-1$ , and using (13) follows

[TABLE]

Since ${f_{i}(p_{k})}$ is a decreasing sequence for each $i\in\{1,\ldots,m\}$ , by some algebraic manipulations in the previous inequality we have

[TABLE]

Defining $\lambda_{i}:=\sum_{k=0}^{N-1}\mu^{k}_{i}/N$ we obtain the inequality in (24). To complete the proof, we have show that $\sum_{i=1}^{m}\lambda_{i}=1$ . For that, it is sufficient to note that

[TABLE]

and $\sum_{i=1}^{m}\mu_{i}^{k}=1$ for each $k$ . ∎

Finally we are ready to present the main result of this section, namely, the improvement of Theorem 17. We remark that this result is new, even in Euclidean context.

Theorem 21.

Let $\{p_{k}\}$ be generated with any of Strategies 1, 2 or 3. Assume that $F$ is a convex function on $\mathcal{M}$ and $q\in{\cal A}$ . Then, for every $N\in\mathbb{N}$ , there holds

[TABLE]

where $\rho$ is defined in (13) and $\nu=1/2$ for Strategy 1, $\nu=\beta$ for Strategy 2 and $\nu=\delta$ for Strategy 3.

Proof.

Let $N\in\mathbb{N}$ and denote by ${\lceil N/2\rceil}$ the least integer that is greater than or equal to $N/2$ . It follows from Lemma 13 that $\nu t_{k}\left\|v_{k}\right\|^{2}e\preceq F(p_{k})-F(p_{k+1})$ , for all $k=0,1,\ldots$ . Thus, by summing both sides of this inequality for $k={\lceil N/2\rceil},\ldots,N$ and using (21), we obtain

[TABLE]

Hence, taking non-negative numbers $\lambda_{1},\ldots,\lambda_{m}$ as in the Proposition 20 and considering that $q\in{\cal A}$ , we conclude from the last inequality that

[TABLE]

Thus, from Proposition 20 and considering that $N/2\leq\lceil N/2\rceil$ it follows that

[TABLE]

Therefore, $\operatorname{min}\{\|v_{k}\|^{2}:~{}k=\lceil N/2\rceil,\ldots,N\}\leq 2(d^{2}(p_{0},q)+{\cal K}_{\rho,\kappa}^{q}\rho)/(\nu{\xi}^{2}N^{2})$ , which implies the desired inequality. ∎

4 Examples

In this section we present some examples to illustrate the results obtained in previous sections. In particular, we will present some examples of convex vectorial functions such that its Riemannian Jacobian is componentwise Lipschitz continuous.

Example 22.

Let ${\mathbb{P}}^{n}_{++}$ be the cone of symmetric positive definite matrices. Define the vectorial function $F(X)=\left(f_{1}(X),\ldots,f_{m}(X)\right)$ , where $f_{i}:{\mathbb{P}}^{n}_{++}\longrightarrow\mathbb{R}$ is given by

[TABLE]

$a_{i},b_{i},c_{i},d_{i}\in{\mathbb{R}_{++}}$ * with $d_{i}<a_{i}b_{i}$ for all $i=1,\ldots,m$ . Endowing ${\mathbb{P}}^{n}_{++}$ with the Riemannian metric given by*

[TABLE]

where $\mbox{tr}(X)$ denotes the trace of $X\in{\mathbb{P}}^{n}$ , we obtain a Riemannian manifolds $\mathcal{M}:=({\mathbb{P}}^{n}_{++},\langle\cdot,\cdot\rangle)$ with nonpositive sectional curvature, see [43, Theorem 1.2. p. 325]. In ${\cal M}$ , $f_{i}$ is convex and has Lipschitz gradient with constant $L_{i}\leq a_{i}b_{i}^{2}n$ , for each $i=1,\ldots,m$ , see [16, example 4.5]. Hence, from Definition 4 the Jacobian $\nabla F$ is componentwise Lipschitz continuous with constant $L\leq n\operatorname{max}\{a_{1}b_{1}^{2},\ldots,a_{m}b_{m}^{2}\}$ . In $\mathcal{M}$ , the exponential mapping $\exp_{X}:T_{X}\mathcal{M}\to\mathcal{M}$ , is given by

[TABLE]

Therefore, from Corollary 16 we can apply Algorithm 1 with Strategies 1, 2 or 3 to find weak optimal Pareto of $F$ .

In the following we present, without giving the details, one more example of convex vectorial function with Lipschitz gradients in the Riemannian manifolds $\mathcal{M}:=({\mathbb{P}}^{n}_{++},\langle\cdot,\cdot\rangle)$ .

Example 23.

Let $F(X)=\left(f_{1}(X),\ldots,f_{m}(X)\right)$ be a vectorial function, where $f_{i}:{\mathbb{P}}^{n}_{++}\to\mathbb{R}$ is defined by

[TABLE]

$a_{i},b_{i}\in{\mathbb{R}_{++}}$ * for all $i=1,\ldots,m$ . In $\mathcal{M}:=({\mathbb{P}}^{n}_{++},\langle\cdot,\cdot\rangle)$ , $f_{i}$ is convex and has Lipschitz gradient with constant $L_{i}\leq 2a_{i}\sqrt{n}$ , for each $i=1,\ldots,m$ , [16, example 4.4]. The Jacobian $\nabla F$ is componentwise Lipschitz continuous with constant $L\leq 2\sqrt{n}\operatorname{max}\{a_{1},\ldots,a_{m}\}$ .*

Now, we present some preliminaries results to study examples of convex vectorial functions with componentwise Lipschitz continuous Riemannian Jacobians. We begin with a result that, with some adjustments in the notation, can be found in [44, Lemma 2].

Lemma 24.

Let ${\bar{\mathcal{M}}}$ and ${\mathcal{M}}$ be Riemannian manifold, ${\bar{\nabla}}$ be the Levi-Civita connection associated to ${\bar{\mathcal{M}}}$ and $\varphi:{\bar{\mathcal{M}}}\rightarrow{\mathcal{M}}$ be an isometry. Then, ${\nabla}:{\cal X}(\mathcal{M})\times{\cal X}(\mathcal{M})\rightarrow{\cal X}(\mathcal{M})$ defined by

[TABLE]

is the Levi-Civita connection associated to ${\bar{\mathcal{M}}}$ , where ${\bar{V}}=\mbox{d}\varphi^{-1}{V}$ and ${\bar{U}}=\mbox{d}\varphi^{-1}{U}$ .

Proof.

Let $f$ be continuously differentiable, ${V}$ and ${U}$ be vector fields in ${\mathcal{M}}$ . Since $\varphi$ is a diffeomorphism, ${f}\circ\varphi$ is continuously differentiable, ${\bar{V}}=\mbox{d}\varphi^{-1}{V}$ and ${\bar{U}}=\mbox{d}\varphi^{-1}{U}$ are vector fields in ${\bar{\mathcal{M}}}$ . Thus, we can prove that (27) satisfies [38, equations (1.9), (1.10), (1.11) and (1.12) on page 27 and 28] and therefore is the Levi-Civita connection associated to $\mathcal{M}$ . ∎

The next result is the main tool used in the following examples.

Theorem 25.

Let ${\mathcal{M}}$ and ${\bar{\mathcal{M}}}$ be Riemannian manifolds, $f:{\mathcal{M}}\rightarrow{\mathbb{R}}$ be a twice-differentiable function and $\varphi:{\bar{\mathcal{M}}}\rightarrow{\mathcal{M}}$ be an isometry. Then, $f$ has gradient vector field Lipschitz continuous with constant $L\geq 0$ if, and only if, $g:{\bar{\mathcal{M}}}\rightarrow{\mathbb{R}}$ defined by $g:=f\circ\varphi$ , has gradient vector field Lipschitz continuous with constant $L\geq 0$ .

Proof.

Let ${\bar{V}}\in{\cal X}(\bar{\mathcal{M}})$ and set ${V}(\varphi(q))=\mbox{d}\varphi(q){\bar{V}}(q)$ . Thus, by using the definition of the gradient vector field and the chain rule, we have

[TABLE]

Taking into account that $\varphi$ is an isometry and ${V}(\varphi(q))=\mbox{d}\varphi(q){\bar{V}}(q)$ , we obtain that

[TABLE]

Hence, combining the two above equality we conclude that $\operatorname{grad}f(\varphi(q))=\mbox{d}\varphi(q)\operatorname{grad}g(q)$ . Moreover, the definition of the hessian of $f$ together with Lemma 24 yield

[TABLE]

which implies that $\mbox{hess}\,f(\varphi(q))\mbox{d}\varphi(q)=\mbox{d}\varphi(q)\mbox{hess}\,g(q)$ . Then, using again that $\varphi$ is an isometry, we have $\|\mbox{hess}\,f(\varphi(q))\|=\|\mbox{hess}\,g(q)\|.$ Therefore, by using Lemma 3 the results follows. ∎

The next result is an important property of isometries, its prove is in [45, Proposition 5.6.1, p. 196].

Proposition 26.

Let ${\mathcal{M}}$ and ${\bar{\mathcal{M}}}$ be complete Riemannian manifolds. If $\varphi:{\bar{\mathcal{M}}}\rightarrow{\mathcal{M}}$ is a isometry and $\gamma$ is a geodesic in ${\bar{\mathcal{M}}}$ , then $\varphi\circ\gamma$ is a geodesic in ${\mathcal{M}}$ .

The following result is a straight consequence of the definition of isometry and Proposition 26.

Theorem 27.

Let ${\mathcal{M}}$ , ${\bar{\mathcal{M}}}$ be Riemannian manifold and $\varphi:{\bar{\mathcal{M}}}\rightarrow{\mathcal{M}}$ an isometry. The function $g:{\mathcal{M}}\rightarrow{\mathbb{R}}$ is convex if and only if $f:{\bar{\mathcal{M}}}\rightarrow{\mathbb{R}}$ , defined by $f(p)=(g\circ\varphi)(p)$ , is convex.

In the next example we change the metric of the Euclidean space $\mathbb{R}^{n}$ to prove, in particular, that the extended Rosenbrock’s banana function is convex and has gradiente Lipschitz in $\mathbb{R}^{n}$ with this new metric. It is worth to pointed out that the convexity of this function in two dimension has been established in [25, p. 83].

Example 28 (Rosenbrock’s banana function class).

Let $f_{j}:\mathbb{R}^{2n}\to\mathbb{R}$ be a variant of the Rosenbrock’s banana function, defined by

[TABLE]

for $j=1,\ldots,m$ . Denote ${\bar{\mathcal{M}}}$ as the Euclidean space $\mathbb{R}^{2n}$ with the usual metric. It is well known that $f_{j}$ is non-convex and its gradient is non-Lipschitz continuous in ${\bar{\mathcal{M}}}$ . Endowing $\mathbb{R}^{2n}$ with the new Riemannian metric $\langle u,v\rangle:=u^{T}G(x)v$ , where $u,v\in\mathbb{R}^{2n}$ and $G(x)$ is the $2n\times 2n$ block diagonal matrix $G(x)=\operatorname{diag}(G_{1}(x),\ldots,G_{n}(x))$ , where the blocks are given by

[TABLE]

and $x=(x_{1},\ldots,x_{2n})$ , we obtain a Riemannian manifold ${\mathcal{M}}:=(\mathbb{R}^{2n},G)$ . Taking into account that the function $\varphi:{\bar{\mathcal{M}}}\to{\mathcal{M}}$ defined by

[TABLE]

is an isometry, the Riemannian manifolds ${\mathcal{M}}$ is complete and has constant seccional curvature $K=0$ . On the other hand, $g_{j}:{\bar{\mathcal{M}}}\to\mathbb{R}$ defined by

[TABLE]

is a quadratics function, which is convex with gradient vector field Lipschitz in ${\bar{\mathcal{M}}}$ with constant $L_{j}:=\operatorname{max}\{2,2a_{1j},\ldots,2a_{nj}\}$ . Therefore, Theorem 27 and Theorem 25 imply, respectively, that $f_{j}$ is also convex and has gradient vector field Lipschitz continuous, with constant $L_{j}$ , in ${\mathcal{M}}$ . Let $F=\left(f_{1},\ldots,f_{m}\right)$ be the Rosenbrock’s banana vectorial function. Hence, $F$ is convex and Definition 4 implies that $\nabla F$ is componentwise Lipschitz continuous with constant $L=\operatorname{max}\{2,2a_{11},\ldots 2a_{nm}\}$ . The gradient of $f_{j}$ is given by $\operatorname{grad}f_{j}(x)=G(x)^{-1}f^{\prime}_{j}(x)$ , where $f^{\prime}_{j}$ is the usual gradient of $f_{j}$ . Given $z\in{\bar{\mathcal{M}}}$ the exponential map in ${\bar{\mathcal{M}}}$ , ${\overline{\exp}}_{z}:T_{z}{\bar{\mathcal{M}}}\rightarrow{\bar{\mathcal{M}}}$ , is given by ${\overline{\exp}}_{z}({\bar{v}})=z+{\bar{v}}$ . Since $\varphi$ is an isometry, Proposition 26 implies that the exponential map in ${\mathcal{M}}$ , $\exp_{x}:T_{x}{\mathcal{M}}\to{\mathcal{M}}$ , is given by $\exp_{x}(v)=\varphi(\varphi^{-1}(x)+{\mbox{d}\varphi}^{-1}(x)v).$ Thus, due to $\varphi^{-1}(x)=(x_{1},x_{1}^{2}-x_{2},\ldots,x_{2n-1},x_{2n-1}^{2}-x_{2n})$ and ${\mbox{d}\varphi}^{-1}(x)v=(v_{1},2x_{1}v_{1}-v_{2},\ldots,v_{2n-1},2x_{2n-1}v_{2n-1}-~{}v_{2n})$ , we obtain that

[TABLE]

where $x:=(x_{1},\ldots,x_{2n})$ and $v:=(v_{1},\ldots,v_{2n})$ .

We end this section by presenting, in particular, a family of vectorial functions in positive orthant $\mathbb{R}_{+}^{n}$ that are not convex and their gradients are not componentwise Lipschitz continuous. However, by a suitable change of the metric of $\mathbb{R}_{+}^{n}$ the functions of that family are convex and have componentwise Lipschitz continuous gradients on this new Riemannian manifold.

Example 29.

Let $f_{j}:\mathbb{R}^{n}_{++}\to\mathbb{R}$ be defined by

[TABLE]

where $x:=(x_{1},\ldots,x_{n})\in\mathbb{R}^{n}_{++}$ , $u_{j}:=(u_{1j},\ldots,u_{nj})^{T}\in\mathbb{R}_{+}^{n}$ , $w_{j}:=(w_{1j},\ldots,w_{nj})^{T}\in\mathbb{R}_{+}^{n}$ and $a_{j},b_{j},c_{j}\in\mathbb{R}_{++}$ , for all $j=1,\ldots,m$ . Denote ${\bar{\mathcal{M}}}$ as the Euclidean space $\mathbb{R}^{n}$ with the usual metric. The function $f$ is in general non-convex and its gradient is non-Lipschitz in ${\bar{\mathcal{M}}}$ . Endowing $\mathbb{R}^{n}_{++}$ with the new Riemannian metric $\langle u,v\rangle:=u^{T}G(x)v$ , where $u,v\in T_{x}{\mathcal{M}}$ and $G(x)$ is the $n\times n$ diagonal matrix

[TABLE]

we obtain the Riemannian manifold ${\mathcal{M}}:=(\mathbb{R}_{++}^{n},G)$ . Since $\varphi:{\bar{\mathcal{M}}}\to{\mathcal{M}}$ defined by

[TABLE]

is an isometry, then ${\mathcal{M}}$ is complete and has constant seccional curvature $K=0$ . The function $g_{j}:{\bar{\mathcal{M}}}\to\mathbb{R}$ defined by

[TABLE]

is convex and its gradient is Lipschitz in ${\bar{\mathcal{M}}}$ with constant $L_{j}\leq a_{j}u_{j}^{T}u_{j}/b_{j}+2c_{j}$ . Thus, Theorem 27 and Theorem 25 imply, respectively, that $f_{j}$ is also convex and has gradient Lipschitz in ${\mathcal{M}}$ with constant $L_{j}$ . Therefore, the vectorial function $F(x)=\left(f_{1}(x),\ldots,f_{m}(x)\right)$ is convex and Definition 4 implies that $\nabla F$ is componentwise Lipschitz continuous with constant $L=\operatorname{max}\{L_{1},\ldots,L_{m}\}$ . The gradient of $f_{j}$ is given by

[TABLE]

where $\operatorname{diag}(x):=\operatorname{diag}(x_{1},\ldots,x_{n})$ and $f_{j}^{\prime}$ is the usual derivative. Using the isometry (30) Proposition 26 implies that the exponential map in ${\mathcal{M}}$ , $\exp_{x}:T_{x}{\mathcal{M}}\to{\mathcal{M}}$ , is given by $\exp_{x}(v)=\varphi(\varphi^{-1}(x)+{\mbox{d}\varphi}^{-1}(x)v).$ Since $\varphi^{-1}(x)=\left(\ln{x_{1}},\dots,\ln{x_{n}}\right)$ and ${\mbox{d}\varphi}^{-1}(x)v=(x_{1}^{-1}v_{1},\ldots,x_{n}^{-1}v_{n})$ , where $v=(v_{1},\ldots,v_{n})$ , we have

[TABLE]

5 Numerical experiments

In order to illustrate the applicability of our proposal, we implemented Algorithm 1 with the Armijo-type stepsize and tested it in the functions of the examples in Section 4. Without attempting to go into details, we mention that the Armijo-type line search sketched out in Strategy 3 was coded based on (quadratic) polynomial interpolations of the coordinate functions. We refer the reader to [46] for a careful discussion about line search strategies for vector optimization problems. We set $\delta=10^{-4}$ , $t_{\operatorname{min}}=10^{-2}$ , $t_{\operatorname{max}}=10^{2}$ , $\omega_{1}=0.05$ , and $\omega_{2}=0.95$ . Given a Riemannian manifold $\mathcal{M}$ , the steepest descent direction $v_{p}$ at a non-critical point $p\in\mathcal{M}$ as in (4) can be calculated by solving for $\lambda\in\mathbb{R}$ and $u\in T_{p}\mathcal{M}$ the following differentiable problem

[TABLE]

which is a convex quadratic problem with linear inequality constraints, see [8]. In our implementation, for calculating $v_{p}$ , we solve problem (31) using Algencan [47], an augmented Lagrangian code for general nonlinear programming.

We stopped the execution of the algorithm at $p_{k}$ declaring convergence if

[TABLE]

where ${\cal I}=\{1,\ldots,m\}$ , and eps denotes the machine precision given. In our experiments we used $\texttt{eps}=2^{-52}\approx 2.22\times 10^{-16}$ . We point out that this convergence criterion was proposed in the numerical tests of [48] and also used in [16, 1]. The maximum number of allowed iterations was set to 10000. Codes are written in double precision Fortran 90 and are freely available at https://orizon.ime.ufg.br/.

5.1 Rosenbrock’s Problem

We start the numerical experiments by verifying the practical behavior of Algorithm 1 in a small instance of the Rosenbrock’s problem given by the functions in Example 28. We considered $n=1$ , $m=2$ in (28), and set $F(x)=(f_{1}(x),f_{2}(x))$ where

[TABLE]

Functions $f_{1}$ and $f_{2}$ have global minimizers at $x^{*}=(1,1)$ and $\hat{x}=(2,4)$ , respectively. Note that $f_{1}(x^{*})=f_{2}(\hat{x})=0$ and $f_{1}(\hat{x})=f_{2}(x^{*})=1$ . Figure 1(a) shows a representation of the image set of $F(x)$ around the Pareto front, obtained by discretizing the square $[-5,5]\times[-5,5]$ by a fine grid and plotting all the image points. We run the algorithm 1000 times using starting points from a uniform random distribution belonging to $(-5,5)\times(-5,5)$ . In all instances, the Algorithm 1 stopped at a point satisfying the convergence criterion. Figure 1(b) shows the image set of all final iterates. Thus, given a reasonable number of starting points, Algorithm 1 was able to estimate the Pareto front of the considered Rosenbrock’s problem. The value space generated by the Riemannian gradient method using others 200 random starting points with image belonging to the box $(0,4)\times(0,4)$ can be seen in Figure 2(a). A full point represents a final iterate whereas the beginning of a straight segment represents the corresponding starting point.

For comparative purposes, we implemented and tested the Euclidean gradient method for minimizing (32)–(33). In summary, the Euclidean method corresponds to Algorithm 1 with the usual inner product and the exponential map given by $\mbox{exp}_{x}(v)=x+v$ . We point out that an equivalent Armijo-type line search employed in the Riemannian case was coded in the Euclidean algorithm. We also run the Euclidean algorithm using the same 1000 starting points belonging to $(-5,5)\times(-5,5)$ considered for the Riemannian algorithm. For each method, Table 1 reports the percentages of runs that has reached a critical point ( $\%$ ) and, for the successful runs, the median of number of iterations (it), the median of functions evaluations (evalf), and the median of gradient evaluations (evalg). Thus, the reported data in Table 1 represents a typical run of the Riemannian and the Euclidean algorithms. It is worth noting that we considered each evaluation of a coordinate function (resp. gradient) in the calculation of evalf (resp. evalg). Note that the number of steepest descent direction calculations is equal to the number of iterations.

As can be seen in Table 1, in the considered Rosenbrock’s problem, the Riemannian algorithm is much superior to the Euclidean one. The introduction of a suitable metric that makes $F$ convex with componentwise Lipschitz continuous Jacobian enabled a huge reduction in computational cost to solve the problem. Figure 2(b) shows a typical behavior of the methods on the Rosenbrock’s problem (32)–(33). For each method, we plotted the image set of the generated sequence for the particular case where the starting point is $(0.5,0.2)$ . The convergence criterion was satisfied with 25 and 1585 iterations for the Riemannian and Euclidean gradient methods, respectively. Due to the small steps sizes performed by the Euclidean method (typically of the order of $10^{-3}$ ), the corresponding path illustrated in the Figure 2(b) appears to be a continuous segment. In its turn, the Riemannian method quickly approaches the Pareto front.

5.2 Example in the Positive Orthant

Now we consider the application of Algorithm 1 for minimizing the vector function $F(x)=(f_{1}(x),\ldots,f_{m}(x))$ where $f_{j}(x)$ is given by (29). Note that for the Riemannian manifold ${\mathcal{M}}=(\mathbb{R}_{++}^{n},G)$ and $x\in\mathbb{R}_{++}^{n}$ , the tangent space $T_{x}{\mathcal{M}}$ corresponds to $\mathbb{R}^{n}$ . Thus, problem (31) to calculate $v_{x}$ is directly posed as a quadratic programming problem.

Since in the previous section we solved only a small Rosenbrock’s problem, we now consider larger instances of the problem related to Example 29. First, we kept the number of objectives equal to two and varied the dimension of the space assigning the following values: $n=10$ , $100$ , $400$ , and $1000$ . In the second set of tests, we set $n=100$ and varied the number of objectives taking $m=10$ , $20$ , $100$ , and $200$ . All the parameters of each function $f_{j}$ in (29) were random generated belonging to $(0,1)$ . Each problem instance was solved 20 times using starting points from a uniform random distribution inside the box $(0,10)^{n}$ . The results in Table 2 are given in the same form as Table 1.

The highlight of Table 2 is that Algorithm 1 was robust with respect to the dimension and to the number of objectives, which is consistent with the theoretical results. The results of the present section suggest that Algorithm 1 is potentially able to solve large problems. Surprisingly, for the first set of problems, a fewer number of function/gradient evaluations were required for the case where $n=1000$ compared to smaller instances of the problem.

5.3 Example in the Cone of Symmetric Positive Definite Matrices

Let $\mathcal{M}$ be the Riemannian manifold $({\mathbb{P}}^{n}_{++},\langle\cdot,\cdot\rangle)$ , where the inner product is defined as in Example 22. For $X\in\mathbb{P}_{++}^{n}$ , the tangent space $T_{X}{\mathcal{M}}$ corresponds to the set of the symmetric matrices ${\mathbb{P}}^{n}$ . In our implementation, in order to compute the steepest descent direction, in addition to $\lambda$ , the unknowns of problem (31) are the $(n^{2}+n)/2$ entries of the lower triangular part of the symmetric matrix $u$ .

Given $X\in{\mathbb{P}}^{n}_{++}$ and $V\in{\mathbb{P}}^{n}$ , direct calculations shows that the exponential map in (26) can be rewritten as $\exp_{X}(V)=Xe^{X^{-1}V}$ . For computing the inverse of matrix $X$ , we used the LAPACK routine dpotri which uses the Cholesky factorization of $X$ . For computing matrix exponentials, we used dgpadm routine of EXPOKIT package [49]. It should be noted that dpotri and dgpadm are dense routines.

We considered bicriteria and three-criteria problem instances related to Example 22. The parameters of function (25) were randomly generated belonging to $(0,1)$ . For each instance, we run the Riemannian gradient method 20 times using random starting points with eigenvalues belonging to the interval $(0,100)$ . The results in Table 3 show that Algorithm 1 solved all the instances with a moderate computational effort. It is worth mentioning that in a typical iteration, the first trial step size of Strategy 3 defined by

[TABLE]

satisfies the sufficient descent condition (11). Indeed, as it can be seen Table 3, the values reported in evalf columns are slightly greater than the corresponding number of iterations times the number of objectives $m$ . We observe that the choice (34) corresponds to the safeguarded Shanno and Phua [50] recommendation and was first proposed in the multiobjective optimization setting in [1].

Finally, we report that Algorithm 1 converges with a single iteration when applied to instances of Example 23. The considered metric makes it possible to explore the structure of the problem turning it into a trivial problem from the Riemannian perspective.

6 Conclusions

In this paper, the behavior of the steepest descent method for multiobjective optimization on Riemannian manifolds with lower bounded sectional curvature is analyzed. It would be interesting to study stochastic versions of this method. An interesting question to be also investigated is the extension and analysis of subgradient method in this new setting.

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

11. Lucambio Pérez, L.R., Prudente, L.F.: Nonlinear conjugate gradient methods for vector optimization. SIAM J. Optim. 28 (3), 2690–2720 (2018).
22. Gonçalves, M.L.N., Prudente, L.F.: On the extension of the Hager-Zhang conjugate gradient method for vector optimization. Technical report pp. 1–19 (2018).
33. Bento, G.C., Cruz Neto, J.X., López, G., Soubeyran, A., Souza, J.C.O.: The proximal point method for locally Lipschitz functions in multiobjective optimization with application to the compromise problem. SIAM J. Optim. 28 (2), 1104–1120 (2018).
44. Montonen, O., Karmitsa, N., Mäkelä, M.M.: Multiple subgradient descent bundle method for convex nonsmooth multiobjective optimization. Optimization 67 (1), 139–158 (2018).
55. Carrizo, G.A., Lotito, P.A., Maciel, M.C.: Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem. Math. Program. 159 (1-2, Ser. A), 339–369 (2016).
66. Fliege, J., Vaz, A.I.F.: A method for constrained multiobjective optimization based on SQP techniques. SIAM J. Optim. 26 (4), 2091–2119 (2016).
77. Morovati, V., Pourkarimi, L., Basirzadeh, H.: Barzilai and Borwein’s method for multiobjective optimization problems. Numer. Algorithms 72 (3), 539–604 (2016).
88. Fliege, J., Svaiter, B.F.: Steepest descent methods for multicriteria optimization. Math. Methods Oper. Res. 51 (3), 479–494 (2000).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Iteration-complexity and asymptotic analysis of steepest descent method for multiobjective optimization on Riemannian manifolds

Abstract

1 Introduction

2 Notations and Auxiliary Concepts

Lemma 1**.**

Definition 2**.**

Lemma 3**.**

Definition 4**.**

Lemma 5**.**

Definition 6**.**

Theorem 7**.**

3 Steepest Descent for Multiobjective Optimization

Lemma 8**.**

Lemma 9**.**

Proof.

Lemma 10**.**

Strategy 1** (Lipschitz stepsize).**

Strategy 2** (adaptive stepsize).**

Remark 11**.**

Strategy 3** (Armijo-type stepsize).**

Remark 12**.**

3.1 Asymptotic Convergence Analysis

Lemma 13**.**

Proof.

Lemma 14**.**

Proof.

Theorem 15**.**

Proof.

Corollary 16**.**

Proof.

3.2 Iteration-Complexity Analysis

Theorem 17**.**

Proof.

Remark 18**.**

Lemma 19**.**

Proof.

Proposition 20**.**

Proof.

Theorem 21**.**

Proof.

4 Examples

Example 22**.**

Example 23**.**

Lemma 24**.**

Proof.

Theorem 25**.**

Proof.

Proposition 26**.**

Theorem 27**.**

Example 28** (Rosenbrock’s banana function class).**

Example 29**.**

5 Numerical experiments

5.1 Rosenbrock’s Problem

5.2 Example in the Positive Orthant

5.3 Example in the Cone of Symmetric Positive Definite Matrices

6 Conclusions

Lemma 1.

Definition 2.

Lemma 3.

Definition 4.

Lemma 5.

Definition 6.

Theorem 7.

Lemma 8.

Lemma 9.

Lemma 10.

Strategy 1 (Lipschitz stepsize).

Strategy 2 (adaptive stepsize).

Remark 11.

Strategy 3 (Armijo-type stepsize).

Remark 12.

Lemma 13.

Lemma 14.

Theorem 15.

Corollary 16.

Theorem 17.

Remark 18.

Lemma 19.

Proposition 20.

Theorem 21.

Example 22.

Example 23.

Lemma 24.

Theorem 25.

Proposition 26.

Theorem 27.

Example 28 (Rosenbrock’s banana function class).

Example 29.