The condition number of a function relative to a set
David H. Gutman, Javier F. Pena

TL;DR
This paper introduces a new concept of a relative condition number for convex functions with respect to a set, extending classical notions to constrained optimization and providing bounds and characterizations for specific function-set pairs.
Contribution
The paper defines a relative condition number for convex functions relative to a set, generalizing classical condition numbers and analyzing its properties and bounds in specific cases.
Findings
The relative condition number extends classical properties and characterizations.
Bounds are provided for functions of the form f = g â A relative to convex sets.
The relative condition number influences the convergence analysis of first-order methods.
Abstract
The condition number of a differentiable convex function, namely the ratio of its smoothness to strong convexity constants, is closely tied to fundamental properties of the function. In particular, the condition number of a quadratic convex function is the square of the aspect ratio of a canonical ellipsoid associated to the function. Furthermore, the condition number of a function bounds the linear rate of convergence of the gradient descent algorithm for unconstrained convex minimization. We propose a condition number of a differentiable convex function relative to a reference convex set and distance function pair. This relative condition number is defined as the ratio of a relative smoothness to a relative strong convexity constants. We show that the relative condition number extends the main properties of the traditional condition number both in terms of its geometric insight andâŠ
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
The condition number of a function relative to a set
David H. Gutman Department of Industrial, Manufacturing, and Systems Engineering, Texas Tech University, USA, [email protected] ââ
Javier F. Peña Tepper School of Business, Carnegie Mellon University, USA, [email protected]
Abstract
The condition number of a differentiable convex function, namely the ratio of its smoothness to strong convexity constants, is closely tied to fundamental properties of the function. In particular, the condition number of a quadratic convex function is the square of the aspect ratio of a canonical ellipsoid associated to the function. Furthermore, the condition number of a function bounds the linear rate of convergence of the gradient descent algorithm for unconstrained convex minimization.
We propose a condition number of a differentiable convex function relative to a reference convex set and distance function pair. This relative condition number is defined as the ratio of relative smoothness to relative strong convexity constants. We show that the relative condition number extends the main properties of the traditional condition number both in terms of its geometric insight and in terms of its role in characterizing the linear convergence of first-order methods for constrained convex minimization.
When the reference set is a convex cone or a polyhedron and the function is of the form , we provide characterizations of and bounds on the condition number of relative to in terms of the usual condition number of and a suitable condition number of the pair .
1 Introduction
Let be a convex differentiable function. The condition number of is the ratio where and are respectively the smoothness and strong convexity constants of the function . See Definition 1 and equation (9) below. The condition number is closely tied to a number of fundamental properties of the function . In the special case when is a quadratic convex function the condition number has the following geometric interpretation. Suppose where is non-singular. Then the condition number of is
[TABLE]
The latter quantity is the square of the aspect ratio of the ellipsoid since and are respectively the radius of the smallest ball that contains and the radius of the largest ball contained in .
The condition number also bounds the linear convergence rate of the gradient descent algorithm for the unconstrained minimization problem
[TABLE]
More precisely, for a suitable choice of step sizes the iterates generated by the gradient descent algorithm satisfy
[TABLE]
and
[TABLE]
where and . The articles [4, 8, 17, 21, 22, 23, 24], among others, discuss the above type of linear convergence and a number of interesting related developments. In particular, Necoara, Nesterov and Glineur [22] establish linear convergence properties for a wide class of first-order methods under assumptions that are relaxations of strong convexity.
Let be a convex differentiable function, be a convex set, and be a distance-like function, that is, and for all . We propose a relative smoothness constant and a relative strong convexity constant of the function relative to the pair . See Definition 2 and equation (8) below for details. We show that the relative condition number extends the above properties of the traditional condition number both in terms of its geometric insight and in terms of its role in characterizing the linear convergence of first-order methods for the constrained convex minimization problem
[TABLE]
As Example 1 illustrates, the relative condition number depends on the combination of the constraint set and the function . In particular, Example 1 shows that the relative condition number can be vastly different (both smaller or larger) than the usual condition number depending on how the shape of fits . Example 1 also shows that can be strictly positive in cases when . Our main results highlight deeper connections between the relative constants and geometric features of the set . In particular, when for some matrix and , and is conic or polyhedral, we provide characterizations of and bounds on and in terms of and and some condition properties of the pair .
We show that the relative condition number and some related quantities readily yield linear convergence rates for the mirror descent, Frank-Wolfe, and Frank-Wolfe with away steps algorithms for the constrained minimization problem (2). We should note that these linear convergence properties have been previously established in [3, 2, 18, 20, 13, 22, 24, 28, 32] under various kinds of assumptions. Our approach shows that all of these linear convergence results hinge on a similar type of relative conditioning. Our approach also reveals that several linear convergence results can be sharpened. We show that the linear convergence of the mirror descent algorithm (Proposition 6 and Proposition 7) holds for a sharper rate and under more general assumptions than those in [20, 32]. More precisely, Proposition 6 and Proposition 7 show that linear convergence holds under new conditions of relative quasi-strong convexity and relative functional growth that are typically weaker than the type of relative strong convexity assumed in [20, 32]. In contrast to the previous results in [3, 13], our linear convergence result for the Frank-Wolfe algorithm (Proposition 8) is stated in terms of an affine invariant relative condition number defined via a natural radial distance function. Our approach based on the relative condition number yields a proof of linear convergence for the Frank-Wolfe with away steps algorithm that is significantly shorter, simpler, and at least as sharp as or sharper than the ones previously presented in [2, 18, 28]. Unlike previous approaches, our proof of linear convergence of the Frank-Wolfe with away steps algorithm (Proposition 9) highlights some similarities with the proof of linear convergence of the regular Frank-Wolfe algorithm (Proposition 8). Like the results presented in [18, Appendix C and D], the linear convergence of the Frank-Wolfe with away steps algorithm (Proposition 9) is stated in terms of an affine invariant relative condition number.
The relative constants and are defined globally. In particular, they do not depend on any specific point in . We consider several variants of relative strong convexity following the constructions of Necoara, Nesterov and Glineur [22]. In particular, we define a relative quasi-strong convexity constant and a relative functional growth constant . See Definition 3 and equation (12). Unlike , the constants and depend on the set of minimizers of on . We show that relative quasi-strong convexity is a relaxation of relative strong convexity. We also show that under suitable assumptions relative functional growth is a relaxation of relative quasi-strong convexity. Not surprisingly, there are classes of non-strongly convex functions for which the constant is positive while and may not be. (See Theorem 4.)
Our work draws on and connects several seemingly unrelated threads of research on first-order methods [1, 2, 18, 20, 22, 28, 32] and on condition measures for convex optimization [10, 9, 12, 11, 19, 25, 27, 30, 31]. Our construction of and is inspired by and closely related to the work of Lu, Freund, and Nesterov [20] and of Bauschke, Bolte, and Teboulle [1, 32]. Lu et al. [20] extend the concepts of smoothness and strong convexity constants by considering them relative to a reference function , see [20, Definition 1.1 and 1.2]. Our construction is identical to theirs in the special case when the distance function is the Bregman distance function associated to a reference function and the function is strictly convex. Bauschke, Bolte, and Teboulle [1] define a concept of Lipschitz-like condition that is equivalent to smoothness relative to a reference function. As we detail in Section 5, our relative constants and are also identical to the curvature constant, away curvature constant and geometric strong convexity constant proposed by Jaggi [16] and by Lacoste-Julien and Jaggi in [18, Appendix C] for properly chosen distance-like functions . Our constructions of relative functional growth and relative quasi strong convexity are natural extensions of analogous concepts proposed by Necoara, Nesterov, and Glineur [22] to unveil relaxations of strong convexity that ensure the linear convergence of first-order methods. Our relative functional growth concept is in the same spirit as that of the quadratic functional growth approach used by Beck and Shtern [2] to established the linear convergence of a conditional gradient algorithm with away steps for non-strongly convex functions.
In contrast to the approaches in [2, 18, 20, 22, 28], our construction of the relative condition constants applies to any pair of reference set and distance function. Our main results (Section 3 and Section 4) reveal some interesting insights when is bounded by a squared norm. We establish a close connection between our relative conditioning approach and the conditioning of linear conic systems pioneered by Renegar [30, 31] and further developed by a number of authors [6, 10, 9, 12, 11, 19, 25, 27, 26]. We especially draw on ideas developed in the recent paper [26]. We note that consistent with our construction of the relative constants , all of our results concerning them scale appropriately, that is, they scale by whenever the objective function is replaced by for some constant . In particular, the relative condition number and all of our bounds on it are invariant under positive scaling of .
The main sections of the paper are organized as follows. Section 2 presents our central construction, namely relative smoothness and relative strong convexity. This section also introduces relative quasi strong convexity and relative functional growth, both of which are variants of relative strong convexity. Section 3 and Section 4 present the main technical results of the paper. Section 3 develops several properties of the constants and . More precisely, Proposition 2 gives an upper bound on when is of the form for some . Proposition 2(a) shows that the bound is tight. The more involved Theorem 1 and Theorem 2 give lower bounds on when is of the form and is a convex cone or a polyhedron. These bounds readily imply that for the relative condition number can be bounded in terms of the product of the classical condition number and a condition number of the pair . See equation (21) and equation (24). Corollary 1 and Corollary 2 show that the bounds in Theorem 1 and Theorem 2 are tight. Section 4 develops properties analogous to those in Section 3 but for the constants and . Section 5 details linear convergence results for the mirror descent algorithm, Frank-Wolfe algorithm, and Frank-Wolfe with away steps algorithm for problem (2). In all cases the linear convergence properties are stated in terms of the relative constants and for suitable choices of distance-like function . The main results in Section 5 can be summarized as follows. Consider the mirror descent algorithm for problem (2) with a Bregman distance associated to a reference function . Proposition 6 shows the following linear convergence result: if and then the mirror descent iterates satisfy
[TABLE]
for . Proposition 7 gives a linear convergence result of similar flavor when . The rates of convergence in both Proposition 6 and Proposition 7 are at least as sharp, and possibly much sharper, than those in [20, 32] and apply to a broader class of functions. In particular, as Example 7 in Section 4 shows, there are instances where occurs. In such instances Proposition 7 yields the linear convergence of mirror descent whereas the linear convergence results in [20, 32] do not apply.
Proposition 8 gives a strikingly similar linear convergence result for the Frank-Wolfe algorithm: suppose is a compact convex set endowed with a linear oracle and and for the radial distance function defined via (46). Proposition 8 shows that the Frank-Wolfe iterates satisfy
[TABLE]
This rate of convergence subsumes and is sharper than the previously known linear convergence results for the Frank-Wolfe algorithm in [13, 3].
Proposition 9 gives a result of similar flavor for the Frank-Wolfe with away steps algorithm: suppose is a polytope endowed with a vertex linear oracle, and and for the distance functions and defined via (49) and (51). Proposition 9 shows that if the Frank-Wolfe with away steps algorithm starts from a vertex in then the subsequent iterates satisfy
[TABLE]
This rate of convergence is at least as sharp, and possible much sharper, than the rates previously shown in [2, 18, 28].
Throughout the paper we define a number of new objects that are necessary for our main developments. To help the reader recall the definition and notation associated to these new objects, Table 1 displays the section and equation where each object is defined.
2 Conditioning relative to a reference set and distance function pair
This section presents the central ideas of this paper. We introduce the concepts of relative smoothness and relative strong convexity of a function relative to a reference set and distance function pair. We also introduce some variants of relative strong convexity that are natural extensions of the approach developed by Necoara, Nesterov and Glineur [22].
Throughout the entire paper we will make the following blanket assumption about the triple .
Assumption 1**.**
The function is convex and differentiable. The set is convex. The function is a reference distance-like function, that is, for all and for all .**
Throughout our developments we will consider the following classes of reference distance-like functions:
- âą
The Bregman distance associated to a reference convex differentiable function , that is,
[TABLE]
- âą
The square of a (non-necessarily Euclidean) norm in , that is,
[TABLE]
- âą
The square of the radial distance function defined as follows
[TABLE]
Notice that the function coincides with the gauge function of the set on . Figure 1 illustrates the level sets defined by for .
- âą
The square of the diametral distance function defined as follows
[TABLE]
Figure 2 illustrates the level sets defined by the diametral distance for .
Our main construction is based on bounding the behavior of the Bregman distance associated to in terms of the reference distance function . The following set-valued mapping provides a key building block for our construction. For let denote the set
[TABLE]
It is easy to see that can also be written as
[TABLE]
Observe that if is strictly convex then for all . The set captures the largest convex subset of that includes and where fails to be strictly convex. In particular, when is of the form for and strictly convex, it is easy to see that . We will further discuss functions of this form in Section 3 and Section 4. To illustrate the set-valued mapping in a different example, consider the function defined as
[TABLE]
where In this case
[TABLE]
2.1 Relative smoothness and relative strong convexity
To motivate our main construction we first recall the classical notion of smoothness and strong convexity constants. We recall these classical concepts in a format that we subsequently use for our main construction. Recall that for a convex differentiable function and the Bregman distance is
[TABLE]
Definition 1**.**
Suppose is convex and differentiable and for some norm in .
- (a)
The function is smooth for the norm if there exists a constant such that
[TABLE]
- (b)
The function is strongly convex for the norm if there exists a constant such that
[TABLE]
Next, we present our main construction. In Definition 2 and throughout the paper we will use the following notational convention. For a nonempty and let and denote and respectively.
Definition 2**.**
Let satisfy Assumption 1.
- (a)
We say that is smooth relative to if there exists a constant such that
[TABLE]
- (b)
We say that is strongly convex relative to if there exists a constant such that
[TABLE]
When for some convex differentiable function , the above relative smoothness concept is identical to the smoothness of relative to on as defined in [20]. The latter in turn is equivalent to the Lipschitz-like condition defined in [1]. Furthermore, when and is strictly convex, the above relative strong convexity concept is identical to the strong convexity of relative to on as defined in [20]. We note that as in [20], the above definitions (6) and (7) are not symmetric in and since they depend on and which are not necessarily symmetric. Observe that the term instead of in (7) makes this definition of relative strong convexity less stringent than the classical one (5) or the one in [20]. This is a key feature of our construction.
We will use the following notation throughout the rest of the paper. Suppose satisfies Assumption 1. Let and be the following relative smoothness and strong convexity constants
[TABLE]
In addition, suppose is convex and differentiable and for some norm in . Let and be the following classical smoothness and strong convexity constants
[TABLE]
The following example illustrates the values of the relative smoothness and strong convexity constants and of a convex quadratic function relative to for some canonical choices of and . Example 1 highlights that the relative constants and depend on the combination of the constraint set and the function . In particular, Example 1 shows that the relative condition number can be vastly different (both smaller or larger) than the usual condition number depending on how the shape of fits . Example 1 also lays the ground for the main properties that we develop in Section 3.
Example 1**.**
Let with and and be endowed with the Euclidean norm. Let and Then has the following smoothness and strong convexity constants and relative to for some particular choices of .
- (a)
For we have and , where denotes the smallest positive singular value. Observe that in this case but only when is full column rank.
- (b)
Suppose is a linear subspace such that the mapping defined via is nonzero. Then and . Observe that in this case and can be quite a bit smaller. Likewise, and can be quite a bit larger.
For instance, suppose for some positive with . If then
[TABLE]
In this case we have .
- (c)
Suppose . In this case . On the other hand, if then is the following kind of squared signed smallest singular value of
[TABLE]
where and denote the unit balls in and respectively. In other words, is the square of the radius of the largest ball centered at zero and contained in . Observe that if and then and can be quite a bit smaller. For instance, if for then
[TABLE]
In this case we have .
The statements (a), (b), and (c) in Example 1 can be verified directly but they also follow from the more general Proposition 2, Corollary 1, and Corollary 2 in Section 3 below.
2.2 Relative quasi strong convexity and relative functional growth
Following [22], we next consider two variants of relative strong convexity that are natural extensions of the quasi-strong convexity and quadratic functional growth concepts defined in [22]. For that purpose, we will rely on the following strengthening of Assumption 1.
Assumption 2**.**
Suppose satisfy Assumption 1, is finite, , and the map is well defined for all . **
Definition 3**.**
Suppose satisfies Assumption 2.
- (a)
We say that is quasi-strongly-convex relative to if there exists a constant such that
[TABLE]
- (b)
We say that has -relative functional growth on if there exists a constant such that
[TABLE]
Throughout the sequel we will use the following notation analogous to (8). Suppose satisfies Assumption 2. Let and be as follows
[TABLE]
The next proposition shows that, as one may intuitively expect, relative quasi-strong convexity is a relaxation of relative strong convexity. In other words, whenever satisfies Assumption 2.
Proposition 1**.**
Suppose satisfy Assumption 2. If is such that satisfies (7) then satisfies (10).
Proof.
The construction of implies that for all . Therefore, if satisfies (7) then by taking it follows that
[TABLE]
â
The following simple example shows that, perhaps contrary to what one might intuitively expect, relative functional growth is not necessarily a relaxation of strong relative convexity unless some additional assumptions are made about or .
Example 2**.**
Let and be the function . For we have . Thus for and the tuple satisfies (7). However, observe that for all and
[TABLE]
In particular, does not satisfy (11) for any . **
It can be shown that under additional assumptions on or the relative functional growth condition is a relaxation of the relative strong convexity condition. In particular, relative functional growth is a relaxation of relative strong convexity when is a squared norm as we discuss in Section 4 below.
3 Properties of and when
is of the form
This section develops some properties of the relative constants and when is of the form for and and is bounded in terms of some norm in . The main results of this section are Theorem 1 and Theorem 2. These results provide lower bounds on in terms of and the norms of some canonical set-valued mappings that depend on and . In a similar vein, Proposition 2 gives an upper bound on in terms of and the norm of a canonical mapping associated to and .
We will rely on the objects and defined next. For nonempty and let
[TABLE]
The set-valued mapping can be seen as an extension of the set-valued mapping introduced in Section 2.1.
For and a convex cone let be the set-valued mapping defined via
[TABLE]
and let be its inverse, that is,
[TABLE]
Suppose and are endowed with norms. Define the norms of and of as follows
[TABLE]
Observe that if and is a convex set that contains more than one point then
[TABLE]
where denotes the linear subspace spanned by , that is,
[TABLE]
In particular, the following property of the relative smoothness constant readily follows.
Proposition 2**.**
Let and be a convex set that contains more than one point.
(a)
If is endowed with the Euclidean norm, for some norm in , and for some then
[TABLE]
(b)
Suppose are endowed with norms and for the norm in . If where is smooth for the norm in then
[TABLE]
Proof.
(a)
This follows from (17) and .
(b)
This follows from (17) and . The latter inequality follows from the smoothness of .
â
We next discuss far more interesting results that either characterize or lower bound the relative strong convexity constant .
3.1 Lower bound on when is a convex cone and is a linear subspace
In this subsection we will consider the special case when is a convex cone and is such that is a linear subspace of . The latter condition is equivalent to the following Slater condition: there exists such that , where denotes the relative interior of . When this is the case, the norms and have the following geometric interpretation. Let and denote the unit balls in and respectively. It is easy to see that if is a convex cone and is a linear subspace then
[TABLE]
and
[TABLE]
In other words, is the radius of the smallest ball in centered at the origin that contains . Similarly, is the radius of the largest ball in centered at the origin and that is contained in . Example 3 illustrates this geometric interpretation of and in a simple instance.
Example 3**.**
Let for and . Let be endowed with the Euclidean norm and let be endowed with the norm. In this case and
[TABLE]
Therefore and as Figure 3 illustrates.
The above norms, especially and other related quantities, have been extensively studied in the literature on condition measures for convex optimization [6, 9, 11, 27, 31, 30]. They have been further extended to the broader variational analysis context [19, 7]. In particular, when the family of conic systems is well-posed. That is, for all the conic system is feasible and remains so for sufficiently small perturbations of . In this case it follows from [31] that the quantity is precisely the distance to ill-posedness introduced by Renegar [30, 31], that is, the size of the smallest perturbation on so that the conic system is infeasible for some . A similar identity holds for the distance to non-surjectivity of closed sublinear set-valued mappings [19]. The latter in turn extends to a far more general identity for the radius of metric regularity [7].
Observe that if and is a linear subspace then is automatically a linear subspace. If in addition and are each endowed with Euclidean norms, then (18) and (19) yield
[TABLE]
Corollary 1 and Theorem 1 below show that there is a tight connection between the relative strong convexity constant and the norm when is of the form . Both of these results rely on the following proposition that characterizes a certain type of Hoffman constant [15]. Proposition 3 is closely related to developments in [26, 29]. Proposition 3 extends [29, Theorem 2] that only applies to the case .
Proposition 3**.**
Suppose and are endowed with norms. Let and be a convex cone such that contains more than one point. If is a linear subspace then
[TABLE]
Proof.
Fix and . Since is a linear subspace, it follows that and thus for some with Hence and Since this holds for arbitrary and we conclude that
[TABLE]
To prove the reverse inequality, let and be such that and for all with . Pick with . Then for all . Thus and
[TABLE]
To finish let . â
Proposition 3 readily yields the following result that generalizes Example 1.
Corollary 1**.**
Suppose is endowed with the Euclidean norm , is endowed with a norm , and . If for some and , is a convex cone, and is a linear subspace that contains more than one point then
[TABLE]
Proof.
This follows from Proposition 3 and the observation that for this choice of and we have and â
The following result extends Corollary 1 to a broader class of functions.
Theorem 1**.**
Suppose and are endowed with norms and for the norm in . Let be a convex differentiable function, and be a convex cone such that is a linear subspace that contains more than one point. If is strongly convex for the norm in then the function satisfies
[TABLE]
Proof.
Observe that for all Since is strongly convex, it follows that for all and for all . Therefore Proposition 3 implies that
[TABLE]
â
If are as in Corollary 1 then by Proposition 2 the relative condition number is
[TABLE]
which has a striking resemblance to the classical condition number (1) of More generally, if are as in Theorem 1, , and is also smooth then by Proposition 2 we obtain the following bound on the relative condition number in terms of the condition number of and a condition number of the pair :
[TABLE]
3.2 Lower bound on when is a polyhedron
The results in Section 3.1 require to be a convex cone and to be a linear subspace. We next provide some results of similar flavor that relax these assumptions in exchange for the assumption that is a polyhedron. The crux of the main results in this section is Proposition 4. This technical result is drawn from the recent paper of Peña, Vera, and Zuluaga [26]. The latter paper develops a number of properties of a new class of relative Hoffman bounds. In particular, it introduces the sets of tangent cones and described below. These two sets of tangent cones are at the heart of the main developments in [26].
For a nonempty polyhedron let , where is the tangent cone of at , that is,
[TABLE]
We will rely on the following subset of that depends on how and fit together. Let
[TABLE]
In this definition, minimal is to be interpreted as minimal with respect to inclusion. This restriction guarantees that the set is of minimal size as it does not include redundant cones from .
Observe that is finite since is polyhedral and thus is finite as well. The following example illustrates the interesting relationship between and the tangent cones of captured by .
Example 4**.**
Suppose and . In this case each element of is of the form for some . Observe that is a linear subspace if and only if is feasible. Thus the set is in one-to-one correspondence with the maximal sets such that is feasible.**
Observe that when is a polyhedral cone and is a linear subspace. Thus the following proposition subsumes Proposition 3 when is polyhedral.
Proposition 4**.**
Suppose and are endowed with norms. Let and be a polyhedron such that contains more than one point. Then
[TABLE]
Proof.
This follows as a special case of [26, Proposition 5 and Corollary 3]. â
Corollary 2**.**
Suppose is endowed with the Euclidean norm , is endowed with a norm , and . If for some and , and is a polyhedron such that contains more than one point then
[TABLE]
Proof.
Proceed exactly as in the proof of Corollary 1 but apply Proposition 4 instead of Proposition 3. â
Theorem 2**.**
Suppose and are endowed with norms and for the norm in . Let be a convex differentiable function, and be a polyhedron such that contains more than one point. If is strongly convex for the norm in then the function satisfies
[TABLE]
Proof.
Proceeding exactly as in the proof of Theorem 1 but applying Proposition 4 instead of Proposition 3 we get
[TABLE]
â
Observe that if is polyhedral then and
[TABLE]
Thus Proposition 2 implies that for as in Corollary 2, the relative condition has the following expression, which is again strikingly similar to the classical condition number (1) of :
[TABLE]
Proposition 2 also implies that if are as in Theorem 2, , and is smooth then the relative condition number can be bounded in terms of the condition number of and a condition number of the pair as follows:
[TABLE]
We next place some of the developments by Peña and RodrĂguez [28] in the context of this paper. To that end, consider the special case when is the standard simplex in . For let and let denote the set of faces of . Furthermore, for let denote the set of columns of that do not belong to . Suppose is endowed with a norm and for let . Following [28] define the facial distance of as follows
[TABLE]
Let denote the diameter of the set of columns of defined as follows
[TABLE]
In the special case when it follows from [28, Theorem 1] that (23) in Proposition 4 has the following geometric characterization
[TABLE]
Furthermore, in this same special case when it is easy to see that (17) has the following geometric characterization
[TABLE]
Figure 4 gives a visualization of and of the facial distance for and . It depicts and in the hyperplane .
Example 5 below, a special case of Corollary 2, shows that for , , and the relative condition number is the square of , which has a flavor of an aspect ratio of . This gives an interesting analogy to (1).
Example 5**.**
Suppose is endowed with the norm, is endowed with the Euclidean norm, and for some with at least two different columns and . Then for Corollary 2 and identities (28) and (27) yield
[TABLE]
In particular,
[TABLE]
More generally, if for some smooth and strongly convex function then
[TABLE]
In particular,
[TABLE]
4 Properties of and
We next provide bounds on and analogous to those developed in Section 3 for . Proposition 1 already established . It is intuitively clear that could be a lot larger. When is a squared norm, the exact same technique used in [22, Theorem 1] show that . Indeed, when is a squared norm, the relationship among other variants of strong convexity introduced [22] extend to our context in a straightforward fashion as we next explain.
Definition 4**.**
Suppose satisfy Assumption 2.
- (a)
We say that has -under approximation on if there exists a constant such that
[TABLE]
- (b)
We say that has -gradient growth on if there exists a constant such that
[TABLE]
Suppose satisfies Assumption 2 and is a squared norm. Then for  [22, Theorem 4] yields the following chain of implications for :
[TABLE]
We note that [22, Theorem 4] is stated and proven for the Euclidean norm but the same statement and proof hold for any norm.
From the above chain of implications it follows that if satisfies Assumption 2 and is a squared norm then . In particular, any lower bound on , such as those in Theorem 1 or Theorem 2, is also a lower bound on and on when is a squared norm. We next show that the ideas in Section 3 can be extended to obtain sharper bounds on these two constants.
4.1 A sharper lower bound on
Suppose and is a polyhedron such that contains more than one point, and is nonempty. Proposition 4 readily implies
[TABLE]
Proposition 5 below, which extends Proposition 4, gives a sharper version of (31). Suppose is a polyhedron, and is nonempty. Let
[TABLE]
where
[TABLE]
Proposition 5 can be proven via a straightforward modification of techniques in [26]. We provide the details of this modification in Appendix A.
Proposition 5**.**
Suppose and are endowed with norms. Let and be a polyhedron such that contains more than one point. Then for all nonempty
[TABLE]
Furthermore, if is convex then
[TABLE]
Corollary 3**.**
Suppose is endowed with the Euclidean norm , is endowed with a norm , and . If for some and , and is a polyhedron such that contains more than one point and . Then
[TABLE]
Proof.
Proceed exactly as in the proof of Corollary 1 but apply Proposition 5 instead of Proposition 3. â
The following theorem gives a lower bound on analogous to the one on in Theorem 2. In light of Proposition 5, the lower bound on in Theorem 3 is at least as large, and possibly much larger, than the one on in Theorem 2.
Theorem 3**.**
Suppose and are endowed with norms and for the norm in . Let and be a polyhedron such that has more than one point. If is -strongly convex for the norm in then the function satisfies
[TABLE]
Proof.
Observe that for all and
[TABLE]
Since is strongly convex on , it follows that for all and , and it also follows that for all . Therefore
[TABLE]
To finish, apply Proposition 5. â
Once again there is an interesting connection with the developments in [28] when . Consider the special case when has at least two different columns, is nonempty, and is the smallest face of that contains . From [28, Theorem 3] it follows that if is endowed with the one-norm then
[TABLE]
The following example illustrates the difference between and .
Example 6**.**
Suppose is endowed with the one-norm and . Suppose is endowed with the Euclidean norm, and for some with at least two different columns and . As noted in Example 5, in this case
[TABLE]
This relative strong convexity constant depends only on but not on . On the other hand, the smallest face of containing is
[TABLE]
which evidently depends on both and . Theorem 3 and (35) yield
[TABLE]
It is evident that
[TABLE]
Furthermore, as it is illustrated in [28], the difference between these two quantities can be arbitrarily large. Consequently, the bound in Theorem 3 can be far sharper than that in Theorem 2. **
4.2 A sharper lower bound on
Suppose is defined as where is a strongly convex function, and . Theorem 3 does not apply to this kind of function due to the extra linear term . Indeed for a function of this form the constant may be zero, see Example 7 below. On the other hand, the next result shows that for a function of this form and for a polyhedral set it is always the case that provided a suitable linear cut is added to .
Theorem 4**.**
Suppose and are endowed with norms and for the norm in . Let and be a polyhedron such that contains more than one point. Suppose is -strongly convex for the norm in and is defined via . Then the vector is the same for all and satisfies for all . Furthermore, one of the following two possible cases applies depending on the range of values of for .
Case 1:
For all we have . In this case
[TABLE]
Case 2:
For some we have . In this case for all
[TABLE]
for the polyhedron , the matrix and the norm in defined as follows
[TABLE]
Proof.
The optimality conditions for imply that
[TABLE]
Thus for all the strong convexity of and (36) imply
[TABLE]
Hence whenever In particular, is the same for all . Furthermore, the optimality conditions for imply that for all . In particular, for all .
Next, the strong convexity of on implies that for all
[TABLE]
If for all then Case 1 applies. In this case for all and thus
[TABLE]
If for some then Case 2 applies. In this case for all and thus
[TABLE]
Next, observe that for and
[TABLE]
To finish, apply Proposition 5 in either case. â
Observe that if in Theorem 4 is bounded then Case 2 gives a lower bound on by taking because for this choice of .
We conclude this section with a simple example showing that can occur. The example also shows that the additional bound on in Theorem 4, Case 2 cannot simply dropped without making some additional assumptions.
Example 7**.**
Let be endowed with the one-norm and let Suppose is as follows
[TABLE]
If then . For we have and . Hence . On the other hand, Theorem 4 implies that A more careful calculation shows that in this case .
On the other hand, if then For and we have and . Therefore . Furthermore, in the context of Theorem 4 we have . A simple calculation shows that for all we have and **
5 Convergence of first-order methods
This section details linear convergence results for the mirror descent algorithm, Frank-Wolfe algorithm, and Frank-Wolfe algorithm with away steps for problem (2). The linear convergence statements for the three algorithms are strikingly similar. They are stated in terms of the relative constants and for suitable choices of distance-like functions .
5.1 Mirror descent algorithm
Suppose is convex and differentiable on and the Bregman proximal map
[TABLE]
is computable for and . The mirror descent algorithm for problem (2) is based on the following update for :
[TABLE]
Algorithm 1 gives a description of the mirror descent algorithm for (2).
Proposition 6 and Proposition 7 show the linear convergence of Algorithm 1 provided that suitable relative smoothness and relative quasi-strong convexity or relative functional growth conditions hold. Throughout the remaining of this subsection we assume that satisfy Assumption 1.
We should note that Proposition 6 and its proof are straightforward modifications of the linear convergence results in [20, 32]. However, Proposition 6 shows that the linear convergence of Algorithm 1 holds with a sharper rate and under more general assumptions than those in [20, 32]. In particular, the rate in Proposition 6 is stated in terms of a relative quasi-strong convexity constant, which is always at least as large and possibly much larger than the kind of relative strong convexity constant in [20, 32]. Furthermore, our results in Section 3 and Section 4 guarantee linear convergence when is of the form provided and satisfy smoothness and strong convexity assumptions. The linear convergence results in [20, 32] do not apply for functions of this form because they are not strictly convex and thus the kind of relative strong convexity constant in [20, 32] is typically zero.
The following lemma, which is a straightforward extension of results presented in [32], provides the crux of the proof of Proposition 6.
Lemma 1**.**
Suppose and . If and
[TABLE]
then
[TABLE]
Proof.
Since and we have
[TABLE]
and
[TABLE]
In addition, the three-point property of  [5, Lemma 3.1] yields
[TABLE]
By putting together (39), (40), and (41) we get
[TABLE]
We get (38) by observing that the optimality conditions for (37) imply
[TABLE]
â
Proposition 6**.**
Suppose and . If in Algorithm 1 then the iterates generated by Algorithm 1 satisfy
[TABLE]
and
[TABLE]
Proof.
Lemma 1 applied to implies that
[TABLE]
Therefore
[TABLE]
Thus (42) readily follows. Inequality (43) also yields
[TABLE]
â
Proposition 6 implies that if and then Algorithm 1 yields such that in at most
[TABLE]
iterations.
Proposition 7 below shows that the same kind of iteration bound holds under a relative functional growth assumption instead of the quasi strong convexity assumption in Proposition 6. We note that although Proposition 7 is similar in flavor to Proposition 6, it is stated in terms of the novel concept of relative functional growth. Furthermore, neither Proposition 6 nor Proposition 7 implies the other since neither nor necessarily bounds the other. (See Example 2 and Example 7.)
Proposition 7**.**
Suppose and . If in Algorithm 1 then for the iterates generated by Algorithm 1 satisfy
[TABLE]
In addition, Algorithm 1 yields such that in at most
[TABLE]
iterations.
Proof.
Since , it follows from [20, Theorem 3.1] that the -th iterate generated by Algorithm 1 satisfies
[TABLE]
Therefore, since ,
[TABLE]
Thus (44) follows. It also follows that for
[TABLE]
and thus (45) follows as well. â
To ease our exposition, in Proposition 6 and Proposition 7 we assumed is known and used in Step 3 of Algorithm 1. However, it is easy to see that these two results also hold if the assumption is relaxed to the assumption and . The latter condition is easier to implement via a standard backtracking procedure. We also assume knowledge of suitable relative smoothness constants for the choice of stepsize in Step 4 of Algorithm 2 and in Step 9 of Algorithm 3 below. As in Algorithm 1, this assumption can be relaxed via a standard backtracking procedure.
5.2 Frank-Wolfe algorithm
Suppose is a compact convex set and a linear oracle for is available, that is, the map
[TABLE]
is computable.
The Frank-Wolfe algorithm, also known as the conditional gradient algorithm, for (2) is based on the following update for
[TABLE]
Algorithm 2 gives a description of the Frank-Wolfe algorithm for (2).
Let where is the radial distance defined as follows: for
[TABLE]
Hence the relative smoothness constant is the smallest such that for all and
[TABLE]
Observe that the relative smoothness constant is precisely the curvature constant of on defined by Jaggi [16].
The relative quasi strong convexity constant is the largest such that for all
[TABLE]
Similarly, the relative functional growth constant is the largest such that for all
[TABLE]
The next result shows the linear convergence of Algorithm 2 when or is finite. As we note below, Proposition 8 is at least as sharp as the linear convergence rates established in [13, 3].
Proposition 8**.**
Suppose and If each stepsize in Step 4 of Algorithm 2 is chosen via
[TABLE]
then the iterates generated by Algorithm 2 satisfy
[TABLE]
Proof.
It suffices to show that at iteration
[TABLE]
Indeed, inequality (47), the choice of , and (48) imply that
[TABLE]
We next show (48). The construction of the radial distance and the choice of in Algorithm 2 imply that
[TABLE]
We next consider the two possible values of separately.
Case 1: . In this case we have
[TABLE]
Rearranging and applying the arithmetic-mean geometric-mean inequality we get
[TABLE]
Case 2: . In this case we have
[TABLE]
Therefore the last term is at least as large as the geometric mean of the first two and we get
[TABLE]
â
To conclude this subsection, we discuss some natural bounds on and . Recall that denotes the relative interior of . Similarly, let denote the relative boundary of . As it was previously discussed in [16], from (47) it readily follows that if is -smooth on for some norm in then
[TABLE]
On the other hand, if is -strongly convex on for some norm in and the single element satisfies then for all we have for some . The strong convexity of thus implies both
[TABLE]
Therefore when is both -smooth and -strongly convex and we have
[TABLE]
Observe that the right-hand side in both inequalities is an interesting combination of the usual condition number of and a kind of condition number of the set around the point . The first bound above and Proposition 8 yield a linear convergence result similar to [13, Theorem 2] but with a sharper rate.
The above bounds can be extended to a broader context. Suppose for some strongly convex function and . Then for all we have
[TABLE]
Consequently, if then for all
[TABLE]
Observe that can in turn be bounded below as follows
[TABLE]
Therefore when where is -smooth and -strongly convex then for all both and are bounded above by
[TABLE]
This bound and Proposition 8 yield a linear convergence result similar to [3, Proposition 3.2] but with a sharper rate.
5.3 Frank-Wolfe with away steps algorithm
Suppose is a polytope and a vertex linear oracle for is available, that is, the map
[TABLE]
is computable and outputs a vertex of for all .
For this kind of linear oracle, each step of the Frank-Wolfe algorithm adds weight to some vertex . The basic idea of the Frank-Wolfe with away steps algorithm is to combine regular steps of the Frank-Wolfe algorithm with away steps that reduce weight from some vertex . To that end, the algorithm requires an additional vertex representation of . More precisely, let and be such that
[TABLE]
Algorithm 3 describes a Frank-Wolfe with away steps algorithm. We should highlight that although the set could be immense, the algorithm does not require it explicitly. Instead the algorithm only maintains and that are far more manageable. Indeed, by using the IRR procedure in [2] or its modification described in [14], Step 10 in Algorithm 3 can guarantee that the sets have size at most for .
Proposition 9 below establishes the linear convergence of Algorithm 3 under suitable relative smoothness and quasi strong convexity or functional growth conditions. To that end, we consider two variants of the radial distance. Let where is the diametral distance defined via
[TABLE]
The relative smoothness constant is the smallest such that for all and with
[TABLE]
The relative smoothness constant is precisely the away curvature constant of on defined by Lacoste-Julien and Jaggi [18].
To capture the appropriate relative strong convexity conditions, we rely on a more involved variant of the radial distance. For , let denote the collection of all subsets such that is a positive convex combination of the elements in . Let where is defined via
[TABLE]
The relative strong convexity constant is at least as large as
[TABLE]
The latter quantity is precisely the geometric strong convexity constant defined by Lacoste-Julien and Jaggi [18, Appendix C]. Notice that it matches when is strictly convex because in that case for all . Otherwise, could be larger.
The relative quasi strong convexity constant is the largest such that for all
[TABLE]
Similarly, the relative functional growth constant is the largest such that for all
[TABLE]
Since and is at least as large as the geometric strong convexity constant in [18, Appendix C], the following linear convergence result is at least as sharp as the one given in [18, Theorem 8] for the Frank-Wolfe with away steps algorithm.
Proposition 9**.**
Suppose and If each stepsize in Step 9 of Algorithm 3 is chosen via
[TABLE]
then the iterates generated by Algorithm 3 satisfy
[TABLE]
Proof.
This proof follows a similar reasoning to the proof of Proposition 8. First we claim that at iteration
[TABLE]
To show this claim, consider the two possible values of separately.
Case 1: . In this case we have
[TABLE]
Rearranging and applying the arithmetic-mean geometric-mean inequality we get
[TABLE]
Case 2: . In this case we have
[TABLE]
Therefore the last term is at least as large as the geometric mean of the first two and we get
[TABLE]
To finish the proof, we next show (52) by relying on (53). To do so, we replicate some of the main ideas previously introduced in [2, 18, 28].
The choice of at iteration and (52) imply that
[TABLE]
We consider separately the three possible cases that can occur for at iteration , namely , and
Case 1: . In this case . In addition, inequalities (50) and (54), and the choice of imply that
[TABLE]
Case 2: . In this case . In addition, inequality (50), the choice of , and the convexity of imply that
[TABLE]
Case 3: . In this case . In addition, (50) and the choice of imply that
[TABLE]
We next show that in the first iterations Case 3 can occur at most times by using the argument introduced by Lacoste-Julien and Jaggi in [18]. Since and for it follows that for each iteration when Case 3 occurred there must have been at least one previous iteration when Case 1 occurred. Hence in the first iterations Case 3 could occur at most times.
To finish the proof, observe that at every iteration when Case 1 or Case 2 occur inequalities (55) and (56) yield
[TABLE]
We note that the minimum in the last expression is is necessary because may indeed occur. For a concrete example, see [28, Example 6].
â
We next discuss some bounds on and on in terms of the set . We should note that the bounds below on and on have also been derived, albeit following a different approach, in [18, Appendix C].
From (50) it readily follows that if is -smooth on for some norm in then
[TABLE]
On the other hand, from [28, Theorem 1] it follows that for all
[TABLE]
where .
Hence if is -strongly convex on for some norm in then for all we have
[TABLE]
and consequently
[TABLE]
Therefore when is both -smooth and -strongly convex on for some norm in we have
[TABLE]
Once again, the right-hand side is an interesting combination of the usual condition number of and a kind of condition number of . Furthermore, by proceeding as in Example 5 it follows that when is of the form for some and we have and . Thus for we have
[TABLE]
This illustrates how the condition number of relative to depends on how the shape of and fit together.
We also have the following sharper lower bound on . From [28, Theorem 3] it follows that
[TABLE]
where is the smallest face of that contains . It thus follows that if is -strongly convex on for some norm then
[TABLE]
Finally we note that Theorem 4 implies that when is of the form for some strongly convex function . Indeed, with a slight abuse of notation, let denote the matrix whose columns are the elements of and consider the function defined via . Observe that for
[TABLE]
Consequently,
[TABLE]
for the distance function . The functional growth constant in turn can be bounded below as detailed in Theorem 4 since can be written as and is strongly convex.
The linear convergence bounds in Proposition 9 are tight modulo some small constants. This can be readily inferred from [28, Example 3 and Example 4].
Appendix A Proof of Proposition 5
The construction of implies and  for all . Hence
[TABLE]
where the last step follows from [26, Lemma 1]. This proves the second inequality in (33).
Let The first inequality in (33) can be stated as follows: for all and
[TABLE]
We prove (57) by contradiction. Suppose that there exist and such that That is,
[TABLE]
Let and consider the convex optimization problem
[TABLE]
Observe that since . Thus there exists such that and
[TABLE]
Therefore there exists feasible for (59) with . On the other hand, (58) implies that there does not exist any feasible for (59) with . It thus follows that (59) has an optimal solution with . Now consider the modification of (59) obtained by replacing with :
[TABLE]
Proceeding as above with in lieu of it follows that (60) has an optimal solution with . In particular, is a feasible solution to (59) with which contradicts the optimality of We therefore conclude that (57) must hold and thus (33) is proven.
We next prove (34) when is convex. To that end, suppose and . Then for some . Let be such that and for all with . By scaling if necessary we can assume that and thus for some . Observe that implies both and . It thus follows that
[TABLE]
Since this holds for all and identity (34) follows. â
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] H. Bauschke, J. Bolte, and M. Teboulle. A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Mathematics of Operations Research , 42(2):330â348, 2016.
- 2[2] A. Beck and S. Shtern. Linearly convergent away-step conditional gradient for non-strongly convex functions. Mathematical Programming , 164:1â27, 2017.
- 3[3] A. Beck and M. Teboulle. A conditional gradient method with linear rate of convergence for solving convex linear systems. Math. Meth. of Oper. Res. , 59(2):235â247, 2004.
- 4[4] S. Bubeck, Y. Lee, and M. Singh. A geometric alternative to Nesterovâs accelerated gradient descent. ar Xiv preprint ar Xiv:1506.08187 , 2015.
- 5[5] G. Chen and M. Teboulle. Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM Journal on Optimization , 3(3):538â543, 1993.
- 6[6] D. Cheung and F. Cucker. A new condition number for linear programming. Math. Prog. , 91(2):163â174, 2001.
- 7[7] A. L. Dontchev, A. S. Lewis, and R. T. Rockafellar. The radius of metric regularity. Trans. Amer. Math. Soc. , 355(2):493â517 (electronic), 2003.
- 8[8] D. Drusvyatskiy, M. Fazel, and S. Roy. An optimal first order method based on optimal quadratic averaging. SIAM Journal on Optimization , 28(1):251â271, 2018.
