Equivalence and invariance of the chi and Hoffman constants of a matrix
Javier F. Pena, Juan C. Vera, and Luis F. Zuluaga

TL;DR
This paper proves that the chi and Hoffman constants for a full column rank matrix are identical and explores their invariance and equivalence with related condition measures, extending to subspace-dependent variants.
Contribution
It establishes the equality and invariance of the chi and Hoffman constants, and relates them to other condition measures, revealing fundamental connections.
Findings
Chi and Hoffman constants are identical for full column rank matrices.
Invariance of these constants under sign changes of matrix rows.
Extensions to subspace-dependent variants and relations to other condition measures.
Abstract
We show that the following two condition measures of a full column rank matrix are identical: the chi constant and a signed Hoffman constant. This identity is naturally suggested by the evident invariance of the chi constant under sign changes of the rows of . We also show that similar equivalence and invariance properties extend to variants of the chi and Hoffman constants that depend only on the linear subspace . Finally, we show similar identities between the chi constants and signed versions of Renegar's and Grassmannian condition measures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Advanced Optimization Algorithms Research · Sparse and Compressive Sensing Techniques
Equivalence and invariance of the chi and Hoffman constants of a matrix
Javier F. Peña Tepper School of Business, Carnegie Mellon University, USA, [email protected]
Juan C. Vera Department of Econometrics and Operations Research, Tilburg University, The Netherlands, [email protected]
Luis F. Zuluaga Department of Industrial and Systems Engineering, Lehigh University, USA, [email protected]
Abstract
We show that the following two condition measures of a full column rank matrix are identical: the chi constant and a signed Hoffman constant. This identity is naturally suggested by the evident invariance of the chi constant under sign changes of the rows of . We also show that similar equivalence and invariance properties extend to variants of the chi and Hoffman constants that depend only on the linear subspace . Finally, we show similar identities between the chi constants and signed versions of Renegar’s and Grassmannian condition measures.
AMS Subject Classification: 65K10, 65F22, 90C25 90C57
Keywords: Condition measures, invariance, weighted least squares, linear inequalities
1 Introduction
We show a novel equivalence between the following two condition measures of a matrix that play central roles in numerical linear algebra and in convex optimization: the chi measure [3, 5, 6, 29, 30] and the Hoffman constant [12, 9, 14, 34]. We also show some similar equivalences for some variants of these constants.
Let be a full column rank matrix. The chi constant and its variant arise in the analysis of weighted least squares problems [4, 6, 7, 13]. In particular, plays a central role in the analysis of Vavasis and Ye’s interior-point algorithm for linear programming [19, 33]. A remarkable feature of Vavasis and Ye’s algorithm is its sole dependence on the matrix defining the primal and dual constraints.
The Hoffman constant is associated to Hoffman’s Lemma [12, 9], a fundamental error bound for systems of linear constraints of the form . The Hoffman constant and other similar error bounds are used to establish the convergence rate of a wide variety of optimization algorithms [34, 2, 8, 10, 15, 16, 18, 20, 23, 24, 34].
As we discuss in Section 2, the chi constant and its variant can be seen as measures of worst behavior of a canonical solution mapping for the following weighted least squares problems
[TABLE]
where is a diagonal matrix with positive diagonal entries.
Similarly, the Hoffman constant and its variant can be seen as measures of worst behavior of a canonical solution mapping for the following system of linear inequalities
[TABLE]
It is not immediately obvious that there should be a relationship between the chi and Hoffman constants. Nonetheless, it is known that and that can be arbitrarily larger [11, 26]. Thus an equivalence between the constants and appears impossible. The main goal of this paper is to show that this apparent impossibility can be attributed to and rectified via a canonical sign invariance property of detailed in equation (2) below. Namely, the constant does not change when the signs of some of the rows of are flipped as the solution mapping (7) satisfies this sign invariance property. On the other hand, the constant does not satisfy this sign invariance property and thus and cannot be identical. Our main result (Theorem 1) shows that and become identical after properly tweaking to ensure the sign invariance property.
A similar type of invariance consideration yields identities between the variants and . Our developments can be further extended to obtain analogous identities between the four measures and the following two popular condition measures for systems of linear inequalities: Renegar’s distance to ill-posedness [27] and the Grassmannian condition measure [1].
The above developments are similar in spirit to results previously derived by Tunçel [32], by Todd, Tunçel, and Ye [31], and by Ho and Tunçel [11]. These articles compare various condition measures for linear programming including the chi and Hoffman constants. However, there are two major differences between our developments and theirs. First, most of the results in [32, 11, 31] provide only inequalities and hence are weaker than our identities concerning the chi and Hoffman constants. Second, the articles [32, 11, 31] do not deal with Renegar’s and Grassmannian condition measures but instead relate the chi and Hoffman constants with Ye’s condition measure [35] for polyhedra of the form . Hence we deliberately chose not to discuss Ye’s condition measure in this paper. However, we note that our results can be extended to identities involving Ye’s condition measure by drawing on the recent work by Peña and Roshchina [25].
To formally state the sign invariance property, we rely on the following convenient notation. Let denote the set of signature matrices defined as follows
[TABLE]
The constant satisfies the following sign invariance property:
[TABLE]
Our main result states that and become identical if we take a suitable closure of to ensure the sign invariance property.
Theorem 1**.**
Let be a full column-rank matrix. Then
[TABLE]
A similar type of invariance property relates the measures and . The construction of depends only on the subspace . Thus readily satisfies the following invariance under right multiplication by non-singular matrices
[TABLE]
In analogy to Theorem 1, the measures and become identical if we take a suitable closure of to ensure the same invariance under right multiplication by non-singular matrices (see Proposition 1):
[TABLE]
Furthermore, the same kind of identity holds for the measures and (see Proposition 3):
[TABLE]
In particular, identity (3) in Theorem 1 readily extends to the measures and as follows (see Corollary 1):
[TABLE]
Our proof of Theorem 1 will actually show the following stronger identity when all rows of are non-zero (see Theorem 2):
[TABLE]
This stronger identity in turn yields some interesting connections with Renegar’s distance to ill-posedness [27, 28] and the Grassmannian condition number of [1]. More precisely, in Section 4 we show the following identity analogous to (3) (see Proposition 5):
[TABLE]
and the following identity analogous to (4) (see Corollary 2):
[TABLE]
The main sections of the paper are organized as follows. Section 2 recalls the construction of the chi constants as well as the Hoffman constants and some of their main properties. Our presentation deliberately follows separate but similar formats for and for . Section 3 presents the proof of Theorem 1. To do so, we state and prove the stronger Theorem 2. Finally, Section 4 recalls the construction of Renegar’s condition measure and of the Grassmannian condition measure . This section also proves identities (5) and (6).
Throughout the paper whenever we encounter an Euclidean space we implicitly assume that it is endowed with the Euclidean norm defined by the canonical inner product in , that is, for all . Likewise, whenever we encounter a space of matrices we implicitly assume that it is endowed with the operator norm, that is,
[TABLE]
for all .
2 Definition and properties of the chi and Hoffman constants
This section recalls the construction and main properties of the constants and . These constants can be seen as condition measures for two fundamental problems in scientific computing, namely weighted least squares and linear inequalities.
2.1 Weighted least squares
Let denote the set of diagonal matrices in with positive diagonal entries. That is,
[TABLE]
where denotes the set of vectors in with positive entries.
Suppose . Given , consider the weighted least squares problem
[TABLE]
When is full column-rank, it is easy to see that the solution to (7) is precisely where is the following weighted pseudo-inverse of [6, 29]:
[TABLE]
2.1.1 Condition measures and
Suppose is full column-rank. The condition measure is defined as the following worst-case characteristic of the family of solution mappings constructed via (8):
[TABLE]
Consider the following alternative reformulation of the weighted least-squares problem (7) in the subspace :
[TABLE]
The solution to (10) is evidently the -projection of onto . Once again, it is easy to see that if is full column-rank then the -projection onto is
[TABLE]
The condition measure is defined as the following worst-case characteristic of the family of solution mappings :
[TABLE]
Although it is not immediately evident, the constants and are finite for any full-rank matrix . This fact was independently shown by Ben-Tal and Teboulle [3], Dikin [5], Stewart [29], and Todd [30]. The constants and arise in and play a key role in weighted least-squares problems [6, 7, 4] and in linear programming [11, 31, 32, 33].
We record some alternative expressions for and that are closely related to the constructions of and discussed below. First, observe that
[TABLE]
Second, observe that
[TABLE]
2.1.2 Properties of and
Suppose is full column-rank and . By construction, the solution mappings and satisfy the following property: For then . In particular and . Therefore (9) and (11) imply that the constants and satisfy the following sign invariance property:
[TABLE]
Furthermore, the quantity depends only on the subspace which evidently satisfies for all non-singular . Therefore, the constant is invariant under multiplication by non-singular matrices, that is,
[TABLE]
The constant is not invariant under multiplication by singular matrices. Proposition 1 shows that is the closure of under this kind of invariance.
Proposition 1**.**
Suppose is full column-rank. Then and when the columns of are orthonormal. In particular,
[TABLE]
Proof.
Since , the construction (9) and (11) of and readily implies that
[TABLE]
Next, we show that when the columns of are orthonormal. To that end, observe that if the columns of are orthonormal then for all . In particular, if the columns of are orthonormal then for all . Thus (9) and (11) imply that .
Finally, from (12) and (14) it follows that for all non-singular. Thus (13) follows.
∎
In the special case when and is non-singular it is easy to see that
[TABLE]
We will rely on the following related characterization of from [6]. The same characterization is also stated and proved in [36] by adapting a technique from [31]. In the statement below for and the matrix denotes the submatrix of defined by the rows of indexed by .
Proposition 2**.**
Suppose has full column-rank. Then
[TABLE]
2.2 Linear inequalities
Suppose . Consider the feasibility problem
[TABLE]
The solution of (16) is the set
[TABLE]
Observe that if and only if .
2.2.1 Condition measures and
Suppose is a nonzero matrix. The condition measure is defined as the following worst-case characteristic of the solution mapping constructed via (17):
[TABLE]
Here and throughout the paper, denotes the following point-to-set distance for all and :
[TABLE]
The constant can be equivalently defined as the smallest constant depending only on such that the following error bound holds for all and all :
[TABLE]
Again, it is not immediately evident that is finite. This fact was shown by Hoffman in his seminal paper [12]. Other proofs of this fundamental result can be found in [9, 26, 34]. After Hoffman’s initial work, the literature in error bounds has developed extensively [17, 18, 21, 22, 23, 37]. Error bounds play a key role in optimization and variational analysis. In particular, error bounds are widely used to established the convergence rate of a variety of algorithms [2, 8, 10, 15, 16, 18, 20, 23, 24, 34].
Consider the following reformulation of (16) in the subspace :
[TABLE]
The solution of (19) is the set
[TABLE]
Define as the following worst-case characteristic of the solution mapping :
[TABLE]
The constant can be equivalently defined as the smallest constant depending only on the subspace such that the following error bound holds for all and
[TABLE]
2.2.2 Properties of and
By construction, depends only on and thus is invariant under multiplication by non-singular matrices, i.e.,
[TABLE]
On the other hand, is not invariant under multiplication by non-singular matrices. Proposition 3 shows that is the closure of under this kind of invariance.
Proposition 3**.**
Suppose is a nonzero matrix. Then and when the nonzero columns of are orthonormal. In particular,
[TABLE]
Proof.
This proof is similar to the proof of Proposition 1. Observe that for all because for all . Hence (18) and (20) imply that
[TABLE]
We next show that when the nonzero columns of are orthonormal. For ease of exposition, consider first the case when all columns of are nonzero and orthonormal. In this case it is easy to see that if and only if for some unique with . Therefore for all . From (18) and (20) it follows that .
Next consider the more general case when some columns of are zero. Without loss of generality assume that for some with nonzero orthonormal columns for some . Since the columns of are orthonormal, we have . To finish, it suffices to show that and . Indeed, holds because and . On the other hand, for let denote the subvector of first entries of . Then for all and thus . Hence
[TABLE]
Finally from (21) and (23) it follows that for all non-singular. Thus (22) follows.
∎
We will also rely on the following two properties of . First, in the special case when or equivalently we have [26, Corollary 1]
[TABLE]
Second, for general we have the following related characterization of discussed in [26] but that can be traced back to [14, 34, 36].
Proposition 4**.**
Suppose is full column-rank. Then
[TABLE]
Observe both the similarity and subtle difference between the right-most expressions in the characterization (15) of in Proposition 2 and the characterization (25) of in Proposition 4: the first maximum is taken over the same collection of sets in both (15) and (25) whereas the second maximum is taken over in (15) and over in (25).
3 Proof of Theorem 1
We will prove the following stronger version of Theorem 1.
Theorem 2**.**
Let be a full column-rank matrix. Then
[TABLE]
where is the column-wise concatenation of and , that is,
[TABLE]
Furthermore, if all rows of are nonzero then (26) can be sharpened to
[TABLE]
Proof.
From (15) in Proposition 2 and (25) in Proposition 4 it immediately follows that . Thus the sign invariance of readily yields
[TABLE]
To prove the reverse inequality we rely on (15) and (25) again. Suppose is such that is non-singular and
[TABLE]
Thus for some such that . Choose such that for each and let . Observe that is nonsingular and
[TABLE]
Therefore
[TABLE]
Thus the first identity in (26) is established. Next, Proposition 2 and Proposition 4 imply that for all
[TABLE]
The second inequality follows because all rows of are rows of as well. Hence by taking the maximum over and applying the first identity in (26), we obtain the second identity in (26).
When all rows of are non-zero, it follows that has all nonzero entries for an arbitrarily small perturbation of . Therefore the matrix above can be chosen so that both and Thus the sharper identity (28) follows. ∎
Corollary 1**.**
Let be a full column-rank matrix. Then
[TABLE]
where is as in (27). Furthermore, if all rows of are nonzero then
[TABLE]
Proof.
This is an immediate consequence of Theorem 2, Proposition 1 and Proposition 3. ∎
We note that when is full column-rank but some rows of are zero, then the following amended version of (28) holds for the submatrix obtained after deleting the zero rows from :
[TABLE]
The construction of and enables us to rewrite the latter identity as follows
[TABLE]
4 Renegar’s and Grassmannian condition numbers
Suppose is such that . This property can be equivalently stated as , that is, for all the system of linear inequalities
[TABLE]
is feasible. In his seminal paper on condition measures for optimization [27], Renegar defined the distance to infeasibility of as the smallest perturbation that can be made on so that this property is lost. That is
[TABLE]
Renegar also defined as a condition number of .
We have the following characterization of in terms analogous to that in Theorem 2.
Proposition 5**.**
Let be a full column-rank matrix. If then . Consequently, if all rows of full column-rank matrix are nonzero then
[TABLE]
Proof.
When , the distance to ill-posedness has the following property similar in spirit to Proposition 2 and Proposition 4 (see[28, Theorem 3.5]):
[TABLE]
From (24) and (30) it follows that when . The latter condition and (28) in turn imply (29) if all rows of are nonzero. ∎
Ameluxen and Burgisser [1] proposed a condition number via the Grassmannian manifold of linear subspaces of of some fixed dimension. This condition number can be seen as a variant of Renegar’s condition measure that depends only on akin to the variants and of and respectively. We next recall the description of the Grassmannian condition number proposed by Ameluxen and Burgisser [1]. First, define the Grassmannian distance between two linear subspaces of the same dimension as
[TABLE]
where and denote the orthogonal projection matrices onto and respectively.
Suppose satisfies . Let and define the Grassmannian condition number of as follows
[TABLE]
Since depends only on , it automatically satisfies the following invariance property just as and do: For all non-singular
[TABLE]
The pair of quantities are related to each other in the same way the pairs of quantities and are. More precisely, we have the following analogue of Proposition 1 and Proposition 3.
Proposition 6**.**
Suppose is a nonzero matrix and . Then and when the non-zero columns of are orthonormal. Consequently, if is a nonzero matrix
[TABLE]
Proof.
Suppose and . Then the inequality follows from [1, Theorem 1.4] and the identity when the nonzero columns of are orthonormal follows from [1, Theorem 1.3]. The latter two facts and (31) in turn imply (32) when is a nonzero matrix. ∎
We conclude with the following characterization of in terms analogous to that in Corollary 1.
Corollary 2**.**
Suppose is a full column-rank matrix and all rows of are nonzero. Then
[TABLE]
Proof.
This is an immediate consequence of Proposition 1, Proposition 5, and Proposition 6. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. Amelunxen and P. Bürgisser. A coordinate-free condition number for convex programming. SIAM J. on Optim. , 22(3):1029–1041, 2012.
- 2[2] A. Beck and S. Shtern. Linearly convergent away-step conditional gradient for non-strongly convex functions. Mathematical Programming , 164:1–27, 2017.
- 3[3] A. Ben-Tal and M. Teboulle. A geometric property of the least squares solution of linear equations. Linear Algebra and its Applications , 139:165–170, 1990.
- 4[4] E. Bobrovnikova and S. Vavasis. Accurate solution of weighted least squares by iterative methods. SIAM Journal on Matrix Analysis and Applications , 22(4):1153–1174, 2001.
- 5[5] I. Dikin. On the speed of an iterative process. Upravlyaemye Sistemi , 12(1):54–60, 1974.
- 6[6] A. Forsgren. On linear least-squares problems with diagonally dominant weight matrices. SIAM Journal on Matrix Analysis and Applications , 17(4):763–788, 1996.
- 7[7] A. Forsgren and G. Sporre. On weighted linear least-squares problems related to interior methods for convex quadratic programming. SIAM Journal on Matrix Analysis and Applications , 23(1):42–56, 2001.
- 8[8] D. Garber. Fast rates for online gradient descent without strong convexity via Hoffman’s bound. ar Xiv preprint ar Xiv:1802.04623 , 2018.
