Equivalence and invariance of the chi and Hoffman constants of a matrix

Javier F. Pena; Juan C. Vera; and Luis F. Zuluaga

arXiv:1905.06366·math.OC·May 19, 2020

Equivalence and invariance of the chi and Hoffman constants of a matrix

Javier F. Pena, Juan C. Vera, and Luis F. Zuluaga

PDF

Open Access

TL;DR

This paper proves that the chi and Hoffman constants for a full column rank matrix are identical and explores their invariance and equivalence with related condition measures, extending to subspace-dependent variants.

Contribution

It establishes the equality and invariance of the chi and Hoffman constants, and relates them to other condition measures, revealing fundamental connections.

Findings

01

Chi and Hoffman constants are identical for full column rank matrices.

02

Invariance of these constants under sign changes of matrix rows.

03

Extensions to subspace-dependent variants and relations to other condition measures.

Abstract

We show that the following two condition measures of a full column rank matrix $A \in R^{m \times n}$ are identical: the chi constant and a signed Hoffman constant. This identity is naturally suggested by the evident invariance of the chi constant under sign changes of the rows of $A$ . We also show that similar equivalence and invariance properties extend to variants of the chi and Hoffman constants that depend only on the linear subspace $A (R^{n}) := {A x : x \in R^{n}} \subseteq R^{m}$ . Finally, we show similar identities between the chi constants and signed versions of Renegar's and Grassmannian condition measures.

Equations128

b \mapsto x \in R^{n} arg min (A x - b)^{T} D (A x - b)

b \mapsto x \in R^{n} arg min (A x - b)^{T} D (A x - b)

b \mapsto {x \in R^{n} : A x \leq b} .

b \mapsto {x \in R^{n} : A x \leq b} .

S := {Diag (s) : s \in {- 1, 1}^{m}} .

S := {Diag (s) : s \in {- 1, 1}^{m}} .

χ (A) = χ (S A) for all S \in S .

χ (A) = χ (S A) for all S \in S .

χ (A) = S \in S max H (S A) .

χ (A) = S \in S max H (S A) .

\overline{χ} (A) = \overline{χ} (A R) for all R \in R^{n \times n} non-singular .

\overline{χ} (A) = \overline{χ} (A R) for all R \in R^{n \times n} non-singular .

\overline{χ} (A) = non-singular R \in R ^{m \times m} min ∥ A R ∥ \cdot χ (A R) .

\overline{χ} (A) = non-singular R \in R ^{m \times m} min ∥ A R ∥ \cdot χ (A R) .

\overline{H} (A) = non-singular R \in R ^{m \times m} min ∥ A R ∥ \cdot H (A R) .

\overline{H} (A) = non-singular R \in R ^{m \times m} min ∥ A R ∥ \cdot H (A R) .

\overline{χ} (A) = S \in S max \overline{H} (S A) .

\overline{χ} (A) = S \in S max \overline{H} (S A) .

χ (A) = S A ( R ^{n} ) \cap R _{++}^{m} \neq = \emptyset S \in S max H (S A) .

χ (A) = S A ( R ^{n} ) \cap R _{++}^{m} \neq = \emptyset S \in S max H (S A) .

χ (A) = S A ( R ^{n} ) \cap R _{++}^{m} \neq = \emptyset S \in S max \frac{1}{R ( S A )}

χ (A) = S A ( R ^{n} ) \cap R _{++}^{m} \neq = \emptyset S \in S max \frac{1}{R ( S A )}

\overline{χ} (A) = S A ( R ^{n} ) \cap R _{++}^{m} \neq = \emptyset S \in S max G (S A) .

\overline{χ} (A) = S A ( R ^{n} ) \cap R _{++}^{m} \neq = \emptyset S \in S max G (S A) .

∥ A ∥ = ∥ x ∥ \leq 1 x \in R ^{d} max ∥ A x ∥

∥ A ∥ = ∥ x ∥ \leq 1 x \in R ^{d} max ∥ A x ∥

D := {Diag (d) : d \in R_{++}^{m}},

D := {Diag (d) : d \in R_{++}^{m}},

x \in R^{n} min (A x - b)^{T} D (A x - b) .

x \in R^{n} min (A x - b)^{T} D (A x - b) .

A_{D}^{†} = (A^{T} D A)^{- 1} A^{T} D .

A_{D}^{†} = (A^{T} D A)^{- 1} A^{T} D .

χ (A) := D \in D max ∥ A_{D}^{†} ∥.

χ (A) := D \in D max ∥ A_{D}^{†} ∥.

y \in A (R^{n}) min (y - b)^{T} D (y - b) .

y \in A (R^{n}) min (y - b)^{T} D (y - b) .

A (A^{T} D A)^{- 1} A^{T} D = A A_{D}^{†} .

A (A^{T} D A)^{- 1} A^{T} D = A A_{D}^{†} .

\overline{χ} (A) := D \in D max ∥ A A_{D}^{†} ∥ = D \in D max ∥ A (A^{T} D A)^{- 1} A^{T} D ∥.

\overline{χ} (A) := D \in D max ∥ A A_{D}^{†} ∥ = D \in D max ∥ A (A^{T} D A)^{- 1} A^{T} D ∥.

χ (A) = D \in D max A x \neq = b b \in R ^{m} , x \in R ^{n} max \frac{∥ x - A _{D}^{†} ( b ) ∥}{∥ A x - b ∥} .

χ (A) = D \in D max A x \neq = b b \in R ^{m} , x \in R ^{n} max \frac{∥ x - A _{D}^{†} ( b ) ∥}{∥ A x - b ∥} .

\overline{χ} (A) = D \in D max y \neq = b b \in R ^{m} , y \in A ( R ^{n} ) max \frac{∥ y - A A _{D}^{†} ( b ) ∥}{∥ y - b ∥} .

\overline{χ} (A) = D \in D max y \neq = b b \in R ^{m} , y \in A ( R ^{n} ) max \frac{∥ y - A A _{D}^{†} ( b ) ∥}{∥ y - b ∥} .

χ (A) = χ (S A) and \overline{χ} (A) = \overline{χ} (S A), for all S \in S .

χ (A) = χ (S A) and \overline{χ} (A) = \overline{χ} (S A), for all S \in S .

\overline{χ} (A) = \overline{χ} (A R), for all non-singular R \in R^{n \times n} .

\overline{χ} (A) = \overline{χ} (A R), for all non-singular R \in R^{n \times n} .

\overline{χ} (A) = non-singular R \in R ^{n \times n} min ∥ A R ∥ \cdot χ (A R) .

\overline{χ} (A) = non-singular R \in R ^{n \times n} min ∥ A R ∥ \cdot χ (A R) .

\overline{χ} (A) \leq ∥ A ∥ \cdot χ (A) .

\overline{χ} (A) \leq ∥ A ∥ \cdot χ (A) .

χ (A) = ∥ A^{- 1} ∥.

χ (A) = ∥ A^{- 1} ∥.

χ (A) = A _{J} non-singular J \subseteq [ m ] max ∥ A_{J}^{- 1} ∥ = A _{J} non-singular J \subseteq [ m ] max ∥ A _{J}^{T} v ∥ = 1 v \in R ^{J} max ∥ v ∥.

χ (A) = A _{J} non-singular J \subseteq [ m ] max ∥ A_{J}^{- 1} ∥ = A _{J} non-singular J \subseteq [ m ] max ∥ A _{J}^{T} v ∥ = 1 v \in R ^{J} max ∥ v ∥.

A x \leq b .

A x \leq b .

P_{A} (b) := {x \in R^{n} : A x \leq b} .

P_{A} (b) := {x \in R^{n} : A x \leq b} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMatrix Theory and Algorithms · Advanced Optimization Algorithms Research · Sparse and Compressive Sensing Techniques

Full text

Equivalence and invariance of the chi and Hoffman constants of a matrix

Javier F. Peña Tepper School of Business, Carnegie Mellon University, USA, [email protected]

Juan C. Vera Department of Econometrics and Operations Research, Tilburg University, The Netherlands, [email protected]

Luis F. Zuluaga Department of Industrial and Systems Engineering, Lehigh University, USA, [email protected]

Abstract

We show that the following two condition measures of a full column rank matrix $A\in{\mathbb{R}}^{m\times n}$ are identical: the chi constant and a signed Hoffman constant. This identity is naturally suggested by the evident invariance of the chi constant under sign changes of the rows of $A$ . We also show that similar equivalence and invariance properties extend to variants of the chi and Hoffman constants that depend only on the linear subspace $A({\mathbb{R}}^{n}):=\{Ax:x\in{\mathbb{R}}^{n}\}\subseteq{\mathbb{R}}^{m}$ . Finally, we show similar identities between the chi constants and signed versions of Renegar’s and Grassmannian condition measures.

AMS Subject Classification: 65K10, 65F22, 90C25 90C57

Keywords: Condition measures, invariance, weighted least squares, linear inequalities

1 Introduction

We show a novel equivalence between the following two condition measures of a matrix that play central roles in numerical linear algebra and in convex optimization: the chi measure [3, 5, 6, 29, 30] and the Hoffman constant [12, 9, 14, 34]. We also show some similar equivalences for some variants of these constants.

Let $A\in{\mathbb{R}}^{m\times n}$ be a full column rank matrix. The chi constant $\chi(A)$ and its variant $\overline{\chi}(A)$ arise in the analysis of weighted least squares problems [4, 6, 7, 13]. In particular, $\overline{\chi}(A)$ plays a central role in the analysis of Vavasis and Ye’s interior-point algorithm for linear programming [19, 33]. A remarkable feature of Vavasis and Ye’s algorithm is its sole dependence on the matrix $A$ defining the primal and dual constraints.

The Hoffman constant $H(A)$ is associated to Hoffman’s Lemma [12, 9], a fundamental error bound for systems of linear constraints of the form $Ax\leq b$ . The Hoffman constant and other similar error bounds are used to establish the convergence rate of a wide variety of optimization algorithms [34, 2, 8, 10, 15, 16, 18, 20, 23, 24, 34].

As we discuss in Section 2, the chi constant $\chi(A)$ and its variant $\overline{\chi}(A)$ can be seen as measures of worst behavior of a canonical solution mapping for the following weighted least squares problems

[TABLE]

where $D$ is a diagonal matrix with positive diagonal entries.

Similarly, the Hoffman constant $H(A)$ and its variant $\overline{H}(A)$ can be seen as measures of worst behavior of a canonical solution mapping for the following system of linear inequalities

[TABLE]

It is not immediately obvious that there should be a relationship between the chi and Hoffman constants. Nonetheless, it is known that $H(A)\leq\chi(A)$ and that $\chi(A)$ can be arbitrarily larger [11, 26]. Thus an equivalence between the constants $H(A)$ and $\chi(A)$ appears impossible. The main goal of this paper is to show that this apparent impossibility can be attributed to and rectified via a canonical sign invariance property of $\chi(A)$ detailed in equation (2) below. Namely, the constant $\chi(A)$ does not change when the signs of some of the rows of $A$ are flipped as the solution mapping (7) satisfies this sign invariance property. On the other hand, the constant $H(A)$ does not satisfy this sign invariance property and thus $H(A)$ and $\chi(A)$ cannot be identical. Our main result (Theorem 1) shows that $\chi(A)$ and $H(A)$ become identical after properly tweaking $H(A)$ to ensure the sign invariance property.

A similar type of invariance consideration yields identities between the variants $\overline{\chi}(A)$ and $\overline{H}(A)$ . Our developments can be further extended to obtain analogous identities between the four measures $\chi(A),\overline{\chi}(A),H(A),\overline{H}(A)$ and the following two popular condition measures for systems of linear inequalities: Renegar’s distance to ill-posedness $\mathcal{R}(A)$ [27] and the Grassmannian condition measure $\mathcal{G}(A)$ [1].

The above developments are similar in spirit to results previously derived by Tunçel [32], by Todd, Tunçel, and Ye [31], and by Ho and Tunçel [11]. These articles compare various condition measures for linear programming including the chi and Hoffman constants. However, there are two major differences between our developments and theirs. First, most of the results in [32, 11, 31] provide only inequalities and hence are weaker than our identities concerning the chi and Hoffman constants. Second, the articles [32, 11, 31] do not deal with Renegar’s and Grassmannian condition measures but instead relate the chi and Hoffman constants with Ye’s condition measure [35] for polyhedra of the form $\{A^{\text{\sf T}}y:y\geq 0,\|y\|_{1}=1\}$ . Hence we deliberately chose not to discuss Ye’s condition measure in this paper. However, we note that our results can be extended to identities involving Ye’s condition measure by drawing on the recent work by Peña and Roshchina [25].

To formally state the sign invariance property, we rely on the following convenient notation. Let ${\mathscr{S}}\subseteq{\mathbb{R}}^{m\times m}$ denote the set of signature matrices defined as follows

[TABLE]

The constant $\chi(A)$ satisfies the following sign invariance property:

[TABLE]

Our main result states that $\chi(A)$ and $H(A)$ become identical if we take a suitable closure of $H(A)$ to ensure the sign invariance property.

Theorem 1.

Let $A\in{\mathbb{R}}^{m\times n}$ be a full column-rank matrix. Then

[TABLE]

A similar type of invariance property relates the measures $\chi(A)$ and $\overline{\chi}(A)$ . The construction of $\overline{\chi}(A)$ depends only on the subspace $A({\mathbb{R}}^{n})$ . Thus $\overline{\chi}(A)$ readily satisfies the following invariance under right multiplication by non-singular matrices

[TABLE]

In analogy to Theorem 1, the measures $\chi(A)$ and $\overline{\chi}(A)$ become identical if we take a suitable closure of $\chi(A)$ to ensure the same invariance under right multiplication by non-singular matrices (see Proposition 1):

[TABLE]

Furthermore, the same kind of identity holds for the measures $H(A)$ and $\overline{H}(A)$ (see Proposition 3):

[TABLE]

In particular, identity (3) in Theorem 1 readily extends to the measures $\overline{\chi}(A)$ and $\overline{H}(A)$ as follows (see Corollary 1):

[TABLE]

Our proof of Theorem 1 will actually show the following stronger identity when all rows of $A$ are non-zero (see Theorem 2):

[TABLE]

This stronger identity in turn yields some interesting connections with Renegar’s distance to ill-posedness $\mathcal{R}(A)$ [27, 28] and the Grassmannian condition number of $\mathcal{G}(A)$ [1]. More precisely, in Section 4 we show the following identity analogous to (3) (see Proposition 5):

[TABLE]

and the following identity analogous to (4) (see Corollary 2):

[TABLE]

The main sections of the paper are organized as follows. Section 2 recalls the construction of the chi constants $\chi(A),\overline{\chi}(A)$ as well as the Hoffman constants $H(A),\overline{H}(A)$ and some of their main properties. Our presentation deliberately follows separate but similar formats for $\chi(A),\overline{\chi}(A)$ and for $H(A),\overline{H}(A)$ . Section 3 presents the proof of Theorem 1. To do so, we state and prove the stronger Theorem 2. Finally, Section 4 recalls the construction of Renegar’s condition measure $\mathcal{R}(A)$ and of the Grassmannian condition measure $\mathcal{G}(A)$ . This section also proves identities (5) and (6).

Throughout the paper whenever we encounter an Euclidean space ${\mathbb{R}}^{d}$ we implicitly assume that it is endowed with the Euclidean norm defined by the canonical inner product in ${\mathbb{R}}^{d}$ , that is, $\|u\|:=\sqrt{u^{\text{\sf T}}u}$ for all $u\in{\mathbb{R}}^{d}$ . Likewise, whenever we encounter a space of matrices ${\mathbb{R}}^{p\times d}$ we implicitly assume that it is endowed with the operator norm, that is,

[TABLE]

for all $A\in{\mathbb{R}}^{p\times d}$ .

2 Definition and properties of the chi and Hoffman constants

This section recalls the construction and main properties of the constants $\chi(A),\overline{\chi}(A)$ and $H(A),\overline{H}(A)$ . These constants can be seen as condition measures for two fundamental problems in scientific computing, namely weighted least squares and linear inequalities.

2.1 Weighted least squares

Let ${\mathscr{D}}\subseteq{\mathbb{R}}^{m\times m}$ denote the set of diagonal matrices in ${\mathbb{R}}^{m\times m}$ with positive diagonal entries. That is,

[TABLE]

where ${\mathbb{R}}^{m}_{++}\subseteq{\mathbb{R}}^{m}$ denotes the set of vectors in ${\mathbb{R}}^{m}$ with positive entries.

Suppose $A\in{\mathbb{R}}^{m\times n}$ . Given $D\in{\mathscr{D}}$ , consider the weighted least squares problem

[TABLE]

When $A$ is full column-rank, it is easy to see that the solution to (7) is precisely $A_{D}^{\dagger}b$ where $A_{D}^{\dagger}$ is the following weighted pseudo-inverse of $A$ [6, 29]:

[TABLE]

2.1.1 Condition measures $\chi(A)$ and $\overline{\chi}(A)$

Suppose $A\in{\mathbb{R}}^{m\times n}$ is full column-rank. The condition measure $\chi(A)$ is defined as the following worst-case characteristic of the family of solution mappings $A^{\dagger}_{D}:{\mathbb{R}}^{m}\rightarrow{\mathbb{R}}^{n}$ constructed via (8):

[TABLE]

Consider the following alternative reformulation of the weighted least-squares problem (7) in the subspace $A({\mathbb{R}}^{n})$ :

[TABLE]

The solution to (10) is evidently the $D$ -projection of $b$ onto $A({\mathbb{R}}^{n})$ . Once again, it is easy to see that if $A$ is full column-rank then the $D$ -projection onto $A({\mathbb{R}}^{n})$ is

[TABLE]

The condition measure $\overline{\chi}(A)$ is defined as the following worst-case characteristic of the family of solution mappings $AA_{D}^{\dagger}:{\mathbb{R}}^{m}\rightarrow A({\mathbb{R}}^{n})$ :

[TABLE]

Although it is not immediately evident, the constants $\chi(A)$ and $\overline{\chi}(A)$ are finite for any full-rank matrix $A\in{\mathbb{R}}^{m\times n}$ . This fact was independently shown by Ben-Tal and Teboulle [3], Dikin [5], Stewart [29], and Todd [30]. The constants $\chi(A)$ and $\overline{\chi}(A)$ arise in and play a key role in weighted least-squares problems [6, 7, 4] and in linear programming [11, 31, 32, 33].

We record some alternative expressions for $\chi(A)$ and $\overline{\chi}(A)$ that are closely related to the constructions of $H(A)$ and $\overline{H}(A)$ discussed below. First, observe that

[TABLE]

Second, observe that

[TABLE]

2.1.2 Properties of $\chi(A)$ and $\overline{\chi}(A)$

Suppose $A\in{\mathbb{R}}^{m\times n}$ is full column-rank and $D\in{\mathscr{D}}$ . By construction, the solution mappings $A_{D}^{\dagger}$ and $AA_{D}^{\dagger}$ satisfy the following property: For $S\in{\mathscr{S}}$ then $(SA)_{D}^{\dagger}=A_{D}^{\dagger}S$ . In particular $\|(SA)_{D}^{\dagger}\|=\|A_{D}^{\dagger}\|$ and $\|(SA)(SA)_{D}^{\dagger}\|=\|AA_{D}^{\dagger}\|$ . Therefore (9) and (11) imply that the constants $\chi(A)$ and $\overline{\chi}(A)$ satisfy the following sign invariance property:

[TABLE]

Furthermore, the quantity $\overline{\chi}(A)$ depends only on the subspace $A({\mathbb{R}}^{n})$ which evidently satisfies $A({\mathbb{R}}^{n})=AR({\mathbb{R}}^{n})$ for all non-singular $R\in{\mathbb{R}}^{n\times n}$ . Therefore, the constant $\overline{\chi}(A)$ is invariant under multiplication by non-singular matrices, that is,

[TABLE]

The constant $\chi(A)$ is not invariant under multiplication by singular matrices. Proposition 1 shows that $\overline{\chi}(A)$ is the closure of $\chi(A)$ under this kind of invariance.

Proposition 1.

Suppose $A\in{\mathbb{R}}^{m\times n}$ is full column-rank. Then $\overline{\chi}(A)\leq\|A\|\cdot\chi(A)$ and $\overline{\chi}(A)=\chi(A)$ when the columns of $A$ are orthonormal. In particular,

[TABLE]

Proof.

Since $\|AA_{D}^{\dagger}\|\leq\|A\|\cdot\|A_{D}^{\dagger}\|$ , the construction (9) and (11) of $\chi(A)$ and $\overline{\chi}(A)$ readily implies that

[TABLE]

Next, we show that $\overline{\chi}(A)=\chi(A)$ when the columns of $A$ are orthonormal. To that end, observe that if the columns of $A$ are orthonormal then $\|Ax\|=\|x\|$ for all $x\in{\mathbb{R}}^{n}$ . In particular, if the columns of $A$ are orthonormal then $\|AA_{D}^{\dagger}\|=\|A_{D}^{\dagger}\|$ for all $D\in{\mathscr{D}}$ . Thus (9) and (11) imply that $\overline{\chi}(A)=\chi(A)$ .

Finally, from (12) and (14) it follows that $\overline{\chi}(A)=\overline{\chi}(AR)\leq\|AR\|\cdot\chi(AR)$ for all $R\in{\mathbb{R}}^{m\times m}$ non-singular. Thus (13) follows.

∎

In the special case when $m=n$ and $A\in{\mathbb{R}}^{n\times n}$ is non-singular it is easy to see that

[TABLE]

We will rely on the following related characterization of $\chi(A)$ from [6]. The same characterization is also stated and proved in [36] by adapting a technique from [31]. In the statement below for $A\in{\mathbb{R}}^{m\times n}$ and $J\subseteq[m]:=\{1,\dots,m\}$ the matrix $A_{J}\in{\mathbb{R}}^{J\times n}$ denotes the $|J|\times n$ submatrix of $A$ defined by the rows of $A$ indexed by $J$ .

Proposition 2.

Suppose $A\in{\mathbb{R}}^{m\times n}$ has full column-rank. Then

[TABLE]

2.2 Linear inequalities

Suppose $A\in{\mathbb{R}}^{m\times n}$ . Consider the feasibility problem

[TABLE]

The solution of (16) is the set

[TABLE]

Observe that $P_{A}(b)\neq\emptyset$ if and only if $b\in A({\mathbb{R}}^{n})+{\mathbb{R}}^{m}_{+}$ .

2.2.1 Condition measures $H(A)$ and $\overline{H}(A)$

Suppose $A\in{\mathbb{R}}^{m\times n}$ is a nonzero matrix. The condition measure $H(A)$ is defined as the following worst-case characteristic of the solution mapping $P_{A}:{\mathbb{R}}^{m}\rightrightarrows{\mathbb{R}}^{n}$ constructed via (17):

[TABLE]

Here and throughout the paper, ${\mathrm{dist}}(u,S)$ denotes the following point-to-set distance for all $u\in{\mathbb{R}}^{d}$ and $S\subseteq{\mathbb{R}}^{d}$ :

[TABLE]

The constant $H(A)$ can be equivalently defined as the smallest constant depending only on $A$ such that the following error bound holds for all $b\in A({\mathbb{R}}^{n})+{\mathbb{R}}^{m}_{+}$ and all $x\in{\mathbb{R}}^{n}$ :

[TABLE]

Again, it is not immediately evident that $H(A)$ is finite. This fact was shown by Hoffman in his seminal paper [12]. Other proofs of this fundamental result can be found in [9, 26, 34]. After Hoffman’s initial work, the literature in error bounds has developed extensively [17, 18, 21, 22, 23, 37]. Error bounds play a key role in optimization and variational analysis. In particular, error bounds are widely used to established the convergence rate of a variety of algorithms [2, 8, 10, 15, 16, 18, 20, 23, 24, 34].

Consider the following reformulation of (16) in the subspace $A({\mathbb{R}}^{n})$ :

[TABLE]

The solution of (19) is the set

[TABLE]

Define $\overline{H}(A)$ as the following worst-case characteristic of the solution mapping $AP_{A}:{\mathbb{R}}^{m}\rightrightarrows A({\mathbb{R}}^{n})$ :

[TABLE]

The constant $\overline{H}(A)$ can be equivalently defined as the smallest constant depending only on the subspace $A({\mathbb{R}}^{n})$ such that the following error bound holds for all $b\in A({\mathbb{R}}^{n})+{\mathbb{R}}^{m}_{+}$ and $v\in A({\mathbb{R}}^{n})+b$

[TABLE]

2.2.2 Properties of $H(A)$ and $\overline{H}(A)$

By construction, $\overline{H}(A)$ depends only on $A({\mathbb{R}}^{n})$ and thus is invariant under multiplication by non-singular matrices, i.e.,

[TABLE]

On the other hand, $H(A)$ is not invariant under multiplication by non-singular matrices. Proposition 3 shows that $\overline{H}(A)$ is the closure of $H(A)$ under this kind of invariance.

Proposition 3.

Suppose $A\in{\mathbb{R}}^{m\times n}$ is a nonzero matrix. Then $\overline{H}(A)\leq\|A\|\cdot H(A)$ and $\overline{H}(A)=H(A)$ when the nonzero columns of $A$ are orthonormal. In particular,

[TABLE]

Proof.

This proof is similar to the proof of Proposition 1. Observe that ${\mathrm{dist}}(Ax,AP_{A}(b))\leq\|A\|\cdot{\mathrm{dist}}(x,P_{A}(b))$ for all $x\in{\mathbb{R}}^{n}$ because $\|Ax-Au\|\leq\|A\|\cdot\|x-u\|$ for all $x,u\in{\mathbb{R}}^{n}$ . Hence (18) and (20) imply that

[TABLE]

We next show that $\overline{H}(A)=H(A)$ when the nonzero columns of $A$ are orthonormal. For ease of exposition, consider first the case when all columns of $A$ are nonzero and orthonormal. In this case it is easy to see that $y\in A({\mathbb{R}}^{n})$ if and only if $y=Ax$ for some unique $x\in{\mathbb{R}}^{n}$ with $\|y\|=\|x\|$ . Therefore ${\mathrm{dist}}(y,AP_{A}(b))={\mathrm{dist}}(x,P_{A}(b))$ for all $y=Ax\in A({\mathbb{R}}^{n})$ . From (18) and (20) it follows that $\overline{H}(A)=H(A)$ .

Next consider the more general case when some columns of $A$ are zero. Without loss of generality assume that $A=\begin{bmatrix}\tilde{A}&0\end{bmatrix}$ for some $\tilde{A}\in{\mathbb{R}}^{m\times k}$ with nonzero orthonormal columns for some $k<n$ . Since the columns of $\tilde{A}$ are orthonormal, we have $\overline{H}(\tilde{A})=H(\tilde{A})$ . To finish, it suffices to show that $\overline{H}(A)=\overline{H}(\tilde{A})$ and $H(A)=H(\tilde{A})$ . Indeed, $\overline{H}(A)=\overline{H}(\tilde{A})$ holds because $A({\mathbb{R}}^{n})=\tilde{A}({\mathbb{R}}^{k})$ and $AP_{A}(b)=\tilde{A}P_{\tilde{A}}(b)$ . On the other hand, for $x\in{\mathbb{R}}^{n}$ let $\tilde{x}\in{\mathbb{R}}^{k}$ denote the subvector of first $k$ entries of $x$ . Then $Ax=\tilde{A}\tilde{x}$ for all $x\in{\mathbb{R}}^{n}$ and thus $P_{A}(b)=P_{\tilde{A}}(b)\times{\mathbb{R}}^{n-k}$ . Hence

[TABLE]

Finally from (21) and (23) it follows that $\overline{H}(A)=\overline{H}(AR)\leq\|AR\|\cdot H(AR)$ for all $R\in{\mathbb{R}}^{m\times m}$ non-singular. Thus (22) follows.

∎

We will also rely on the following two properties of $H(A)$ . First, in the special case when $A({\mathbb{R}}^{n})\cap{\mathbb{R}}^{m}_{++}\neq\emptyset$ or equivalently $A({\mathbb{R}}^{n})+{\mathbb{R}}^{m}_{+}={\mathbb{R}}^{m}$ we have [26, Corollary 1]

[TABLE]

Second, for general $A\in{\mathbb{R}}^{m\times n}$ we have the following related characterization of $H(A)$ discussed in [26] but that can be traced back to [14, 34, 36].

Proposition 4.

Suppose $A\in{\mathbb{R}}^{m\times n}$ is full column-rank. Then

[TABLE]

Observe both the similarity and subtle difference between the right-most expressions in the characterization (15) of $\chi(A)$ in Proposition 2 and the characterization (25) of $H(A)$ in Proposition 4: the first maximum is taken over the same collection of sets $J$ in both (15) and (25) whereas the second maximum is taken over $v\in{\mathbb{R}}^{J}$ in (15) and over $v\in{\mathbb{R}}^{J}_{+}$ in (25).

3 Proof of Theorem 1

We will prove the following stronger version of Theorem 1.

Theorem 2.

Let $A\in{\mathbb{R}}^{m\times n}$ be a full column-rank matrix. Then

[TABLE]

where $\mathbf{A}\in{\mathbb{R}}^{2m\times n}$ is the column-wise concatenation of $A$ and $-A$ , that is,

[TABLE]

Furthermore, if all rows of $A$ are nonzero then (26) can be sharpened to

[TABLE]

Proof.

From (15) in Proposition 2 and (25) in Proposition 4 it immediately follows that $H(A)\leq\chi(A)$ . Thus the sign invariance of $\chi(A)$ readily yields

[TABLE]

To prove the reverse inequality we rely on (15) and (25) again. Suppose $\hat{J}\subseteq[m]$ is such that $A_{\hat{J}}$ is non-singular and

[TABLE]

Thus $\chi(A)=\|\hat{v}\|$ for some $\hat{v}\in{\mathbb{R}}^{\hat{J}}$ such that $\|A_{\hat{J}}^{\text{\sf T}}\hat{v}\|=1$ . Choose $\hat{S}\in{\mathscr{S}}$ such that $\hat{S}_{ii}=\text{sign}(v_{i})$ for each $i\in\hat{J}$ and let $u:=\hat{S}_{\hat{J}}\hat{v}\in{\mathbb{R}}^{\hat{J}}_{+}$ . Observe that $(\hat{S}A)_{J}=\hat{S}_{\hat{J}}A_{\hat{J}}$ is nonsingular and

[TABLE]

Therefore

[TABLE]

Thus the first identity in (26) is established. Next, Proposition 2 and Proposition 4 imply that for all $S\in{\mathscr{S}}$

[TABLE]

The second inequality follows because all rows of $SA$ are rows of $\mathbf{A}$ as well. Hence by taking the maximum over $S\in{\mathscr{S}}$ and applying the first identity in (26), we obtain the second identity in (26).

When all rows of $A$ are non-zero, it follows that $A\tilde{v}$ has all nonzero entries for an arbitrarily small perturbation $\tilde{v}$ of $\hat{v}$ . Therefore the matrix $\hat{S}\in{\mathscr{S}}$ above can be chosen so that both $\hat{S}_{\hat{J}}\hat{v}\in{\mathbb{R}}^{\hat{J}}_{+}$ and $\hat{S}A^{\text{\sf T}}\tilde{v}\in{\mathbb{R}}^{m}_{++}.$ Thus the sharper identity (28) follows. ∎

Corollary 1.

Let $A\in{\mathbb{R}}^{m\times n}$ be a full column-rank matrix. Then

[TABLE]

where $\mathbf{A}$ is as in (27). Furthermore, if all rows of $A$ are nonzero then

[TABLE]

Proof.

This is an immediate consequence of Theorem 2, Proposition 1 and Proposition 3. ∎

We note that when $A\in{\mathbb{R}}^{m\times n}$ is full column-rank but some rows of $A\in{\mathbb{R}}^{m\times n}$ are zero, then the following amended version of (28) holds for the submatrix $\tilde{A}\in{\mathbb{R}}^{\ell\times n}$ obtained after deleting the zero rows from $A$ :

[TABLE]

The construction of $\chi(A)$ and $H(A)$ enables us to rewrite the latter identity as follows

[TABLE]

4 Renegar’s and Grassmannian condition numbers

Suppose $A\in{\mathbb{R}}^{m\times n}$ is such that $A({\mathbb{R}}^{n})\cap{\mathbb{R}}^{m}_{++}\neq\emptyset$ . This property can be equivalently stated as $A({\mathbb{R}}^{n})+{\mathbb{R}}^{m}_{+}={\mathbb{R}}^{m}$ , that is, for all $b\in{\mathbb{R}}^{m}$ the system of linear inequalities

[TABLE]

is feasible. In his seminal paper on condition measures for optimization [27], Renegar defined the distance to infeasibility of $A$ as the smallest perturbation that can be made on $A$ so that this property is lost. That is

[TABLE]

Renegar also defined $\|A\|/\mathcal{R}(A)$ as a condition number of $A$ .

We have the following characterization of $\chi(A)$ in terms $\mathcal{R}(A)$ analogous to that in Theorem 2.

Proposition 5.

Let $A\in{\mathbb{R}}^{m\times n}$ be a full column-rank matrix. If $A({\mathbb{R}}^{n})\cap{\mathbb{R}}^{m}_{++}\neq\emptyset$ then $H(A)=1/\mathcal{R}(A)$ . Consequently, if all rows of full column-rank matrix $A\in{\mathbb{R}}^{m\times n}$ are nonzero then

[TABLE]

Proof.

When $A({\mathbb{R}}^{n})\cap{\mathbb{R}}^{m}_{++}\neq\emptyset$ , the distance to ill-posedness $\mathcal{R}(A)$ has the following property similar in spirit to Proposition 2 and Proposition 4 (see[28, Theorem 3.5]):

[TABLE]

From (24) and (30) it follows that $H(A)=1/\mathcal{R}(A)$ when $A({\mathbb{R}}^{n})\cap{\mathbb{R}}^{m}_{++}\neq\emptyset$ . The latter condition and (28) in turn imply (29) if all rows of $A$ are nonzero. ∎

Ameluxen and Burgisser [1] proposed a condition number via the Grassmannian manifold of linear subspaces of ${\mathbb{R}}^{m}$ of some fixed dimension. This condition number can be seen as a variant of Renegar’s condition measure that depends only on $A({\mathbb{R}}^{n})$ akin to the variants $\overline{\chi}(A)$ and $\overline{H}(A)$ of $\chi(A)$ and $H(A)$ respectively. We next recall the description of the Grassmannian condition number proposed by Ameluxen and Burgisser [1]. First, define the Grassmannian distance ${\mathrm{dist}}(L,L^{\prime})$ between two linear subspaces $L,L^{\prime}\subseteq{\mathbb{R}}^{m}$ of the same dimension as

[TABLE]

where $\Pi_{L}$ and $\Pi_{L^{\prime}}$ denote the orthogonal projection matrices onto $L$ and $L^{\prime}$ respectively.

Suppose $A\in{\mathbb{R}}^{m\times n}$ satisfies $A({\mathbb{R}}^{n})\cap{\mathbb{R}}^{m}_{++}\neq\emptyset$ . Let $L:=A({\mathbb{R}}^{n})$ and define the Grassmannian condition number of $A$ as follows

[TABLE]

Since $\mathcal{G}(A)$ depends only on $A({\mathbb{R}}^{n})$ , it automatically satisfies the following invariance property just as $\overline{\chi}(A)$ and $\overline{H}(A)$ do: For all non-singular $R\in{\mathbb{R}}^{m\times m}$

[TABLE]

The pair of quantities $1/\mathcal{R}(A),\mathcal{G}(A)$ are related to each other in the same way the pairs of quantities $\chi(A),\overline{\chi}(A)$ and $H(A),\overline{H}(A)$ are. More precisely, we have the following analogue of Proposition 1 and Proposition 3.

Proposition 6.

Suppose $A\in{\mathbb{R}}^{m\times n}$ is a nonzero matrix and $A({\mathbb{R}}^{n})\cap{\mathbb{R}}^{m}_{++}\neq\emptyset$ . Then $\mathcal{G}(A)\leq\|A\|/\mathcal{R}(A)$ and $\mathcal{G}(A)=1/\mathcal{R}(A)$ when the non-zero columns of $A$ are orthonormal. Consequently, if $A\in{\mathbb{R}}^{m\times n}$ is a nonzero matrix

[TABLE]

Proof.

Suppose $A\in{\mathbb{R}}^{m\times n}$ and $A({\mathbb{R}}^{n})\cap{\mathbb{R}}^{m}_{++}\neq\emptyset$ . Then the inequality $\mathcal{G}(A)\leq 1/\mathcal{R}(A)$ follows from [1, Theorem 1.4] and the identity $\mathcal{G}(A)=1/\mathcal{R}(A)$ when the nonzero columns of $A$ are orthonormal follows from [1, Theorem 1.3]. The latter two facts and (31) in turn imply (32) when $A\in{\mathbb{R}}^{m\times n}$ is a nonzero matrix. ∎

We conclude with the following characterization of $\overline{\chi}(A)$ in terms $\mathcal{G}(A)$ analogous to that in Corollary 1.

Corollary 2.

Suppose $A\in{\mathbb{R}}^{m\times n}$ is a full column-rank matrix and all rows of $A$ are nonzero. Then

[TABLE]

Proof.

This is an immediate consequence of Proposition 1, Proposition 5, and Proposition 6. ∎

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. Amelunxen and P. Bürgisser. A coordinate-free condition number for convex programming. SIAM J. on Optim. , 22(3):1029–1041, 2012.
2[2] A. Beck and S. Shtern. Linearly convergent away-step conditional gradient for non-strongly convex functions. Mathematical Programming , 164:1–27, 2017.
3[3] A. Ben-Tal and M. Teboulle. A geometric property of the least squares solution of linear equations. Linear Algebra and its Applications , 139:165–170, 1990.
4[4] E. Bobrovnikova and S. Vavasis. Accurate solution of weighted least squares by iterative methods. SIAM Journal on Matrix Analysis and Applications , 22(4):1153–1174, 2001.
5[5] I. Dikin. On the speed of an iterative process. Upravlyaemye Sistemi , 12(1):54–60, 1974.
6[6] A. Forsgren. On linear least-squares problems with diagonally dominant weight matrices. SIAM Journal on Matrix Analysis and Applications , 17(4):763–788, 1996.
7[7] A. Forsgren and G. Sporre. On weighted linear least-squares problems related to interior methods for convex quadratic programming. SIAM Journal on Matrix Analysis and Applications , 23(1):42–56, 2001.
8[8] D. Garber. Fast rates for online gradient descent without strong convexity via Hoffman’s bound. ar Xiv preprint ar Xiv:1802.04623 , 2018.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Equivalence and invariance of the chi and Hoffman constants of a matrix

Abstract

1 Introduction

Theorem 1**.**

2 Definition and properties of the chi and Hoffman constants

2.1 Weighted least squares

2.1.1 Condition measures χ(A)\chi(A)χ(A) and χ‾(A)\overline{\chi}(A)χ​(A)

2.1.2 Properties of χ(A)\chi(A)χ(A) and χ‾(A)\overline{\chi}(A)χ​(A)

Proposition 1**.**

Proof.

Proposition 2**.**

2.2 Linear inequalities

2.2.1 Condition measures H(A)H(A)H(A) and H‾(A)\overline{H}(A)H(A)

2.2.2 Properties of H(A)H(A)H(A) and H‾(A)\overline{H}(A)H(A)

Proposition 3**.**

Proof.

Proposition 4**.**

3 Proof of Theorem 1

Theorem 2**.**

Proof.

Corollary 1**.**

Proof.

4 Renegar’s and Grassmannian condition numbers

Proposition 5**.**

Proof.

Proposition 6**.**

Proof.

Corollary 2**.**

Proof.

Theorem 1.

2.1.1 Condition measures $\chi(A)$ and $\overline{\chi}(A)$

2.1.2 Properties of $\chi(A)$ and $\overline{\chi}(A)$

Proposition 1.

Proposition 2.

2.2.1 Condition measures $H(A)$ and $\overline{H}(A)$

2.2.2 Properties of $H(A)$ and $\overline{H}(A)$

Proposition 3.

Proposition 4.

Theorem 2.

Corollary 1.

Proposition 5.

Proposition 6.

Corollary 2.