A Rank Revealing Factorization Using Arbitrary Norms

Reid Atcheson

arXiv:1905.02355·math.NA·May 27, 2019

A Rank Revealing Factorization Using Arbitrary Norms

Reid Atcheson

PDF

Open Access

TL;DR

This paper generalizes the rank-revealing QR factorization to arbitrary norms, enabling low-rank approximations with different error metrics, including the $l^1$ norm, and provides practical Python implementation.

Contribution

It introduces a generalized QR factorization framework for arbitrary norms and demonstrates its application to $l^1$ norm low-rank approximation.

Findings

01

Generalized QR factorization for any norm with analogous properties.

02

Application to $l^1$ norm low-rank approximation.

03

Provided Python code for practical implementation.

Abstract

The classic rank-revealing QR factorization factorizes a matrix $A$ as $A P = QR$ where $P$ permutes the columns of $A$ , $Q$ is an orthogonal matrix, and $R$ is upper triangular with non-increasing diagonal entries. This is called rank-revealing because careful choice of $P$ allows the user to truncate the factorization for a low-rank approximation of $A$ with an error term computed in the $l^{2}$ norm. In this paper I generalize the QR factorization to use any arbitrary norm and prove analogous properties for $Q$ and $R$ in this setting. I then show an application of this algorithm to compute low-rank approximations to $A$ with error term in the $l^{1}$ norm instead of the $l^{2}$ norm. I provide Python code for the $l^{1}$ case as demonstration of the idea.

Figures23

Click any figure to enlarge with its caption.

Equations63

A

A

Q

P

Q^{i} = (Q_{1}, Q_{2}, \dots, Q_{i}) .

Q^{i} = (Q_{1}, Q_{2}, \dots, Q_{i}) .

k

k

P_{1}

Q_{1}

R (1, 1)

k_{j}

k_{j}

c_{j}

γ_{j}

P_{j + 1}

Q^{j + 1}

R (j, 1 : j - 1)

R (j, j)

A P = QR

A P = QR

R is upper triangular with nonincreasing diagonal entries

R is upper triangular with nonincreasing diagonal entries

∥ x ∥ = 1 max ∥ Q x ∥ \leq C_{1}

∥ x ∥ = 1 max ∥ Q x ∥ \leq C_{1}

∥ x ∥ = 1 min ∥ Q x ∥ \geq C_{2}

∥ x ∥ = 1 min ∥ Q x ∥ \geq C_{2}

Q^{j + 1} R (1 : j + 1, 1 : j + 1)

Q^{j + 1} R (1 : j + 1, 1 : j + 1)

= (Q^{j} R (1 : j, 1 : j), Q_{j} c + γ_{j} Q_{j + 1})

= (A P^{j}, A_{k_{j}})

= A P^{j + 1}

R (1, 1)

R (1, 1)

\geq ∥ A_{k_{1}} ∥

\geq ∥ A_{k_{1}} - Q^{1} c ∥

= R (2, 2)

R (j, j)

R (j, j)

\geq i max c_{j + 1} \in R^{j + 1} min ∥ A_{i} - Q^{j + 1} c_{j + 1} ∥

= R (j + 1, j + 1)

∥ Q x ∥

∥ Q x ∥

\leq i = 1 \sum m ∥ Q_{i} x_{i} ∥

\leq i = 1 \sum m ∥ Q_{i} ∥∥ x_{i} ∥

\leq ∥ x ∥_{1}

Q^{j + 1} = (Q^{j}, γ_{j}^{- 1} (A_{k_{j}} - Q^{j} c_{j}))

Q^{j + 1} = (Q^{j}, γ_{j}^{- 1} (A_{k_{j}} - Q^{j} c_{j}))

c_{j} = c_{j} \in R^{j} arg min ∥ A_{k_{j}} - Q^{j} c_{j} ∥

c_{j} = c_{j} \in R^{j} arg min ∥ A_{k_{j}} - Q^{j} c_{j} ∥

x arg min ∥ b - A x ∥_{1}

x arg min ∥ b - A x ∥_{1}

t arg min i = 1 \sum m t_{i}

t arg min i = 1 \sum m t_{i}

b - A x

A x - b

t

A P_{1}

A P_{1}

A P_{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Face and Expression Recognition · Image and Signal Denoising Methods

Full text

\newsiamremark

remarkRemark \newsiamremarkhypothesisHypothesis

\newsiamthmclaimClaim \headersA Rank Revealing Factorization Using Arbitrary NormsR. Atcheson

A Rank Revealing Factorization Using Arbitrary Norms

Reid Atcheson Numerical Algorithms Group (). [email protected]

Abstract

The classic rank-revealing QR factorization factorizes a matrix $A$ as $AP=QR$ where $P$ permutes the columns of $A$ , $Q$ is an orthogonal matrix, and $R$ is upper triangular with non-increasing diagonal entries. This is called rank-revealing because careful choice of $P$ allows the user to truncate the factorization for a low-rank approximation of $A$ with an error term computed in the $l^{2}$ norm. In this paper I generalize the QR factorization to use any arbitrary norm and prove analogous properties for $Q$ and $R$ in this setting. I then show an application of this algorithm to compute low-rank approximations to $A$ with error term in the $l^{1}$ norm instead of the $l^{2}$ norm. I provide Python code for the $l^{1}$ case as demonstration of the idea.

keywords:

QR factorization,rank-revealing QR factorization,low-rank approximation

{AMS}

65F35, 65F30

1 Introduction

Low-rank approximation allows the user to compress an input matrix in a very informative way. The low-rank factors can provide useful information about the data which comprises the input matrix, which forms the basis of Principal Component Analysis (PCA). The gold-standard of low-rank approximations is the SVD factorization, which gives optimal low-rank approximations with respect to the Euclidean norm $\|\cdot\|_{2}$ . The problem with SVD is that algorithms for it typically must be iterative in nature, or even probabalistic. A non-iterative and deterministic algorithm which reveals rank information can therefore be useful.

The rank-revealing QR factorization [2] is a deterministic and non-iterative algorithm which provides rank information on the input matrix by way of the diagonal entries of its upper triangular factor. It turns out this factorization can in fact be used directly for low-rank approximation also, bypassing the SVD entirely, and this has been exploited heavily in areas such as hierarchical compression of matrices [3],[4]. Like with the SVD the quality of this low-rank approximation is often best in the Euclidean norm $\|\cdot\|_{2}$ because the $QR$ factorization is explicitly based on the Euclidean dot product. This optimality in the Euclidean norm has some undesirable properties in other fields however.

For some applications of data analysis the optimality of a low-rank approximation in the Euclidean norm results in unfavorable low-rank factors, because outliers in data can quickly overwhelm the Euclidean norm of that data, resulting in poor approximations. This has led to the field of ”L1 PCA” which tries to find optimal low rank approximations in the $l^{1}$ norm instead of the $l^{2}$ norm [11],[6],[7]. Unfortunately, since the $QR$ factorization is highly specialized to the Euclidean norm this suggests that rank-revealing $QR$ strategies can not help in domain. Thus this new area of low-rank approximation has moved in the direction of iterative or probabalistic SVD-like algorithms [7].

In this paper I show that the $QR$ factorization can be generalized to norms other than the Euclidean norm. I derive the algorithm, state and prove analogous properties of the resulting $Q$ and $R$ factors, and then show numerical results. This yields a deterministic and non-iterative algorithm with rank-revealing properties with the potential to give optimality in norms besides the Euclidean norm.

The paper is organized as follows. The main theory and algorithm is presented in Section 2, an implementation of this algorithm in python for the special case of the $l^{1}$ norm is in section Section 3, experimental results are in Section 4, and the conclusions follow in Section 5.

2 Main results

I start by presenting the algorithm that this paper is based on. This algorithm accepts a matrix $A\in\mathbb{R}^{m\times m}$ and any norm $\|\cdot\|$ on $\mathbb{R}^{m}$ and returns a permutation $P$ , an upper triangular matrix $R$ with nonincreasing diagonal, and $Q$ such that $AP=QR$ . I then prove key facts about this algorithm (theorem 2.1) and state a conjecture (conjecture 2.2). I also prove that when the input norm is equal to the Euclidean norm, then the factorization reduces to a classical $QR$ - in the sense that $Q$ becomes orthogonal. This is theorem 2.4. I start first with the algorithm 1 below.

The key theoretical result of this paper is summarized in theorem 2.1. Following this theorem is a conjecture which seems true based on numerical evidence supporting it (see section Section 4) but a full proof remains elusive. Finally I prove in theorem 2.4 that if $\|\cdot\|$ = $\|\cdot\|_{2}$ then algorithm 1 outputs $Q$ as orthogonal.

Theorem 2.1 (Arbitrary-norm Rank-Revealing $QR$ factorization ).

Suppose that $A\in\mathbb{R}^{m\times m}$ , $\|\cdot\|$ is a norm, and that $P,Q,R$ are output by algorithm 1.

Then the following properties hold:

[TABLE]

There exists a constant $C_{1}>0$ independent of $A$ such that

[TABLE]

Conjecture 2.2 (Inverse Bound).

Suppose that $A\in\mathbb{R}^{m\times m}$ , $\|\cdot\|$ is a norm, and that $P,Q,R$ are output by algorithm 1.

Then there exists a constant $C_{2}>0$ that depends only on the norm $\|\cdot\|$ such that

[TABLE]

Properties 15 and 16 are standard and precisely match the classical $QR$ factorization with column pivoting. Properties 17 and 18 perhaps require more explanation. In the classical $QR$ factorization the matrix $Q$ is orthogonal ( $Q^{T}Q=I$ ). Strictly speaking we could insist that $Q$ also be orthogonal in the above theorem, but the utility of orthogonality is lost when using norms different from the $l^{2}$ norm. This utility stems from the fact that the $l^{2}$ norm is derived from an inner product, so orthogonality has strong implications on the conditioning of $Q$ in this norm.

Thus to find an analogue to orthogonality I require that the matrix $Q$ be well conditioned. The bounds 17 and 18 prove that $Q$ is invertible (full-rank), but also that the conditioning of $Q$ does not depend on the conditioning of $A,$ which the theorem allows to be highly numerically singular. By way of example, if we were to state this theorem for $\|\cdot\|=\|\cdot\|_{2}$ then we would actually have $C_{1}=C_{2}=1$ .

I now prove theorem 2.1, minus the conjecture:

Proof 2.3.

To prove equation 15 note that $AP^{1}=Q^{1}R(1,1)$ follows directly from the base case definitions of these quantities. Now assume $AP^{j}=Q^{j}R(1:j,1:j)$ for some $j.$ Then

[TABLE]

For 16 it’s clear that $R$ is upper triangular, but to show that its diagonal entries are nonincreasing observe that from the optimality property of $c_{j}$ we have

[TABLE]

and for any $j>1$ :

[TABLE]

and finally for the conditioning properties 17 and 18 observe that if $\|x\|=1$ then

[TABLE]

*where the final inequality is a consequence of Holder’s inequality. Finally we may apply norm equivalence between all norms in finite dimensional spaces to choose $C_{1}>0$ such that $\|x\|_{1}\leq C_{1}\|x\|$ holds for all $x$ to complete the proof of 17. The bound 18 remains conjecture, but is supported by numerical evidence in section Section 4 *

Theorem 2.4 (Classic QR as Special Case ).

Suppose that $A\in\mathbb{R}^{m\times m}$ , $\|\cdot\|_{2}$ is the $l^{2}$ norm, and that $P,Q,R$ are output by algorithm 1.

*Then $Q$ is orthogonal, i.e. $Q^{T}Q=I$ . *

Proof 2.5.

By the inductive definition of $Q$ in 8 we have

[TABLE]

Recall that $c_{j}$ solves the minimization problem

[TABLE]

*which means it is forming the $l^{2}$ projection of $A_{k}$ onto the space $V=\operatorname*{span}(Q_{1},\ldots,Q_{j})$ . Since $Q_{j+1}$ is the residual of this projection, it is orthogonal to the whole space $V$ . *

3 Implementation for $l^{1}$ norm using linear programming

The key ingredient of algorithm 1 is the ability to compute solutions to minimum-norm linear problems such as $\operatorname*{arg\,min}\|b-Ax\|$ . For the $l^{2}$ case there are already established and robust algorithms for this problem, but it’s less obvious for other norms. For the $l^{1}$ norm we can cast it as a linear program. In other words:

[TABLE]

is equivalent to the linear program

[TABLE]

This is implemented using Python in the appendix at listing 8, this uses the open source tools NumPy [12] and SciPy [5].

The linear program approach is correct but does not seem to scale well for larger matrices $A$ Thus I also provide a Python implementation that uses the NAG numerical library [9] in listing 9. Furthermore I have also implemented the relations 4 and 8 as a function in python in listing 10. This implementation can use either the NAG $l^{1}$ solver from 9 or the open source $l^{1}$ solver from 8 by changing the value of the l1alg parameter.

I now proceed to show numerical results of this algorithm.

4 Experimental results

The results below are designed to validate some of the theoretical properties proven and asserted earlier. These include properties like the well-conditioning of $Q$ and the non-increasing property for the diagonal of $R$ . I also include results on low-rank approximation from this factorization as that was the primary motivation of deriving this algorithm.

4.1 Diagonal Entries of R

These experiments test the theorem result 16. Here I take $A\in\mathbb{R}^{m\times m}$ constructed explicitly as an SVD factorization $A=U\Sigma V^{T}$ with diagonal entries of $\Sigma$ varying in relative size, which I indicate with $\sigma_{m}\operatorname*{arg\,min}_{i}\Sigma_{i,i}$ and $\sigma_{1}=\operatorname*{arg\,max}_{i}\Sigma_{i,i}$ .

If $R$ truly has rank-revealing properties then it should exhibit rapid decay of diagonal entries when $A$ becomes progressively more singular.

These results suggest that $R$ is capturing low rank information.

4.2 Conditioning of Q

An important part of successful rank-revealing factorization $AP=QR$ is the conditioning of $Q$ should be independent of the conditioning of $A$ . The key theoretical result which would prove this would be 18, but unfortunately I was unable to prove this. Here I give numerical evidence that it does appear to be true.

I take $A\in\mathbb{R}^{m\times m}$ constructed explicitly as an SVD factorization $A=U\Sigma V^{T}$ with diagonal entries of $\Sigma$ varying in relative size. I compute the condition numbers $\|A\|_{1}\|A^{-1}\|_{1}$ , $\|Q\|_{1}\|Q^{-1}\|_{1}$ and plot them against each other in figure 3.

4.3 Factorization error

Next I illustrate that the factorization error $\|AP-QR\|$ also does not depend on the conditioning of $A$ .

I take $A\in\mathbb{R}^{m\times m}$ constructed explicitly as an SVD factorization $A=U\Sigma V^{T}$ with diagonal entries of $\Sigma$ varying in relative size. I compute the condition numbers $\|A\|_{1}\|A^{-1}\|_{1}$ , and factorization errors $\|AP-QR\|_{1}$ and plot them against each other in figure 4

4.4 Low-rank approximation

With the rank-revealing properties validated I now show an example of low-rank approximation.

For this test I again generate $A$ by forming it as an explicit SVD factorization $A=U\Sigma V^{T}$ with $\max_{i}\Sigma_{i,i}=1$ and $\min_{i}\Sigma{i,i}=10^{-6}$ . I then compute two factorizations of $A$ :

[TABLE]

Next I truncate the factorizations to be a rank- $k$ approximation to $A$ as follows:

[TABLE]

For the first study I compare the induced $l^{1}$ matrix norm error of these approximations for $k=1,\ldots,60$ this is in figure 5.

This result suggests that there is little difference in results between $l^{1}$ and $l^{2}$ rank revealing factorizations. The next section 4.5 shows the subtle difference between $l^{1}$ and $l^{2}$ norms for low-rank approximations, and why the $l^{1}$ norm may be preferred in some situations.

4.5 Resistance of $l^{1}$ norm to outliers

One of the original motivations for deriving algorithm 1 was to be able to do $l^{1}$ low-rank approximations, which can be very robust with respect to outliers in data [1].

I show here that to some extent this appears to be reflected in the $l^{1}$ version of the rank revealing factorization. To illustrate this I first show a ”clean” example without outliers, and then do rank $k$ approximations for $k=1,2,3$ for both classical RRQR and $l^{1}$ RRQR. Then I introduce outliers to this same data and show the classical RRQR algorithm quickly is drawn to over-resolve outlier data because of how it dominates the Euclidean norm.

The first case are the low-rank approximations without outliers in the input data

The next case I introduce two outliers of much larger magnitude than surrounding data. The classical RRQR quickly gravitates to the columns containing these outliers because the outlier data gets squared in the Euclidean norm and then dominates it. In the $l^{1}$ norm however this effect is much less pronounced. See figure 7 below.

5 Conclusions

I derived a rank-revealing factorization that shares some similarities to classic rank-revealing QR with column pivoting. Instead of the $Q$ factor being orthogonal it has conditioning that is independent of the conditioning of $A$ . Furthermore the rank-revealing factorization presented here does not depend strictly on using dot products and the $l^{2}$ norm. I validated that claim by implementing the algorithm for the $l^{1}$ norm case, where least-norm-solution is equivalent to a linear program.

While I was able to numerically validate the conditioning properties of $Q$ I was unable to mathematically prove them. The key fact to be proven remains conjecture (2.2). Without orthogonality properties available in the $l^{2}$ case most avenues for proof are lost. I believe however that careful use of the optimality properties for the least-norm solution $c_{j}$ (see eqns 8) may be able to overcome the loss of orthogonality.

Appendix A Python Implementations

This section contains python implementations for the key algorithms of this paper. I give three implementations here. The first two, implementations 8 and 9 solve the same problem, and may be used interchangeably in the following implementation 10 which actually computes the rank-revealing factorization.

The first algorithm solves linear systems in the least- $l^{1}$ -norm sense by translating it to a linear program and then using SciPy. This has the advantage of only requiring open source tools that are readily available on the internet.

The next Python implementation makes use of the NAG library routine e02ga through the Nag Library for Python [8]. Full documentation for this routine may be found at [10]. This relies on the closed-source NAG library, but since the e02ga routine is specialized to the least- $l^{1}$ -norm problem it is significantly faster than a generic linear-programming approach as shown above. This is important for the rank revealing factorization because it spends almost all of its time solving linear systems in this minimum-norm sense. This enables factorizing much larger matrices.

Finally the actual factorization. As mentioned above, this factorization depends on the ability to solve linear systems in the least-norm sense. Since I gave two possible ways to achieve this, I made the least-norm-solver an input argument which may be changed either to the fully open source solver, or to the faster NAG-based solver.

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. J. Candès, X. Li, Y. Ma, and J. Wright , Robust Principal Component Analysis? , J. ACM, 58 (2011), pp. 11:1–11:37, https://doi.org/10.1145/1970392.1970395 , http://doi.acm.org/10.1145/1970392.1970395 (accessed 2019-05-04). · doi ↗
2[2] T. F. Chan , Rank revealing QR factorizations , Linear Algebra and its Applications, 88-89 (1987), pp. 67–82, https://doi.org/10.1016/0024-3795(87)90103-0 , http://www.sciencedirect.com/science/article/pii/0024379587901030 (accessed 2019-05-05). · doi ↗
3[3] W. Hackbusch , A Sparse Matrix Arithmetic Based on $\Cal H$-Matrices. Part I: Introduction to $ { \Cal H } $-Matrices , Computing, 62 (1999), pp. 89–108, https://doi.org/10.1007/s 006070050015 , https://doi.org/10.1007/s 006070050015 (accessed 2019-05-06). · doi ↗
4[4] W. Hackbusch, L. Grasedyck, and S. Börm , An introduction to hierarchical matrices , Mathematica Bohemica, v.127, 229-241 (2002), 127 (2002).
5[5] E. Jones, T. Oliphant, P. Peterson, and others , Sci Py: Open source scientific tools for Python , 2001, http://www.scipy.org/ .
6[6] P. P. Markopoulos, S. Kundu, S. Chamadia, and D. A. Pados , Efficient L 1-Norm Principal-Component Analysis via Bit Flipping , IEEE Transactions on Signal Processing, 65 (2017), pp. 4252–4264, https://doi.org/10.1109/TSP.2017.2708023 , http://arxiv.org/abs/1610.01959 (accessed 2019-04-28). ar Xiv: 1610.01959. · doi ↗
7[7] P. P. Markopoulos, S. Kundu, S. Chamadia, N. Tsagkarakis, and D. A. Pados , Outlier-Resistant Data Processing with L 1-Norm Principal Component Analysis , in Advances in Principal Component Analysis: Research and Development, G. R. Naik, ed., Springer Singapore, Singapore, 2018, pp. 121–135, https://doi.org/10.1007/978-981-10-6704-4_6 , https://doi.org/10.1007/978-981-10-6704-4_6 (accessed 2019-05-04). · doi ↗
8[8] The Numerical Algorithms Group (NAG) , The nag library , www.nag.com .

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A Rank Revealing Factorization Using Arbitrary Norms

Abstract

keywords:

1 Introduction

2 Main results

Theorem 2.1** (Arbitrary-norm Rank-Revealing QRQRQR factorization ).**

Conjecture 2.2** (Inverse Bound).**

Proof 2.3**.**

Theorem 2.4** (Classic QR as Special Case ).**

Proof 2.5**.**

3 Implementation for l1l^{1}l1 norm using linear programming

4 Experimental results

4.1 Diagonal Entries of R

4.2 Conditioning of Q

4.3 Factorization error

4.4 Low-rank approximation

4.5 Resistance of l1l^{1}l1 norm to outliers

5 Conclusions

Appendix A Python Implementations

Theorem 2.1 (Arbitrary-norm Rank-Revealing $QR$ factorization ).

Conjecture 2.2 (Inverse Bound).

Proof 2.3.

Theorem 2.4 (Classic QR as Special Case ).

Proof 2.5.

3 Implementation for $l^{1}$ norm using linear programming

4.5 Resistance of $l^{1}$ norm to outliers