QRP Variation of Cross--Approximation Iterations for Low Rank   Approximation

Victor Y. Pan; John Svadlenka

arXiv:1901.00377·math.NA·January 17, 2025

QRP Variation of Cross--Approximation Iterations for Low Rank Approximation

Victor Y. Pan, John Svadlenka

PDF

Open Access

TL;DR

This paper explores a variation of cross-approximation iterations for low rank matrix approximation, emphasizing superfast algorithms that significantly reduce computational resources, with applications to accelerating the Fast Multipole Method.

Contribution

It introduces a QRP variation of cross-approximation iterations, enhancing superfast low rank approximation techniques for large-scale matrices.

Findings

01

Superfast LRA reduces computational complexity.

02

Application to accelerate the Fast Multipole Method.

03

Extensive testing confirms efficiency improvements.

Abstract

We call matrix algorithms superfast if they use much fewer flops and memory cells than the input matrix has entries. Using such algorithms is indispensable for Big Data Mining and Analysis, where the input matrices are so immense that one can only access a small fraction of all their entries. A natural remedy is Low Rank Approximation (LRA) of these matrices, which is routinely computed by means of Cross-Approximation iterations for more than a decade of worldwide application in computational practice. We point out and extensively test an important application of superfast LRA to significant acceleration of the celebrated Fast Multipole Method, which turns it into Superfast Multipole Method.

Tables1

Table 1. Table 1: LRA approximation of HSS matrices from [ XXG12 ]

			Spectral Norm		Chebyshev Norm
Inputs	C–A loops	HSS rank	mean	std	mean	std
B B	1	26	8.11e-07	1.45e-06	3.19e-07	5.23e-07
B B	5	26	4.60e-08	6.43e-08	7.33e-09	1.22e-08
C	1	16	5.62e-03	8.99e-03	3.00e-03	4.37e-03
C	5	16	3.37e-05	1.78e-05	8.77e-06	1.01e-05
D	1	13	1.12e-07	8.99e-08	1.35e-07	1.47e-07
D	5	13	1.50e-07	1.82e-07	2.09e-07	2.29e-07
E	1	14	5.35e-04	6.14e-04	2.90e-04	3.51e-04
E	5	14	1.90e-05	1.04e-05	5.49e-06	4.79e-06
F	1	37	1.14e-05	4.49e-05	6.02e-06	2.16e-05
F	5	37	4.92e-07	8.19e-07	1.12e-07	2.60e-07

Equations2

M = A B + E, ∣∣ E ∣∣/∣∣ M ∣∣ \leq ϵ,

M = A B + E, ∣∣ E ∣∣/∣∣ M ∣∣ \leq ϵ,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElectromagnetic Scattering and Analysis · Tensor decomposition and applications · Matrix Theory and Algorithms

Full text

On the Superfast Multipole Method

Victor Y. Pan

Victor Y. Pan*[1,2],[a]* and John Svadlenka*[2],[c]*

[1] Department of Computer Science

Lehman College of the City University of New York

Bronx, NY 10468 USA

[2] Ph.D. Programs in Computer Science and Mathematics

The Graduate Center of of the City University of New York

New York, NY 10036 USA

[a] [email protected]

http://comet.lehman.cuny.edu/vpan/

[b] [email protected]

Abstract

We call matrix algorithms superfast if they involve much fewer memory cells and flops than the input matrix has entries. Using such algorithms is indispensable for Big Data Mining and Analysis, where the input matrices are so immense that realistically one can only access a small fraction of all their entries. A natural remedy is Low Rank Approximation of these matrices,111Hereafter we use the acronym LRA. which is routinely computed by means of Cross–Approximation222Hereafter we use the acronym C–A. iterations for more than a decade of worldwide application in computational practice. We point out and extensively test an important application of superfast LRA to significant acceleration of the celebrated Fast Multipole Method, which turns it into Supefast Multipole Method.

Keywords:

Low Rank Approximation, Cross–Approximation, Fast Multipole Method.

2000 Math. Subject Classification:

65F30, 68Q25, 15A52

1 Introduction: Superfast LRA

Low rank approximation (hereafter LRA) of a matrix is a fundamental subject of Numerical Linear Algebra and Computer Science. An $m\times n$ matrix $M$ admits its close approximation333Here and hereafter the concepts “low”, “large”, “small”, “far”, “close”, etc. are defined in context. The inequalities $a\ll b$ and $b\gg a$ show that $|a/b|$ is small in context. of rank $r$ if and only if the matrix $M$ has numerical rank $r$ (then we write $\operatorname{nrank}(W)=r$ ) or equivalently if and only if

[TABLE]

for a small integer $r$ , $A\in\mathbb{C}^{m\times r}$ , $B\in\mathbb{C}^{r\times n}$ , a fixed matrix norm $||\cdot||$ , and a small tolerance $\epsilon$ . Such an LRA approximates the $mn$ entries of $M$ by using $(m+n)r$ entries of $A$ and $B$ instead of $mn$ entries of $M$ , and one can operate with a low rank matrix, e.g., multiply it by a vector superfast. This is a crucial benefit in applications of LRA to Big Data Mining and Analysis, where the input matrices $M$ , e.g., unfolding matrices of multidimensional tensors, are so immense that realistically one can only access a tiny fraction of all their entries. LRA is a natural remedy, and for more than a decade the Cross–Approximation (C-A) iterations have routinely been computing accurate LRA superfast in worldwide computational practice ( cf. [T96], [GTZ97], [T00], [B00], [BR03], [GOSTZ10], [B11], [O18], [OZ18]).

2 An Application – Superfast Multipole Method

Superfast LRA algorithms can be extended to numerous important computational problems linked to LRA. Next we we point out and extensively test a simple but apparently unnoticed application of superfast LRA to significant acceleration of the celebrated Fast Multipole Method (FMM), which turns it into Superfast Multipole Method.

2.1 Fast and Superfast Multipole Method

FMM enables superfast multiplication by a vector of so called HSS matrices provided that low rank generators are available for its off-diagonal blocks. Such generators are not available in some important applications, however (see, e.g, [XXG12], [XXCB14], and [P15]), but C–A algorithms compute them superfast, thus turning FMM into Superfast Multipole Method. Since the method is highly important we supply some details of its bottleneck stage of HSS computations, which we perform superfast by incorporating superfast LRA.

HSS matrices naturally extend the class of banded matrices and their inverses, are closely linked to FMM, and are increasingly popular (see [BGH03], [GH03], [MRT05], [CGS07], [VVGM05], [VVM07/08], [B10], [X12], [XXG12], [EGH13], [X13], [XXCB14], and the bibliography therein).

Definition 1.

(Neutered Block Columns. See [MRT05].) With each diagonal block of a block matrix associate its complement in its block column, and call this complement a neutered block column.

Definition 2.

(HSS matrices. See [CGS07], [X12], [X13], [XXCB14].)

A block matrix $M$ of size $m\times n$ is called an $r$ -HSS matrix, for a positive integer $r$ ,

(i) if all diagonal blocks of this matrix consist of $O((m+n)r)$ entries overall and

(ii) if $r$ is the maximal rank of its neutered block columns.

Remark 3.

Many authors work with $(l,u)$ -HSS (rather than $r$ -HSS) matrices $M$ for which $l$ and $u$ are the maximal ranks of the sub- and super-diagonal blocks, respectively. The $(l,u)$ -HSS and $r$ -HSS matrices are closely related. If a neutered block column $N$ is the union of a sub-diagonal block $B_{-}$ and a super-diagonal block $B_{+}$ , then $\operatorname{rank}(N)\leq\operatorname{rank}(B_{-})+\operatorname{rank}(B_{+})$ , and so an $(l,u)$ -HSS matrix is an $r$ -HSS matrix, for $r\leq l+u$ , while clearly an $r$ -HSS matrix is a $(r,r)$ -HSS matrix.

The FMM exploits the $r$ -HSS structure of a matrix as follows:

(i) Cover all off-block-diagonal entries with a set of non-overlapping neutered block columns.

(ii) Express every neutered block column $N$ of this set as the product $FH$ of two generator matrices, $F$ of size $h\times r$ and $H$ of size $r\times k$ . Call the pair $\{F,H\}$ a length $r$ generator of the neutered block column $N$ .

(iii) Multiply the matrix $M$ by a vector by separately multiplying generators and diagonal blocks by subvectors, involving $O((m+n)r)$ flops overall, and

(iv) in a more advanced application of FMM solve a nonsingular $r$ -HSS linear system of $n$ equations by using $O(nr\log^{2}(n))$ flops under some mild additional assumptions on the input.

This approach is readily extended to the same operations with $(r,\xi)$ -HSS matrices, that is, matrices approximated by $r$ -HSS matrices within a perturbation norm bound $\xi$ where a positive tolerance $\xi$ is small in context (for example, is the unit round-off). Likewise, one defines an $(r,\xi)$ -HSS representation and $(r,\xi)$ -generators.

$(r,\xi)$ -HSS matrices (for $r$ small in context) appear routinely in matrix computations, and computations with such matrices are performed efficiently by using the above techniques.

In some applications of the FMM (see [BGP05], [VVVF10]) stage (ii) is omitted because short generators for all neutered block columns are readily available, but this is not the case in a variety of other important applications (see [XXG12], [XXCB14], and [P15]). This stage of the computation of generators is precisely LRA of the neutered block columns, which turns out to be the bottleneck stage of FMM in these applications, and superfast LRA algorithms provide a remedy.

Indeed apply a fast algorithm at this stage, e.g., the algorithm of [HMT11] with a Gaussian multiplier. Multiplication of a $q\times h$ matrix by an $h\times r$ Gaussian matrix requires $(2h-1)qr$ flops, while standard HSS-representation of an $n\times n$ HSS matrix includes $q\times h$ neutered block columns for $q\approx m/2$ and $h\approx n/2$ . In this case the cost of computing an $r$ -HSS representation of the matrix $M$ has at least order $mnr$ . For $r\ll\min\{m,n\}$ , this is much greater than $O((m+n)r\log^{2}(n))$ flops, used at the other stages of the computations.

We alleviate such a problem, however, when we compute LRA of $(r,\xi)$ -generators by applying superfast algorithms.

3 Computation of LRAs for benchmark HSS matrices

In this section, the contribution of the secind author, we cover our tests of the Superfast Multipole Method where we applied C–A iterations in order to compute LRA of the generators of the off-diagonal blocks of HSS matrices. Namely we tested HSS matrices that approximate $1024\times 1024$ Cauchy-like matrices derived from benchmark Toeplitz matrices B, C, D, E, and F of [XXG12, Section 5]. For the computation of LRA we applied the algorithm of [GOSTZ10].

Table 1 displays the relative errors of the approximation of the $1024\times 1024$ HSS input matrices in the spectral and Chebyshev norms averaged over 100 tests. Each approximation was obtained by means of combining the exact diagonal blocks and LRA of the off-diagonal blocks. We computed LRA of all these blocks superfast.

In good accordance with extnsive empirical evidence about the power of C–A iterations, already the first C–A loop have consistently yielded reasonably close LRA, but our further improvement was achieved in five C–A loops in our tests for all but one of the five families of input matrices.

The reported HSS rank is the larger of the numerical ranks for the $512\times 512$ off-diagonal blocks. This HSS rank was used as an upper bound in our binary search that determined the numerical rank of each off-diagonal block for the purpose of computing its LRA. We based the binary search on minimizing the difference (in the spectral norm) between each off-diagonal block and its LRA.

The output error norms were quite low. Even in the case of the matrix C, obtained from Prolate Toeplitz matrices – extremely ill-conditioned, they ranged from $10^{-3}$ to $10^{-6}$ .

We have also performed further numerical experiments on all the HSS input matrices by using a hybrid LRA algorithm: we used random pre-processing with Gaussian and Hadamard (abridged and permuted) multipliers by incorporating Algorithm 4.1 of [HMT11], but only for the off-diagonal blocks of smaller sizes while retaining our previous way for computing LRA of the larger off-diagonal blocks. We have not displayed the results of these experiments because they yielded no substantial improvement in accuracy in comparison to the exclusive use of the less expensive LRA on all off-diagonal blocks.

Acknowledgements: Our research was supported by NSF Grants CCF–1116736, CCF–1563942, and CCF–133834 and PSC CUNY Award 69813 00 48.

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[B 00] M. Bebendorf, Approximation of Boundary Element Matrices, Numer. Math. , 86, 4 , 565–589, 2000.
2[B 10] S. Börm, Efficient Numerical Methods for Non-local Operators: ℋ 2 superscript ℋ 2 \mathcal{H}^{2} -Matrix Compression, Algorithms and Analysis , European Math. Society, 2010.
3[B 11] M. Bebendorf, Adaptive Cross Approximation of Multivariate Functions, Constructive approximation , 34, 2 , 149–179, 2011.
4[BGH 03] S. Börm, L. Grasedyck, W. Hackbusch, Introduction to Hierarchical Matrices with Applications, Engineering Analysis with Boundary Elements , 27, 5 , 405–422, 2003.
5[BGP 05] A. Bini, L. Gemignani, V. Y. Pan, Fast and Stable QR Eigenvalue Algorithms for Generalized Semiseparable Matrices and Secular Equation, Numerische Mathematik , 100, 3 , 373–408, 2005.
6[BR 03] M. Bebendorf, S. Rjasanow, Adaptive Low-Rank Approximation of Collocation Matrices, Computing , 70, 1 , 1–24, 2003.
7[BY 13] L. A. Barba, R. Yokota, How Will the Fast Multipole Method Fare in Exascale Era? SIAM News , 46 , 6 , 1–3, July/August 2013.
8[C 00] B. A. Cipra, The Best of the 20th Century: Editors Name Top 10 Algorithms, SIAM News , 33, 4 , 2, May 16, 2000.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

On the Superfast Multipole Method

Abstract

Keywords:

2000 Math. Subject Classification:

1 Introduction: Superfast LRA

2 An Application – Superfast Multipole Method

2.1 Fast and Superfast Multipole Method

Definition 1**.**

Definition 2**.**

Remark 3**.**

3 Computation of LRAs for benchmark HSS matrices

Definition 1.

Definition 2.

Remark 3.