QRP Variation of Cross--Approximation Iterations for Low Rank Approximation
Victor Y. Pan, John Svadlenka

TL;DR
This paper explores a variation of cross-approximation iterations for low rank matrix approximation, emphasizing superfast algorithms that significantly reduce computational resources, with applications to accelerating the Fast Multipole Method.
Contribution
It introduces a QRP variation of cross-approximation iterations, enhancing superfast low rank approximation techniques for large-scale matrices.
Findings
Superfast LRA reduces computational complexity.
Application to accelerate the Fast Multipole Method.
Extensive testing confirms efficiency improvements.
Abstract
We call matrix algorithms superfast if they use much fewer flops and memory cells than the input matrix has entries. Using such algorithms is indispensable for Big Data Mining and Analysis, where the input matrices are so immense that one can only access a small fraction of all their entries. A natural remedy is Low Rank Approximation (LRA) of these matrices, which is routinely computed by means of Cross-Approximation iterations for more than a decade of worldwide application in computational practice. We point out and extensively test an important application of superfast LRA to significant acceleration of the celebrated Fast Multipole Method, which turns it into Superfast Multipole Method.
| Spectral Norm | Chebyshev Norm | |||||
| Inputs | C–A loops | HSS rank | mean | std | mean | std |
| B B | 1 | 26 | 8.11e-07 | 1.45e-06 | 3.19e-07 | 5.23e-07 |
| 5 | 26 | 4.60e-08 | 6.43e-08 | 7.33e-09 | 1.22e-08 | |
| C | 1 | 16 | 5.62e-03 | 8.99e-03 | 3.00e-03 | 4.37e-03 |
| 5 | 16 | 3.37e-05 | 1.78e-05 | 8.77e-06 | 1.01e-05 | |
| D | 1 | 13 | 1.12e-07 | 8.99e-08 | 1.35e-07 | 1.47e-07 |
| 5 | 13 | 1.50e-07 | 1.82e-07 | 2.09e-07 | 2.29e-07 | |
| E | 1 | 14 | 5.35e-04 | 6.14e-04 | 2.90e-04 | 3.51e-04 |
| 5 | 14 | 1.90e-05 | 1.04e-05 | 5.49e-06 | 4.79e-06 | |
| F | 1 | 37 | 1.14e-05 | 4.49e-05 | 6.02e-06 | 2.16e-05 |
| 5 | 37 | 4.92e-07 | 8.19e-07 | 1.12e-07 | 2.60e-07 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElectromagnetic Scattering and Analysis · Tensor decomposition and applications · Matrix Theory and Algorithms
On the Superfast Multipole Method
Victor Y. Pan
Victor Y. Pan*[1,2],[a]* and John Svadlenka*[2],[c]*
[1] Department of Computer Science
Lehman College of the City University of New York
Bronx, NY 10468 USA
[2] Ph.D. Programs in Computer Science and Mathematics
The Graduate Center of of the City University of New York
New York, NY 10036 USA
http://comet.lehman.cuny.edu/vpan/
Abstract
We call matrix algorithms superfast if they involve much fewer memory cells and flops than the input matrix has entries. Using such algorithms is indispensable for Big Data Mining and Analysis, where the input matrices are so immense that realistically one can only access a small fraction of all their entries. A natural remedy is Low Rank Approximation of these matrices,111Hereafter we use the acronym LRA. which is routinely computed by means of Cross–Approximation222Hereafter we use the acronym C–A. iterations for more than a decade of worldwide application in computational practice. We point out and extensively test an important application of superfast LRA to significant acceleration of the celebrated Fast Multipole Method, which turns it into Supefast Multipole Method.
Keywords:
Low Rank Approximation, Cross–Approximation, Fast Multipole Method.
2000 Math. Subject Classification:
65F30, 68Q25, 15A52
1 Introduction: Superfast LRA
Low rank approximation (hereafter LRA) of a matrix is a fundamental subject of Numerical Linear Algebra and Computer Science. An matrix admits its close approximation333Here and hereafter the concepts “low”, “large”, “small”, “far”, “close”, etc. are defined in context. The inequalities and show that is small in context. of rank if and only if the matrix has numerical rank (then we write ) or equivalently if and only if
[TABLE]
for a small integer , , , a fixed matrix norm , and a small tolerance . Such an LRA approximates the entries of by using entries of and instead of entries of , and one can operate with a low rank matrix, e.g., multiply it by a vector superfast. This is a crucial benefit in applications of LRA to Big Data Mining and Analysis, where the input matrices , e.g., unfolding matrices of multidimensional tensors, are so immense that realistically one can only access a tiny fraction of all their entries. LRA is a natural remedy, and for more than a decade the Cross–Approximation (C-A) iterations have routinely been computing accurate LRA superfast in worldwide computational practice ( cf. [T96], [GTZ97], [T00], [B00], [BR03], [GOSTZ10], [B11], [O18], [OZ18]).
2 An Application – Superfast Multipole Method
Superfast LRA algorithms can be extended to numerous important computational problems linked to LRA. Next we we point out and extensively test a simple but apparently unnoticed application of superfast LRA to significant acceleration of the celebrated Fast Multipole Method (FMM), which turns it into Superfast Multipole Method.
2.1 Fast and Superfast Multipole Method
FMM enables superfast multiplication by a vector of so called HSS matrices provided that low rank generators are available for its off-diagonal blocks. Such generators are not available in some important applications, however (see, e.g, [XXG12], [XXCB14], and [P15]), but C–A algorithms compute them superfast, thus turning FMM into Superfast Multipole Method. Since the method is highly important we supply some details of its bottleneck stage of HSS computations, which we perform superfast by incorporating superfast LRA.
HSS matrices naturally extend the class of banded matrices and their inverses, are closely linked to FMM, and are increasingly popular (see [BGH03], [GH03], [MRT05], [CGS07], [VVGM05], [VVM07/08], [B10], [X12], [XXG12], [EGH13], [X13], [XXCB14], and the bibliography therein).
Definition 1**.**
(Neutered Block Columns. See [MRT05].) With each diagonal block of a block matrix associate its complement in its block column, and call this complement a neutered block column.
Definition 2**.**
(HSS matrices. See [CGS07], [X12], [X13], [XXCB14].)
A block matrix of size is called an -HSS matrix, for a positive integer ,
(i) if all diagonal blocks of this matrix consist of entries overall and
(ii) if is the maximal rank of its neutered block columns.
Remark 3**.**
Many authors work with -HSS (rather than -HSS) matrices for which and are the maximal ranks of the sub- and super-diagonal blocks, respectively. The -HSS and -HSS matrices are closely related. If a neutered block column is the union of a sub-diagonal block and a super-diagonal block , then , and so an -HSS matrix is an -HSS matrix, for , while clearly an -HSS matrix is a -HSS matrix.
The FMM exploits the -HSS structure of a matrix as follows:
(i) Cover all off-block-diagonal entries with a set of non-overlapping neutered block columns.
(ii) Express every neutered block column of this set as the product of two generator matrices, of size and of size . Call the pair a length generator of the neutered block column .
(iii) Multiply the matrix by a vector by separately multiplying generators and diagonal blocks by subvectors, involving flops overall, and
(iv) in a more advanced application of FMM solve a nonsingular -HSS linear system of equations by using flops under some mild additional assumptions on the input.
This approach is readily extended to the same operations with -HSS matrices, that is, matrices approximated by -HSS matrices within a perturbation norm bound where a positive tolerance is small in context (for example, is the unit round-off). Likewise, one defines an -HSS representation and -generators.
-HSS matrices (for small in context) appear routinely in matrix computations, and computations with such matrices are performed efficiently by using the above techniques.
In some applications of the FMM (see [BGP05], [VVVF10]) stage (ii) is omitted because short generators for all neutered block columns are readily available, but this is not the case in a variety of other important applications (see [XXG12], [XXCB14], and [P15]). This stage of the computation of generators is precisely LRA of the neutered block columns, which turns out to be the bottleneck stage of FMM in these applications, and superfast LRA algorithms provide a remedy.
Indeed apply a fast algorithm at this stage, e.g., the algorithm of [HMT11] with a Gaussian multiplier. Multiplication of a matrix by an Gaussian matrix requires flops, while standard HSS-representation of an HSS matrix includes neutered block columns for and . In this case the cost of computing an -HSS representation of the matrix has at least order . For , this is much greater than flops, used at the other stages of the computations.
We alleviate such a problem, however, when we compute LRA of -generators by applying superfast algorithms.
3 Computation of LRAs for benchmark HSS matrices
In this section, the contribution of the secind author, we cover our tests of the Superfast Multipole Method where we applied C–A iterations in order to compute LRA of the generators of the off-diagonal blocks of HSS matrices. Namely we tested HSS matrices that approximate Cauchy-like matrices derived from benchmark Toeplitz matrices B, C, D, E, and F of [XXG12, Section 5]. For the computation of LRA we applied the algorithm of [GOSTZ10].
Table 1 displays the relative errors of the approximation of the HSS input matrices in the spectral and Chebyshev norms averaged over 100 tests. Each approximation was obtained by means of combining the exact diagonal blocks and LRA of the off-diagonal blocks. We computed LRA of all these blocks superfast.
In good accordance with extnsive empirical evidence about the power of C–A iterations, already the first C–A loop have consistently yielded reasonably close LRA, but our further improvement was achieved in five C–A loops in our tests for all but one of the five families of input matrices.
The reported HSS rank is the larger of the numerical ranks for the off-diagonal blocks. This HSS rank was used as an upper bound in our binary search that determined the numerical rank of each off-diagonal block for the purpose of computing its LRA. We based the binary search on minimizing the difference (in the spectral norm) between each off-diagonal block and its LRA.
The output error norms were quite low. Even in the case of the matrix C, obtained from Prolate Toeplitz matrices – extremely ill-conditioned, they ranged from to .
We have also performed further numerical experiments on all the HSS input matrices by using a hybrid LRA algorithm: we used random pre-processing with Gaussian and Hadamard (abridged and permuted) multipliers by incorporating Algorithm 4.1 of [HMT11], but only for the off-diagonal blocks of smaller sizes while retaining our previous way for computing LRA of the larger off-diagonal blocks. We have not displayed the results of these experiments because they yielded no substantial improvement in accuracy in comparison to the exclusive use of the less expensive LRA on all off-diagonal blocks.
Acknowledgements: Our research was supported by NSF Grants CCF–1116736, CCF–1563942, and CCF–133834 and PSC CUNY Award 69813 00 48.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[B 00] M. Bebendorf, Approximation of Boundary Element Matrices, Numer. Math. , 86, 4 , 565–589, 2000.
- 2[B 10] S. Börm, Efficient Numerical Methods for Non-local Operators: ℋ 2 superscript ℋ 2 \mathcal{H}^{2} -Matrix Compression, Algorithms and Analysis , European Math. Society, 2010.
- 3[B 11] M. Bebendorf, Adaptive Cross Approximation of Multivariate Functions, Constructive approximation , 34, 2 , 149–179, 2011.
- 4[BGH 03] S. Börm, L. Grasedyck, W. Hackbusch, Introduction to Hierarchical Matrices with Applications, Engineering Analysis with Boundary Elements , 27, 5 , 405–422, 2003.
- 5[BGP 05] A. Bini, L. Gemignani, V. Y. Pan, Fast and Stable QR Eigenvalue Algorithms for Generalized Semiseparable Matrices and Secular Equation, Numerische Mathematik , 100, 3 , 373–408, 2005.
- 6[BR 03] M. Bebendorf, S. Rjasanow, Adaptive Low-Rank Approximation of Collocation Matrices, Computing , 70, 1 , 1–24, 2003.
- 7[BY 13] L. A. Barba, R. Yokota, How Will the Fast Multipole Method Fare in Exascale Era? SIAM News , 46 , 6 , 1–3, July/August 2013.
- 8[C 00] B. A. Cipra, The Best of the 20th Century: Editors Name Top 10 Algorithms, SIAM News , 33, 4 , 2, May 16, 2000.
