An Optimized Sparse Approximate Matrix Multiply for Matrices with Decay

Nicolas Bock; Matt Challacombe

arXiv:1203.1692·cs.NA·September 5, 2012

An Optimized Sparse Approximate Matrix Multiply for Matrices with Decay

Nicolas Bock, Matt Challacombe

PDF

TL;DR

This paper introduces an optimized sparse matrix multiplication algorithm with decay, achieving lower error and higher speed than dense matrix routines, suitable for quantum chemical matrices and scalable to large sizes.

Contribution

The paper presents an optimized implementation of the extsc{SpAMM} algorithm that outperforms standard dense routines and naive sparse implementations in both accuracy and speed.

Findings

01

Achieves $ ext{O}(n ext{ log } n)$ complexity for matrices with decay.

02

Outperforms dense routines like { t SGEMM} in accuracy and speed for matrices with around 1000 size.

03

Potential hardware prefetch improvements could further double or triple the speed.

Abstract

We present an optimized single-precision implementation of the Sparse Approximate Matrix Multiply (\SpAMM{}) [M. Challacombe and N. Bock, arXiv {\bf 1011.3534} (2010)], a fast algorithm for matrix-matrix multiplication for matrices with decay that achieves an $O (n lo g n)$ computational complexity with respect to matrix dimension $n$ . We find that the max norm of the error achieved with a \SpAMM{} tolerance below $2 \times 1 0^{- 8}$ is lower than that of the single-precision {\tt SGEMM} for dense quantum chemical matrices, while outperforming {\tt SGEMM} with a cross-over already for small matrices ( $n \sim 1000$ ). Relative to naive implementations of \SpAMM{} using Intel's Math Kernel Library ({\tt MKL}) or AMD's Core Math Library ({\tt ACML}), our optimized version is found to be significantly faster. Detailed performance comparisons are made for quantum chemical matrices…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.