Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms
Jean-Noel Quintin, Khalid Hasanov, Alexey Lastovetsky

TL;DR
This paper presents HSUMMA, a hierarchical parallel matrix multiplication algorithm that significantly reduces communication costs on large-scale distributed memory platforms, outperforming SUMMA especially at higher core counts.
Contribution
The paper introduces HSUMMA, a novel hierarchical redesign of SUMMA, to lower communication costs in parallel matrix multiplication on large-scale systems.
Findings
Up to 2.08x reduction in communication cost on 2048 cores.
Up to 5.89x reduction on 16384 cores.
Effective scalability demonstrated on IBM BlueGene-P.
Abstract
Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid 1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon algorithm as it can be used on a non-square number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Interconnection Networks and Systems
