Revisiting Matrix Product on Master-Worker Platforms
Jack Dongarra, Jean-Francois Pineau, Yves Robert, Zhiao Shi, and, Frederic Vivien

TL;DR
This paper develops new parallel matrix multiplication algorithms tailored for heterogeneous master-worker platforms with centralized data and limited memory, focusing on resource selection and communication efficiency.
Contribution
It introduces algorithms for resource selection and communication ordering in heterogeneous master-worker platforms with centralized data and limited memory.
Findings
Algorithms improve resource utilization on heterogeneous platforms.
Numerical experiments demonstrate effectiveness on homogeneous platforms.
Focus on centralized data and limited memory constraints.
Abstract
This paper is aimed at designing efficient parallel matrix-product algorithms for heterogeneous master-worker platforms. While matrix-product is well-understood for homogeneous 2D-arrays of processors (e.g., Cannon algorithm and ScaLAPACK outer product algorithm), there are three key hypotheses that render our work original and innovative: - Centralized data. We assume that all matrix files originate from, and must be returned to, the master. - Heterogeneous star-shaped platforms. We target fully heterogeneous platforms, where computational resources have different computing powers. - Limited memory. Because we investigate the parallelization of large problems, we cannot assume that full matrix panels can be stored in the worker memories and re-used for subsequent updates (as in ScaLAPACK). We have devised efficient algorithms for resource selection (deciding which workers to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scheduling and Optimization Algorithms · Cloud Computing and Resource Management
