Hybrid static/dynamic scheduling for already optimized dense matrix factorization
Simplice Donfack, Laura Grigori, William D. Gropp, Vivek Kale

TL;DR
This paper introduces a hybrid static/dynamic scheduling strategy for dense matrix factorization that improves performance by balancing data locality and load, outperforming existing static, dynamic, and library routines.
Contribution
The paper proposes a novel hybrid static/dynamic scheduling method for dense matrix factorization that achieves significant performance improvements over fully static, fully dynamic, and existing library routines.
Findings
Up to 64% performance improvement over fully dynamic CALU on AMD NUMA.
Up to 30% improvement over fully static CALU on AMD NUMA.
Speedups of up to 110% over MKL and 82% over PLASMA.
Abstract
We present the use of a hybrid static/dynamic scheduling strategy of the task dependency graph for direct methods used in dense numerical linear algebra. This strategy provides a balance of data locality, load balance, and low dequeue overhead. We show that the usage of this scheduling in communication avoiding dense factorization leads to significant performance gains. On a 48 core AMD Opteron NUMA machine, our experiments show that we can achieve up to 64% improvement over a version of CALU that uses fully dynamic scheduling, and up to 30% improvement over the version of CALU that uses fully static scheduling. On a 16-core Intel Xeon machine, our hybrid static/dynamic scheduling approach is up to 8% faster than the version of CALU that uses a fully static scheduling or fully dynamic scheduling. Our algorithm leads to speedups over the corresponding routines for computing LU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Matrix Theory and Algorithms
