Hybrid static/dynamic scheduling for already optimized dense matrix   factorization

Simplice Donfack; Laura Grigori; William D. Gropp; Vivek Kale

arXiv:1110.2677·cs.DC·October 13, 2011·1 cites

Hybrid static/dynamic scheduling for already optimized dense matrix factorization

Simplice Donfack, Laura Grigori, William D. Gropp, Vivek Kale

PDF

Open Access

TL;DR

This paper introduces a hybrid static/dynamic scheduling strategy for dense matrix factorization that improves performance by balancing data locality and load, outperforming existing static, dynamic, and library routines.

Contribution

The paper proposes a novel hybrid static/dynamic scheduling method for dense matrix factorization that achieves significant performance improvements over fully static, fully dynamic, and existing library routines.

Findings

01

Up to 64% performance improvement over fully dynamic CALU on AMD NUMA.

02

Up to 30% improvement over fully static CALU on AMD NUMA.

03

Speedups of up to 110% over MKL and 82% over PLASMA.

Abstract

We present the use of a hybrid static/dynamic scheduling strategy of the task dependency graph for direct methods used in dense numerical linear algebra. This strategy provides a balance of data locality, load balance, and low dequeue overhead. We show that the usage of this scheduling in communication avoiding dense factorization leads to significant performance gains. On a 48 core AMD Opteron NUMA machine, our experiments show that we can achieve up to 64% improvement over a version of CALU that uses fully dynamic scheduling, and up to 30% improvement over the version of CALU that uses fully static scheduling. On a 16-core Intel Xeon machine, our hybrid static/dynamic scheduling approach is up to 8% faster than the version of CALU that uses a fully static scheduling or fully dynamic scheduling. Our algorithm leads to speedups over the corresponding routines for computing LU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Matrix Theory and Algorithms