A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization with Partial Pivoting
Sandra Catal\'an, Jos\'e R. Herrero, Enrique S. Quintana-Ort\'i,, Rafael Rodr\'iguez-S\'anchez, Robert van de Geijn

TL;DR
This paper introduces two novel load-balancing techniques, worker sharing and early termination, for thread-level LU factorization with look-ahead, improving performance on multi-core processors.
Contribution
It presents a new malleable thread-level implementation of BLAS and demonstrates how to effectively balance load during LU factorization with look-ahead.
Findings
WS+ET achieves competitive performance with task-parallel solutions.
The techniques reduce load imbalance during dense matrix factorizations.
Experimental results on Intel-Xeon show performance improvements.
Abstract
We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques target the scenario where two thread teams are created/activated during the factorization, with each team in charge of performing an independent task/branch of execution. The first technique promotes worker sharing (WS) between the two tasks, allowing the threads of the task that completes first to be reallocated for use by the costlier task. The second technique allows a fast task to alert the slower task of completion, enforcing the early termination (ET) of the second task, and a smooth transition of the factorization procedure into the next iteration. The two mechanisms are instantiated via a new malleable thread-level implementation of the Basic Linear Algebra…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
