Programming Parallel Dense Matrix Factorizations with Look-Ahead and OpenMP
Sandra Catal\'an, Adri\'an Castell\'o, Francisco D. Igual, Rafael, Rodr\'iguez-S\'anchez, Enrique S. Quintana-Ort\'i

TL;DR
This paper presents a novel parallelization strategy for dense matrix factorizations using OpenMP, employing static look-ahead and cache-aware techniques to improve performance on multicore systems.
Contribution
It introduces a static look-ahead approach integrated into DMF algorithms, surpassing traditional and runtime-assisted methods for better multicore performance.
Findings
Achieves high performance with explicit look-ahead and cache-aware BLAS
Simplifies implementation of high-performance LAPACK routines
Effective on various multicore platforms
Abstract
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multithreaded version of BLAS. This approach is also different from the more sophisticated runtime-assisted implementations, which decompose the operation into tasks and identify dependencies via directives and runtime support. Instead, our strategy attains high performance by explicitly embedding a static look-ahead technique into the DMF code, in order to overcome the performance bottleneck of the panel factorization, and realizing the trailing update via a cache-aware multi-threaded implementation of the BLAS. Although the parallel algorithms are specified with a highlevel of abstraction, the actual implementation can be easily derived from them, paving the road to deriving a high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
