A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures
Alfredo Buttari, Julien Langou, Jakub Kurzak, Jack Dongarra

TL;DR
This paper introduces a class of parallel tiled algorithms for linear algebra factorizations on multicore architectures, enabling fine-grain parallelism and dynamic task scheduling to improve performance over traditional LAPACK methods.
Contribution
It presents a novel task-based parallel algorithm framework for Cholesky, LU, and QR factorizations that exploits fine-grain parallelism and dynamic scheduling on multicore systems.
Findings
Achieves better performance than LAPACK implementations.
Enables out-of-order execution to hide sequential tasks.
Demonstrates effective use of loose synchronization in parallel algorithms.
Abstract
As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the Cholesky, LU and QR factorization where the operations can be represented as a sequence of small tasks that operate on square blocks of data. These tasks can be dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out of order execution of the tasks which will completely hide the presence of intrinsically sequential tasks in the factorization. Performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Matrix Theory and Algorithms
