Developing a High Performance Software Library with MPI and CUDA for Matrix Computations
Bogdan Oancea, Tudorel Andrei

TL;DR
This paper presents a high-performance linear algebra library that combines MPI and CUDA to efficiently solve large linear systems on heterogeneous CPU-GPU clusters, demonstrating significant speedups over CPU-only implementations.
Contribution
The paper introduces a novel hybrid MPI and CUDA-based library for solving large linear systems, integrating direct and iterative methods for improved performance.
Findings
MPI/CUDA implementation outperforms CPU-only programs
Hybrid approach enables efficient large-scale computations
Library supports LU, Cholesky, and iterative solvers
Abstract
Nowadays, the paradigm of parallel computing is changing. CUDA is now a popular programming model for general purpose computations on GPUs and a great number of applications were ported to CUDA obtaining speedups of orders of magnitude comparing to optimized CPU implementations. Hybrid approaches that combine the message passing model with the shared memory model for parallel computing are a solution for very large applications. We considered a heterogeneous cluster that combines the CPU and GPU computations using MPI and CUDA for developing a high performance linear algebra library. Our library deals with large linear systems solvers because they are a common problem in the fields of science and engineering. Direct methods for computing the solution of such systems can be very expensive due to high memory requirements and computational cost. An efficient alternative are iterative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Matrix Theory and Algorithms · Numerical Methods and Algorithms
