Developing a High Performance Software Library with MPI and CUDA for   Matrix Computations

Bogdan Oancea; Tudorel Andrei

arXiv:1511.07174·cs.DC·February 9, 2018·2 cites

Developing a High Performance Software Library with MPI and CUDA for Matrix Computations

Bogdan Oancea, Tudorel Andrei

PDF

Open Access

TL;DR

This paper presents a high-performance linear algebra library that combines MPI and CUDA to efficiently solve large linear systems on heterogeneous CPU-GPU clusters, demonstrating significant speedups over CPU-only implementations.

Contribution

The paper introduces a novel hybrid MPI and CUDA-based library for solving large linear systems, integrating direct and iterative methods for improved performance.

Findings

01

MPI/CUDA implementation outperforms CPU-only programs

02

Hybrid approach enables efficient large-scale computations

03

Library supports LU, Cholesky, and iterative solvers

Abstract

Nowadays, the paradigm of parallel computing is changing. CUDA is now a popular programming model for general purpose computations on GPUs and a great number of applications were ported to CUDA obtaining speedups of orders of magnitude comparing to optimized CPU implementations. Hybrid approaches that combine the message passing model with the shared memory model for parallel computing are a solution for very large applications. We considered a heterogeneous cluster that combines the CPU and GPU computations using MPI and CUDA for developing a high performance linear algebra library. Our library deals with large linear systems solvers because they are a common problem in the fields of science and engineering. Direct methods for computing the solution of such systems can be very expensive due to high memory requirements and computational cost. An efficient alternative are iterative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Matrix Theory and Algorithms · Numerical Methods and Algorithms