Efficient Interleaved Batch Matrix Solvers for CUDA

Andrew Gloster; Enda Carroll; Miguel Bustamante; Lennon O'Naraigh

arXiv:1909.04539·cs.DC·September 13, 2019

Efficient Interleaved Batch Matrix Solvers for CUDA

Andrew Gloster, Enda Carroll, Miguel Bustamante, Lennon O'Naraigh

PDF

Open Access 3 Repos

TL;DR

This paper introduces a new CUDA-based method for solving batches of structured matrices that reduces memory usage and improves computational performance, enabling larger problem sizes on GPUs.

Contribution

It proposes a novel data access methodology that minimizes storage and enhances speed for batch matrix solvers sharing a common LHS matrix.

Findings

01

Reduced storage overhead by storing only one copy of the LHS matrix

02

Achieved performance improvements over existing algorithms

03

Enabled solving more systems on a single GPU

Abstract

In this paper we present a new methodology for data accesses when solving batches of Tridiagonal and Pentadiagonal matrices that all share the same LHS matrix. By only storing one copy of this matrix there is a significant reduction in storage overheads and the authors show that there is also a performance increase in terms of compute time. These two results combined lead to an overall more efficient implementation over the current state of the art algorithms cuThomasBatch and cuPentBatch, allowing for a greater number of systems to be solved on a single GPU.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMatrix Theory and Algorithms · Parallel Computing and Optimization Techniques · Graph Theory and Algorithms