Pipelined Iterative Solvers with Kernel Fusion for Graphics Processing   Units

Karl Rupp; Josef Weinbub; Ansgar J\"ungel; Tibor Grasser

arXiv:1410.4054·cs.MS·November 7, 2016

Pipelined Iterative Solvers with Kernel Fusion for Graphics Processing Units

Karl Rupp, Josef Weinbub, Ansgar J\"ungel, Tibor Grasser

PDF

TL;DR

This paper presents a GPU-based iterative solver implementation using extensive kernel fusion, significantly improving performance for small to medium-sized systems and transient problems, and competitive with existing solver packages.

Contribution

It introduces a novel GPU implementation of iterative solvers with kernel fusion, outperforming traditional methods especially for small to medium systems.

Findings

01

Significant performance gains for small to medium systems

02

Competitive with vendor-tuned implementations for large systems

03

Effective for transient problems requiring multiple small systems

Abstract

We revisit the implementation of iterative solvers on discrete graphics processing units and demonstrate the benefit of implementations using extensive kernel fusion for pipelined formulations over conventional implementations of classical formulations. The proposed implementations with both CUDA and OpenCL are freely available in ViennaCL and are shown to be competitive with or even superior to other solver packages for graphics processing units. Highest performance gains are obtained for small to medium-sized systems, while our implementations are on par with vendor-tuned implementations for very large systems. Our results are especially beneficial for transient problems, where many small to medium-sized systems instead of a single big system need to be solved.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.