Pipelined Iterative Solvers with Kernel Fusion for Graphics Processing Units
Karl Rupp, Josef Weinbub, Ansgar J\"ungel, Tibor Grasser

TL;DR
This paper presents a GPU-based iterative solver implementation using extensive kernel fusion, significantly improving performance for small to medium-sized systems and transient problems, and competitive with existing solver packages.
Contribution
It introduces a novel GPU implementation of iterative solvers with kernel fusion, outperforming traditional methods especially for small to medium systems.
Findings
Significant performance gains for small to medium systems
Competitive with vendor-tuned implementations for large systems
Effective for transient problems requiring multiple small systems
Abstract
We revisit the implementation of iterative solvers on discrete graphics processing units and demonstrate the benefit of implementations using extensive kernel fusion for pipelined formulations over conventional implementations of classical formulations. The proposed implementations with both CUDA and OpenCL are freely available in ViennaCL and are shown to be competitive with or even superior to other solver packages for graphics processing units. Highest performance gains are obtained for small to medium-sized systems, while our implementations are on par with vendor-tuned implementations for very large systems. Our results are especially beneficial for transient problems, where many small to medium-sized systems instead of a single big system need to be solved.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
