TL;DR
This paper demonstrates the successful porting and optimization of batched iterative solvers to Intel GPUs using SYCL, achieving significant performance improvements over previous CUDA implementations for scientific applications.
Contribution
The paper introduces a SYCL-based implementation of batched iterative solvers optimized for Intel GPUs, enhancing performance and portability for scientific computing.
Findings
Achieved 2.4x faster performance on Intel GPUs compared to NVIDIA CUDA implementation.
Successfully integrated the solvers into the Ginkgo library for real-world scientific applications.
Demonstrated the viability of SYCL for high-performance scientific computing on Intel GPU architectures.
Abstract
Batched linear solvers play a vital role in computational sciences, especially in the fields of plasma physics and combustion simulations. With the imminent deployment of the Aurora Supercomputer and other upcoming systems equipped with Intel GPUs, there is a compelling demand to expand the capabilities of these solvers for Intel GPU architectures. In this paper, we present our efforts in porting and optimizing the batched iterative solvers on Intel GPUs using the SYCL programming model. These new solvers achieve impressive performance on the Intel GPU Max 1550s (Ponte Vecchio GPUs) which surpass our previous CUDA implementation on NVIDIA H100 GPUs by an average of 2.4x for the PeleLM application inputs. The batched solvers are ready for production use in real-world scientific applications through the Ginkgo library, complementing the performance portability of the batched…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
